Build awareness and adoption for your software startup with Circuit.

Geospatial Vector Search: Building an AI-Powered Geo-Aware News Search

Creating a geo-aware news search using News API, PyCountry, GeoPy, and Qdrant vector database.

Introduction

Vector databases are important for geospatial applications because they help organize and handle geographical information effectively. They organize information about places in a neat way, which makes it easy to show objects like rivers, roads, and boundaries accurately. These databases use a smart system to connect details about places, which offers a full picture of the spatial data.

They’re also really good at handling relationships between different features on a map, like figuring out if two lines cross or if a point is inside a shape. This helps with advanced spatial analysis, which makes it easier to understand what’s happening in a specific area. When it comes to finding things on a map quickly, vector databases use special methods to speed up the process, by making it quicker to get info about specific places. Plus, they have a language just for asking about location-related questions and concepts.

Vector databases also work well with other mapping tools because they follow certain standards. This makes it easy for different softwares and systems to share information. These databases don’t just store shapes; they also keep track of details about places, like names, pictures, and numbers. This makes it easy to add extra info to the map.

Qdrant: GeoSpatial Search Supported Vector DB

Qdrant DB’s Geo features distinguish it as a powerful tool for geospatial search within the realm of vector databases.

Following are the key aspects of Qdrant DB’s Geo features and how they set it apart from other vector databases:

  1. Geo Filter: Qdrant introduces a Geo Filter feature, which allows users to refine query results using polygons. This feature is an expansion of traditional radius and rectangle filters, providing greater flexibility in spatial queries. The Geo Filter allows users to define complex geographic shapes for refining searches. This is particularly useful in scenarios where data points need to be filtered based on intricate spatial boundaries, offering more advanced geospatial querying capabilities. Qdrant’s Geo Filter feature, especially with polygon filtering, distinguishes it from many other vector databases.
  2. Geo Hash Layer: Qdrant employs a Geo Hash Layer that divides the world into rectangles. During spatial indexing, each entry is assigned to the geohash corresponding to its location. This approach optimizes spatial queries by first identifying potential geohashes and subsequently checking for location candidates within those hashes. The Geo Hash Layer enhances query performance by efficiently narrowing down the search space.
  3. Payload Support: Qdrant not only supports the storage of vector data but also includes additional payload associated with vectors. The database allows filtering results based on payload values, including diverse data types and query conditions such as geo-locations. The support for payload in Qdrant extends its geospatial capabilities beyond mere location-based queries. Users can filter and retrieve data based on various attributes associated with vectors.

News API and PyCountry: Creating News Article Dataset

The idea to build a geo-aware news search engine emerged from the increasing importance of location-based information in news consumption. It was inspired by the capabilities of geospatial technologies and vector databases, with the recognition of the potential to improve user experience and deliver more contextually relevant news.

The main objective was to create a platform that not only leverages the structured and efficient framework of vector databases, but also integrates advanced spatial analysis tools by ensuring that users receive tailored news content based on their geographical interests and preferences.

Let’s see how we can leverage vector databases for geo-aware news search.

To create the news article dataset, you need an API key for which we will be using News API, which you can get from here.

NewsAPI.org provides a user-friendly API for accessing news from over 30,000 global sources. The API is freely available for non-commercial projects, including open-source initiatives and in-development commercial projects. However, registration is required to obtain an ‘API key’.

For retrieving country names, we will be using the PyCountry module. PyCountry is a Python module designed to facilitate working with country-related data. This module offers a range of functionalities, including the retrieval of information about countries such as their official names, common names, and ISO 3166 codes (both alpha-2 and alpha-3).

We’ll begin by installing the dependencies.

%%capture
pip install -q newsapi-python
pip install -q pycountry

Using the API key, we will write code such that, for the country input we provide, the type of news will be saved in the CSV file.

# Import necessary libraries
import csv
from newsapi.newsapi_client import NewsApiClient
import pycountry

# Initialize NewsAPI client with your API key
newsapi = NewsApiClient(api_key='your-api-key')

# Infinite loop to keep the program running until the user decides to exit
while True:
    # Get the name of the country from the user as input
    input_country = input("Country: ")
    input_countries = [f'{input_country.strip()}']
    countries = {}

    # Iterate over all the countries using pycountry module and store their codes in a dictionary
    for country in pycountry.countries:
        countries[country.name] = country.alpha_2

    # Check if the entered country name is valid or invalid using the unique code
    codes = [countries.get(country.title(), 'Unknown code') for country in input_countries]

    # Get the user's interest category for news
    option = input("Which category are you interested in?\n1.Business\n2.Entertainment\n3.General\n4.Health\n5.Science\n6.Technology\n\nEnter here: ")

    # Fetch top headlines based on user's choices
    top_headlines = newsapi.get_top_headlines(category=f'{option.lower()}', language='en', country=f'{codes[0].lower()}')
    Headlines = top_headlines['articles']

    # Display news with good readability for the user and write to CSV
    if Headlines:
        with open('news_data.csv', 'w', newline=', encoding='utf-8') as csvfile:
            csv_writer = csv.writer(csvfile)
            csv_writer.writerow(['News Article', 'Country'])

            for articles in Headlines:
                # Check if '-' is present in the title and format it accordingly
                if "-" in articles['title']:
                    b = articles['title'][::-1].index("-")
                    if "news" in (articles['title'][-b+1:]).lower():
                        news_title = f"{articles['title'][-b+1:]}: {articles['title'][:-b-2]}"
                    else:
                        news_title = f"{articles['title'][-b+1:]} News: {articles['title'][:-b-2]}"
                else:
                    news_title = articles['title']

                # Write data to CSV
                csv_writer.writerow([news_title, input_country])

                # Print for user readability
                print(news_title)

        print(f"CSV file 'news_data.csv' created successfully with news articles for {input_country}.")
    else:
        print(f"Sorry, no articles found for {input_country}. Something went wrong!")

    # Ask the user if they want to search again
    option = input("Do you want to search again[Yes/No]?")
    if option.lower() != 'yes':
        break

I ran the above code 11 times, and saved them in 11 different files. Then I merged them into one dataframe.

import pandas as pd
from sklearn.utils import shuffle

# List of file names
file_names = ['/content/news_data.csv', '/content/news_data1.csv', '/content/news_data2.csv', '/content/news_data3.csv',
              '/content/news_data4.csv', '/content/news_data5.csv', '/content/news_data6.csv', '/content/news_data7.csv',
              '/content/news_data9.csv', '/content/news_data8.csv','/content/news_data10.csv']

# Create an empty DataFrame to store the merged data
merged_data = pd.DataFrame()

# Read and merge each CSV file into the DataFrame
for file in file_names:
    df = pd.read_csv(file)
    merged_data = pd.concat([merged_data, df], ignore_index=True)

# Shuffle the DataFrame
shuffled_data = shuffle(merged_data)

# Display the shuffled DataFrame
print(shuffled_data)

Let’s assign the latitude and longitude of the countries. In our dataset, we have country names, and we will go with the latitude and longitude of the middle of each country.

GeoPy: Getting Latitude and Longitude

We’ll start by renaming the unnamed column as ID, because IDs are the most important part of vector search.

import pandas as pd
data = pd.read_csv("/content/News Article.csv")
data = data.rename(columns = {'Unnamed: 0':'Id'})
data.sample(3)

Then, we’ll install the GeoPy dependency.

GeoPy is a Python library designed to facilitate geocoding tasks, which involve converting addresses to geographical coordinates (latitude and longitude) and vice versa. This library offers functionalities such as geocoding, enabling the transformation of addresses like “1600 Amphitheatre Parkway, Mountain View, CA” into the corresponding geographic coordinates. Additionally, GeoPy supports reverse geocoding, which allows users to obtain location details or addresses based on the given coordinates. The library further provides tools for calculating distances between two points on the Earth’s surface using various metrics. Notably, GeoPy supports multiple geocoding services, including OpenStreetMap Nominatim, Google Geocoding API, and Bing Maps API.

%%capture
pip install geopy

By using the Nominatim API, we’ll define a helper function through which we’ll get the latitude and longitude columns in our dataset.

# Initialize Nominatim API
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="MyApp")

# Function to get latitude and longitude
def get_lat_lon(country):
    location = geolocator.geocode(country)
    if location:
        return pd.Series({'lat': location.latitude, 'lon': location.longitude})
    else:
        return pd.Series({'lat': None, 'lon': None})

# Apply the function to create new columns
data[['lat', 'lon']] = data['Country'].apply(get_lat_lon)
data.head()

Now, let’s transform news articles into vectors.

Transforming Texts into Vectors

Hugging Face offers an extensive range of pre-trained models catering to various purposes. For our semantic search, we will explore the multi-qa-MiniLM-L6-cos-v1-Model, specifically crafted for this task.

This model belongs to the sentence-transformers family, projects sentences and paragraphs into a 384-dimensional dense vector space, tailored for semantic search. Trained on a diverse set of 215 million question-answer pairs from various sources, it serves as a powerful tool.

Keep in mind the model’s limitations, such as the 512-word piece constraint; any text beyond this limit will be truncated. Also, note that the model was trained on input text up to 250-word pieces, potentially affecting its performance on longer texts.

In our case, we are using the “News Articles” column, in which most sentences have up to 50 words. With the sentence transformer model, we’ll create a vector column to keep all the vectors which are converted from News Articles text.

from sentence_transformers import SentenceTransformer
from tqdm import tqdm
tqdm.pandas()
# load the model
model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
# encode all the news article with the model
data["vector"] = data["News Article"].progress_apply(lambda x: model.encode(x.lower()))
data.head()

Now that our vectors are ready, let’s move to the geospatial vector search.

Geospatial Vector Search

Now, we need to create a collection to store the vectors that we created. We’ll initialize a vector database and create a collection. I’m using the Qdrant Vector database here because of its advanced geo filter capabilities.

from qdrant_client import QdrantClient
from qdrant_client.http.models import *
client = QdrantClient(":memory:")
client.recreate_collection(
 collection_name="geo_collection",
 vectors_config=VectorParams(size=384, distance=Distance.DOT)
)

We’ll create a payload index for the “News Article” field in the “geo_collection” collection, by applying text indexing with specific parameters such as word tokenization, minimum and maximum token lengths, and optional lowercase conversion.

client.create_payload_index(
  (collection_name = "geo_collection"),
  (field_name = "News Article"),
  (field_schema = models.TextIndexParams(
    (type = "text"),
    (tokenizer = models.TokenizerType.WORD),
    (min_token_len = 2),
    (max_token_len = 30),
    (lowercase = True)
  ))
);

Now, we’ll define a helper function that inserts one Pandas row at a time in our created collection.

def post_qdrant(row):
    """Inserting each row separately for simplicity. Can be optimized through inserting multiple rows at once."""

    # Create a payload without the vector, lat, and lon
    row_payload = row.iloc[:-3].to_dict()

    # Add lat and lon to the payload as a dictionary under the key "location"
    row_payload["location"] = row[["lat", "lon"]].to_dict()

    # Extract the vector from the "vector" column and convert it to a list
    row_vector = row["vector"].tolist()

    # Extract the unique identifier from the "Id" column
    row_id = row["Id"]

    # Perform a POST request to Qdrant API using the provided client
    operation_info = client.upsert(
        collection_name="geo_collection",
        wait=True,
        points=[
            PointStruct(id=row_id, vector=row_vector, payload=row_payload),
        ]
    )

Then, we’ll apply the helper function to our dataset.

data.progress_apply(lambda x: post_qdrant(x), axis=1)

You can query the database with semantic search by calling the search endpoint. I put the search term “scientific breakthrough” and limited the search results to 5.

search_term = "scientific breakthrough";
search_result = client.search(
  (collection_name = "geo_collection"),
  (query_vector = model.encode(search_term)),
  (limit = 5)
);
search_result;

Here you can see the scores of the search results. They are in descending order. The search result which is very close to our search term has the highest score among all the results.

[
  ScoredPoint(
    (id = 23),
    (version = 0),
    (score = 0.5760013461112976),
    (payload = {
      Id: 23,
      "News Article":
        "New York Post News: 'Scientific breakthrough' leads to discovery of first antibiotic that kills drug-resistant bacteria in 50 years",
      Country: "United States",
      location: { lat: 39.7837304, lon: -100.445882 },
    }),
    (vector = None),
    (shard_key = None)
  ),
  ScoredPoint(
    (id = 40),
    (version = 0),
    (score = 0.3289460241794586),
    (payload = {
      Id: 40,
      "News Article":
        "Futurism News: Scientists Concerned About Devices That Literally Read Your Mind",
      Country: "New Zealand",
      location: { lat: -41.5000831, lon: 172.8344077 },
    }),
    (vector = None),
    (shard_key = None)
  ),
  ScoredPoint(
    (id = 103),
    (version = 0),
    (score = 0.3289460241794586),
    (payload = {
      Id: 103,
      "News Article":
        "Futurism News: Scientists Concerned About Devices That Literally Read Your Mind",
      Country: "Canada",
      location: { lat: 61.0666922, lon: -107.991707 },
    }),
    (vector = None),
    (shard_key = None)
  ),
  ScoredPoint(
    (id = 72),
    (version = 0),
    (score = 0.31487736105918884),
    (payload = {
      Id: 72,
      "News Article":
        "Big Think News: Starts With A Bang podcast #101 - Quantum Computing",
      Country: "United states",
      location: { lat: 39.7837304, lon: -100.445882 },
    }),
    (vector = None),
    (shard_key = None)
  ),
  ScoredPoint(
    (id = 48),
    (version = 0),
    (score = 0.2746903598308563),
    (payload = {
      Id: 48,
      "News Article":
        "IndiaTimes News: 4 mushroom recipes that can help boost vitamin D in winter",
      Country: "United Kingdom",
      location: { lat: 54.7023545, lon: -3.2765753 },
    }),
    (vector = None),
    (shard_key = None)
  ),
];

Now, let’s try putting the filter where we will search by “News Articles” key. The search result will come only if the text of the search term matches one of the news articles.

search_term = "Why fears over a 'tripledemic' are surging";
search_result = client.scroll(
  (collection_name = "geo_collection"),
  (scroll_filter = Filter(
    (must = [
      FieldCondition(
        (key = "News Article"),
        (match = MatchText((text = search_term)))
      ),
    ])
  )),
  (limit = 3),
  (with_payload = True)
);
search_result;

We limited the search results to 3, but we got only one result, because it follows only the exact match.

([Record(id=34, payload={'Id': 34, 'News Article': 'The Hill News: Why fears over a 'tripledemic' are surging', 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None)],
 None)

Now, let’s put a geofilter bounding box in which we will define the bottom right and top left bounds.

The Geo Bounding Box search involves defining a rectangular area on the Earth’s surface using coordinates of the upper left (top_left) and lower right (bottom_right) corners. Here, the rectangle is specified by longitude and latitude values, and locations inside this rectangle will match the condition.

search_term = "scientific breakthrough";

search_result = client.search(
  (collection_name = "geo_collection"),
  (query_vector = model.encode(search_term)),
  (query_filter = Filter(
    (must = [
      FieldCondition(
        (key = "location"),
        (geo_bounding_box = models.GeoBoundingBox(
          (bottom_right = models.GeoPoint(
            (lat = 61.0666922),
            (lon = -107.991707)
          )),
          (top_left = models.GeoPoint((lat = 22.351115), (lon = -3.2765753)))
        ))
      ),
    ])
  )),
  (limit = 6)
);

search_result;

Following are the results, where again we get the scores in descending order. The results we have are within the bounds that we put for latitude and longitude.

[ScoredPoint(id=23, version=0, score=0.5760013461112976, payload={'Id': 23, 'News Article': "New York Post News: 'Scientific breakthrough' leads to discovery of first antibiotic that kills drug-resistant bacteria in 50 years", 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=103, version=0, score=0.3289460241794586, payload={'Id': 103, 'News Article': 'Futurism News: Scientists Concerned About Devices That Literally Read Your Mind', 'Country': 'Canada', 'location': {'lat': 61.0666922, 'lon': -107.991707}}, vector=None, shard_key=None),
 ScoredPoint(id=72, version=0, score=0.31487736105918884, payload={'Id': 72, 'News Article': 'Big Think News: Starts With A Bang podcast #101 - Quantum Computing', 'Country': 'United states', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=48, version=0, score=0.2746903598308563, payload={'Id': 48, 'News Article': 'IndiaTimes News: 4 mushroom recipes that can help boost vitamin D in winter', 'Country': 'United Kingdom', 'location': {'lat': 54.7023545, 'lon': -3.2765753}}, vector=None, shard_key=None),
 ScoredPoint(id=37, version=0, score=0.2717020809650421, payload={'Id': 37, 'News Article': 'TechCrunch News: Harvard's robotic exoskeleton can improve walking, decrease falls in people with Parkinson's', 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=14, version=0, score=0.249062180519104, payload={'Id': 14, 'News Article': 'Samsung News: Samsung Teases Evolution of Mobile Experience in Major Cities Ahead of Unpacked 2024', 'Country': 'India', 'location': {'lat': 22.3511148, 'lon': 78.6677428}}, vector=None, shard_key=None)]

Apart from putting the bounding boxes, if we want the search results for a circular area with a center defined by the latitude and longitude, we can also do that with Geo Radius.

Geo Radius search focuses on finding locations within a circular area defined by a center point and a specified radius. Here, the center is specified by longitude and latitude, and locations within the circle with a radius of a certain distance (in meters) from the center will match the condition.

search_term = "scientific breakthrough";

search_result = client.search(
  (collection_name = "geo_collection"),
  (query_vector = model.encode(search_term)),
  (query_filter = Filter(
    (must = [
      FieldCondition(
        (key = "location"),
        (geo_radius = models.GeoRadius(
          (center = models.GeoPoint((lat = 39.78373), (lon = -100.445882))),
          (radius = 10_000)
        ))
      ),
    ])
  )),
  (limit = 5)
);

search_result;

Following are the results, where we can see that all the results are from the same location that we defined in our code.

[ScoredPoint(id=23, version=0, score=0.5760013461112976, payload={'Id': 23, 'News Article': "New York Post News: 'Scientific breakthrough' leads to discovery of first antibiotic that kills drug-resistant bacteria in 50 years", 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=72, version=0, score=0.31487736105918884, payload={'Id': 72, 'News Article': 'Big Think News: Starts With A Bang podcast #101 - Quantum Computing', 'Country': 'United states', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=37, version=0, score=0.2717020809650421, payload={'Id': 37, 'News Article': 'TechCrunch News: Harvard's robotic exoskeleton can improve walking, decrease falls in people with Parkinson's', 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=34, version=0, score=0.2585242688655853, payload={'Id': 34, 'News Article': 'The Hill News: Why fears over a 'tripledemic' are surging', 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=113, version=0, score=0.24455253779888153, payload={'Id': 113, 'News Article': "Fox Business News: Experts express concern over pharmaceutical giant Eli Lilly's new website connecting patients to obesity drugs", 'Country': 'United states', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None)]

Let’s try the Geo Polygon search. The Geo Polygon search is useful for defining irregularly shaped areas, such as country boundaries. The polygon is defined by exterior and interior rings, with each ring consisting of specific points in the longitude and latitude. The exact matches for latitude and longitude coordinates will be the points located between the exterior and the interior boundaries. Here, we set the exterior boundaries only.

Let’s check the coordinates on which our “News Articles” data will fall.

import matplotlib.pyplot as plt
from matplotlib.patches import Polygon

# Coordinates of the GeoPolygon
polygon_coords = [
    [-130, 70],
    [-120, 0],
    [200, -50],
    [140, 20],
    [40, 100],
]

# Coordinates of the points in our dataset
lon = [-100.445882, 78.6677428, -107.991707, -3.2765753, 134.755, 172.8344077]
lat = [39.7837304, 22.3511148, 61.0666922, 54.7023545, -24.7761086, -41.5000831]

# Create a plot
fig, ax = plt.subplots()

# Plot the GeoPolygon
polygon = Polygon(polygon_coords, closed=True, edgecolor='b', alpha=0.3)
ax.add_patch(polygon)

# Plot the points
ax.scatter(lon, lat, color='r', label='Data Points')

# Set axis labels
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

# Show legend
ax.legend()

# Show the plot
plt.show()

The plot will look like this:

Now, let’s query with the Geo Polygon coordinates.

polygon_coords = [
 [-130, 70],
 [-120, 0],
 [200, -50],
 [140, 20],
 [40, 100],
]
search_term = "scientific breakthrough"
# Adjusted GeoPolygon using the polygon coordinates
geo_polygon = models.GeoPolygon(
 exterior=models.GeoLineString(
 points=[
 models.GeoPoint(lon=polygon_coords[0][0], lat=polygon_coords[0][1]),
 models.GeoPoint(lon=polygon_coords[1][0], lat=polygon_coords[1][1]),
 models.GeoPoint(lon=polygon_coords[2][0], lat=polygon_coords[2][1]),
 models.GeoPoint(lon=polygon_coords[3][0], lat=polygon_coords[3][1]),
 models.GeoPoint(lon=polygon_coords[4][0], lat=polygon_coords[4][1]),
 ]
 )
)
# Use the adjusted GeoPolygon in the search query
search_result = client.search(
 collection_name="geo_collection",
 query_vector=model.encode(search_term),
 query_filter=Filter(
 must=[
 FieldCondition(
 key="location",
 geo_polygon=geo_polygon
 )
 ]
 )
)
search_result

Following will be the results. We have got 10 results here.

[ScoredPoint(id=23, version=0, score=0.5760013461112976, payload={'Id': 23, 'News Article': "New York Post News: 'Scientific breakthrough' leads to discovery of first antibiotic that kills drug-resistant bacteria in 50 years", 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=40, version=0, score=0.3289460241794586, payload={'Id': 40, 'News Article': 'Futurism News: Scientists Concerned About Devices That Literally Read Your Mind', 'Country': 'New Zealand', 'location': {'lat': -41.5000831, 'lon': 172.8344077}}, vector=None, shard_key=None),
 ScoredPoint(id=103, version=0, score=0.3289460241794586, payload={'Id': 103, 'News Article': 'Futurism News: Scientists Concerned About Devices That Literally Read Your Mind', 'Country': 'Canada', 'location': {'lat': 61.0666922, 'lon': -107.991707}}, vector=None, shard_key=None),
 ScoredPoint(id=72, version=0, score=0.31487736105918884, payload={'Id': 72, 'News Article': 'Big Think News: Starts With A Bang podcast #101 - Quantum Computing', 'Country': 'United states', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=48, version=0, score=0.2746903598308563, payload={'Id': 48, 'News Article': 'IndiaTimes News: 4 mushroom recipes that can help boost vitamin D in winter', 'Country': 'United Kingdom', 'location': {'lat': 54.7023545, 'lon': -3.2765753}}, vector=None, shard_key=None),
 ScoredPoint(id=37, version=0, score=0.2717020809650421, payload={'Id': 37, 'News Article': 'TechCrunch News: Harvard's robotic exoskeleton can improve walking, decrease falls in people with Parkinson's', 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=34, version=0, score=0.2585242688655853, payload={'Id': 34, 'News Article': 'The Hill News: Why fears over a 'tripledemic' are surging', 'Country': 'United States', 'location': {'lat': 39.7837304, 'lon': -100.445882}}, vector=None, shard_key=None),
 ScoredPoint(id=57, version=0, score=0.249062180519104, payload={'Id': 57, 'News Article': 'Samsung News: Samsung Teases Evolution of Mobile Experience in Major Cities Ahead of Unpacked 2024', 'Country': 'Canada', 'location': {'lat': 61.0666922, 'lon': -107.991707}}, vector=None, shard_key=None),
 ScoredPoint(id=79, version=0, score=0.249062180519104, payload={'Id': 79, 'News Article': 'Samsung News: Samsung Teases Evolution of Mobile Experience in Major Cities Ahead of Unpacked 2024', 'Country': 'Canada', 'location': {'lat': 61.0666922, 'lon': -107.991707}}, vector=None, shard_key=None),
 ScoredPoint(id=14, version=0, score=0.249062180519104, payload={'Id': 14, 'News Article': 'Samsung News: Samsung Teases Evolution of Mobile Experience in Major Cities Ahead of Unpacked 2024', 'Country': 'India', 'location': {'lat': 22.3511148, 'lon': 78.6677428}}, vector=None, shard_key=None)]

Our search results make it obvious that vector databases are really useful in making AI-Powered Geo Aware News Search.

Conclusion

With the assistance of News API and PyCountry, we created a custom news article dataset. Using GeoPy, we defined the locations of the countries, which aided us in our geospatial vector search using the Qdrant vector database. The results demonstrated that we are obtaining the expected outcomes from the database.

I greatly enjoyed constructing my dataset and implementing it to create a geo-aware news search. Thank you for reading!




Continue Learning