I recently had the challenge of migrating images from one online platform to another. This became particularly interesting since the platform we needed to move the images to presented a GraphQL API (The āQLā stands for āQuery Languageā and āAPIā is short for Application Programming Interface).
Letās dive into the world of GraphQL and how we can use it in Python.
Contents
Ā· Getting Started
Ā· Our First Graphql Query
Ā· Preparing Pagination
Ā· Extracting GraphQL Queries to Separate Files
Ā· Splitting up logic
Ā· Handling Errors
Ā· A Finished Product
Ā· Conclusion and Next Steps
Getting Started
The images that need to be migrated are associated with particular products. But not all of the products are available on the destination platform. It makes sense then to start by asking the destination platform what products were available.
Our First Graphql Query
The SKU (Stock Keeping Unit) code of each product is common across both platforms, so we can start by querying the destination platformās API for the SKUs of all products. The basic query looks like this:
query {
products {
sku
}
}
One of the great things about GraphQL is how instantly readable the syntax is. We can easily see that this is a query to request the SKUs of all products.
Querying in Python
We can use a Python library to execute this query. Doing so requires a bit of setup. We can use pip
to download the gql
(short for Graph Query Language) and aiohttp
libraries. To handle the configuration parameters required by the API, it would also be useful to have the python-dotenv
library.
We can then create a simple script to instantiate the GraphQL client:
from dotenv import dotenv_values
from gql import gql, Client
from gql.transport.aiohttp import AIOHTTPTransport
env = dotenv_values()
headers = {"Authorization": f"Bearer {env['DEST_TOKEN']}"}
transport = AIOHTTPTransport(url=env['DEST_URL'], headers=headers)
client = Client(transport=transport)
Letās walk through this quickly. Weāre starting with the library for handling .env
files, which are a great way of keeping configuration parameters such as passwords out of your code. Our .env
file for this project looks something like this:
DEST_URL="https://destination-api.example.com"
DEST_TOKEN="super_secret_generated_token"
Our code can use these values to set up the header and http transport layer used by the gql
client, which then makes queries for us like this:
...
products_query = gql("""
query {
products {
sku
}
}
""")
response = client.execute(products_query)
print(len(response['products']))
The gql
function prepares the query string. Then the client
sends the query and parses the JSON (JavaScript Object Notation) response into a dictionary containing the key āproductsā, which should contain a list of dictionaries containing all the product parameters weāve requested ā in this case, SKUs. The response should look something like this:
[{'sku': '12345'}, {'sku': '23456'}, {'sku': '34567'}]
But looking at the length of that list, and comparing it to the number of expected products (numbering in the thousands) itās immediately clear that not all the products are being returned. Whatās wrong?
Preparing Pagination
To avoid sending a lot of information all at once, which might overwhelm both the requesting and responding servers, many APIs will only provide a small portion of the entries available, for example, 100 records at a time. The documentation of this product API shows that it permits these parameters:
- āfirstā ā allowing us to ask for the first
n
items. - āskipā ā the number of records to skip over before providing a response.
In GraphQL syntax we can pass these parameters like this:
query {
products(first: 100, skip: 100 {
sku
}
}
This would return us 100 products, starting with the 101st product. In other words, it returns the second āpageā of records. But how many records are there? Do we need to prepare a separate query for each set? That would get pretty messy. Fortunately, GraphQL allows us to pass parameters to our query and use them in place of previously hard-coded values like this:
query ($first: Int!, $skip: Int!) {
products(first: $first, skip: $skip) {
sku
}
}
GraphQL parsers understand that any word following a ā$ā is now considered a variable. Adding the exclamation point to the parameter type (Int) makes these required parameters, which means weāll get an error if we forget to include them.
We can now easily loop through all our products like this:
...
products_query = gql("""
query ($first: Int!, $skip: Int!) {
products(first: $first, skip: $skip) {
sku
}
}
""")
page_size = 100
skip = 0
while True:
vars = {"first": page_size, "skip": skip}
response = client.execute(products_query, variable_values=vars)
skip += page_size
if not response['products']:
break
<fetch images and upload etc...>
Here we create an infinite loop and keep asking for more products, increasing the āskipā value by the page size on each iteration. When no products are returned, we break our loop.
Extracting GraphQL Queries to Separate Files
An obvious ācode smellā is seeing code written in another language encapsulated in a string variable. We can easily extract our GraphQL query into its own file, naming it something like products_query.graphql
and load it with a function:
def load_query(path):
with open(path) as f:
return gql(f.read())
products_query = load_query('products_query.graphql')
This reads much more nicely than having a multi-line string containing our query, especially when we need to add more properties to the query later on. If the query file isnāt available our script will crash, but thatās actually for the best. Thereās no point in executing an empty query.
Splitting up logic
Our script is already getting pretty long. Letās try encapsulating our GraphQL logic in its own class.
from gql import gql, Client
from gql.transport.aiohttp import AIOHTTPTransport
class ProductProvider:
def __init__(self, conf):
url = conf['DEST_URL']
headers = {"Authorization": f"Bearer {conf['DEST_TOKEN']}"}
transport = AIOHTTPTransport(url=url, headers=headers)
self._client = Client(transport=transport)
self._query = self._load_query('products_query.graphql')
def get_products(self, page_size, skip):
v = {"first": page_size, "skip": skip}
return self._client.execute(self._query, variable_values=v)
def _load_query(self, path):
with open(path) as f:
return gql(f.read())
This is ok, but it still means our script will have to handle pagination. A much nicer technique would be to use Pythonās yield
keyword, making our function into a generator.
We can then return products one by one and still handle pagination within our function.
...
def get_products(self, page_size=100):
skip = 0
while True:
v = {"first": page_size, "skip": skip}
products = self._client.execute(
self._query,
variable_values=v
)
if not products['products']:
return
skip += page_size
for product in products['products']:
yield product
...
Including the optional page_size
variable makes it obvious that pagination is occurring and is handled within the function.
Handling Errors
One nice finishing touch is some query error handling. Looking at the gql
library we can see that it returns errors with multiple messages.
gql/exceptions.py at master Ā· graphql-python/gql
After some experimentation we can see that we usually only care about the first error message from this class, so we could add a function that wraps the client.execute
method and transforms that error into a custom exception.
...
from gql.transport.exceptions import TransportQueryError
class ProductProviderError(Exception):
pass
class ProductProvider:
...
def _execute(self, query, vars):
try:
return self._client.execute(query, variable_values=vars)
except TransportQueryError as err:
raise ProductProviderError(err.errors[0]['message'])
We can then easily catch and print exceptions like this:
from product_provider import ProductProvider, ProductProviderError
...
try:
for product in product_provider.get_products():
print(product)
except ProductProviderError as err:
print(err)
A Finished Product
With the encapsulation of our GraphQL calls into a custom class, we have a very simple script to run.
#!/usr/bin/env python3
from dotenv import dotenv_values
from product_provider import ProductProvider, ProductProviderError
conf = dotenv_values()
product_provider = ProductProvider(conf)
try:
for product in product_provider.get_products():
print(product)
except ProductProviderError as err:
print(err)
Our ProductProvider class is a bit more complicated but encapsulates the APIās pagination and the gql
libraryās exceptions. It also provides the potential for reuse.
from gql import gql, Client
from gql.transport.aiohttp import AIOHTTPTransport
from gql.transport.exceptions import TransportQueryError
class ProductProviderError(Exception):
pass
class ProductProvider:
def __init__(self, conf):
headers = {"Authorization": f"Bearer {conf['DEST_TOKEN']}"}
transport = AIOHTTPTransport(url=conf['DEST_URL'], headers=headers)
self._client = Client(transport=transport)
self._sku_query = self._load_query('products_query.graphql')
def get_products(self, page_size=100):
skip = 0
while True:
vars = {"first": page_size, "skip": skip}
products = self._execute(self._sku_query, vars)
if not products['products']:
return
skip += page_size
for product in products['products']:
yield product
def _load_query(self, path):
with open(path) as f:
return gql(f.read())
def _execute(self, query, variable_values):
try:
return self._client.execute(query, variable_values=variable_values)
except TransportQueryError as err:
raise ProductProviderError(err.errors[0]['message'])
Our query is in its own file for easy readability, updating and reusability.
query ($first: Int!, $skip: Int!) {
products(first: $first, skip: $skip) {
sku
}
}
Finally, for dependency management, our requirements.txt
file should look something like this:
gql~=3.0.0a6
aiohttp~=3.7
python-dotenv~=0.18
To install them all at once, we can run pip3 install -r requirements.txt
Conclusion
Good development practices are usually iterative. We create a rough idea that works and then make it more readable, robust and reusable.
Our script started out very rough, but since we could quickly prove that the concepts worked, we could follow up with best practices and we were able to distribute responsibility and create a reusable component.
Weāll look at a way of handling image downloads and uploads for this project in a future post.
Want more Python tips?
I Solved a Codility Challenge with One Line of Python
Some recommendations about useful tools: