circuit

Send HTTP Requests As Fast As Possible in Python

Use Python's synchronous, multi-threading, queue, and asyncio event loop to make 100 HTTP requests and see which solution performs the best.




Who dives faster? by Charles Zhu, my 6yo boy

It is easy to send a single HTTP request by using the requests package. What if I want to send hundreds or even millions of HTTP requests asynchronously? This article is an exploring note to find my fastest way to send HTTP requests.

The code is running in a Linux(Ubuntu) VM host in the cloud with Python 3.7. All code in gist is ready for copy and run.

Solution #1: The Synchronous Way

The most simple, easy-to-understand way, but also the slowest way. I forge 100 links for the test by this magic python list operator:

url_list = ["https://www.google.com/","https://www.bing.com"]*50

The code:

It takes about 10 seconds to finish downloading 100 links.

...
download 100 links in 9.438416004180908 seconds

As a synchronous solution, there are still rooms to improve. We can leverage the Session object to further increase the speed. The Session object will use urllib3's connection pooling, which means, for repeating requests to the same host, the Session object's underlying TCP connection will be re-used, hence gain a performance increase.

So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase — Session Objects

To ensure the request object exit no matter success or not, I am going to use the with statement as a wrapper. the withkeyword in Python is just a clean solution to replace try… finally… .

Let's see how many seconds are saved by changing to this:

Looks like the performance is really boosted to 5.x seconds.

...
download 100 links in 5.367443561553955 seconds

But this is still slow, let's try the multi-threading solution

Solution #2: The Multi-Threading Way

Python threading is a dangerous topic to discuss, sometimes, multi-threading could be even slower! David Beazley brought with gut delivered a wonderful presentation to cover this dangerous topic. here is the Youtube link.

Anyway, I am still going to use Python Thread to do the HTTP request job. I will use a queue to hold the 100 links and create 10 HTTP download worker threads to consume the 100 links asynchronously.

How the multi-thread works

To use the Session object, it is a waste to create 10 Session objects for 10 threads, I want one Session object and reuse it for all downloading work. To make it happen, The code will leverage the local object from threading package, so that 10 thread workers will share one Session object.

from threading import Thread,local
...
thread_local = local()
...

The code:

The result:

...
download 100 links in 1.1333789825439453 seconds

This is fast! way faster than the synchronous solution.

Solution #3: Multi-Threading by ThreadPoolExecutor

Python also provides ThreadPoolExecutor to perform multi-thread work, I like ThreadPoolExecutor a lot.

In the Thread and Queue version, there is a while True loop in the HTTP request worker, this makes the worker function tangled with Queue and needs additional code change from the synchronous version to the asynchronous version.

Using ThreadPoolExecutor, and its map function, we can create a Multi-Thread version with very concise code, require minimum code change from the synchronous version.

How the ThreadPoolExecutor version works

The code:

And the output is as fast as the Thread-Queue version:

...
download 100 links in **1.0798051357269287** seconds

Solution #4: asyncio with aiohttp

Everyone says asyncio is the future, and it is fast. Some folks use it making 1 million HTTP requests with Python [asyncio](https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html) and [aiohttp](https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html). Although asyncio is super fast, it uses zero Python Multi-Threading.

Believe it or not, asyncio runs in one thread, in one CPU core.

asyncio Event Loop

The event loop implemented in asyncio is almost the same thing that is beloved in Javascript.

Asyncio is so fast that it can send almost any number of requests to the server, the only limitation is your machine and internet bandwidth.

Too many HTTP requests send will behave like “attacking”. Some web site may ban your IP address if too many requests are detected, even Google will ban you too. To avoid being banned, I use a custom TCP connector object that specified the max TCP connection to 10 only. (it may safe to change it to 20)

my_conn = aiohttp.TCPConnector(**_limit_=10**)

The code is pretty short and concise:

And the above code finished 100 links downloading in 0.74 seconds!

...
download 100 links in 0.7412574291229248 seconds

Note that if you are running code in JupyterNotebook or IPython. please also install thenest-asyncio package. (Thanks to this StackOverflow link. Credit to Diaf Badreddine.)

pip install nest-asyncio

and add the following two lines of code at the beginning of the code.

import nest_asyncio
nest_asyncio.apply()

Solution #5: What about Node.js?

I am wondering, what if I do the same work in Node.js, which has a built-in event loop?

Here is the full code:

It takes 1.1 to 1.5 seconds, you can give it a run to see the result in your machine.

...
test: 1195.290ms

Python, win the speed game!

(Looks like the request Node package is deprecated, but this sample is just for testing out how Node.js's event loop performs compare with Python's event loop.)

Let me know if you have a better/faster solution. If you have any questions, leave a comment and I will do my best to answer, If you spot an error or mistake, don't hesitate to mark them out. Thanks for reading.




Continue Learning