Thought leadership from the most innovative tech companies, all in one place.

Bypassing CAPTCHAs Using Python

Bypassing Google's v2 reCAPTCHA using Python requests library and Selenium

Graphic of a robotic finger

We all have seen CAPTCHAs all over the internet, whether it is during signing up or logging into a website, and in many other situations.

CAPTCHAs (or Completely Automated Public Turing tests to tell Computers and Humans Apart) are designed to be a gate that lets humans through and robots (programs) out. These are getting smarter day by day and currently, there is version 2 of Google's reCAPTCHA, which looks something like this:

image

And sometimes you will even see something like this:

image

This is mainly because your score is below the Googles's human threshold and you will see puzzles like these. You will Also, get puzzles like these if you are using selenium in python or requests library.

There are many services out there that solve these captchas for you and in turn, they charge a small fee from you, to name a few services 2captcha, deathbycaptcha, etc. In this example, we will be using 2captcha.

Pre-requisites

This step assumes that you already have a 2captcha account with their API key (which you will find on the screen as soon as you log in). For eg, we will be using this site to bypass their ReCaptcha.

You'll first need the site key from their page. The site key is a unique key that every site gets when reCaptcha is integrated on the site's forms. We will be using this site key to send it to the 2captcha. You'll find the site-key when you inspect the element of the page like this:

image

Copy this site key and store it.

When we will send a request to solve captcha to 2captcha we will receive the response of the solved captcha which we will need to enter in the hidden text field with ID g-recaptcha-response. The images below show where you will find it I have disables the CSS property display.

image

Additionally, we also need the ID (Recaptcha-demo-submit) of the submit button which we can easily find on the page which will help us in submitting the form later.

For reCAPTCHA v2, the results can take upwards of 15--30 seconds.

Coding part

from selenium import webdriver
import requests
import time

pageurl = 'https://www.google.com/recaptcha/api2/demo'

driver = webdriver.Chrome(executable_path=r'chromedriver_win32\chromedriver.exe')
driver.get(pageurl)

Importing the required libraries and visiting the page in selenium

NOTE: I have given the executable_path the path of where the chrome-driver is present in my system, edit it accordingly.

Now we need to send the request to 2captcha for receiving. First the code:

site_key = "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-"

with open(r"api_key.txt", "r") as f:
  api_key = f.read()

form = {"method": "userrecaptcha",
        "googlekey": site_key,
        "key": api_key,
        "pageurl": pageurl,
        "json": 1}

response = requests.post('http://2captcha.com/in.php', data=form)
request_id = response.json()['request']

Now, let's go through the code. 2Captcha's API works via a 2 step process first we send the data in and then check for the results with the request ID that we receive. For reCAPTCHA v2 we need to send the sitekey(mentioned earlier) and the pageurlto tell where key this is located. Also, we need to make sure that the methodis set touserrecaptcha . The JSON parameter in the form is just to emphasize that we would prefer a JSON response.

Now, that we are set with our form data for the first step we need to make a post request to the 2captcha, from this we will receive a request ID. Using this request ID we will need to check whether our response is ready or not.

After you make this call and get a request ID back, you need to make another request to res.php URL with your API key and the request ID to get the response.

url = f"http://2captcha.com/res.php?key={api_key}&action=get&id={request_id}&json=1"

status = 0
while not status:
    res = requests.get(url)
    if res.json()['status']==0:
        time.sleep(3)
    else:
        requ = res.json()['request']
        js = f'document.getElementById("g-recaptcha-response").innerHTML="{requ}";'
        driver.execute_script(js)
        driver.find_element_by_id("recaptcha-demo-submit").submit()
        status = 1

If your CAPTCHA is not ready then you'll receive a CAPTCHA_NOT_READY the response which indicates you need to try again in a second or two.

{'status':0, 'request': 'CAPTCHA_NOT_READY'}

When it is ready the response will be the appropriate data for the method you sent. For this, we will keep checking in an interval of every 3 seconds. On a successful response, we will inject that result in the textarea that we talked about earlier.

{'status':1, 'request': '[VALID_RESPONSE]'}

After, this we need to submit the form. You'll see this in the end:

image

A quick demo!

Is this ethically correct?

This post isn't a hackery tool that you can use to bypass CAPTCHAs. CAPTCHAs are something that no one with a sane mind will use as an actual defence it's more like a hurdle that adds a cost to the automation systems. One can use/absorb this cost by implementing this.

I hope this post gives you a good enough idea of how these CAPTCHAs work in the backend and how to go about bypassing these using neat little scripts.




Continue Learning