Explore the future of Web Scraping. Request a free invite to ScrapeCon 2024

Auto-Schedule Python Scripts using Crontab

Automation is the future, and Python scripts are an excellent way to get started in your automation journey. You will need a way of scheduling though in order for this automation process to not only be effective but also manageable!

In my example, we will send a daily mail with 404 errors from Google Search Console using a local cronjob on your Macbook — which isn’t difficult at all by any means.

image

Case: 404’s in Google Search Console

In order to make your SEO audit stand out from the crowd, it’s important that you do a little bit more than just check the coverage error report in Google. If any 404 webpages are indexed by Google and these go unnoticed and un-checked then they could cause very bad UX for users of search engines who try looking up information about your business. You don’t want to have a 404 page as a first impression. Eventually, also your rankings and traffic will decrease.

In a previous blog post, I shared a script to get all 404 webpages in a Google Search console account. Listing 404 Pages Indexed by Google Using Python Getting started with Python for SEOpython.plainenglish.io

The next step would be sending updates via email daily/weekly enough so no one can ever say ‘I didn’t know’ when they accidentally removed a good ranking webpage.

The updated script can be found here.

The output will be an e-mail with a list of the 404 errors and for every 404 webpage, the amount of clicks and impressions in the last “[numberdays]” days.

Script scheduling on Mac

Now, how can you execute a Python script every hour, day, or week? You may be asking yourself this.

Well, for starters, if the program was made with Google Colab or Jupyter Notebook then convert it into a .py (python) file so that it can run using a command-line tool like terminal.

All it takes are only 5 steps.

  1. Open the terminal

  2. Check if you can execute the script

Type in the following script

python3 /path/to/your/script.py

with /path/to/your/script.py your location in my case

python3 /Users/Michael.vandenreym/Desktop/python/console404.py

If the script is running correctly, proceed with the next step.

  • Note: Depending on your version of python you may have to replace python3 with python or /usr/bin/python

3. Open crontab

Type the following command to open the crontab file.

nano crontab -e

You now opened the crontab file in the nano editor with editing privileges.

4. Edit crontab

In the crontab file enter the following command.

0 1 * * * python3 /path/script.py

This will run the scrape at minute = 0 , hour = 1, day = _ : every day, month=: _ every month, weekday= every weekday

Using the website https://crontab.guru/, it’s easy to create any schedule you can wish for.

Don’t forget to replace /path/script.py with the path of your Python file.

5. Save the crontab file

Keyword combination CTRL+O will save the file and you can exit using CTRL+X

Great work, now your script should be running on schedule.

Question or feedback? You can contact me on LinkedIn: https://www.linkedin.com/in/michaelvdr or Twitter: https://www.twitter.com/vdrweb




Continue Learning