Discover the principle of arbitrage betting, from the initial idea through the creation of a system, its development and its implementation.
The arbitrage bot will be developed on Python and will allow you to learn how to collect, identify and isolate data, structure them, analyze them, and finally use the results obtained and present them to the user in an optimal way.
The odds format used in this article is the decimal odds which is equal to:
Total Payout = Stake x Decimal Odd Number
The practice of arbitrage betting may not be legal or by the platform you are using, it is your responsibility to decide whether or not to go against the terms and conditions of the platform. You may face sanctions such as suspension of your betting accounts and/or funds, legal action.
This article is in no way an encouragement to practice arbitrage on sports betting sites. This article is here to present you the general concept and the management of an IT project from A to Z, as well as to present the approach followed.
This article is intended for educational purposes only.
Arbitrage opportunities exist on a large number of markets, whether in traditional financial markets, crypto-currency exchanges, online marketplaces such as Amazon, sports betting sites, etc.
An “arbitrage software” or nowadays an arbitrage bot is a third party that will identify price differences between different providers selling the same service or object, and then decide to exploit this price difference by buying on the cheapest market and selling on the market where the price is highest.
The question to ask is: why is there a price difference for the same object or service?
The answer seems simple: Each platform is independent and offers its prices. The price is the main marketing tool put forward nowadays. The platforms have the intention to offer a price defying all competition and not to have a universal price that would put all platforms on an equal footing.
The platforms and especially the financial markets are subject to the law of supply and demand. The higher the demand on a platform, the higher the price is. This also influences the price independently of the platform’s will.
Concrete example on crypto currency exchanges: This really happened in April 2021, when a sudden surge in demand on a Korean platform pushed the price of Bitcoin 17% higher than on the European market Bitcoin cryptocurrency sells on exchange “A” Binance at $57,000 However on Korean exchange “B” Bithumb at $66,500 A person wishing to take advantage of this price difference will therefore buy on platform A and sell on platform B Bitcoin in order to make a financial profit.
Illustration of the price difference on several cryptocurrency exchanges
The phenomenon of arbitrage allows the market to self-regulate and balance itself on the different platforms since by buying on the platform where the price is the lowest, demand will increase and thus push the price up.
Sports Betting Sites
Arbitrage betting can also take place on sports betting sites.
Indeed, the goal is to find a match where the differences in odds are such that by betting on all the possibilities (Team 1 win, draw, Team 2 win) it is certain to make a profit regardless of the final result.
First of all, it is important to know that the sports betting site keeps a margin on all sports events. Indeed, the odds of a sports event cover more than 100% of the final possibilities, which guarantees the bookmaker to make a profit regardless of the outcome.
It is possible to know the margin of a bookmaker on a sporting event through a very simple calculation:
The probability of the event occurring according to the bookmaker is :
Example with the game Paris SG vs Angers rated at 1.31–5.75–9.55 on Betclic. According to Betclic, a bet quoted at 1.31 corresponds to a 1/1.31% probability of winning the match, i.e. 76.34% probability of winning the match. The same thing with the draw which has a 1/5.75% chance of happening or 17.40%. And finally Angers has (still according to Betclic) 1/95.5% chance of winning or 10.47%. The match is therefore secured at : 76.34% +17.4% + 10.47% = 104.21% The match is insured by Betclic at 104.21% but the event is 100% sure to happen, so Betclic insures the match above its true value and recovers this margin which is 4.21%.
Finally, in an event with 3 possible outcomes, such as soccer matches, you have to find a match that is less than 100% secured to take advantage of all the opportunities and still make a profit.
To do this, you need to get the odds information from different bookmakers (because it is impossible to have such an opportunity on the same site, except for an internal error) and find out if the event is covered at less than 100% through the different sites.
Let’s take for example the game: Porto vs Milan, the Porto win is quoted at 2.835 on 1Xbet, the draw is quoted at 3.8 on Pinnacle and the Milan win is at 2.65 on Marathon. The event is therefore covered at :
This means that betting on Porto’s win on 1Xbet, the draw on Pinnacle and Milan on Marathon, the odds cover only 99.33% probability of the event happening, however betting on all the opportunities we on our side cover 100% of the final results of the event, which ensures us a profit of 0.67%. To make it easier for you to see, we can calculate the amounts bet and deduct the winnings for each result. If I want to bet $1000 on this particular game: I will bet355.1$ on Porto at 2.835 which means that in case of victory I win 1006.71$. I’ll bet $264.9 on draw at 3.8 so in case of a draw I win $1006.62 I will bet 379.9$ on Milan at 2.65 which means that in case of victory I win 1006.74$. I can see that no matter what the final result is, I get back my initial bet of $1000 plus the profit of 0.67% which is $6.7.
I can see that no matter what the final result is, I get back my initial bet of $1000 plus the profit of 0.67% which is $6.7.
Even if this profit seems low, this kind of arbitrage can take place daily and better opportunities can occur during events that shake up the odds on different sports betting sites. In addition, the power of compound interest can make profits grow exponentially. I invite you to learn more about this area, maybe in a future article.
Difference between simple and compound interest
Now that you have had an introduction to the principle of arbitrage betting, we can begin to code a computer program, in our case a bot that will do the work for us. This work is divided into several tasks:
First of all, you have to be able to get all the data (the odds values on the different sports betting sites) to use them afterwards
Once the information has been collected, we need to be able to sort the information that we want to analyze and structure them in a table to be able to compare it and apply calculations
Once the data has been retrieved, sorted and structured in a table, the matches of the different sites must be linked together to compare their ratings. This task can become problematic since the different sites do not all use the same names for the sports teams
Once the data is linked we can finally calculate if an arbitrage opportunity exists or not
If an arbitrage opportunity exists, the user should be notified, whether on Discord, Twitter, Telegram
Finally, we need to put this program on a server that will be able to run it continuously so that the bot can look for arbitrage opportunities 24/7.
First of all, to perform analyses, we need to identify a data set. To do this, we select the data we need:
The site where the data comes from
The teams that will compete (Team 1 and Team 2)
The odds for the different outcomes (Win Team 1, Draw, Win Team 2)
Optional: Date of the match
The date of the match is optional but will allow us to identify how much time separates us from the match to decide on which arbitrage opportunity to position ourselves if several opportunities are offered at the same time.
Now that we have the dataset we want to retrieve we have to define the sites we want to analyze. For our project, we select the following sites, with a majority of French sites and a few sites abroad to have more important differences in rating.
- Betclic, Unibet, Zebet, 22 Bet, PMU, Winamax, Marathon, Netbet, Pinnacle, 1Xbet.
To retrieve our data we will use web scrapping :
This simply means that we will download or browse the source code of the web page, identify the data we are interested in from the website code, isolate this data and extract it.
For this, we’ll use several Python libraries such as :
- Requests, Beautiful Soup, Selenium.
The sites are not built in the same way, some sites have an API that is not secure or private that allows us to directly download a file in the form of .json containing all the data on the web page, this is done through the Requests library.
Other websites that are not “dynamic” and that do not change their odds directly on the web page have the data directly in the source code. In this case, we use Beautiful Soup to directly download the HTML code that contains the whole web page. Then we will isolate these values thanks to the structure of the page, i.e. the HTML tags, the class name/ID of the object in question.
Finally, some sites are dynamic and do not present their data in the source code. For these, we will use Selenium, a python library that simulates a web browser and will surf the web page to extract the required information.
Once the data from the different sites have been retrieved, we must first isolate them. This can be done easily by using the internal commands of Python or by using the Selenium or Beautiful Soup libraries which allow isolating data thanks to their HTML tag or their CLASS name.
For each match we have the following data set:
Bookmarker, Team1, Team2, Odds Win1, Odds Draw, Odds Win2, Game date
We will structure the data set in a list containing python dictionaries to facilitate processing and analysis later.
Once the data is collected and structured we can turn to the analysis.
At first, we will try to put in a “pack” or make sets of the same match.
Example: put the different “Paris vs Marseille” recovered on the different sites together in order to look for arbitrage opportunities on this specific game.
To do this we will use a Python library: Difflib with SequenceMatcher which allows identifying the similarity between different text variables. So we will gather together the matches whose team names are very similar. This allows us to gather identical matches together.
Finally, we will use the formula described earlier:
1/Odds Win1 + 1/Odds Draw+1/Odds Win2
To determine whether or not arbitrage is possible.
Two arbitrage methods are set up and are at the user’s choice:
- A method that will test all the possibilities of arbitrage between the odds of different bookmakers.
- V1: Odds of the victory of the team 1
- N : Odds of draw
- V2: Odds of the victory of the team 2 — Possibility 1 : V1 on Book n°1 / N on Book n°1 / V2 on Book n°2 — Possibility 2: V1 on Book n°1 / N on Book n°1 / V2 on Book n°3 — Possibility 3: V1 on Book n°1 / N on Book n°2 / V2 on Book n°1 etc.
This makes (number of bookmakers)² of arbitrage possibilities and therefore requires a big computing power if the number of bookmakers is important, that’s why from 9 bookmakers to analyze, we will favour using the method number 2.
- A method that will select the highest odds v1, v2, draw among all bookmakers and try to find an arbitrage possibility among these 3 odds using the same formula as before. This second method requires much less calculation and therefore allows faster analysis of the games.
However, using this second method we will only find the best opportunity on a game and not all possible opportunities. In our case, this detail does not bother us since we want the opportunities with the highest profit.
These calculations will be repeated on each of the “packages” of matches that we have done previously so that now we know for each game if an arbitrage opportunity exists or not, what is its potential profit (in % of the total amount invested on the game), what is the combination of bookmakers to be used to have the arbitrage, the date of the game.
In addition, each opportunity will be rated according to the date of the match, in fact, the longer the game is played the lower the rating will be, this will allow us to identify the best opportunities and to be able to take compound profits, betting on the opportunities whose game is played soon to be able to reinvest them directly.
To get the score use the following calculation:
An opportunity rated 10 means that a profit of 1% per day is feasible, which is a very decent return.
The Use of the Results
Once the calculation of the probabilities is carried out on a game, we know if there is a possibility of arbitration or not.
So now we have to:
Notify the user that an opportunity has been found.
Calculate the amount to bet on each odds to generate a guaranteed profit.
The calculation of the investment follows the following formula:
Example with $1000 to invest with a 99% insured match, i.e. 1% margin to be made by arbitrage betting on this sports event: Bet on the odds of the draw which is 3: Amount to bet on draw: 1000$(0.993)=336.7$: In case of a draw I can check the amount to recover which is $336.7(bet)3(odds)=$1010 which is equivalent to the starting investment of $1000 and the 1% profit.
This calculation is to be repeated on each odds of the arbitrage opportunity found.
To notify the user of an opportunity several channels can be used, Twitter, Telegram and Discord will be chosen for this project.
The three communication channels are managed through APIs and libraries on Python which simplifies their use.
Install the Program on a Server
Once all the algorithms and codes are finished, we finally have an operational bot, we only need to put it in operation on a private server to run the scripts in an automated way 24 hours a day, 7 days a week, and to be alerted at any time as not to miss any opportunities.
To do this, the Linux distribution provides a tool called “Cron Job” using CRONTAB which allows us to launch a program at regular intervals.
We will simply configure CRONTAB to launch the bot analysis every 30min on weekdays and every 15min on weekends, since the density of games is more important.
Configuration of crontab to perform a command every 30 minutes from 4 am to 10 pm
In practice the Bot is online on an AWS server.
Log and Error Management
To be able to “debug” our program and follow its evolution over time, to identify possible errors that may occur and to perform maintenance. It is important to have a logging and error management system.
All logs are stored in a Log file that will be unique for each session (every 30 minutes since the program will be run at regular intervals).
Another file “logErrors” will collect the list of errors that occurred during the sessions. Each logErrors contains the list of errors encountered in a whole day.
Example of the contents of the logErrors file which contains all the errors encountered during the execution of the script
A “logFounds” file will gather all the opportunities found during the day, a program will check that the opportunity has not already been notified to the user to avoid repetition.
Example of the content of the logFounds file with [profit,Team1,Team2,Book1,Book2,Book3,Odds V1,Odds Draw,Odds V2,Date]
Notification to the User
Finally here are some examples of notifications that are sent to the user:
Notification on Telegram of a found opportunity
Notification on Twitter of a found opportunity
Data for IA
Every day a “logData” file is generated which gathers all the data recovered from the bookmakers’ sites.
This file will be used later in a future project that will try to use neural networks to detect anomalies or EV+ odds on the next games of a team based on a historical data set.
To conclude this arbitrage bot explains how data is processed, from the data collection to the final result and notification for the user.
This personal project allowed me to discover the management of information and data through Python on a concrete use and to automate a long and repetitive task system.
Finally, the program can be improved to be able to analyze not only soccer competitions but also basket, tennis, baseball, and rugby, etc.
An enhancement to the program will be to allow data from different sports to be retrieved and processed to search for arbitrage opportunities.
It may be interesting to compare sports with three possible outcomes, such as soccer, to sports with two outcomes, such as tennis or basketball (including overtime).
Thanks for reading!