Abstract
Across countries and continents, football (soccer) has drawn increasingly more attention over the last decades and developed into a huge commercial complex. Consequently, the market of bookmakers providing the possibility to bet on the result of football matches grew rapidly, especially with the appearance of the internet. With a high number of games every week in multiple countries, football league matches hold enormous potential for generating profits over time with the use of advanced betting strategies. In this paper, we use machine learning for predicting the outcome of football league matches by exploiting data about match characteristics. Based on insights from the field of statistical arbitrage stock market trading, we show that one could generate meaningful profits over time by betting accordingly. A simulation study analyzing the matches of the five top European football leagues from season 2013/14 to 2017/18 presented economically and statistically significant returns achieved by exploiting large data sets with modern machine learning algorithms. In contrast to these modern algorithms, the break-even point could not be reached with an ordinary linear regression approach or simple betting strategies, e.g. always betting on the home team.
Similar content being viewed by others
Notes
We thank https://www.sportal.de/ for providing the data.
We thank https://www.football-data.co.uk/data.php for providing the data.
Please contact the authors if you are interested in the data and the code.
Without loss of generality, our model can also be used for matches without home advantage, e.g., FIFA World Cup and UEFA Euro Cup. In this case both teams would be neutral teams.
References
Archontakis F, Osborne E (2007) Playing it safe? A Fibonacci strategy for soccer betting. J Sports Econ 8(3):295–308
Avellaneda M, Lee JH (2010) Statistical arbitrage in the US equities market. Quant Finance 10(7):761–782
Bernile G, Lyandres E (2011) Understanding investor sentiment: the case of soccer. Financ Manag 40(2):357–380
Bertram WK (2010) Analytic solutions for optimal statistical arbitrage trading. Phys A Stat Mech Appl 389(11):2234–2243
Bollinger J (2001) Bollinger on bollinger bands. McGraw-Hill, New York
Boshnakov G, Kharrat T, McHale IG (2017) A bivariate weibull count model for forecasting association football scores. Int J Forecast 33(2):458–466
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505
Chen T, He T, Benesty M (2015) xgboost: extreme gradient boosting. R package version 0.3-0. In: Technical Report
Choi D, Hui SK (2014) The role of surprise: understanding overreaction and underreaction to unanticipated events using in-play soccer betting market. J Econ Behav Org 107:614–629
Croxson K, Reade J (2014) Information and efficiency: goal arrival in soccer betting. Econ J 124(575):62–91
Dixon M, Coles S (1997) Modelling association football scores and inefficiencies in the football betting market. J R Stat Soc Ser C (Appl Stat) 46(2):265–280
Dragulescu AA, Dragulescu MAAA (2014) PROVIDE, R. Package ‘xlsx’. Cell, 2018, 9. Jg., Nr. 1, S. 5
Egidi L, Pauli F, Torelli N (2018) Combining historical data and bookmakers’ odds in modelling football scores. Stat Model 18(5–6):436–459
Endres S, Stübinger J (2019) Optimal trading strategies for Lévy-driven Ornstein–Uhlenbeck processes. Appl Econ 51(29):3153–3169
Endres S, Stübinger J (2019) Regime-switching modeling of high-frequency stock returns with Lévy jumps. Quantitative Finance, Forthcoming
Forrest D, Simmons R (2008) Sentiment in the betting market on Spanish football. Appl Econ 40(1):119–126
Franck E, Verbeek E, Nüesch S (2010) Prediction accuracy of different market structures—bookmakers versus a betting exchange. Int J Forecast 26(3):448–459
Franck E, Verbeek E, Nüesch S (2013) Inter-market arbitrage in betting. Economica 80(318):300–325
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat: 1189–1232
Gatev E, Goetzmann WN, Rouwenhorst KG (2006) Pairs trading: performance of a relative-value arbitrage rule. Rev Financ Stud 19(3):797–827
Gil RGR, Levitt SD (2012) Testing the efficiency of markets in the 2002 World Cup. J Predict Markets 1(3):255–270
Godin F, Zuallaert J, Vandersmissen B, de Neve W, van de Walle R (2014) Beating the bookmakers: leveraging statistics and Twitter microposts for predicting soccer results. In: KDD workshop on large-scale sports analytics, New York, USA, 24–28 Aug 2014
Groll A, Kneib T, Mayr A, Schauberger G (2018) On the dependency of soccer scores—a sparse bivariate poisson model for the UEFA European football championship 2016. J Quant Anal Sports 14(2):65–79
Groll A, Ley C, Schauberger G, Van Eetvelde H (2019) A hybrid random forest to predict soccer matches in international tournaments. J Quant Anal ports. (to appear)
Groll A, Schauberger G, Tutz G (2015) Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: an application to the FIFA World Cup 2014. J Quant Anal Sports 11(2):97–115
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intelligencer 27(2):83–85
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674
Jegadeesh N, Titman S (1993) Returns to buying winners and selling losers: implications for stock market efficiency. J Finance 48(1):65–91
Kelly AH (1956) The fourteenth amendment reconsidered: the segregation question. Mich Law Rev 54(8):1049–1086
Knoll J, Stübinger J, Grottke M (2019) Exploiting social media with higher-order factorization machines: statistical arbitrage on high-frequency data of the S&P 500. Quant Finance 19(4):571–585
Koopman EME, Hakemulder F (2015) Effects of literature on empathy and self-reflection: a theoretical-empirical framework. J Lit Theory 9(1):79–111
Leifeld P (2013) texreg: conversion of statistical model output in R to HTML tables. J Stat Softw 55(8):1–24
Levitt SD (2004) Why are gambling markets organised so differently from financial markets? Econ J 114(495):223–246
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Lisi F, Zanella G (2017) Tennis betting: can statistics beat bookmakers? Electron J Appl Stat Anal 10(3):790–808
Liu B, Chang LB, Geman H (2017) Intraday pairs trading strategies on high frequency data: the case of oil companies. Quant Finance 17(1):87–100
Luckner S, Schröder J, Slamka C (2008) On the forecast accuracy of sports prediction markets. Negotiation, auctions, and market engineering. Springer, Berlin, Heidelberg, pp 227–234
Maher M (1982) Modelling association football scores. Stat Neerl 36(3):109–118
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2017) e1071: misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. R package version 1.6-8
Palomino F, Renneboog L, Zhang C (2009) Information salience, investor sentiment, and stock returns: the case of British soccer betting. J Corp Finance 15(3):368–387
Peterson BG, Carl P, Boudt K, Bennett R, Ulrich J, Zivot E, Wuertz D (2014) Performance analytics: econometric tools for performance and risk analysis. R package version 1.4. 3541
Pfaff B, McNeil A, Ulmann S (2013) QRM: provides R language code to examine quantitative risk management concepts. R package version 0.4-9. http://CRAN.R-project.org/package=QRM
R Core Team (2017) stats: a language and environment for statistical computing. R package
Team RC, Wuertz D, Setz T, Chalabi Y (2015) timeSeries: Rmetrics —Financial time series objects. R package version, 3012
Rue H, Salvesen O (2000) Prediction and retrospective analysis of soccer matches in a league. J R Stat Soc Ser D (Stati) 49(3):399–418
Ryan JA, Ulrich JM (2017) quantmod: Quantitative financial modelling framework. R package version 0.4-12
Ryan JA, Ulrich JM (2014) xts: eXtensible time series. R package version 0.8-2
Schauberger G, Groll A, Tutz G (2018) Analysis of the importance of on-field covariates in the German Bundesliga. J Appl Stat 45(9):1561–1578
Spann M, Skiera B (2009) Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters. J Forecast 28(1):55–72
Stefani RT (1980) Improved least squares football, basketball, and soccer predictions. IEEE Trans Syst Man Cybernetics 10(2):116–123
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
Stekler HO, Sendor D, Verlander R (2010) Issues in sports forecasting. Int J Forecast 26(3):606–621
Stübinger J (2019) Statistical arbitrage with optimal causal paths on high-frequency data of the S&P 500. Quant Finance 19(6):921–935
Stübinger J, Endres S (2018) Pairs trading with a mean-reverting jump-diffusion model on high-frequency data. Quant Finance 18(10):1735–1751
Stübinger J, Knoll J (2018) Beat the bookmaker - Winning football bets with machine learning (Best Application Paper). In: proceedings of the 38th SGAI international conference on artificial intelligence, pp. 219–233. Springer
Stübinger J, Mangold B, Krauss C (2018) Statistical arbitrage with vine copulas. Quanti Finance 18(11):1831–1849
Tax N, Joustra Y (2015) Predicting the Dutch football competition using public data: a machine learning approach. Trans Knowl Data Eng 10(10):1–13
Trapletti A, Hornik K, Lebaron B (2007) Tseries: time series analysis and computational finance. R package version 0.10-11
Ulrich J (2016) TTR: technical trading rules. R package
Wickham H, Bryan J (2016) readxl: Read Excel files. R package 1.0. 0. 2017
Wickham H, Francois R, Henry L, Müller K (2015) dplyr: a grammar of data manipulation. R package version 0.4, 3
Wickham H, Hester J, Francois R, Jylänki J, Jørgensen M (2017) readr: read rectangular text data. R foundation for statistical computing. R package version 1.1.1
Zeileis A (2006) Object-oriented computation of sandwich estimators. J Stat Softw 16(9):1–16
Zeileis A, Grothendieck G (2005) zoo: S3 infrastructure for regular and irregular time series. J Stat Softw 14(6):1–27
Zeileis A, Leitner C, Hornik K (2016) Predictive bookmaker consensus model for the UEFA Euro 2016. In: Working papers in economics and statistics
Zeileis A, Leitner C, Hornik K (2018) Probabilistic forecasts for the 2018 FIFA World Cup based on the bookmaker consensus model. In: working papers in economics and statistics
Zhou ZH (2012) Ensemble methods: foundations and algorithms. Chapman and Hall, Boca Raton
Acknowledgements
We are grateful to two anonymous referees for many helpful suggestions on this topic.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Knoll, J., Stübinger, J. Machine-Learning-Based Statistical Arbitrage Football Betting. Künstl Intell 34, 69–80 (2020). https://doi.org/10.1007/s13218-019-00610-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-019-00610-4