Skip to main content
Log in

Machine-Learning-Based Statistical Arbitrage Football Betting

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

Across countries and continents, football (soccer) has drawn increasingly more attention over the last decades and developed into a huge commercial complex. Consequently, the market of bookmakers providing the possibility to bet on the result of football matches grew rapidly, especially with the appearance of the internet. With a high number of games every week in multiple countries, football league matches hold enormous potential for generating profits over time with the use of advanced betting strategies. In this paper, we use machine learning for predicting the outcome of football league matches by exploiting data about match characteristics. Based on insights from the field of statistical arbitrage stock market trading, we show that one could generate meaningful profits over time by betting accordingly. A simulation study analyzing the matches of the five top European football leagues from season 2013/14 to 2017/18 presented economically and statistically significant returns achieved by exploiting large data sets with modern machine learning algorithms. In contrast to these modern algorithms, the break-even point could not be reached with an ordinary linear regression approach or simple betting strategies, e.g. always betting on the home team.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We thank https://www.sportal.de/ for providing the data.

  2. We thank https://www.football-data.co.uk/data.php for providing the data.

  3. Please contact the authors if you are interested in the data and the code.

  4. Without loss of generality, our model can also be used for matches without home advantage, e.g., FIFA World Cup and UEFA Euro Cup. In this case both teams would be neutral teams.

References

  1. Archontakis F, Osborne E (2007) Playing it safe? A Fibonacci strategy for soccer betting. J Sports Econ 8(3):295–308

    Google Scholar 

  2. Avellaneda M, Lee JH (2010) Statistical arbitrage in the US equities market. Quant Finance 10(7):761–782

    MathSciNet  MATH  Google Scholar 

  3. Bernile G, Lyandres E (2011) Understanding investor sentiment: the case of soccer. Financ Manag 40(2):357–380

    Google Scholar 

  4. Bertram WK (2010) Analytic solutions for optimal statistical arbitrage trading. Phys A Stat Mech Appl 389(11):2234–2243

    Google Scholar 

  5. Bollinger J (2001) Bollinger on bollinger bands. McGraw-Hill, New York

    Google Scholar 

  6. Boshnakov G, Kharrat T, McHale IG (2017) A bivariate weibull count model for forecasting association football scores. Int J Forecast 33(2):458–466

    Google Scholar 

  7. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  8. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505

    MathSciNet  MATH  Google Scholar 

  9. Chen T, He T, Benesty M (2015) xgboost: extreme gradient boosting. R package version 0.3-0. In: Technical Report

  10. Choi D, Hui SK (2014) The role of surprise: understanding overreaction and underreaction to unanticipated events using in-play soccer betting market. J Econ Behav Org 107:614–629

    Google Scholar 

  11. Croxson K, Reade J (2014) Information and efficiency: goal arrival in soccer betting. Econ J 124(575):62–91

    Google Scholar 

  12. Dixon M, Coles S (1997) Modelling association football scores and inefficiencies in the football betting market. J R Stat Soc Ser C (Appl Stat) 46(2):265–280

    Google Scholar 

  13. Dragulescu AA, Dragulescu MAAA (2014) PROVIDE, R. Package ‘xlsx’. Cell, 2018, 9. Jg., Nr. 1, S. 5

  14. Egidi L, Pauli F, Torelli N (2018) Combining historical data and bookmakers’ odds in modelling football scores. Stat Model 18(5–6):436–459

    MathSciNet  Google Scholar 

  15. Endres S, Stübinger J (2019) Optimal trading strategies for Lévy-driven Ornstein–Uhlenbeck processes. Appl Econ 51(29):3153–3169

    Google Scholar 

  16. Endres S, Stübinger J (2019) Regime-switching modeling of high-frequency stock returns with Lévy jumps. Quantitative Finance, Forthcoming

  17. Forrest D, Simmons R (2008) Sentiment in the betting market on Spanish football. Appl Econ 40(1):119–126

    Google Scholar 

  18. Franck E, Verbeek E, Nüesch S (2010) Prediction accuracy of different market structures—bookmakers versus a betting exchange. Int J Forecast 26(3):448–459

    Google Scholar 

  19. Franck E, Verbeek E, Nüesch S (2013) Inter-market arbitrage in betting. Economica 80(318):300–325

    Google Scholar 

  20. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat: 1189–1232

  21. Gatev E, Goetzmann WN, Rouwenhorst KG (2006) Pairs trading: performance of a relative-value arbitrage rule. Rev Financ Stud 19(3):797–827

    Google Scholar 

  22. Gil RGR, Levitt SD (2012) Testing the efficiency of markets in the 2002 World Cup. J Predict Markets 1(3):255–270

    Google Scholar 

  23. Godin F, Zuallaert J, Vandersmissen B, de Neve W, van de Walle R (2014) Beating the bookmakers: leveraging statistics and Twitter microposts for predicting soccer results. In: KDD workshop on large-scale sports analytics, New York, USA, 24–28 Aug 2014

  24. Groll A, Kneib T, Mayr A, Schauberger G (2018) On the dependency of soccer scores—a sparse bivariate poisson model for the UEFA European football championship 2016. J Quant Anal Sports 14(2):65–79

    Google Scholar 

  25. Groll A, Ley C, Schauberger G, Van Eetvelde H (2019) A hybrid random forest to predict soccer matches in international tournaments. J Quant Anal ports. (to appear)

  26. Groll A, Schauberger G, Tutz G (2015) Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: an application to the FIFA World Cup 2014. J Quant Anal Sports 11(2):97–115

    Google Scholar 

  27. Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intelligencer 27(2):83–85

    Google Scholar 

  28. Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674

    MathSciNet  Google Scholar 

  29. Jegadeesh N, Titman S (1993) Returns to buying winners and selling losers: implications for stock market efficiency. J Finance 48(1):65–91

    Google Scholar 

  30. Kelly AH (1956) The fourteenth amendment reconsidered: the segregation question. Mich Law Rev 54(8):1049–1086

    Google Scholar 

  31. Knoll J, Stübinger J, Grottke M (2019) Exploiting social media with higher-order factorization machines: statistical arbitrage on high-frequency data of the S&P 500. Quant Finance 19(4):571–585

    MathSciNet  MATH  Google Scholar 

  32. Koopman EME, Hakemulder F (2015) Effects of literature on empathy and self-reflection: a theoretical-empirical framework. J Lit Theory 9(1):79–111

    Google Scholar 

  33. Leifeld P (2013) texreg: conversion of statistical model output in R to HTML tables. J Stat Softw 55(8):1–24

    Google Scholar 

  34. Levitt SD (2004) Why are gambling markets organised so differently from financial markets? Econ J 114(495):223–246

    Google Scholar 

  35. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  36. Lisi F, Zanella G (2017) Tennis betting: can statistics beat bookmakers? Electron J Appl Stat Anal 10(3):790–808

    MathSciNet  Google Scholar 

  37. Liu B, Chang LB, Geman H (2017) Intraday pairs trading strategies on high frequency data: the case of oil companies. Quant Finance 17(1):87–100

    MathSciNet  MATH  Google Scholar 

  38. Luckner S, Schröder J, Slamka C (2008) On the forecast accuracy of sports prediction markets. Negotiation, auctions, and market engineering. Springer, Berlin, Heidelberg, pp 227–234

    Google Scholar 

  39. Maher M (1982) Modelling association football scores. Stat Neerl 36(3):109–118

    Google Scholar 

  40. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2017) e1071: misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. R package version 1.6-8

  41. Palomino F, Renneboog L, Zhang C (2009) Information salience, investor sentiment, and stock returns: the case of British soccer betting. J Corp Finance 15(3):368–387

    Google Scholar 

  42. Peterson BG, Carl P, Boudt K, Bennett R, Ulrich J, Zivot E, Wuertz D (2014) Performance analytics: econometric tools for performance and risk analysis. R package version 1.4. 3541

  43. Pfaff B, McNeil A, Ulmann S (2013) QRM: provides R language code to examine quantitative risk management concepts. R package version 0.4-9. http://CRAN.R-project.org/package=QRM

  44. R Core Team (2017) stats: a language and environment for statistical computing. R package

  45. Team RC, Wuertz D, Setz T, Chalabi Y (2015) timeSeries: Rmetrics —Financial time series objects. R package version, 3012

  46. Rue H, Salvesen O (2000) Prediction and retrospective analysis of soccer matches in a league. J R Stat Soc Ser D (Stati) 49(3):399–418

    Google Scholar 

  47. Ryan JA, Ulrich JM (2017) quantmod: Quantitative financial modelling framework. R package version 0.4-12

  48. Ryan JA, Ulrich JM (2014) xts: eXtensible time series. R package version 0.8-2

  49. Schauberger G, Groll A, Tutz G (2018) Analysis of the importance of on-field covariates in the German Bundesliga. J Appl Stat 45(9):1561–1578

    MathSciNet  Google Scholar 

  50. Spann M, Skiera B (2009) Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters. J Forecast 28(1):55–72

    MathSciNet  Google Scholar 

  51. Stefani RT (1980) Improved least squares football, basketball, and soccer predictions. IEEE Trans Syst Man Cybernetics 10(2):116–123

    Google Scholar 

  52. Steinwart I, Christmann A (2008) Support vector machines. Springer, New York

    MATH  Google Scholar 

  53. Stekler HO, Sendor D, Verlander R (2010) Issues in sports forecasting. Int J Forecast 26(3):606–621

    Google Scholar 

  54. Stübinger J (2019) Statistical arbitrage with optimal causal paths on high-frequency data of the S&P 500. Quant Finance 19(6):921–935

    MathSciNet  MATH  Google Scholar 

  55. Stübinger J, Endres S (2018) Pairs trading with a mean-reverting jump-diffusion model on high-frequency data. Quant Finance 18(10):1735–1751

    MathSciNet  MATH  Google Scholar 

  56. Stübinger J, Knoll J (2018) Beat the bookmaker - Winning football bets with machine learning (Best Application Paper). In: proceedings of the 38th SGAI international conference on artificial intelligence, pp. 219–233. Springer

  57. Stübinger J, Mangold B, Krauss C (2018) Statistical arbitrage with vine copulas. Quanti Finance 18(11):1831–1849

    MathSciNet  MATH  Google Scholar 

  58. Tax N, Joustra Y (2015) Predicting the Dutch football competition using public data: a machine learning approach. Trans Knowl Data Eng 10(10):1–13

    Google Scholar 

  59. Trapletti A, Hornik K, Lebaron B (2007) Tseries: time series analysis and computational finance. R package version 0.10-11

  60. Ulrich J (2016) TTR: technical trading rules. R package

  61. Wickham H, Bryan J (2016) readxl: Read Excel files. R package 1.0. 0. 2017

  62. Wickham H, Francois R, Henry L, Müller K (2015) dplyr: a grammar of data manipulation. R package version 0.4, 3

  63. Wickham H, Hester J, Francois R, Jylänki J, Jørgensen M (2017) readr: read rectangular text data. R foundation for statistical computing. R package version 1.1.1

  64. Zeileis A (2006) Object-oriented computation of sandwich estimators. J Stat Softw 16(9):1–16

    Google Scholar 

  65. Zeileis A, Grothendieck G (2005) zoo: S3 infrastructure for regular and irregular time series. J Stat Softw 14(6):1–27

    Google Scholar 

  66. Zeileis A, Leitner C, Hornik K (2016) Predictive bookmaker consensus model for the UEFA Euro 2016. In: Working papers in economics and statistics

  67. Zeileis A, Leitner C, Hornik K (2018) Probabilistic forecasts for the 2018 FIFA World Cup based on the bookmaker consensus model. In: working papers in economics and statistics

  68. Zhou ZH (2012) Ensemble methods: foundations and algorithms. Chapman and Hall, Boca Raton

    Google Scholar 

Download references

Acknowledgements

We are grateful to two anonymous referees for many helpful suggestions on this topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Knoll.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Knoll, J., Stübinger, J. Machine-Learning-Based Statistical Arbitrage Football Betting. Künstl Intell 34, 69–80 (2020). https://doi.org/10.1007/s13218-019-00610-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-019-00610-4

Keywords

Navigation