A data-driven optimization approach to baseball roster management

Barnes, Sean; Bjarnadóttir, Margrét; Smolyak, Daniel; Thiele, Aurélie

doi:10.1007/s10479-023-05725-4

A data-driven optimization approach to baseball roster management

Original Research
Published: 15 January 2024

Volume 335, pages 33–58, (2024)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Sean Barnes¹,
Margrét Bjarnadóttir ORCID: orcid.org/0000-0003-2955-1992²,
Daniel Smolyak³ &
…
Aurélie Thiele⁴

232 Accesses
Explore all metrics

Abstract

Each year, major league baseball (MLB) teams face complex decisions about which players to retain and which players to recruit. In addition to operational, team and budget constraints, these decisions are further complicated by the fact that an athlete’s future performance and its impact on the team are both uncertain. In this paper, we combine prediction modeling with decision optimization to study the MLB free agent market. We develop optimization models for the allocation of a team’s recruitment budget using six different metrics that evaluate a player’s contributions to a team’s success. We consider both an ideal case, where each team can choose among all free agents, and a sequential case, where we assume that teams with stronger appeal (big market) are more successful in attracting talent, while teams with less pull must optimize their rosters over a much smaller pool of remaining players. Using the best-performing metric, which takes into account both players’ positions and their positional flexibility, we develop a series of quantitative tools that help teams, especially those with small budgets, identify (1) the players who deliver a key competitive advantage to their teams, appearing in both their ideal and sequential rosters and (2) the players who are in many ideal rosters and thus are likely to be hired by teams with big budgets, perhaps at a substantial salary premium. In order to gain and maintain an edge in the fiercely competitive free agent market, teams need to continuously adapt their strategies, and our models represent a first step towards prescriptive (not just predictive) analytics designed to help them do so. Further, our analysis indicates that a few players are in high demand from many teams (for instance, in every year of the period considered, the ten most in-demand players appear in the ideal rosters of at least seven teams), while most players appear in one ideal roster or none at all. Our models go beyond players’ individual performance metrics to help teams understand which players will be in high demand due to teams’ position needs in a given year. The results further emphasize the increasing importance of contract extensions as a strategy to bypass the free agent market.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing performance with an eye on the finances: a chance-constrained model for football transfer market decisions

Article 23 October 2020

A data-driven integer programming model for soccer clubs’ decision making on player transfers

Article 01 March 2019

An innovative method for accurate NBA player performance forecasting and line-up optimization in daily fantasy sports

Article Open access 19 March 2024

References

Albert, J. (2006). Pitching statistics, talent and luck, and the best strikeout seasons of all-time. Journal of Quantitative Analysis in Sports, 2(1).
Barnes, S. L., & Bjarnadóttir, M. V. (2016). Great expectations: An analysis of major league baseball free agent performance. Statistical Analysis and Data Mining, 9(5), 295–309.
Article Google Scholar
Baumer, B., & Zimbalist, A. (2014). The sabermetric revolution: Assessing the growth of analytics in baseball. University of Pennsylvania Press.
Book Google Scholar
Bendtsen, M. (2017). Regimes in baseball players’ career data. Data Mining and Knowledge Discovery, 31, 1580–1621.
Article Google Scholar
Ben-Tal, A., El Ghaoui, L., & Nemirovski, A. (2009). Robust optimization. Princeton series in applied mathematicsPrinceton University Press.
Book Google Scholar
Bertsimas, D., & Sim, M. (2003). Price of robustness. Operations Research, 52, 35–53.
Article Google Scholar
Brave, S. A., Butters, R. A., & Roberts, K. A. (2019). Uncovering the sources of team synergy: Player complementarities in the production of wins. Journal of Sports Analytics, 5(4), 247–279.
Article Google Scholar
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroskedasticity and random coefficient variation. Econometrica, 47(5), 1287–1294.
Article Google Scholar
Busing, C., Koster, A., & Kutschka, M. (2011). Recoverable robust knapsacks: The discrete scenario case. Optimization Letters, 5, 379–392.
Article Google Scholar
Chan, T. C. Y., & Fearing, D. S. (2013). The value of flexibility in baseball roster construction. In MIT sloan sports analytics conference.
Chan, T. C. Y., & Fearing, D. S. (2019). Process flexibility in baseball: The value of positional flexibility. Management Science, 65(4), 1642–1666.
Article Google Scholar
Chung, D. J. (2017). How much is a win worth? An application to intercollegiate athletics. Management Science, 63, 548–565.
Article Google Scholar
Cot’s Baseball Contracts. Highest paid players. https://legacy.baseballprospectus.com/compensation/cots/league-info/highest-paid-players/
DeBrock, L., Hendricks, W., & Koenker, R. (2004). Pay and performance. The impact of salary distribution on firm-level outcomes in baseball. Journal of Sports Economics, 5(3), 243–261.
Article Google Scholar
Depken, C. A. (2000). Wage disparity and team productivity: Evidence from major league baseball. Economics Letters, 67, 87–92.
Article Google Scholar
Elitzur, R. (2020). Data analytics effects in major league baseball. Omega, 90, 102001. https://doi.org/10.1016/j.omega.2018.11.010
Article Google Scholar
Farrar, A., & Bruggink, T. H. (2011). A new test of the Moneyball hypothesis. The Sport Journal, 14(1), 1–9.
Google Scholar
Frick, B., Prinz, J., & Winkelmann, K. (2003). Pay inequalities and team performance: Empirical evidence from the North American major leagues. International Journal of Manpower, 24(4), 472–488.
Article Google Scholar
Gross, A., & Link, C. (2017). Does option theory hold for Majorl League Baseball contracts? Economic Inquiry, 55(1), 425–433.
Article Google Scholar
Hakes, J. K., & Sauer, R. D. (2006). An economic evaluation of the Moneyball hypothesis. Journal of Economic Perspectives, 20(3), 173–185.
Article Google Scholar
Hall, S., Szymanski, S., & Zimbalist, A. S. (2002). Testing causality between team performance and payroll. The cases of major league baseball and English soccer. Journal of Sports Economics, 3, 149–168.
Article Google Scholar
Humphrey, S. E., Morgenson, F. P., & Mannor, M. J. (2009). Developing a theory of the strategic core of teams: A role composition model of team performance. Journal of Applied Psychology, 94(1), 48–60.
Article Google Scholar
Humphreys, B. R., & Pyun, H. (2017). Monopsony exploitation in professional sport: Evidence from major league baseball position players, 2000–2011. Managerial and Decision Economics, 28, 676–688.
Article Google Scholar
Kahn, L. M. (1993). Managerial quality, team success, and individual player performance in major league baseball. ILR Review, 46, 531–547.
Article Google Scholar
Kasperski, A., & Zielinski, P. (2016). Robust discrete optimization under discrete and interval uncertainty: A survey. In Robustness analysis in decision aiding, optimization and analytics. Springer.
Kim, J. W., & King, B. G. (2014). Seeing stars: Matthew effects and status bias in major league baseball umpiring. Management Science, 60(11), 2619–2644.
Article Google Scholar
Koop, G. (2002). Comparing the performance of baseball players. Journal of the American Statistical Association, 97(459), 710–720. https://doi.org/10.1198/016214502388618456
Article Google Scholar
Koseler, K., & Stephan, M. (2017). Machine learning applications in baseball: A systematic literature review. Applied Artificial Intelligence, 31(9–10), 745–763. https://doi.org/10.1080/08839514.2018.1442991
Article Google Scholar
Krautmann, A. C. (1990). Shirking or stochastic productivity in major league baseball? Southern Economic Journal, 5(4), 961–968.
Article Google Scholar
Krautmann, A. C. (2016). Contract extensions: The case of major league baseball. Journal of Sports Economics, 19, 1–16.
Google Scholar
Lackritz, J. R. (1990). Salary evaluation for professional baseball players. The American Statistician, 44(1), 4–8. https://doi.org/10.1080/00031305.1990.10475682
Article Google Scholar
Lesaege, C., & Poss, M. (2016). The partial choice recoverable knapsack problem. Computational Management Science, 1, 189–194.
Article Google Scholar
Lewis, M. (2004). Moneyball: The art of winning an unfair game. W. W. Norton & Company.
Google Scholar
Liebchen, C., Lubbecke, M., Mohring, R., & Stiller, S. (2009). The concept of recoverable robustness, linear programming recovery, and railway applications. In Robust and online large-scale optimization (pp. 1–27). Springer.
MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29(3), 305–325.
Article Google Scholar
Monaci, M., Pferschy, U., & Serafini, P. (2013). Exact solution of the robust knapsack problem. Computers and Operations Research, 40, 2625–2631.
Article Google Scholar
Nasrabadi, E., & Orlin, J. (2013). Robust optimization with incremental recourse. Technical report. MIT Sloan School of Management.
Raimondo, H. J. (1983). Free agents’ impact on the labor market for baseball players. Journal of Labor Research, 4(2), 183–193.
Article Google Scholar
Rockerbie, D. W. (2009). Strategic free agency in baseball. Journal of Sports Economics, 10(3), 278–291.
Article Google Scholar
Schall, T., & Smith, G. (2000). XXX double check the first name XXX. Do baseball players regress toward the mean? The American Statistician, 54(4), 231–235.
Article Google Scholar
Schultz, R., & Curnow, C. (1988). Peak performance and age amount superathletes: Track and field, swimming, baseball, tennis and golf. Journal of Gerontology, 43(5), 113–120.
Article Google Scholar
Scully, G. W. (1974). Pay and performance in major league baseball. The American Economic Review, 64(6), 915–930.
Google Scholar
Silver, N. (2012). The signal and the noise. Penguin.
Google Scholar
Spotrac. MLB offseason spending. Online tool. https://www.spotrac.com/mlb/tools/offseason/
Timmerman, T. A. (2000). Racial diversity, age diversity, interdependence, and team performance. Small Group Research, 13(5), 592–606.
Article Google Scholar
Turvey, J. (2013). The future of baseball contracts: A look at the growing trend in long-term contracts. The Baseball Research Journal, 42(2), 101–107.
Google Scholar
Tymkovich, J. L. (2012). A study of minor league baseball prospects and their expected future value. CMC Senior Theses (p. 442). http://scholarship.claremont.edu/cmc_theses/442
van den Akker, J., Bouman, P., Hoogeveen, J., & Tonissen, D. (2014). Decomposition approaches for recoverable robust optimization problems. Technical report, Utrecht University, Utrecht.
Wiseman, F., & Chatterjee, S. (2003). Team Payroll and team performance in major league baseball: 1985–2002. Economics Bulletin, 1(2), 1–10.
Google Scholar

Download references

Author information

Authors and Affiliations

Netflix, Los Angeles, CA, USA
Sean Barnes
Robert H. Smith School of Business, University of Maryland College Park, College Park, MD, USA
Margrét Bjarnadóttir
Computer Science Department, University of Maryland College Park, College Park, MD, USA
Daniel Smolyak
Engineering Management, Information and Systems, Southern Methodist University, Dallas, TX, USA
Aurélie Thiele

Authors

Sean Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Margrét Bjarnadóttir
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Smolyak
View author publications
You can also search for this author in PubMed Google Scholar
Aurélie Thiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margrét Bjarnadóttir.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 503 KB)

Appendices

Appendix A: Additional examples: ideal rosters exceeding and falling short of actual rosters

Best performance of our metric.

Kansas city Royals (KCR) in 2009: The actual roster is more balanced than the optimized one with a maximum salary of $4.7m and five players paid over $1 m, while in the optimal roster, the maximum is $9.2m, the next highest salary is $1.1m, and the other 10 players are all paid less than $1 m, with nine of them at base salary. The overperformance of the optimization approach is due to the outstanding performance of Kenshin Kawakami, who was selected by the optimization model despite his WAR performance prediction of 0. Hence, in this case the overperformance may be due to chance rather than systematic advantages.
Washington Nationals (WSN) in 2011: Five players in the actual roster and four in the optimized roster are paid over $1 m, but in the actual roster, two players are paid at least $10 m and the third highest salary drops to $1.6m. In contrast, in the optimized roster, the maximum salary is $10.5m and two next highest are $5.5m and $3.7m. While the highest paid player in the optimized roster had an actual WPA of $-$0.9, the second and third highest paid players had WPAs of 2.1 and 1.2, respectively. In contrast, in the actual roster, the highest WPA was 1.1 and the second highest was 0.4.
Washington Nationals (WSN) in 2012: This is a case where the maximum salary is higher in the optimized roster than in the actual one ($13.4m, Carlos Beltran, vs. $11.3m, Edwin Jackson). In this case, Beltran did very well with an actual WAR of 3.9 and an actual WPA of 2.4. The optimized roster was also helped by the presence of Fernando Rodney, with an actual WAR of 3.8 and WPA of 5.1 (with an adjusted salary of $1.8m).
Atlanta Braves (ATL) in 2013: The main reason the optimized roster of five new players performs better than the actual one is that instead of signing B.J. Upton at $12.7m, who underperformed (actual WAR $-$1.3, actual WPA $-$2.8), the optimization approach signs two players in the $6.1–6.7m adjusted salary range, who both performed quite well. In addition, the optimization approach signs Dioner Navarro, who also exceeded predictions, at an $1.8m adjusted salary.
Milwaukee Brewers (MIL) in 2013: Both the optimized and actual rosters sign Kyle Lohse, who has the highest adjusted salary at $11.2m and had an actual WAR of 3.3 and WPA of 1.1. However, in the actual roster, none of the other WPAs are positive while four of the other WPAs in the optimized roster are positive, leading to a cumulative WPA of $-$0.3 in the optimized case vs. $-$4.9 in the real world. Because the salary distribution is not fundamentally altered, the overperformance of the optimization approach might to some degree be due to luck.

Worst performance of our metric.

Houston Astros (HOU) in 2010: The optimization approach results in a roster of 11 new players with a maximum of $7.6m in adjusted salary, two players in the $0.71-$0.76m range and the remaining eight at base salary, while the actual roster has two players in the $3.3-$4.6m range, two in the $0.76–0.87m range, and the remaining seven at base salary. Hence, the star in the optimized roster is Carl Pavano with a salary of $7.6m, with all the other salaries being much lower, while the actual roster splits his salary over two players. With an actual WAR of 4 and WPA of 0.6, Pavano did quite well, but his performance is counterbalanced by that of Rodrigo Lopez, with a WAR of $-$0.7 and WPA of $-$3.2. The worst WPA of the actual roster is $-$1.1 (Gustavo Chacin).
Arizona Diamondbacks (ARI) in 2012: The maximum salary in this roster of 11 new players is $7.7m in the actual roster and $8.2m in the optimized one. The second highest salary in the actual roster is $5.6m, with the third highest dropping to $2.0m. Six players in the actual roster are paid over $1 m. In the optimized roster, four players were paid above $1 m, with all of those being paid at least $2 m. The cumulative WPA of the players paid over $1 m was 3.4 in the actual roster and -5 in the optimized roster. Particularly detrimental to the performance of the optimized roster was the selection of Francisco Rodriguez, who is the highest paid player but had an actual WAR of $-$0.2 and WPA of $-$1.3.
Baltimore Orioles (BAL) in 2012: This is another case where the optimization approach leads to an overemphasis on very expensive players that backfires. In this case, the optimized approach signs Casey Kotchman at $3.1m, but his actual WAR was $-$0.9 and his WPA was $-$2.8.
Seattle Mariners (SEA) in 2013: The underperformance of the optimization approach is due to the signing of Hisashi Iwakuma to the actual team, who far exceeded predictions with a WAR of 7 and WPA of 3.5.
San Francisco Giants (SFG) in 2013: The underperformance of the optimization approach is due to the signing of B.J. Upton, who underperformed, to the optimized team at an adjusted salary of $12.7m. The maximum adjusted salary of the actual roster was $8.4m, allowing two other salaries in the $6.1–6.8m range. In the optimization approach, the next highest salaries are $8.1m and $1.4m.

Appendix B: Heteroskedasticity in team performance models

We investigate potential model misspecification in our models in Sect. 3.1 with tests for heteroskedasticity. We use the Breusch-Pagan Lagrange Multiplier test (Breusch & Pagan, 1979) for heteroskedasticity on each model, the results of which are shown in Supplementary Table 1. All of the p-values are below 0.01 for Models 1–3, indicating the presence of heteroskedasticity. In Supplementary Figure 1 we highlight the heteroskedasticity of Model 1. The models tends to predict closer to the mean, causing a pattern of under-prediction for high performing teams and over-prediction for low performing teams.

Heteroscedasticity commonly results in inconsistent estimates of standard errors of linear regression models, leading to confidence intervals that are either too wide or too narrow. To investigate this effect we reran Models 1–3 with robust standard errors, using the HC1 estimator (MacKinnon & White, 1985). The robust standard errors for each model were in fact close to or lower than the original standard errors; In all three models the intercept standard error decreased and the standard error for WPA and/or WAR increased by less than 10% (with p-values remaining highly significant). More importantly, as in this paper we are using the models as predictive inputs to the decision models it is important to note the the regression estimates are not affected when using robust errors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Barnes, S., Bjarnadóttir, M., Smolyak, D. et al. A data-driven optimization approach to baseball roster management. Ann Oper Res 335, 33–58 (2024). https://doi.org/10.1007/s10479-023-05725-4

Download citation

Received: 31 July 2021
Accepted: 22 August 2023
Published: 15 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10479-023-05725-4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A data-driven optimization approach to baseball roster management

Abstract

Access this article

Similar content being viewed by others

Maximizing performance with an eye on the finances: a chance-constrained model for football transfer market decisions

A data-driven integer programming model for soccer clubs’ decision making on player transfers

An innovative method for accurate NBA player performance forecasting and line-up optimization in daily fantasy sports

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 503 KB)

Appendices

Appendix A: Additional examples: ideal rosters exceeding and falling short of actual rosters

Appendix B: Heteroskedasticity in team performance models

Rights and permissions

About this article

Cite this article

Navigation

A data-driven optimization approach to baseball roster management

Abstract

Access this article

Similar content being viewed by others

Maximizing performance with an eye on the finances: a chance-constrained model for football transfer market decisions

A data-driven integer programming model for soccer clubs’ decision making on player transfers

An innovative method for accurate NBA player performance forecasting and line-up optimization in daily fantasy sports

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 503 KB)

Appendices

Appendix A: Additional examples: ideal rosters exceeding and falling short of actual rosters

Appendix B: Heteroskedasticity in team performance models

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation