Abstract
Each year, major league baseball (MLB) teams face complex decisions about which players to retain and which players to recruit. In addition to operational, team and budget constraints, these decisions are further complicated by the fact that an athlete’s future performance and its impact on the team are both uncertain. In this paper, we combine prediction modeling with decision optimization to study the MLB free agent market. We develop optimization models for the allocation of a team’s recruitment budget using six different metrics that evaluate a player’s contributions to a team’s success. We consider both an ideal case, where each team can choose among all free agents, and a sequential case, where we assume that teams with stronger appeal (big market) are more successful in attracting talent, while teams with less pull must optimize their rosters over a much smaller pool of remaining players. Using the best-performing metric, which takes into account both players’ positions and their positional flexibility, we develop a series of quantitative tools that help teams, especially those with small budgets, identify (1) the players who deliver a key competitive advantage to their teams, appearing in both their ideal and sequential rosters and (2) the players who are in many ideal rosters and thus are likely to be hired by teams with big budgets, perhaps at a substantial salary premium. In order to gain and maintain an edge in the fiercely competitive free agent market, teams need to continuously adapt their strategies, and our models represent a first step towards prescriptive (not just predictive) analytics designed to help them do so. Further, our analysis indicates that a few players are in high demand from many teams (for instance, in every year of the period considered, the ten most in-demand players appear in the ideal rosters of at least seven teams), while most players appear in one ideal roster or none at all. Our models go beyond players’ individual performance metrics to help teams understand which players will be in high demand due to teams’ position needs in a given year. The results further emphasize the increasing importance of contract extensions as a strategy to bypass the free agent market.
Similar content being viewed by others
References
Albert, J. (2006). Pitching statistics, talent and luck, and the best strikeout seasons of all-time. Journal of Quantitative Analysis in Sports, 2(1).
Barnes, S. L., & Bjarnadóttir, M. V. (2016). Great expectations: An analysis of major league baseball free agent performance. Statistical Analysis and Data Mining, 9(5), 295–309.
Baumer, B., & Zimbalist, A. (2014). The sabermetric revolution: Assessing the growth of analytics in baseball. University of Pennsylvania Press.
Bendtsen, M. (2017). Regimes in baseball players’ career data. Data Mining and Knowledge Discovery, 31, 1580–1621.
Ben-Tal, A., El Ghaoui, L., & Nemirovski, A. (2009). Robust optimization. Princeton series in applied mathematicsPrinceton University Press.
Bertsimas, D., & Sim, M. (2003). Price of robustness. Operations Research, 52, 35–53.
Brave, S. A., Butters, R. A., & Roberts, K. A. (2019). Uncovering the sources of team synergy: Player complementarities in the production of wins. Journal of Sports Analytics, 5(4), 247–279.
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroskedasticity and random coefficient variation. Econometrica, 47(5), 1287–1294.
Busing, C., Koster, A., & Kutschka, M. (2011). Recoverable robust knapsacks: The discrete scenario case. Optimization Letters, 5, 379–392.
Chan, T. C. Y., & Fearing, D. S. (2013). The value of flexibility in baseball roster construction. In MIT sloan sports analytics conference.
Chan, T. C. Y., & Fearing, D. S. (2019). Process flexibility in baseball: The value of positional flexibility. Management Science, 65(4), 1642–1666.
Chung, D. J. (2017). How much is a win worth? An application to intercollegiate athletics. Management Science, 63, 548–565.
Cot’s Baseball Contracts. Highest paid players. https://legacy.baseballprospectus.com/compensation/cots/league-info/highest-paid-players/
DeBrock, L., Hendricks, W., & Koenker, R. (2004). Pay and performance. The impact of salary distribution on firm-level outcomes in baseball. Journal of Sports Economics, 5(3), 243–261.
Depken, C. A. (2000). Wage disparity and team productivity: Evidence from major league baseball. Economics Letters, 67, 87–92.
Elitzur, R. (2020). Data analytics effects in major league baseball. Omega, 90, 102001. https://doi.org/10.1016/j.omega.2018.11.010
Farrar, A., & Bruggink, T. H. (2011). A new test of the Moneyball hypothesis. The Sport Journal, 14(1), 1–9.
Frick, B., Prinz, J., & Winkelmann, K. (2003). Pay inequalities and team performance: Empirical evidence from the North American major leagues. International Journal of Manpower, 24(4), 472–488.
Gross, A., & Link, C. (2017). Does option theory hold for Majorl League Baseball contracts? Economic Inquiry, 55(1), 425–433.
Hakes, J. K., & Sauer, R. D. (2006). An economic evaluation of the Moneyball hypothesis. Journal of Economic Perspectives, 20(3), 173–185.
Hall, S., Szymanski, S., & Zimbalist, A. S. (2002). Testing causality between team performance and payroll. The cases of major league baseball and English soccer. Journal of Sports Economics, 3, 149–168.
Humphrey, S. E., Morgenson, F. P., & Mannor, M. J. (2009). Developing a theory of the strategic core of teams: A role composition model of team performance. Journal of Applied Psychology, 94(1), 48–60.
Humphreys, B. R., & Pyun, H. (2017). Monopsony exploitation in professional sport: Evidence from major league baseball position players, 2000–2011. Managerial and Decision Economics, 28, 676–688.
Kahn, L. M. (1993). Managerial quality, team success, and individual player performance in major league baseball. ILR Review, 46, 531–547.
Kasperski, A., & Zielinski, P. (2016). Robust discrete optimization under discrete and interval uncertainty: A survey. In Robustness analysis in decision aiding, optimization and analytics. Springer.
Kim, J. W., & King, B. G. (2014). Seeing stars: Matthew effects and status bias in major league baseball umpiring. Management Science, 60(11), 2619–2644.
Koop, G. (2002). Comparing the performance of baseball players. Journal of the American Statistical Association, 97(459), 710–720. https://doi.org/10.1198/016214502388618456
Koseler, K., & Stephan, M. (2017). Machine learning applications in baseball: A systematic literature review. Applied Artificial Intelligence, 31(9–10), 745–763. https://doi.org/10.1080/08839514.2018.1442991
Krautmann, A. C. (1990). Shirking or stochastic productivity in major league baseball? Southern Economic Journal, 5(4), 961–968.
Krautmann, A. C. (2016). Contract extensions: The case of major league baseball. Journal of Sports Economics, 19, 1–16.
Lackritz, J. R. (1990). Salary evaluation for professional baseball players. The American Statistician, 44(1), 4–8. https://doi.org/10.1080/00031305.1990.10475682
Lesaege, C., & Poss, M. (2016). The partial choice recoverable knapsack problem. Computational Management Science, 1, 189–194.
Lewis, M. (2004). Moneyball: The art of winning an unfair game. W. W. Norton & Company.
Liebchen, C., Lubbecke, M., Mohring, R., & Stiller, S. (2009). The concept of recoverable robustness, linear programming recovery, and railway applications. In Robust and online large-scale optimization (pp. 1–27). Springer.
MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29(3), 305–325.
Monaci, M., Pferschy, U., & Serafini, P. (2013). Exact solution of the robust knapsack problem. Computers and Operations Research, 40, 2625–2631.
Nasrabadi, E., & Orlin, J. (2013). Robust optimization with incremental recourse. Technical report. MIT Sloan School of Management.
Raimondo, H. J. (1983). Free agents’ impact on the labor market for baseball players. Journal of Labor Research, 4(2), 183–193.
Rockerbie, D. W. (2009). Strategic free agency in baseball. Journal of Sports Economics, 10(3), 278–291.
Schall, T., & Smith, G. (2000). XXX double check the first name XXX. Do baseball players regress toward the mean? The American Statistician, 54(4), 231–235.
Schultz, R., & Curnow, C. (1988). Peak performance and age amount superathletes: Track and field, swimming, baseball, tennis and golf. Journal of Gerontology, 43(5), 113–120.
Scully, G. W. (1974). Pay and performance in major league baseball. The American Economic Review, 64(6), 915–930.
Silver, N. (2012). The signal and the noise. Penguin.
Spotrac. MLB offseason spending. Online tool. https://www.spotrac.com/mlb/tools/offseason/
Timmerman, T. A. (2000). Racial diversity, age diversity, interdependence, and team performance. Small Group Research, 13(5), 592–606.
Turvey, J. (2013). The future of baseball contracts: A look at the growing trend in long-term contracts. The Baseball Research Journal, 42(2), 101–107.
Tymkovich, J. L. (2012). A study of minor league baseball prospects and their expected future value. CMC Senior Theses (p. 442). http://scholarship.claremont.edu/cmc_theses/442
van den Akker, J., Bouman, P., Hoogeveen, J., & Tonissen, D. (2014). Decomposition approaches for recoverable robust optimization problems. Technical report, Utrecht University, Utrecht.
Wiseman, F., & Chatterjee, S. (2003). Team Payroll and team performance in major league baseball: 1985–2002. Economics Bulletin, 1(2), 1–10.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Additional examples: ideal rosters exceeding and falling short of actual rosters
Best performance of our metric.
-
Kansas city Royals (KCR) in 2009: The actual roster is more balanced than the optimized one with a maximum salary of $4.7m and five players paid over $1 m, while in the optimal roster, the maximum is $9.2m, the next highest salary is $1.1m, and the other 10 players are all paid less than $1 m, with nine of them at base salary. The overperformance of the optimization approach is due to the outstanding performance of Kenshin Kawakami, who was selected by the optimization model despite his WAR performance prediction of 0. Hence, in this case the overperformance may be due to chance rather than systematic advantages.
-
Washington Nationals (WSN) in 2011: Five players in the actual roster and four in the optimized roster are paid over $1 m, but in the actual roster, two players are paid at least $10 m and the third highest salary drops to $1.6m. In contrast, in the optimized roster, the maximum salary is $10.5m and two next highest are $5.5m and $3.7m. While the highest paid player in the optimized roster had an actual WPA of \(-\)0.9, the second and third highest paid players had WPAs of 2.1 and 1.2, respectively. In contrast, in the actual roster, the highest WPA was 1.1 and the second highest was 0.4.
-
Washington Nationals (WSN) in 2012: This is a case where the maximum salary is higher in the optimized roster than in the actual one ($13.4m, Carlos Beltran, vs. $11.3m, Edwin Jackson). In this case, Beltran did very well with an actual WAR of 3.9 and an actual WPA of 2.4. The optimized roster was also helped by the presence of Fernando Rodney, with an actual WAR of 3.8 and WPA of 5.1 (with an adjusted salary of $1.8m).
-
Atlanta Braves (ATL) in 2013: The main reason the optimized roster of five new players performs better than the actual one is that instead of signing B.J. Upton at $12.7m, who underperformed (actual WAR \(-\)1.3, actual WPA \(-\)2.8), the optimization approach signs two players in the $6.1–6.7m adjusted salary range, who both performed quite well. In addition, the optimization approach signs Dioner Navarro, who also exceeded predictions, at an $1.8m adjusted salary.
-
Milwaukee Brewers (MIL) in 2013: Both the optimized and actual rosters sign Kyle Lohse, who has the highest adjusted salary at $11.2m and had an actual WAR of 3.3 and WPA of 1.1. However, in the actual roster, none of the other WPAs are positive while four of the other WPAs in the optimized roster are positive, leading to a cumulative WPA of \(-\)0.3 in the optimized case vs. \(-\)4.9 in the real world. Because the salary distribution is not fundamentally altered, the overperformance of the optimization approach might to some degree be due to luck.
Worst performance of our metric.
-
Houston Astros (HOU) in 2010: The optimization approach results in a roster of 11 new players with a maximum of $7.6m in adjusted salary, two players in the $0.71-$0.76m range and the remaining eight at base salary, while the actual roster has two players in the $3.3-$4.6m range, two in the $0.76–0.87m range, and the remaining seven at base salary. Hence, the star in the optimized roster is Carl Pavano with a salary of $7.6m, with all the other salaries being much lower, while the actual roster splits his salary over two players. With an actual WAR of 4 and WPA of 0.6, Pavano did quite well, but his performance is counterbalanced by that of Rodrigo Lopez, with a WAR of \(-\)0.7 and WPA of \(-\)3.2. The worst WPA of the actual roster is \(-\)1.1 (Gustavo Chacin).
-
Arizona Diamondbacks (ARI) in 2012: The maximum salary in this roster of 11 new players is $7.7m in the actual roster and $8.2m in the optimized one. The second highest salary in the actual roster is $5.6m, with the third highest dropping to $2.0m. Six players in the actual roster are paid over $1 m. In the optimized roster, four players were paid above $1 m, with all of those being paid at least $2 m. The cumulative WPA of the players paid over $1 m was 3.4 in the actual roster and -5 in the optimized roster. Particularly detrimental to the performance of the optimized roster was the selection of Francisco Rodriguez, who is the highest paid player but had an actual WAR of \(-\)0.2 and WPA of \(-\)1.3.
-
Baltimore Orioles (BAL) in 2012: This is another case where the optimization approach leads to an overemphasis on very expensive players that backfires. In this case, the optimized approach signs Casey Kotchman at $3.1m, but his actual WAR was \(-\)0.9 and his WPA was \(-\)2.8.
-
Seattle Mariners (SEA) in 2013: The underperformance of the optimization approach is due to the signing of Hisashi Iwakuma to the actual team, who far exceeded predictions with a WAR of 7 and WPA of 3.5.
-
San Francisco Giants (SFG) in 2013: The underperformance of the optimization approach is due to the signing of B.J. Upton, who underperformed, to the optimized team at an adjusted salary of $12.7m. The maximum adjusted salary of the actual roster was $8.4m, allowing two other salaries in the $6.1–6.8m range. In the optimization approach, the next highest salaries are $8.1m and $1.4m.
Appendix B: Heteroskedasticity in team performance models
We investigate potential model misspecification in our models in Sect. 3.1 with tests for heteroskedasticity. We use the Breusch-Pagan Lagrange Multiplier test (Breusch & Pagan, 1979) for heteroskedasticity on each model, the results of which are shown in Supplementary Table 1. All of the p-values are below 0.01 for Models 1–3, indicating the presence of heteroskedasticity. In Supplementary Figure 1 we highlight the heteroskedasticity of Model 1. The models tends to predict closer to the mean, causing a pattern of under-prediction for high performing teams and over-prediction for low performing teams.
Heteroscedasticity commonly results in inconsistent estimates of standard errors of linear regression models, leading to confidence intervals that are either too wide or too narrow. To investigate this effect we reran Models 1–3 with robust standard errors, using the HC1 estimator (MacKinnon & White, 1985). The robust standard errors for each model were in fact close to or lower than the original standard errors; In all three models the intercept standard error decreased and the standard error for WPA and/or WAR increased by less than 10% (with p-values remaining highly significant). More importantly, as in this paper we are using the models as predictive inputs to the decision models it is important to note the the regression estimates are not affected when using robust errors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Barnes, S., Bjarnadóttir, M., Smolyak, D. et al. A data-driven optimization approach to baseball roster management. Ann Oper Res 335, 33–58 (2024). https://doi.org/10.1007/s10479-023-05725-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-023-05725-4