Abstract
Baseball, which is one of the most popular sports in the world, has a uniquely discrete gameplay structure. This stop-and-go style of play creates a natural ability for fans and observers to record information about the game in progress, resulting in a wealth of data that is available for analysis. Major League Baseball (MLB), the professional baseball league in the US and Canada, uses a system known as PITCHf/x to record information about every individual pitch that is thrown in league play. We extend the classification to pitch prediction (fastball or nonfastball) by restricting our analysis to pre-pitch features. By performing significant feature analysis and introducing a novel approach for feature selection, moderate improvement over published results is achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Strike-result percentage (SRP): a metric we created that measures the percentage of strikes from all pitches in the given situation.
References
Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Attarian, A., Danis, G., Gronsbell, J., Iervolina, G., Layne, L., Padgett, D., Tran, H.: Baseball pitch classification: a Bayesian method and dimension reduction investigation. IAENG Transactions on Engineering Sciences, pp. 392–399 (2014)
Attarian, A., Danis, G., Gronsbell, J., Iervolino, G., Tran, H.: A comparison of feature selection and classification algorithms in identifying baseball pitches. In: International MultiConference of Engineers and Computer Scientists 2013. Lecture Notes in Engineering and Computer Science, pp. 263–268. Newswood Limited (2013)
Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Egan, J.: Signal detection theory and ROC analysis. Cognition and Perception. Academic Press, New York (1975)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Ganeshapillai, G., Guttag, J.: Predicting the next pitch. In: MIT Sloan Sports Analytics Conference (2012)
Hamilton, M., Hoang, P., Layne, L., Murray, J., Padget, D., Stafford, C., Tran, H.: Applying machine learning techniques to baseball pitch prediction. In: 3rd International Conference on Pattern Recognition Applications and Methods, pp. 520–527. SciTePress (2014)
Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2009)
Hopkins, T., Magel, R.: Slugging percentage in differing baseball counts. J. Quant. Anal. Sports 4(2), 1136 (2008)
Swets, J., Dawes, R., Monahan, J.: Better decisions through science. Sci. Am. 283, 82–87 (2000)
Zweig, M.H., Campbell, G.: Receiver-operating characteristic ROC plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)
Acknowledgements
Part of this research was supported by NSA grant H98230-12-1-0299 and NSF grants DMS-1063010 and DMS-0943855. The authors would like to thank Lori Layne, David Padget , Brian Lewis and Jessica Gronsbell, all from MIT LL, for their helpful advices and constructive inputs and Mr. Tom Tippett, Director of Baseball Information Services for the Boston Red Sox, for meeting with us to discuss this research and providing us with useful feedback, and Minh Nhat Phan for helping us scrape the PITCH f/x data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
From the original 18 features given in Table 1, we generated a total of 76 features and arranged them into 6 groups as follows:
-
Group 1
-
1.
Inning
-
2.
Time (day/afternoon/night)
-
3.
Number of outs
-
4.
Last at bat events
-
5-7.
Pitcher v.s. batter specific: fastball or nonfastball on previous pitch/ lifetime percentage of fastballs/ previous pitch’s events
-
8.
Numeric score of previous at bat event
-
9-11.
Player on first base/ second base/ third base (true/false)
-
12.
Number of base runners
-
13.
Weighted base score
-
1.
-
Group 2
-
1-3.
Percentage of fastball thrown in previous inning/game/at-bat
-
4.
Lifetime percentage of fastballs thrown to a specific batter over all at bats
-
5-8.
Percentage of fastballs over previous 5/10 /15/ 20 pitches
-
9-10.
Previous pitch in specific count: pitch type/ fastball or nonfastball
-
11-12.
Previous 2 or 3 pitches in specific count: fastball/nonfastball combo
-
13-14.
Previous pitch: pitch type/ fastball or nonfastball
-
15.
Previous 2 pitches: fastball/nonfastball combo
-
16.
Player on first base (true/false)
-
17-18.
Percentage of fastballs over previous 10/15 pitches thrown to a specific batter
-
19-21.
Previous 5 /10 /15 pitches in specific count: percentage of fastballs
-
1-3.
-
Group 3
-
1.
Previous pitch: velocity
-
2-3.
Previous 2 pitches/ 3 pitches: velocity average
-
4.
Previous pitch in specific count: velocity
-
5-6.
Previous 2 pitches/ 3 pitches in specific count: velocity average
-
1.
-
Group 4
-
1-2.
Previous pitch: horizontal/ vertical position
-
3-4.
Previous 2 pitches: horizontal/ vertical position average
-
5-6.
Previous 3 pitches: horizontal/vertical position average
-
7.
Previous pitch: zone (Cartesian quadrant)
-
8-9.
Previous 2 pitches/3 pitches: zone (Cartesian quadrant) average
-
10-11.
Previous pitch in specific count: horizontal/vertical position
-
12-13.
Previous 2 pitches in specific count: horizontal/vertical position average
-
14-15.
Previous 3 pitches in specific count: horizontal/vertical position average
-
16.
Previous pitch in specific count: zone (Cartesian quadrant)
-
17-18.
Previous 2 or 3 pitches in specific count: zone (Cartesian quadrant) average
-
1-2.
-
Group 5
-
1.
SRPFootnote 1 of fastball thrown in the previous inning
-
2.
SRP of fastball thrown in the previous game
-
3-5.
SRP of fastball thrown in the previous 5 pitches/ 10 pitches/ 15 pitches.
-
6.
SRP of fastball thrown in previous 5 pitches thrown to a specific batter
-
7-8.
SRP of nonfastball thrown in the previous inning/ previous game
-
9-11.
SRP of nonfastball thrown in the previous 5 pitches/ 10 pitches/ 15 pitches
-
12.
SRP of nonfastball thrown in previous 5 pitches thrown to a specific batter
-
1.
-
Group 6
-
1.
Previous pitch: ball or strike (boolean)
-
2-3.
Previous 2 pitches/ 3 pitches: ball/strike combo
-
4.
Previous pitch in specific count: ball or strike
-
5-6.
Previous 2 pitches/ 3 pitches in specific count: ball/strike combo
-
1.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hoang, P., Hamilton, M., Murray, J., Stafford, C., Tran, H. (2015). A Dynamic Feature Selection Based LDA Approach to Baseball Pitch Prediction. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-25660-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25659-7
Online ISBN: 978-3-319-25660-3
eBook Packages: Computer ScienceComputer Science (R0)