Skip to main content

A Dynamic Feature Selection Based LDA Approach to Baseball Pitch Prediction

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9441))

Abstract

Baseball, which is one of the most popular sports in the world, has a uniquely discrete gameplay structure. This stop-and-go style of play creates a natural ability for fans and observers to record information about the game in progress, resulting in a wealth of data that is available for analysis. Major League Baseball (MLB), the professional baseball league in the US and Canada, uses a system known as PITCHf/x to record information about every individual pitch that is thrown in league play. We extend the classification to pitch prediction (fastball or nonfastball) by restricting our analysis to pre-pitch features. By performing significant feature analysis and introducing a novel approach for feature selection, moderate improvement over published results is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Strike-result percentage (SRP): a metric we created that measures the percentage of strikes from all pitches in the given situation.

References

  1. Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attarian, A., Danis, G., Gronsbell, J., Iervolina, G., Layne, L., Padgett, D., Tran, H.: Baseball pitch classification: a Bayesian method and dimension reduction investigation. IAENG Transactions on Engineering Sciences, pp. 392–399 (2014)

    Google Scholar 

  3. Attarian, A., Danis, G., Gronsbell, J., Iervolino, G., Tran, H.: A comparison of feature selection and classification algorithms in identifying baseball pitches. In: International MultiConference of Engineers and Computer Scientists 2013. Lecture Notes in Engineering and Computer Science, pp. 263–268. Newswood Limited (2013)

    Google Scholar 

  4. Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  5. Egan, J.: Signal detection theory and ROC analysis. Cognition and Perception. Academic Press, New York (1975)

    Google Scholar 

  6. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  7. Ganeshapillai, G., Guttag, J.: Predicting the next pitch. In: MIT Sloan Sports Analytics Conference (2012)

    Google Scholar 

  8. Hamilton, M., Hoang, P., Layne, L., Murray, J., Padget, D., Stafford, C., Tran, H.: Applying machine learning techniques to baseball pitch prediction. In: 3rd International Conference on Pattern Recognition Applications and Methods, pp. 520–527. SciTePress (2014)

    Google Scholar 

  9. Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2009)

    Book  MATH  Google Scholar 

  10. Hopkins, T., Magel, R.: Slugging percentage in differing baseball counts. J. Quant. Anal. Sports 4(2), 1136 (2008)

    MathSciNet  Google Scholar 

  11. Swets, J., Dawes, R., Monahan, J.: Better decisions through science. Sci. Am. 283, 82–87 (2000)

    Article  Google Scholar 

  12. Zweig, M.H., Campbell, G.: Receiver-operating characteristic ROC plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)

    Google Scholar 

Download references

Acknowledgements

Part of this research was supported by NSA grant H98230-12-1-0299 and NSF grants DMS-1063010 and DMS-0943855. The authors would like to thank Lori Layne, David Padget , Brian Lewis and Jessica Gronsbell, all from MIT LL, for their helpful advices and constructive inputs and Mr. Tom Tippett, Director of Baseball Information Services for the Boston Red Sox, for meeting with us to discuss this research and providing us with useful feedback, and Minh Nhat Phan for helping us scrape the PITCH f/x data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuong Hoang .

Editor information

Editors and Affiliations

A Appendix

A Appendix

From the original 18 features given in Table 1, we generated a total of 76 features and arranged them into 6 groups as follows:

  • Group 1

    1. 1.

      Inning

    2. 2.

      Time (day/afternoon/night)

    3. 3.

      Number of outs

    4. 4.

      Last at bat events

    5. 5-7.

      Pitcher v.s. batter specific: fastball or nonfastball on previous pitch/ lifetime percentage of fastballs/ previous pitch’s events

    6. 8.

      Numeric score of previous at bat event

    7. 9-11.

      Player on first base/ second base/ third base (true/false)

    8. 12.

      Number of base runners

    9. 13.

      Weighted base score

  • Group 2

    1. 1-3.

      Percentage of fastball thrown in previous inning/game/at-bat

    2. 4.

      Lifetime percentage of fastballs thrown to a specific batter over all at bats

    3. 5-8.

      Percentage of fastballs over previous 5/10 /15/ 20 pitches

    4. 9-10.

      Previous pitch in specific count: pitch type/ fastball or nonfastball

    5. 11-12.

      Previous 2 or 3 pitches in specific count: fastball/nonfastball combo

    6. 13-14.

      Previous pitch: pitch type/ fastball or nonfastball

    7. 15.

      Previous 2 pitches: fastball/nonfastball combo

    8. 16.

      Player on first base (true/false)

    9. 17-18.

      Percentage of fastballs over previous 10/15 pitches thrown to a specific batter

    10. 19-21.

      Previous 5 /10 /15 pitches in specific count: percentage of fastballs

  • Group 3

    1. 1.

      Previous pitch: velocity

    2. 2-3.

      Previous 2 pitches/ 3 pitches: velocity average

    3. 4.

      Previous pitch in specific count: velocity

    4. 5-6.

      Previous 2 pitches/ 3 pitches in specific count: velocity average

  • Group 4

    1. 1-2.

      Previous pitch: horizontal/ vertical position

    2. 3-4.

      Previous 2 pitches: horizontal/ vertical position average

    3. 5-6.

      Previous 3 pitches: horizontal/vertical position average

    4. 7.

      Previous pitch: zone (Cartesian quadrant)

    5. 8-9.

      Previous 2 pitches/3 pitches: zone (Cartesian quadrant) average

    6. 10-11.

      Previous pitch in specific count: horizontal/vertical position

    7. 12-13.

      Previous 2 pitches in specific count: horizontal/vertical position average

    8. 14-15.

      Previous 3 pitches in specific count: horizontal/vertical position average

    9. 16.

      Previous pitch in specific count: zone (Cartesian quadrant)

    10. 17-18.

      Previous 2 or 3 pitches in specific count: zone (Cartesian quadrant) average

  • Group 5

    1. 1.

      SRPFootnote 1 of fastball thrown in the previous inning

    2. 2.

      SRP of fastball thrown in the previous game

    3. 3-5.

      SRP of fastball thrown in the previous 5 pitches/ 10 pitches/ 15 pitches.

    4. 6.

      SRP of fastball thrown in previous 5 pitches thrown to a specific batter

    5. 7-8.

      SRP of nonfastball thrown in the previous inning/ previous game

    6. 9-11.

      SRP of nonfastball thrown in the previous 5 pitches/ 10 pitches/ 15 pitches

    7. 12.

      SRP of nonfastball thrown in previous 5 pitches thrown to a specific batter

  • Group 6

    1. 1.

      Previous pitch: ball or strike (boolean)

    2. 2-3.

      Previous 2 pitches/ 3 pitches: ball/strike combo

    3. 4.

      Previous pitch in specific count: ball or strike

    4. 5-6.

      Previous 2 pitches/ 3 pitches in specific count: ball/strike combo

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hoang, P., Hamilton, M., Murray, J., Stafford, C., Tran, H. (2015). A Dynamic Feature Selection Based LDA Approach to Baseball Pitch Prediction. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25660-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25659-7

  • Online ISBN: 978-3-319-25660-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics