Skip to main content
Log in

Learning strategies in table tennis using inverse reinforcement learning

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent’s court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Note that in order to include such uncertain state information as assumptions about the strategy of the opponent or spin, a problem formulation in form of partial observable MDPs would be necessary.

  2. Please note that the performance of k-NN regression depends on the density of the data. In the table tennis context, most of the data were adequately concentrated in a small region.

  3. Expedite system: additional rules to discourage slow play in a table tennis match. It is used after 10 minutes of play or if requested by both players.

  4. In the following, the first value will correspond to the reward differences obtained by MMS algorithm and the second value will correspond to the reward differences obtained by the RE algorithm.

  5. Please note, such a reward function could also contain agent-specific intrinsic cost, which might not be straightforward to transfer to an artificial system.

References

  • Abbeel P, Coates A, Ng A (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29:1608–1679

    Article  Google Scholar 

  • Abbeel P, Dolgov D, Ng A, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: Proceedings of the international conference on intelligent robots and systems (IROS)

  • Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference of machine learning (ICML)

  • Argall B, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483

    Article  Google Scholar 

  • Boularias A, Kober J, Peters J (2011) Relative entropy inverse reinforcement learning. In: Proceedings of the artificial intelligences and statistics (AISTATS), pp 20–27

  • Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory, volume 15 of studies in applied mathematics. SIAM, Philadelphia

  • Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, Cambridge

    Google Scholar 

  • Braitenberg V, Heck D, Sultan F (1997) The detection and generation of sequences as a key to cerebellar function: experiments and theory. Behav Brian Sci 20:229–277

    Article  CAS  Google Scholar 

  • Chandramohan S, Geist M, Lefevre F, Pietquin O (2011) User simulation in dialogue systems using inverse reinforcement learning. In: Proceedings of the 12th annual conference of the international speech communication association

  • Diaz G, Cooper J, Rothkopf C, Hayhoe M (2013) Saccades to future ball location reveal memory-based prediction in a natural interception task. J Vis 13(1):1–14

    Google Scholar 

  • Hohmann A, Zhang H, Koth A (2004) Performance diagnosis through mathematical simulation in table tennis. In: Lees A, Kahn J-F, Maynard I (eds) Science and racket sports III. Routledge, London, pp 220–226

    Google Scholar 

  • International Table Tennis Federation (2011) Table tennis rules

  • Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parameterized motor primitives to new situations. Auton Robot 33(4):361–379

    Article  Google Scholar 

  • Kolter Z, Ng A (2011) The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion. Int J Robot Res 30(2):150–174

    Google Scholar 

  • Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: Advances in neural information processing systems (NIPS), pp 1342–1350

  • Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. Adv Neural Inf Process Syst 19–27

  • Monahan G (1982) A survey of partially observable markov decision processes: theory, models and algorithms. Manag Sci 28:1–16

    Article  Google Scholar 

  • Mori T, Howard M, Vijayakumar S (2011) Model-free apprenticeship learning for transfer of human impedance behaviour. In: Proceedings of the 11th IEEE-RAS international conference on humanoid robots (HUMANOIDS), pp 239–246

  • Muelling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279

    Article  Google Scholar 

  • Ng A, Russel X (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th international conference of, machine learning, pp 663–670

  • Powell W (2011) Approximate dynamic programming: solving the curses of dimensionality, 1st edn. Wiley, New York

    Book  Google Scholar 

  • Puterman M (1994) Markov decision processes: discrete stochastic dynamic programming, 1st edn. Wiley, New York

    Book  Google Scholar 

  • Ramachandran D, Amir E (2007) Bayesian inverse reinforcement learning. In: Proceedings of the 20th international joint conference of artificial intelligence (IJCAI), pp 2586–2591

  • Ratliff N, Bagnell J, Zinkevich M (2006) Maximum margin planning. In: Proceedings of the 23rd international conference on machine learning (ICML), pp 729–736

  • Rothkopf C, Ballard D (2013) Modular inverse reinforcement learning for visuomotor behavior. Biol Cybern 107:477–490

    Article  PubMed Central  PubMed  Google Scholar 

  • Rothkopf C, Dimitrakakis C (2011) Preference elicitation and inverse reinforcement learning. In: 22nd European conference on machine learning (ECML)

  • Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 6:233–242

    Article  Google Scholar 

  • Seve C, Saury J, Leblanc S, Durand M (2004) Course-of-action theory in table tennis: a qualitative analysis of the knowledge used by three elite players during matches. Revue europeen de psychologie appliquee

  • Sutton R, Barto A (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge

    Google Scholar 

  • Vis J, Kosters W, Terroba A (2010) Tennis patterns: player, match and beyond. In: 22nd Benelux conference on artificial intelligence

  • Wang J, Parameswaran N (2005) Analyzing tennis tactics from broadcasting tennis video clips. In: Proceedings of the 11th international multimedia modelling conference, pp 102–106

  • Wang P, Cai R, Yang S (2004) A tennis video indexing approach through pattern discovery in interactive process. Adv Multimed Inf Process 3331:56–59

    Google Scholar 

  • Zhifei S, Joo E (2012) A survey of inverse reinforcement learning techniques. Int J Intell Comput Cybern 5(3):293–311

    Article  Google Scholar 

  • Ziebart B, Maas A, Bagnell A, Dey A (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the 23th national conference of artificial intelligence (AAAI), pp 1433–1438

  • Ziebart B, Ratliff N, Gallagher G, Mertz C, Peterson K, Bagnell A, Herbert M, Srinivasa S (2009) Planning based prediction for pedestrians. In: Proceedings of the international conference on intelligent robotics and systems (IROS)

Download references

Acknowledgments

We would like to thank Ekaterina Volkova for her support with the calibration and advise for the motion suits and VICON system, as well as Volker Grabe for his technical support for the integration of Kinect and VICON with ROS. We also like to thank Dr. Tobias Meilinger for helpful comments on the psychological part of this experiment and Oliver Kroemer for proof reading this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katharina Muelling.

Additional information

This article forms part of a special issue of Biological Cybernetics entitled “Structural Aspects of Biological Cybernetics: Valentino Braitenberg, Neuroanatomy, and Brain Function.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muelling, K., Boularias, A., Mohler, B. et al. Learning strategies in table tennis using inverse reinforcement learning. Biol Cybern 108, 603–619 (2014). https://doi.org/10.1007/s00422-014-0599-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-014-0599-1

Keywords

Navigation