Learning strategies in table tennis using inverse reinforcement learning

Muelling, Katharina; Boularias, Abdeslam; Mohler, Betty; Schölkopf, Bernhard; Peters, Jan

doi:10.1007/s00422-014-0599-1

Learning strategies in table tennis using inverse reinforcement learning

Original Paper
Published: 23 April 2014

Volume 108, pages 603–619, (2014)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Katharina Muelling^1,2,4,
Abdeslam Boularias^1,2,
Betty Mohler³,
Bernhard Schölkopf¹ &
…
Jan Peters^1,4

2696 Accesses
Explore all metrics

Abstract

Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent’s court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning with model-based feedforward inputs for robotic table tennis

Article Open access 17 October 2023

Mastering table tennis with hierarchy: a reinforcement learning approach with progressive self-play training

Article 24 March 2025

Designing a skilled soccer team for RoboCup: exploring skill-set-primitives through reinforcement learning

Article Open access 11 April 2025

Notes

Note that in order to include such uncertain state information as assumptions about the strategy of the opponent or spin, a problem formulation in form of partial observable MDPs would be necessary.
Please note that the performance of k-NN regression depends on the density of the data. In the table tennis context, most of the data were adequately concentrated in a small region.
Expedite system: additional rules to discourage slow play in a table tennis match. It is used after 10 minutes of play or if requested by both players.
In the following, the first value will correspond to the reward differences obtained by MMS algorithm and the second value will correspond to the reward differences obtained by the RE algorithm.
Please note, such a reward function could also contain agent-specific intrinsic cost, which might not be straightforward to transfer to an artificial system.

References

Abbeel P, Coates A, Ng A (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29:1608–1679
Article Google Scholar
Abbeel P, Dolgov D, Ng A, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: Proceedings of the international conference on intelligent robots and systems (IROS)
Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference of machine learning (ICML)
Argall B, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Article Google Scholar
Boularias A, Kober J, Peters J (2011) Relative entropy inverse reinforcement learning. In: Proceedings of the artificial intelligences and statistics (AISTATS), pp 20–27
Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory, volume 15 of studies in applied mathematics. SIAM, Philadelphia
Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, Cambridge
Google Scholar
Braitenberg V, Heck D, Sultan F (1997) The detection and generation of sequences as a key to cerebellar function: experiments and theory. Behav Brian Sci 20:229–277
Article CAS Google Scholar
Chandramohan S, Geist M, Lefevre F, Pietquin O (2011) User simulation in dialogue systems using inverse reinforcement learning. In: Proceedings of the 12th annual conference of the international speech communication association
Diaz G, Cooper J, Rothkopf C, Hayhoe M (2013) Saccades to future ball location reveal memory-based prediction in a natural interception task. J Vis 13(1):1–14
Google Scholar
Hohmann A, Zhang H, Koth A (2004) Performance diagnosis through mathematical simulation in table tennis. In: Lees A, Kahn J-F, Maynard I (eds) Science and racket sports III. Routledge, London, pp 220–226
Google Scholar
International Table Tennis Federation (2011) Table tennis rules
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parameterized motor primitives to new situations. Auton Robot 33(4):361–379
Article Google Scholar
Kolter Z, Ng A (2011) The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion. Int J Robot Res 30(2):150–174
Google Scholar
Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: Advances in neural information processing systems (NIPS), pp 1342–1350
Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. Adv Neural Inf Process Syst 19–27
Monahan G (1982) A survey of partially observable markov decision processes: theory, models and algorithms. Manag Sci 28:1–16
Article Google Scholar
Mori T, Howard M, Vijayakumar S (2011) Model-free apprenticeship learning for transfer of human impedance behaviour. In: Proceedings of the 11th IEEE-RAS international conference on humanoid robots (HUMANOIDS), pp 239–246
Muelling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279
Article Google Scholar
Ng A, Russel X (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th international conference of, machine learning, pp 663–670
Powell W (2011) Approximate dynamic programming: solving the curses of dimensionality, 1st edn. Wiley, New York
Book Google Scholar
Puterman M (1994) Markov decision processes: discrete stochastic dynamic programming, 1st edn. Wiley, New York
Book Google Scholar
Ramachandran D, Amir E (2007) Bayesian inverse reinforcement learning. In: Proceedings of the 20th international joint conference of artificial intelligence (IJCAI), pp 2586–2591
Ratliff N, Bagnell J, Zinkevich M (2006) Maximum margin planning. In: Proceedings of the 23rd international conference on machine learning (ICML), pp 729–736
Rothkopf C, Ballard D (2013) Modular inverse reinforcement learning for visuomotor behavior. Biol Cybern 107:477–490
Article PubMed Central PubMed Google Scholar
Rothkopf C, Dimitrakakis C (2011) Preference elicitation and inverse reinforcement learning. In: 22nd European conference on machine learning (ECML)
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 6:233–242
Article Google Scholar
Seve C, Saury J, Leblanc S, Durand M (2004) Course-of-action theory in table tennis: a qualitative analysis of the knowledge used by three elite players during matches. Revue europeen de psychologie appliquee
Sutton R, Barto A (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge
Google Scholar
Vis J, Kosters W, Terroba A (2010) Tennis patterns: player, match and beyond. In: 22nd Benelux conference on artificial intelligence
Wang J, Parameswaran N (2005) Analyzing tennis tactics from broadcasting tennis video clips. In: Proceedings of the 11th international multimedia modelling conference, pp 102–106
Wang P, Cai R, Yang S (2004) A tennis video indexing approach through pattern discovery in interactive process. Adv Multimed Inf Process 3331:56–59
Google Scholar
Zhifei S, Joo E (2012) A survey of inverse reinforcement learning techniques. Int J Intell Comput Cybern 5(3):293–311
Article Google Scholar
Ziebart B, Maas A, Bagnell A, Dey A (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the 23th national conference of artificial intelligence (AAAI), pp 1433–1438
Ziebart B, Ratliff N, Gallagher G, Mertz C, Peterson K, Bagnell A, Herbert M, Srinivasa S (2009) Planning based prediction for pedestrians. In: Proceedings of the international conference on intelligent robotics and systems (IROS)

Download references

Acknowledgments

We would like to thank Ekaterina Volkova for her support with the calibration and advise for the motion suits and VICON system, as well as Volker Grabe for his technical support for the integration of Kinect and VICON with ROS. We also like to thank Dr. Tobias Meilinger for helpful comments on the psychological part of this experiment and Oliver Kroemer for proof reading this paper.

Author information

Authors and Affiliations

Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076 , Tuebingen, Germany
Katharina Muelling, Abdeslam Boularias, Bernhard Schölkopf & Jan Peters
Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Katharina Muelling & Abdeslam Boularias
Max Planck Institute for Biological Cybernetics, Spemannstr. 44, 72076 , Tuebingen, Germany
Betty Mohler
FG Intelligente Autonome Systeme, Technische Universität Darmstadt, Hochschulstr. 10, 64289 , Darmstadt, Germany
Katharina Muelling & Jan Peters

Authors

Katharina Muelling
View author publications
You can also search for this author inPubMed Google Scholar
Abdeslam Boularias
View author publications
You can also search for this author inPubMed Google Scholar
Betty Mohler
View author publications
You can also search for this author inPubMed Google Scholar
Bernhard Schölkopf
View author publications
You can also search for this author inPubMed Google Scholar
Jan Peters
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Katharina Muelling.

Additional information

This article forms part of a special issue of Biological Cybernetics entitled “Structural Aspects of Biological Cybernetics: Valentino Braitenberg, Neuroanatomy, and Brain Function.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Muelling, K., Boularias, A., Mohler, B. et al. Learning strategies in table tennis using inverse reinforcement learning. Biol Cybern 108, 603–619 (2014). https://doi.org/10.1007/s00422-014-0599-1

Download citation

Received: 02 April 2013
Accepted: 20 March 2014
Published: 23 April 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s00422-014-0599-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning strategies in table tennis using inverse reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement learning with model-based feedforward inputs for robotic table tennis

Mastering table tennis with hierarchy: a reinforcement learning approach with progressive self-play training

Designing a skilled soccer team for RoboCup: exploring skill-set-primitives through reinforcement learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now