Abstract
Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent’s court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.









Similar content being viewed by others
Notes
Note that in order to include such uncertain state information as assumptions about the strategy of the opponent or spin, a problem formulation in form of partial observable MDPs would be necessary.
Please note that the performance of k-NN regression depends on the density of the data. In the table tennis context, most of the data were adequately concentrated in a small region.
Expedite system: additional rules to discourage slow play in a table tennis match. It is used after 10 minutes of play or if requested by both players.
In the following, the first value will correspond to the reward differences obtained by MMS algorithm and the second value will correspond to the reward differences obtained by the RE algorithm.
Please note, such a reward function could also contain agent-specific intrinsic cost, which might not be straightforward to transfer to an artificial system.
References
Abbeel P, Coates A, Ng A (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29:1608–1679
Abbeel P, Dolgov D, Ng A, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: Proceedings of the international conference on intelligent robots and systems (IROS)
Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference of machine learning (ICML)
Argall B, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Boularias A, Kober J, Peters J (2011) Relative entropy inverse reinforcement learning. In: Proceedings of the artificial intelligences and statistics (AISTATS), pp 20–27
Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory, volume 15 of studies in applied mathematics. SIAM, Philadelphia
Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, Cambridge
Braitenberg V, Heck D, Sultan F (1997) The detection and generation of sequences as a key to cerebellar function: experiments and theory. Behav Brian Sci 20:229–277
Chandramohan S, Geist M, Lefevre F, Pietquin O (2011) User simulation in dialogue systems using inverse reinforcement learning. In: Proceedings of the 12th annual conference of the international speech communication association
Diaz G, Cooper J, Rothkopf C, Hayhoe M (2013) Saccades to future ball location reveal memory-based prediction in a natural interception task. J Vis 13(1):1–14
Hohmann A, Zhang H, Koth A (2004) Performance diagnosis through mathematical simulation in table tennis. In: Lees A, Kahn J-F, Maynard I (eds) Science and racket sports III. Routledge, London, pp 220–226
International Table Tennis Federation (2011) Table tennis rules
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parameterized motor primitives to new situations. Auton Robot 33(4):361–379
Kolter Z, Ng A (2011) The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion. Int J Robot Res 30(2):150–174
Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: Advances in neural information processing systems (NIPS), pp 1342–1350
Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. Adv Neural Inf Process Syst 19–27
Monahan G (1982) A survey of partially observable markov decision processes: theory, models and algorithms. Manag Sci 28:1–16
Mori T, Howard M, Vijayakumar S (2011) Model-free apprenticeship learning for transfer of human impedance behaviour. In: Proceedings of the 11th IEEE-RAS international conference on humanoid robots (HUMANOIDS), pp 239–246
Muelling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279
Ng A, Russel X (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th international conference of, machine learning, pp 663–670
Powell W (2011) Approximate dynamic programming: solving the curses of dimensionality, 1st edn. Wiley, New York
Puterman M (1994) Markov decision processes: discrete stochastic dynamic programming, 1st edn. Wiley, New York
Ramachandran D, Amir E (2007) Bayesian inverse reinforcement learning. In: Proceedings of the 20th international joint conference of artificial intelligence (IJCAI), pp 2586–2591
Ratliff N, Bagnell J, Zinkevich M (2006) Maximum margin planning. In: Proceedings of the 23rd international conference on machine learning (ICML), pp 729–736
Rothkopf C, Ballard D (2013) Modular inverse reinforcement learning for visuomotor behavior. Biol Cybern 107:477–490
Rothkopf C, Dimitrakakis C (2011) Preference elicitation and inverse reinforcement learning. In: 22nd European conference on machine learning (ECML)
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 6:233–242
Seve C, Saury J, Leblanc S, Durand M (2004) Course-of-action theory in table tennis: a qualitative analysis of the knowledge used by three elite players during matches. Revue europeen de psychologie appliquee
Sutton R, Barto A (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge
Vis J, Kosters W, Terroba A (2010) Tennis patterns: player, match and beyond. In: 22nd Benelux conference on artificial intelligence
Wang J, Parameswaran N (2005) Analyzing tennis tactics from broadcasting tennis video clips. In: Proceedings of the 11th international multimedia modelling conference, pp 102–106
Wang P, Cai R, Yang S (2004) A tennis video indexing approach through pattern discovery in interactive process. Adv Multimed Inf Process 3331:56–59
Zhifei S, Joo E (2012) A survey of inverse reinforcement learning techniques. Int J Intell Comput Cybern 5(3):293–311
Ziebart B, Maas A, Bagnell A, Dey A (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the 23th national conference of artificial intelligence (AAAI), pp 1433–1438
Ziebart B, Ratliff N, Gallagher G, Mertz C, Peterson K, Bagnell A, Herbert M, Srinivasa S (2009) Planning based prediction for pedestrians. In: Proceedings of the international conference on intelligent robotics and systems (IROS)
Acknowledgments
We would like to thank Ekaterina Volkova for her support with the calibration and advise for the motion suits and VICON system, as well as Volker Grabe for his technical support for the integration of Kinect and VICON with ROS. We also like to thank Dr. Tobias Meilinger for helpful comments on the psychological part of this experiment and Oliver Kroemer for proof reading this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article forms part of a special issue of Biological Cybernetics entitled “Structural Aspects of Biological Cybernetics: Valentino Braitenberg, Neuroanatomy, and Brain Function.
Rights and permissions
About this article
Cite this article
Muelling, K., Boularias, A., Mohler, B. et al. Learning strategies in table tennis using inverse reinforcement learning. Biol Cybern 108, 603–619 (2014). https://doi.org/10.1007/s00422-014-0599-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-014-0599-1