Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots

Antonioni, Emanuele; Riccio, Francesco; Nardi, Daniele

doi:10.1007/978-3-030-98682-7_9

Emanuele Antonioni¹²,
Francesco Riccio¹² &
Daniele Nardi¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13132))

Included in the following conference series:

Robot World Cup

944 Accesses

Abstract

The design and implementation of behaviors for robots operating in dynamic and complex environments are becoming mandatory in nowadays applications. Reinforcement learning is consistently showing remarkable results in learning effective action policies and in achieving super-human performance in various tasks – without exploiting prior knowledge. However, in robotics, the use of purely learning-based techniques is still subject to strong limitations. Foremost, sample efficiency. Such techniques, in fact, are known to require large training datasets, and long training sessions, in order to develop effective action policies. Hence in this paper, to alleviate such constraint, and to allow learning in such robotic scenarios, we introduce SErP (Sample Efficient robot Policies), an iterative algorithm to improve the sample-efficiency of learning algorithms. SErP exploits a sub-optimal planner (here implemented with a monitor-replanning algorithm) to lead the exploration of the learning agent through its initial iterations. Intuitively, SErP exploits the planner as an expert in order to enable focused exploration and to avoid portions of the search space that are not effective to solve the task of the robot. Finally, to confirm our insights and to show the improvements that SErP carries with, we report the results obtained in two different robotic scenarios: (1) a cartpole scenario and (2) a soccer-robots scenario within the RoboCup@Soccer SPL environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MoveRL: To a Safer Robotic Reinforcement Learning Environment

State-Dependent Maximum Entropy Reinforcement Learning for Robot Long-Horizon Task Learning

Article Open access 24 January 2024

First return, then explore

Article 24 February 2021

Notes

1.
https://github.com/openai/baselines.

References

Aşık, O., Görer, B., Akın, H.L.: End-to-end deep imitation learning: robot soccer case study. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 137–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_11
Chapter Google Scholar
Baker, B., et al.: Emergent tool use from multi-agent autocurricula (2019)
Google Scholar
Ben-Ari, M., Mondada, F.: Robots and their applications. In: Elements of Robotics, pp. 1–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-62533-1_1
Chapter MATH Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003). https://doi.org/10.1162/153244303765208377
Article MathSciNet MATH Google Scholar
Böhmer, W., Springenberg, J.T., Boedecker, J., Riedmiller, M., Obermayer, K.: Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz 29(4), 353–362 (2015). https://doi.org/10.1007/s13218-015-0356-1
Article Google Scholar
Chatzilygeroudis, K., Rama, R., Kaushik, R., Goepp, D., Vassiliades, V., Mouret, J.: Black-box data-efficient policy search for robotics. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 51–58 (2017)
Google Scholar
Cheng, C.A., Yan, X., Wagener, N., Boots, B.: Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413 (2018)
De Giacomo, G., Iocchi, L., Nardi, D., Rosati, R.: Planning with sensing for a mobile robot. In: Steel, S., Alami, R. (eds.) ECP 1997. LNCS, vol. 1348, pp. 156–168. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63912-8_83
Chapter Google Scholar
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472 (2011)
Google Scholar
Devlin, S., Kudenko, D.: Plan-based reward shaping for multi-agent reinforcement learning. Knowl. Eng. Rev. 31(1), 44–58 (2016). https://doi.org/10.1017/S0269888915000181
Article Google Scholar
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ingrand, F., Ghallab, M.: Deliberation for autonomous robots: a survey. Artif. Intell. 247, 10–44 (2017). https://doi.org/10.1016/j.artint.2014.11.003. http://www.sciencedirect.com/science/article/pii/S0004370214001350. Special Issue on AI and Robotics
Iocchi, L., Nardi, D., Rosati, R.: Generation of strong cyclic plans with incomplete information and sensing 2(4), 58–65 (2005)
Google Scholar
Jiang, Y., Yang, F., Zhang, S., Stone, P.: Integrating task-motion planning with reinforcement learning for robust decision making in mobile robots. ArXiv abs/1811.08955 (2018)
Google Scholar
Johannink, T., et al.: Residual reinforcement learning for robot control. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6023–6029. IEEE (2019)
Google Scholar
Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. In: Learning Motor Skills. STAR, vol. 97, pp. 9–67. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-03194-1_2
Chapter Google Scholar
Kurenkov, A., Mandlekar, A., Martin-Martin, R., Savarese, S., Garg, A.: AC-Teach: a Bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers. arXiv preprint arXiv:1909.04121 (2019)
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016). https://doi.org/10.1016/j.artint.2016.07.004
Article MathSciNet MATH Google Scholar
Leottau, D.L., Ruiz-del-Solar, J., MacAlpine, P., Stone, P.: A study of layered learning strategies applied to individual behaviors in robot soccer. In: Almeida, L., Ji, J., Steinbauer, G., Luke, S. (eds.) RoboCup 2015. LNCS (LNAI), vol. 9513, pp. 290–302. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-29339-4_24
Chapter Google Scholar
Mellmann, H., Schlotter, B., Blum, C.: Simulation based selection of actions for a humanoid soccer-robot. In: Behnke, S., Sheh, R., Sarıel, S., Lee, D.D. (eds.) RoboCup 2016. LNCS (LNAI), vol. 9776, pp. 193–205. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68792-6_16
Chapter Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mohanan, M., Salgoankar, A.: A survey of robotic motion planning in dynamic environments. Robot. Auton. Syst. 100, 171–185 (2018). https://doi.org/10.1016/j.robot.2017.10.011. http://www.sciencedirect.com/science/article/pii/S0921889017300313
Noda, K., Arie, H., Suga, Y., Ogata, T.: Multimodal integration learning of robot behavior using deep neural networks. Robot. Auton. Syst. 62(6), 721–736 (2014). https://doi.org/10.1016/j.robot.2014.03.003. http://www.sciencedirect.com/science/article/pii/S0921889014000396
Onaindia, E., Sapena, O., Sebastia, L., Marzal, E.: SimPlanner: an execution-monitoring system for replanning in dynamic worlds. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 393–400. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45329-6_38
Chapter MATH Google Scholar
Pierson, H.A., Gashler, M.S.: Deep learning in robotics: a review of recent research. CoRR abs/1707.07217 (2017). http://arxiv.org/abs/1707.07217
Polydoros, A., Nalpantidis, L., Krüger, V.: Real-time deep learning of robotic manipulator inverse dynamics (2015). https://doi.org/10.1109/IROS.2015.7353857
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Watter, M., Springenberg, J.T., Boedecker, J., Riedmiller, M.A.: Embed to control: a locally linear latent dynamics model for control from raw images. CoRR abs/1506.07365 (2015). http://arxiv.org/abs/1506.07365

Download references

Author information

Authors and Affiliations

Department of Computer, Control, and Management Engineering, Sapienza University of Rome, Rome, Italy
Emanuele Antonioni, Francesco Riccio & Daniele Nardi

Authors

Emanuele Antonioni
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Riccio
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Nardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuele Antonioni .

Editor information

Editors and Affiliations

LAAS-CNRS/ANITI, Toulouse, France
Rachid Alami
The University of Texas at Austin, Austin, TX, USA
Joydeep Biswas
University of Washington, Seattle, WA, USA
Maya Cakmak
Western Sydney University, Penrith, NSW, Australia
Oliver Obst

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Antonioni, E., Riccio, F., Nardi, D. (2022). Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots. In: Alami, R., Biswas, J., Cakmak, M., Obst, O. (eds) RoboCup 2021: Robot World Cup XXIV. RoboCup 2021. Lecture Notes in Computer Science(), vol 13132. Springer, Cham. https://doi.org/10.1007/978-3-030-98682-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-98682-7_9
Published: 22 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98681-0
Online ISBN: 978-3-030-98682-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots