Abstract
The design and implementation of behaviors for robots operating in dynamic and complex environments are becoming mandatory in nowadays applications. Reinforcement learning is consistently showing remarkable results in learning effective action policies and in achieving super-human performance in various tasks – without exploiting prior knowledge. However, in robotics, the use of purely learning-based techniques is still subject to strong limitations. Foremost, sample efficiency. Such techniques, in fact, are known to require large training datasets, and long training sessions, in order to develop effective action policies. Hence in this paper, to alleviate such constraint, and to allow learning in such robotic scenarios, we introduce SErP (Sample Efficient robot Policies), an iterative algorithm to improve the sample-efficiency of learning algorithms. SErP exploits a sub-optimal planner (here implemented with a monitor-replanning algorithm) to lead the exploration of the learning agent through its initial iterations. Intuitively, SErP exploits the planner as an expert in order to enable focused exploration and to avoid portions of the search space that are not effective to solve the task of the robot. Finally, to confirm our insights and to show the improvements that SErP carries with, we report the results obtained in two different robotic scenarios: (1) a cartpole scenario and (2) a soccer-robots scenario within the RoboCup@Soccer SPL environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aşık, O., Görer, B., Akın, H.L.: End-to-end deep imitation learning: robot soccer case study. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 137–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_11
Baker, B., et al.: Emergent tool use from multi-agent autocurricula (2019)
Ben-Ari, M., Mondada, F.: Robots and their applications. In: Elements of Robotics, pp. 1–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-62533-1_1
Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003). https://doi.org/10.1162/153244303765208377
Böhmer, W., Springenberg, J.T., Boedecker, J., Riedmiller, M., Obermayer, K.: Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz 29(4), 353–362 (2015). https://doi.org/10.1007/s13218-015-0356-1
Chatzilygeroudis, K., Rama, R., Kaushik, R., Goepp, D., Vassiliades, V., Mouret, J.: Black-box data-efficient policy search for robotics. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 51–58 (2017)
Cheng, C.A., Yan, X., Wagener, N., Boots, B.: Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413 (2018)
De Giacomo, G., Iocchi, L., Nardi, D., Rosati, R.: Planning with sensing for a mobile robot. In: Steel, S., Alami, R. (eds.) ECP 1997. LNCS, vol. 1348, pp. 156–168. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63912-8_83
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472 (2011)
Devlin, S., Kudenko, D.: Plan-based reward shaping for multi-agent reinforcement learning. Knowl. Eng. Rev. 31(1), 44–58 (2016). https://doi.org/10.1017/S0269888915000181
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Ingrand, F., Ghallab, M.: Deliberation for autonomous robots: a survey. Artif. Intell. 247, 10–44 (2017). https://doi.org/10.1016/j.artint.2014.11.003. http://www.sciencedirect.com/science/article/pii/S0004370214001350. Special Issue on AI and Robotics
Iocchi, L., Nardi, D., Rosati, R.: Generation of strong cyclic plans with incomplete information and sensing 2(4), 58–65 (2005)
Jiang, Y., Yang, F., Zhang, S., Stone, P.: Integrating task-motion planning with reinforcement learning for robust decision making in mobile robots. ArXiv abs/1811.08955 (2018)
Johannink, T., et al.: Residual reinforcement learning for robot control. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6023–6029. IEEE (2019)
Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. In: Learning Motor Skills. STAR, vol. 97, pp. 9–67. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-03194-1_2
Kurenkov, A., Mandlekar, A., Martin-Martin, R., Savarese, S., Garg, A.: AC-Teach: a Bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers. arXiv preprint arXiv:1909.04121 (2019)
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016). https://doi.org/10.1016/j.artint.2016.07.004
Leottau, D.L., Ruiz-del-Solar, J., MacAlpine, P., Stone, P.: A study of layered learning strategies applied to individual behaviors in robot soccer. In: Almeida, L., Ji, J., Steinbauer, G., Luke, S. (eds.) RoboCup 2015. LNCS (LNAI), vol. 9513, pp. 290–302. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-29339-4_24
Mellmann, H., Schlotter, B., Blum, C.: Simulation based selection of actions for a humanoid soccer-robot. In: Behnke, S., Sheh, R., Sarıel, S., Lee, D.D. (eds.) RoboCup 2016. LNCS (LNAI), vol. 9776, pp. 193–205. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68792-6_16
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mohanan, M., Salgoankar, A.: A survey of robotic motion planning in dynamic environments. Robot. Auton. Syst. 100, 171–185 (2018). https://doi.org/10.1016/j.robot.2017.10.011. http://www.sciencedirect.com/science/article/pii/S0921889017300313
Noda, K., Arie, H., Suga, Y., Ogata, T.: Multimodal integration learning of robot behavior using deep neural networks. Robot. Auton. Syst. 62(6), 721–736 (2014). https://doi.org/10.1016/j.robot.2014.03.003. http://www.sciencedirect.com/science/article/pii/S0921889014000396
Onaindia, E., Sapena, O., Sebastia, L., Marzal, E.: SimPlanner: an execution-monitoring system for replanning in dynamic worlds. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 393–400. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45329-6_38
Pierson, H.A., Gashler, M.S.: Deep learning in robotics: a review of recent research. CoRR abs/1707.07217 (2017). http://arxiv.org/abs/1707.07217
Polydoros, A., Nalpantidis, L., Krüger, V.: Real-time deep learning of robotic manipulator inverse dynamics (2015). https://doi.org/10.1109/IROS.2015.7353857
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Watter, M., Springenberg, J.T., Boedecker, J., Riedmiller, M.A.: Embed to control: a locally linear latent dynamics model for control from raw images. CoRR abs/1506.07365 (2015). http://arxiv.org/abs/1506.07365
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Antonioni, E., Riccio, F., Nardi, D. (2022). Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots. In: Alami, R., Biswas, J., Cakmak, M., Obst, O. (eds) RoboCup 2021: Robot World Cup XXIV. RoboCup 2021. Lecture Notes in Computer Science(), vol 13132. Springer, Cham. https://doi.org/10.1007/978-3-030-98682-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-98682-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98681-0
Online ISBN: 978-3-030-98682-7
eBook Packages: Computer ScienceComputer Science (R0)