Skip to main content

Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots

  • Conference paper
  • First Online:
RoboCup 2021: Robot World Cup XXIV (RoboCup 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13132))

Included in the following conference series:

Abstract

The design and implementation of behaviors for robots operating in dynamic and complex environments are becoming mandatory in nowadays applications. Reinforcement learning is consistently showing remarkable results in learning effective action policies and in achieving super-human performance in various tasks – without exploiting prior knowledge. However, in robotics, the use of purely learning-based techniques is still subject to strong limitations. Foremost, sample efficiency. Such techniques, in fact, are known to require large training datasets, and long training sessions, in order to develop effective action policies. Hence in this paper, to alleviate such constraint, and to allow learning in such robotic scenarios, we introduce SErP (Sample Efficient robot Policies), an iterative algorithm to improve the sample-efficiency of learning algorithms. SErP exploits a sub-optimal planner (here implemented with a monitor-replanning algorithm) to lead the exploration of the learning agent through its initial iterations. Intuitively, SErP exploits the planner as an expert in order to enable focused exploration and to avoid portions of the search space that are not effective to solve the task of the robot. Finally, to confirm our insights and to show the improvements that SErP carries with, we report the results obtained in two different robotic scenarios: (1) a cartpole scenario and (2) a soccer-robots scenario within the RoboCup@Soccer SPL environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/openai/baselines.

References

  1. Aşık, O., Görer, B., Akın, H.L.: End-to-end deep imitation learning: robot soccer case study. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018. LNCS (LNAI), vol. 11374, pp. 137–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27544-0_11

    Chapter  Google Scholar 

  2. Baker, B., et al.: Emergent tool use from multi-agent autocurricula (2019)

    Google Scholar 

  3. Ben-Ari, M., Mondada, F.: Robots and their applications. In: Elements of Robotics, pp. 1–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-62533-1_1

    Chapter  MATH  Google Scholar 

  4. Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003). https://doi.org/10.1162/153244303765208377

    Article  MathSciNet  MATH  Google Scholar 

  5. Böhmer, W., Springenberg, J.T., Boedecker, J., Riedmiller, M., Obermayer, K.: Autonomous learning of state representations for control: an emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. KI - Künstliche Intelligenz 29(4), 353–362 (2015). https://doi.org/10.1007/s13218-015-0356-1

    Article  Google Scholar 

  6. Chatzilygeroudis, K., Rama, R., Kaushik, R., Goepp, D., Vassiliades, V., Mouret, J.: Black-box data-efficient policy search for robotics. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 51–58 (2017)

    Google Scholar 

  7. Cheng, C.A., Yan, X., Wagener, N., Boots, B.: Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413 (2018)

  8. De Giacomo, G., Iocchi, L., Nardi, D., Rosati, R.: Planning with sensing for a mobile robot. In: Steel, S., Alami, R. (eds.) ECP 1997. LNCS, vol. 1348, pp. 156–168. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63912-8_83

    Chapter  Google Scholar 

  9. Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472 (2011)

    Google Scholar 

  10. Devlin, S., Kudenko, D.: Plan-based reward shaping for multi-agent reinforcement learning. Knowl. Eng. Rev. 31(1), 44–58 (2016). https://doi.org/10.1017/S0269888915000181

    Article  Google Scholar 

  11. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  12. Ingrand, F., Ghallab, M.: Deliberation for autonomous robots: a survey. Artif. Intell. 247, 10–44 (2017). https://doi.org/10.1016/j.artint.2014.11.003. http://www.sciencedirect.com/science/article/pii/S0004370214001350. Special Issue on AI and Robotics

  13. Iocchi, L., Nardi, D., Rosati, R.: Generation of strong cyclic plans with incomplete information and sensing 2(4), 58–65 (2005)

    Google Scholar 

  14. Jiang, Y., Yang, F., Zhang, S., Stone, P.: Integrating task-motion planning with reinforcement learning for robust decision making in mobile robots. ArXiv abs/1811.08955 (2018)

    Google Scholar 

  15. Johannink, T., et al.: Residual reinforcement learning for robot control. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6023–6029. IEEE (2019)

    Google Scholar 

  16. Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. In: Learning Motor Skills. STAR, vol. 97, pp. 9–67. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-03194-1_2

    Chapter  Google Scholar 

  17. Kurenkov, A., Mandlekar, A., Martin-Martin, R., Savarese, S., Garg, A.: AC-Teach: a Bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers. arXiv preprint arXiv:1909.04121 (2019)

  18. Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016). https://doi.org/10.1016/j.artint.2016.07.004

    Article  MathSciNet  MATH  Google Scholar 

  19. Leottau, D.L., Ruiz-del-Solar, J., MacAlpine, P., Stone, P.: A study of layered learning strategies applied to individual behaviors in robot soccer. In: Almeida, L., Ji, J., Steinbauer, G., Luke, S. (eds.) RoboCup 2015. LNCS (LNAI), vol. 9513, pp. 290–302. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-29339-4_24

    Chapter  Google Scholar 

  20. Mellmann, H., Schlotter, B., Blum, C.: Simulation based selection of actions for a humanoid soccer-robot. In: Behnke, S., Sheh, R., Sarıel, S., Lee, D.D. (eds.) RoboCup 2016. LNCS (LNAI), vol. 9776, pp. 193–205. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68792-6_16

    Chapter  Google Scholar 

  21. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  22. Mohanan, M., Salgoankar, A.: A survey of robotic motion planning in dynamic environments. Robot. Auton. Syst. 100, 171–185 (2018). https://doi.org/10.1016/j.robot.2017.10.011. http://www.sciencedirect.com/science/article/pii/S0921889017300313

  23. Noda, K., Arie, H., Suga, Y., Ogata, T.: Multimodal integration learning of robot behavior using deep neural networks. Robot. Auton. Syst. 62(6), 721–736 (2014). https://doi.org/10.1016/j.robot.2014.03.003. http://www.sciencedirect.com/science/article/pii/S0921889014000396

  24. Onaindia, E., Sapena, O., Sebastia, L., Marzal, E.: SimPlanner: an execution-monitoring system for replanning in dynamic worlds. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 393–400. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45329-6_38

    Chapter  MATH  Google Scholar 

  25. Pierson, H.A., Gashler, M.S.: Deep learning in robotics: a review of recent research. CoRR abs/1707.07217 (2017). http://arxiv.org/abs/1707.07217

  26. Polydoros, A., Nalpantidis, L., Krüger, V.: Real-time deep learning of robotic manipulator inverse dynamics (2015). https://doi.org/10.1109/IROS.2015.7353857

  27. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)

    Article  Google Scholar 

  28. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  29. Watter, M., Springenberg, J.T., Boedecker, J., Riedmiller, M.A.: Embed to control: a locally linear latent dynamics model for control from raw images. CoRR abs/1506.07365 (2015). http://arxiv.org/abs/1506.07365

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuele Antonioni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Antonioni, E., Riccio, F., Nardi, D. (2022). Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots. In: Alami, R., Biswas, J., Cakmak, M., Obst, O. (eds) RoboCup 2021: Robot World Cup XXIV. RoboCup 2021. Lecture Notes in Computer Science(), vol 13132. Springer, Cham. https://doi.org/10.1007/978-3-030-98682-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98682-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98681-0

  • Online ISBN: 978-3-030-98682-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics