Skip to main content
Log in

Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Reinforcement Learning (RL) is a well-known technique for learning the solutions of control problems from the interactions of an agent in its domain. However, RL is known to be inefficient in problems of the real-world where the state space and the set of actions grow up fast. Recently, heuristics, case-based reasoning (CBR) and transfer learning have been used as tools to accelerate the RL process. This paper investigates a class of algorithms called Transfer Learning Heuristically Accelerated Reinforcement Learning (TLHARL) that uses CBR as heuristics within a transfer learning setting to accelerate RL. The main contributions of this work are the proposal of a new TLHARL algorithm based on the traditional RL algorithm Q(λ) and the application of TLHARL on two distinct real-robot domains: a robot soccer with small-scale robots and the humanoid-robot stability learning. Experimental results show that our proposed method led to a significant improvement of the learning rate in both domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aha, D.W., Molineaux, M., Sukthankar, G.: Case-based reasoning in transfer learning. In: Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, ICCBR ’09, pp. 29–44. Springer-Verlag, Berlin (2009)

  2. Araujo, E.G., Grupen, R.A.: Learning control composition in a complex environment. In: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pp. 333–342. MIT Press/Bradford Books (1996)

  3. Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)

    Article  Google Scholar 

  4. Astrom, K.J., Furuta, K.: Swinging up a pendulum by energy control. Automatica 36(2), 287–295 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Atkeson, C. G., Schaal, S.: Robot learning from demonstration. In: International Conference on Machine Learning, pp. 12–20 (1997)

  6. Banerjee, B., Stone, P.: General game learning using knowledge transfer. In: The 20th International Joint Conference on Artificial Intelligence, pp. 672–677 (2007)

  7. Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Guyon, I., Dror, G., Lemaire, V., Taylor, G., Silver, D. (eds.) Proceedings of ICML Workshop on Unsupervised and Transfer Learning, Proceedings of Machine Learning Research, vol. 27, pp. 17–36. PMLR, Bellevue, Washington, USA. http://proceedings.mlr.press/v27/bengio12a.html (2012)

  8. Bianchi, R., Celiberto, L.A., Matsuura, J., Santos, P, de Mántaras, R.L.: Transferring knowledge as heuristics in reinforcement learning: a case base approach. Artif. Intell. 226, 102–121 (2015)

    Article  MATH  Google Scholar 

  9. Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Heuristically Accelerated Q-Learning: a new approach to speed up reinforcement learning. Lect. Notes Artif. Intell. 3171, 245–254 (2004)

    MATH  Google Scholar 

  10. Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Accelerating autonomous learning by using heuristic selection of actions. J. Heuristics 14(2), 135–168 (2008)

    Article  MATH  Google Scholar 

  11. de Boer, R., Kok, J.: The Incremental Development of a Synthetic Multi-Agent System: The UvA Trilearn 2001 Robotic Soccer Simulation Team. Master’s Thesis. University of Amsterdam, Amsterdam (2002)

    Google Scholar 

  12. Caruana, R.: Learning many related tasks at the same time with backpropagation. In: Advances in Neural Information Processing Systems 7, pp. 657–664. Morgan Kaufmann (1995)

  13. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  14. Celiberto, L.A. Jr, Bianchi, R.A.C., Santos, P.E.: Transfer learning heuristically accelerated algorithm: a case study with real robots. In: 2016 Latin American Robotics Symposium and Intelligent Robotics Meeting, pp. 311–315 (2016)

  15. Celiberto, L.A. Jr, Matsuura, J.P., de Mantaras, R.L., Bianchi, R.A.C.: Using transfer learning to speed-up reinforcement learning: A cased-based approach. In: 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting, pp. 55–60 (2010)

  16. Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtasks. J. Artif. Intell. Res. 16, 59–104 (2002)

    Article  MATH  Google Scholar 

  17. Du, Y., de la Cruz, G.V., Irwin, J., Taylor, M.E.: Initial progress in transfer for deep reinforcement learning algorithms. In: International Joint Conference on Artificial Intelligence (2016)

  18. Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’06, pp. 720–727. ACM, New York, NY, USA (2006)

  19. Ferreira, L.A., Costa Ribeiro, C.H., da Costa Bianchi, R.A.: Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl. Intell. 41(2), 551–562 (2014)

    Article  Google Scholar 

  20. Glatt, R., da Silva, F.L., Costa, A.H.R.: Towards knowledge transfer in deep reinforcement learning. In: Proceedings of the Brazilian Conference on Intelligent System (BRACIS), pp. 91–96 (2016)

  21. Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: Integrating human feedback with reinforcement learning. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) NIPS, pp. 2625–2633 (2013)

  22. Gupta, A., Devin, C., Liu, Y., Abbeel, P., Levine, S.: Learning invariant feature spaces to transfer skills with reinforcement learning. In: Proceedings of the Fifth International Conference on Learning Representations. OpenReview, Toulon, France (2017)

  23. Ha, I., Tamura, Y., Asama, H., Han, J., Hong, D.W.: Development of open humanoid platform darwin-op. In: SICE Annual Conference 2011, pp. 2178–2181 (2011)

  24. von Hessling, A., Goel, A.K.: Abstracting reusable cases from reinforcement learning. In: Brüninghaus, S. (ed.) 6th International Conference on Case-Based Reasoning, ICCBR 2005, Chicago, IL, USA, August 23-26, 2005, Workshop Proceedings, pp. 227–236 (2005)

  25. Lazaric, A.: Transfer in Reinforcement Learning: A Framework and a Survey, pp. 143–173. Springer Berlin Heidelberg, Berlin (2012)

  26. Lemke, C., Budka, M., Gabrys, B.: Metalearning: a survey of trends and technologies. Artif. Intell. Rev. 44, 1–14 (2013)

    Google Scholar 

  27. Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., Zhang, G.: Transfer learning using computational intelligence. Know.-Based Syst. 80(C), 14–23 (2015). https://doi.org/10.1016/j.knosys.2015.01.010

    Article  Google Scholar 

  28. de Mántaras, R.L., McSherry, D., Bridge, D., Leake, D., Smyth, B., Craw, S., Faltings, B., Maher, M.L., Cox, M.T., Forbus, K., Keane, M., Aamodt, A., Watson, I.: Retrieval, reuse, revision and retention in case-based reasoning. Knowl. Eng. Rev 20(3), 215–240 (2005)

    Article  Google Scholar 

  29. Nichols, B. D.: Continuous action-space reinforcement learning methods applied to the minimum-time swing-up of the acrobot. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2084–2089 (2015)

  30. Niculescu-Mizil, A., Caruana: Inductive transfer for Bayesian network structure learning. In: Unsupervised and Transfer Learning - Workshop held at ICML 2011, Bellevue, Washington, USA, July 2, 2011, pp. 167–180 (2012)

  31. Noda, I.: Soccer server: a simulator of robocup. In: Proceedings of AI Symposium of the Japanese Society for Artificial Intelligence, pp. 29–34 (1995)

  32. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  33. Parisotto, E., Ba, L.J., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv:1511.06342 (2015)

  34. Patricia, N., Caputo, B.: Learning to learn, from transfer learning to domain adaptation: A unifying perspective. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 1442–1449. IEEE Computer Society, Washington, DC, USA (2014)

  35. Perico, D.H., Silva, I.J., Vilão Junior, C.O., Homem, T.P.D., Destro, R.C., Tonidandel, F., Bianchi, R.A.C.: Newton: A high level control humanoid robot for the robocup soccer kidsize league. In: Osório, F.S., Wolf, D.F., Castelo Branco, K., Grassi, V. Jr., Becker, M., Romero, R.A.F. (eds.) Robotics: Joint Conference on Robotics, LARS 2014, SBR 2014, Robocontrol 2014, São Carlos, Brazil, October 18-23, 2014. Revised Selected Papers, pp. 53–73. Springer Berlin Heidelberg, Berlin (2015)

  36. Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: A low cost scalable robot system for collective behaviors. In: 2012 IEEE International Conference on Robotics and Automation, pp. 3293–3298 (2012)

  37. Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1), 123–158 (1996)

    MATH  Google Scholar 

  38. Spiegel, M.R.: Statistics. McGraw-Hill, New York (1998)

    Google Scholar 

  39. Spong, M.W.: The swing up control problem for the Acrobot. IEEE Control Syst. 15(1), 49–55 (1995)

    Article  Google Scholar 

  40. Student: The probable error of a mean. Biometrika 6(1), 1–25 (1908)

    Article  Google Scholar 

  41. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9–44 (1988)

    Google Scholar 

  42. Suttom, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Adv. Neural Inf. Proces. Syst. 8, 1038–1044 (1996)

    Google Scholar 

  43. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  44. Tan, B., Song, Y., Zhong, E., Yang, Q.: Transitive transfer learning. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pp. 1155–1164. ACM, New York, NY, USA (2015)

  45. Taylor, M.E.: Autonomous Inter-task Transfer in Reinforcement Learning Domains. Ph.D. Thesis, Department of Computer Sciences, The University of Texas at Austin (2008)

  46. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10(1), 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  47. Taylor, M.E., Stone, P., Jong, N.K.: Transferring instances for model-based reinforcement learning. In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Artificial Intelligence, vol. 5212, pp. 488–505 (2008)

  48. Tharin, J.: Kilobot User Manual. K-Team (2010)

  49. Thorndike, E.L., Woodworth, R.S.: The influence of improvement in one mental function upon the efficiency of other functions. Psychol. Rev. 8, 247–261 (1901)

    Article  Google Scholar 

  50. Thrun, S., Mitchell, T.M.: Learning one more thing. In: IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1217–1223. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1995)

  51. Watkins, C.J.C.H.: Learning from Delayed rewards. Ph.D. Thesis. University of Cambridge, Cambridge (1989)

    Google Scholar 

  52. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016)

    Article  Google Scholar 

  53. Welch, B. L.: The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34(1), 28–35 (1947)

    MathSciNet  MATH  Google Scholar 

  54. Wender, S., Watson, I.: Combining case-based reasoning and reinforcement learning for tactical unit selection in real-time strategy game AI, pp. 413–429. Springer International Publishing, Berlin (2016)

    Google Scholar 

  55. Zhang, X., Yu, T., Yang, B., Cheng, L.: Accelerating bio-inspired optimizer with transfer reinforcement learning for reactive power optimization. Knowledge-Based Systems pp. – (2016)

  56. Zhang, X.S., Li, Q., YU, T., Yang, B.: Consensus transfer q-learning for decentralized generation command dispatch based on virtual generation tribe. IEEE Trans. Smart Grid PP(99), 1–1 (2016). https://doi.org/10.1109/TSG.2016.2607801

    Google Scholar 

  57. Zhang, A., She, J., Lai, X., Wu, M.: Motion planning and tracking control for an acrobot based on a rewinding approach. Automatica 49(1), 278–284 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Reinaldo Bianchi acknowledges support from FAPESP (2016/21047-3), Paulo E. Santos acknowledges support from FAPESP-IBM (2016/18792-9) and CNPq (307093/2014-0), Isaac J. da Silva acknowledges support from CAPES, and Ramon Lopez de Mantaras acknowledges support from Generalitat de Catalunya Research Grant 2014 SGR 118 and CSIC Project 201550E022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reinaldo A. C. Bianchi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bianchi, R.A.C., Santos, P.E., da Silva, I.J. et al. Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning. J Intell Robot Syst 91, 301–312 (2018). https://doi.org/10.1007/s10846-017-0731-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-017-0731-2

Keywords

Mathematics Subject Classification (2010)

Navigation