Skip to main content
Log in

Learning by reusing previous advice: a memory-based teacher–student framework

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Reinforcement Learning (RL) has been widely used to solve sequential decision-making problems. However, it often suffers from slow learning speed in complex scenarios. Teacher–student frameworks address this issue by enabling agents to ask for and give advice so that a student agent can leverage the knowledge of a teacher agent to facilitate its learning. In this paper, we consider the effect of reusing previous advice, and propose a novel memory-based teacher–student framework such that student agents can memorize and reuse the previous advice from teacher agents. In particular, we propose two methods to decide whether previous advice should be reused: Q-Change per Step that reuses the advice if it leads to an increase in Q-values, and Decay Reusing Probability that reuses the advice with a decaying probability. The experiments on diverse RL tasks (Mario, Predator–Prey and Half Field Offense) confirm that our proposed framework significantly outperforms the existing frameworks in which previous advice is not reused.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Akiyama, H. (2012). Helios team base code.

  2. Amir, O., Kamar, E., Kolobov, A., & Grosz, B. J. (2016). Interactive teaching strategies for agent training. In Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI) (pp. 804–811).

  3. Barto, A. G., Thomas, P. S., & Sutton, R. S. (2017). Some recent applications of reinforcement learning. In Proceedings of the 18th Yale workshop on adaptive and learning systems.

  4. Brys, T., Nowé, A., Kudenko, D., Taylor, M. (2014). Combining multiple correlated reward and shaping signals by measuring confidence. In Proceedings of 28th AAAI conference on artificial intelligence (pp. 1687–1693).

  5. Chiu, D. K. W., Leung, H. F., & Lam, K. M. (2009). On the making of service recommendations: An action theory based on utility, reputation, and risk attitude. Expert Systems with Applications, 36(2), 3293–3301.

    Article  Google Scholar 

  6. Claus, C., & Boutilier C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In The national conference on artificial intelligence (pp. 746–752)

  7. Clouse, J. A. (1996). On integrating apprentice learning and reinforcement learning. PhD thesis, University of Massachusetts

  8. da Silva, F. L., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64, 645–703.

    Article  MathSciNet  MATH  Google Scholar 

  9. da Silva F. L., Glatt, R., & Costa, A. H. R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1100–1108).

  10. Felipe Leno da Silva, Pablo Hernandez-Leal, Bilal Kartal, and Taylor, M. E. (2020) Uncertainty-aware action advising for deep reinforcement learning agents. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (pp.5792–5799

  11. Felipe Leno da Silva, Matthew E. Taylor, and Anna Helena Reali Costa (2018) Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (pp.5487–5493

  12. Fachantidis, A., Taylor, M. E., & Vlahavas, I. P. (2017). Learning to teach reinforcement learning agents. Machine Learning and Knowledge Extraction, 1, 21–42.

    Article  Google Scholar 

  13. Ilhan, E., Gow, J., & Liebana, D. P. (2019) Teaching on a budget in multi-agent deep reinforcement learning. arXiv:1905.01357

  14. Karakovskiy, S., & Togelius, J. (2012). The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 55–67.

    Article  Google Scholar 

  15. Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E., & Matsubara, H. (1997). Robocup: A challenge problem for AI. AI Magazine, 18, 73–85.

    Google Scholar 

  16. Kober, J., Bagnell, J.A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32, 1238–1274.

    Article  Google Scholar 

  17. Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative markov games: A survey regarding coordination problems. Knowledge Engineering Review, 27, 1–31.

    Article  Google Scholar 

  18. Oliehoek, F. A., & Amato, C. (2016). A concise introduction to decentralized POMDPs (1st ed.). Springer: New York.

    Book  MATH  Google Scholar 

  19. Omidshafiei, S., Kim, D.-K., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., How, J. P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In The thirty-third AAAI conference on artificial intelligence (pp. 6128–6136).

  20. Rummery, G. A., & Niranjan, M. (1994). On-line q-learning using connectionist systems. Technical report cued/f-infeng/tr 166, Cambridge University Engineering Department.

  21. Sherstov, A. A., & Stone, P. (2005). Function approximation via tile coding: Automating parameter choice. In Proceedings symposium on abstraction, reformulation, and approximation (SARA-05), Edinburgh, Scotland, UK

  22. Suay H. B., Brys T., Taylor, M. E., & Chernova S. (2016). Learning from demonstration for shaping through inverse reinforcement learning. In Proceedings of the 2016 international conference on autonomous agents and multiagent systems) (pp. 429–437).

  23. Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction (1st ed.). Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  24. Taylor, M. E., Carboni, N., Fachantidis, A., Vlahavas, I. P., & Torrey, L. (2014). Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1), 45–63.

    Article  Google Scholar 

  25. Torrey, L., & Taylor, M. E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (pp. 1053–1060).

  26. Wang, Y., Lu, W., Hao, J., Wei, J., & Leung, H. f. (2018). Efficient convention emergence through decoupled reinforcement social learning with teacher–student mechanism. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 795–803).

  27. Wang, Z., & Taylor. M. E. (2017). Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI) (pp. 3027–3033).

  28. Watkins, C. J. C. H., & Dayan, P. (1992). Technical note: Q-learning. Machine Learning, 8, 279–292.

    Article  MATH  Google Scholar 

  29. Zhan, Y., Bou-Ammar, H., & Taylor, M. E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the twenty-fifth international joint conference on artificial intelligence (pp. 2315–2321).

  30. Zhu, C., Cai, Y., Leung, H.-f., & Hu, S. (2020). Learning by reusing previous advice in teacher–student paradigm. In: A. El Fallah Seghrouchni, G. Sukthankar, B. An, and N. Yorke-Smith (Eds.), Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS ’20, Auckland, New Zealand, May 9–13, 2020. International foundation for autonomous agents and multiagent systems, 2020 (pp. 1674–1682).

  31. Zimmer, M., Viappiani, P. & Weng, P. (2014). Teacher–student framework: A reinforcement learning approach. In AAMAS workshop.

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (62076100), and Fundamental Research Funds for the Central Universities, SCUT (D2210010, D2200150, and D2201300), the Science and Technology Planning Project of Guangdong Province (2020B0101100002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuyue Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, C., Cai, Y., Hu, S. et al. Learning by reusing previous advice: a memory-based teacher–student framework. Auton Agent Multi-Agent Syst 37, 14 (2023). https://doi.org/10.1007/s10458-022-09595-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-022-09595-1

Keywords

Navigation