Skip to main content
Log in

Coaching: accelerating reinforcement learning through human-assisted approach

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

The learning process in reinforcement learning is time-consuming because on early episodes agent relies too much on exploration. The proposed “coaching” approach focused on helping to accelerate learning for the system with a sparse environmental reward setting. This approach works well with linear epsilon-greedy Q-learning with eligibility traces. To coach an agent, an intermediate target is given by a human coach as a sub-goal for the agent to pursue. This sub-goal provides an additional clue that guides the agent toward the actual terminal state. In the coaching phase, the agent pursues an intermediate target with an aggressive policy. The aggressive reward from this intermediate target would not be used to update the state-action value directly but the environmental reward is used. After a small number of coaching episodes, the learning would proceed normally with an \(\epsilon \)-greedy policy. In this way, the agent will end up with an optimal policy which is not under influence or supervision of a human coach. The proposed method has been tested on three experimental tasks: mountain car, ball following, and obstacle avoidance. Even with the human coach of various skill levels, the experimental results show that this method could speed up the learning process of an agent in all tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  1. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  2. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  3. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270

    Article  Google Scholar 

  4. Warnell, G., Waytowich, N., Lawhern, V., Stone, P.: Deep tamer: Interactive agent shaping in high-dimensional state spaces. In: Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orlearns, Louisiana, USA (2018)

  5. Kaplan, F., Oudeyer, P.Y., Kubinyi, E., Miklosi, A.: Robotic clicker training. Robot. Auton. Syst. 38(3–4), 197–206 (2002)

    Article  Google Scholar 

  6. Thomaz, A.L., Hoffman, G., Breazeal, C.: Reinforcement learning with human teachers: understanding how people want to teach robots. In: Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN 2006), Hatfield, UK., pp. 352–357 (2006)

  7. Thomaz, A.L., Breazeal, C.: Reinforcement learning with human teacher: evidence of feedback and guidance with implications for learning performance. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 06). Boston, Massachusetts, USA, pp. 1000–1005 (2006)

  8. Thomaz, A.L., Breazeal, C.: Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif. Intell. 172(6–4), 716–737 (2008). https://doi.org/10.1016/j.artint.2007.09.009

    Article  Google Scholar 

  9. Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. Poster session presented at: Advances in Neural Information Processing Systems 26 (NIPS 2013) (2013)

  10. Judah, K., Roy, S., Fern, A., Dietterich, T.G.: Reinforcement Learning Via Practice and Critique Advice. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI 2010). Atlanta, Georgia, USA (2010)

  11. Tenorio-Gonzalez, A., Morales, E., Villaseor-Pineda, L.: Dynamic reward shaping: training a robot by voice. Advances in Artificial Intelligence–IBERAMIA. pp. 483–492 (2010)

  12. Leon, L.A., Tenorio, A.C., Morales, E.F.: Human interaction for effective reinforcement learning. In: Proceedings of the European Conference Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 13). Prague (2013)

  13. Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the 5th International Conference on Knowledge Capture (K-CAP 09). Redondo Beach, California, USA, pp. 9-16. (2009)

  14. Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent System (AAMAS10). Toronto, Canada, 1, pp. 5–12. (2010)

  15. Knox, W.B., Stone, P.: Reinforcement learning from simultaneous human and MDP reward. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent System (AAMAS 12). Valencia, Spain, 1, pp. 475–482. (2012)

  16. Sridharan, M.: Augmented reinforcement learning for interaction with non-expert humans in agent domains. In: Proceedings of IEEE International Conference on Machine Learning Applications. Honolulu, HI, USA, pp. 424–429 (2011)

  17. Celemin, C., Ruiz-del-Solar, J.: COACH: Learning continuous actions from COrrective Advice Communicated by Humans. In: Proceedings of 2015 International Conference on Advanced Robotics (ICAR), Istanbul, pp. 581–586 (2015)

  18. Celemin, C., Ruiz-del-Solar, J.: Teaching Agents with Corrective Human Feedback for Challenging Problem. In: Proceedings of 2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI). Cartagena, pp. 1–6 (2016)

  19. Vien, N.A., Ertel, W.: Learning via human feedback in continuous state and action spaces. In: Proceedings of AAAI 2012 Fall Symposium Series, Robots Learning Interactively from Human Teachers (RLIHT). Arlington, USA, pp. 65–72 (2012)

  20. Hirkoawa, M., Suzuki, K.: Coaching robots: online behavior learning from human subjective feedback. In: Innovations in Intelligent Machines-3, vol. 442, pp. 37–51 (2013)

    Chapter  Google Scholar 

  21. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML) (2009)

  22. CMUSphinx Documentation—CMUSphinx Open Source Speech Recognition. https://cmusphinx.github.io/wiki/(2018). Accessed 31 Dec 2018

  23. Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., Wölfel, J.: Sphinx-4: a flexible open source framework for speech recognition. Technical Report. Sun Microsystems. Mountain View, CA, USA (2004)

  24. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nakarin Suppakun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suppakun, N., Maneewarn, T. Coaching: accelerating reinforcement learning through human-assisted approach. Prog Artif Intell 9, 155–169 (2020). https://doi.org/10.1007/s13748-020-00204-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-020-00204-4

Keywords

Navigation