Coaching: accelerating reinforcement learning through human-assisted approach

Suppakun, Nakarin; Maneewarn, Thavida

doi:10.1007/s13748-020-00204-4

Coaching: accelerating reinforcement learning through human-assisted approach

Regular Paper
Published: 27 February 2020

Volume 9, pages 155–169, (2020)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

577 Accesses
Explore all metrics

Abstract

The learning process in reinforcement learning is time-consuming because on early episodes agent relies too much on exploration. The proposed “coaching” approach focused on helping to accelerate learning for the system with a sparse environmental reward setting. This approach works well with linear epsilon-greedy Q-learning with eligibility traces. To coach an agent, an intermediate target is given by a human coach as a sub-goal for the agent to pursue. This sub-goal provides an additional clue that guides the agent toward the actual terminal state. In the coaching phase, the agent pursues an intermediate target with an aggressive policy. The aggressive reward from this intermediate target would not be used to update the state-action value directly but the environmental reward is used. After a small number of coaching episodes, the learning would proceed normally with an $\epsilon $-greedy policy. In this way, the agent will end up with an optimal policy which is not under influence or supervision of a human coach. The proposed method has been tested on three experimental tasks: mountain car, ball following, and obstacle avoidance. Even with the human coach of various skill levels, the experimental results show that this method could speed up the learning process of an agent in all tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning

Article 30 October 2017

RL-Studio: A Tool for Reinforcement Learning Methods in Robotics

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
Article Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270
Article Google Scholar
Warnell, G., Waytowich, N., Lawhern, V., Stone, P.: Deep tamer: Interactive agent shaping in high-dimensional state spaces. In: Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orlearns, Louisiana, USA (2018)
Kaplan, F., Oudeyer, P.Y., Kubinyi, E., Miklosi, A.: Robotic clicker training. Robot. Auton. Syst. 38(3–4), 197–206 (2002)
Article Google Scholar
Thomaz, A.L., Hoffman, G., Breazeal, C.: Reinforcement learning with human teachers: understanding how people want to teach robots. In: Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN 2006), Hatfield, UK., pp. 352–357 (2006)
Thomaz, A.L., Breazeal, C.: Reinforcement learning with human teacher: evidence of feedback and guidance with implications for learning performance. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 06). Boston, Massachusetts, USA, pp. 1000–1005 (2006)
Thomaz, A.L., Breazeal, C.: Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif. Intell. 172(6–4), 716–737 (2008). https://doi.org/10.1016/j.artint.2007.09.009
Article Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. Poster session presented at: Advances in Neural Information Processing Systems 26 (NIPS 2013) (2013)
Judah, K., Roy, S., Fern, A., Dietterich, T.G.: Reinforcement Learning Via Practice and Critique Advice. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI 2010). Atlanta, Georgia, USA (2010)
Tenorio-Gonzalez, A., Morales, E., Villaseor-Pineda, L.: Dynamic reward shaping: training a robot by voice. Advances in Artificial Intelligence–IBERAMIA. pp. 483–492 (2010)
Leon, L.A., Tenorio, A.C., Morales, E.F.: Human interaction for effective reinforcement learning. In: Proceedings of the European Conference Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 13). Prague (2013)
Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the 5th International Conference on Knowledge Capture (K-CAP 09). Redondo Beach, California, USA, pp. 9-16. (2009)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent System (AAMAS10). Toronto, Canada, 1, pp. 5–12. (2010)
Knox, W.B., Stone, P.: Reinforcement learning from simultaneous human and MDP reward. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent System (AAMAS 12). Valencia, Spain, 1, pp. 475–482. (2012)
Sridharan, M.: Augmented reinforcement learning for interaction with non-expert humans in agent domains. In: Proceedings of IEEE International Conference on Machine Learning Applications. Honolulu, HI, USA, pp. 424–429 (2011)
Celemin, C., Ruiz-del-Solar, J.: COACH: Learning continuous actions from COrrective Advice Communicated by Humans. In: Proceedings of 2015 International Conference on Advanced Robotics (ICAR), Istanbul, pp. 581–586 (2015)
Celemin, C., Ruiz-del-Solar, J.: Teaching Agents with Corrective Human Feedback for Challenging Problem. In: Proceedings of 2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI). Cartagena, pp. 1–6 (2016)
Vien, N.A., Ertel, W.: Learning via human feedback in continuous state and action spaces. In: Proceedings of AAAI 2012 Fall Symposium Series, Robots Learning Interactively from Human Teachers (RLIHT). Arlington, USA, pp. 65–72 (2012)
Hirkoawa, M., Suzuki, K.: Coaching robots: online behavior learning from human subjective feedback. In: Innovations in Intelligent Machines-3, vol. 442, pp. 37–51 (2013)
Chapter Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML) (2009)
CMUSphinx Documentation—CMUSphinx Open Source Speech Recognition. https://cmusphinx.github.io/wiki/(2018). Accessed 31 Dec 2018
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., Wölfel, J.: Sphinx-4: a flexible open source framework for speech recognition. Technical Report. Sun Microsystems. Mountain View, CA, USA (2004)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of FIeld roBOtics (FIBO), King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Nakarin Suppakun & Thavida Maneewarn

Authors

Nakarin Suppakun
View author publications
You can also search for this author inPubMed Google Scholar
Thavida Maneewarn
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Nakarin Suppakun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suppakun, N., Maneewarn, T. Coaching: accelerating reinforcement learning through human-assisted approach. Prog Artif Intell 9, 155–169 (2020). https://doi.org/10.1007/s13748-020-00204-4

Download citation

Received: 24 June 2019
Accepted: 07 February 2020
Published: 27 February 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s13748-020-00204-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coaching: accelerating reinforcement learning through human-assisted approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning

RL-Studio: A Tool for Reinforcement Learning Methods in Robotics

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now