Policy Learning with Human Reinforcement

Hwang, Kao-Shing; Lin, Jin-Ling; Shi, Haobin; Chen, Yu-Ying

doi:10.1007/s40815-016-0194-9

Policy Learning with Human Reinforcement

Published: 03 May 2016

Volume 18, pages 618–629, (2016)
Cite this article

International Journal of Fuzzy Systems Aims and scope Submit manuscript

Kao-Shing Hwang^1,2,
Jin-Ling Lin³,
Haobin Shi⁴ &
…
Yu-Ying Chen⁵

434 Accesses
Explore all metrics

Abstract

A reinforcement learning agent learns an optimal policy under a certain environment with an evaluative reward function. In some applications, such an agent may lack sufficient adaptability to handle a variety of scenarios, similar to the one in the learning process. In other words, a learning agent is supposed to satisfy minor demands which are not part of the reward function. This paper proposes an interactive approach to accommodating human reinforcement and environmental rewards in a shaped reinforcement function, which can coach a robot despite goal modification or an inaccurate reward. The proposed approach coaches a robot, already equipped with a reinforcement learning mechanism, by human reinforcement feedback to conquer insufficiencies or shortsightedness in the environmental reward function. The proposed reinforcement learning algorithm links direct policy evaluation and human reinforcement to autonomous robots, accordingly shaping the reward function by combining both reinforcement signals. The technique of relative information entropy was applied to provide more effective learning for solving the conflict between human reinforcement and the robot’s core policy. In this work, the human coaching is conveyed by a bystander’s facial expressions. The transformation of facial expressions to a scalar index was processed by a type-2 fuzzy system. The simulated and experimental results show that a short-sighted robot could walk successfully through a swamp, and an under-powered car could reach the tip of a mountain with coaching from a bystander. The learning system worked quickly enough that the robot could continually adapt to an altered goal or environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. A Bradford Book, The MIT Press, Cambridge (2012)
Google Scholar
Yanik, P.M., et al.: A gesture learning interface for simulated robot path shaping with a human teacher. IEEE Trans. Hum.-Mach. Syst 44(1), 41–54 (2014)
Article Google Scholar
Hajimirsadeghi, H., Ahmadabadi, M.N., Araabi, B.N.: Conceptual imitation learning based on perceptual and functional characteristics of action. IEEE Trans. Auton. Mental Dev. 5(4), 311–325 (2013)
Article Google Scholar
Young, J.E., Igarashi, T., Sharlin, E.: Puppet master: designing reactive character behavior by demonstration. In: Symposium on Computer Animation, Dublin, 2008
Grüneberg, P., Suzuki, K.: An approach to subjective computing-a robot that learns from interaction with humans. IEEE Trans. Auton. Mental Dev. 6(1), 5–18 (2014)
Article Google Scholar
Shieh, M.Y., Chang, K.H.: An optimized neuro-fuzzy controller design for bipedal locomotion. Int. J. Fuzzy Syst. 11(3), 137–145 (2009)
Google Scholar
Gadanho, S.: Reinforcement learning in autonomous robots: an empirical investigation of the role of emotions. Ph. D. dissertation, University of Edinburgh (1999)
Sequeira, P., Melo, F., Paiva, A.: Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents. In: Affective Computing and Intelligent Interaction. LNCS, vol. 6974, pp. 326–336. Springer, Heidelberg (2011
Marinier, R.P., Laird, J.E.: Emotion-driven reinforcement learning. In: The 30th Annual Meeting of the Cognitive Science Society (CogSci2008) (2008)
Hoffman, G., Breazeal, C.: Cost-based anticipatory action selection for human-robot fluency. IEEE Trans. Robot. 23(5), 952–961 (2007)
Article Google Scholar
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: The 17th International Conference on Machine Learning, pp. 663–670. (2000)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In: Proceedings of AAMAS (2012)
Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the Fifth International Conference on Knowledge, Capture (2009)
Gruebler, A., Berenz, V., Suzuki, K.: Emotionally assisted human-robot interaction using a wearable device for reading facial expressions. Adv. Robot. 26(10), 1143–1159 (2012)
Article Google Scholar
Konidaris, G., Barto, A.: Autonomous shaping: knowledge transfer in reinforcement learning. In: The 23rd International Conference on Machine Learning, Pittsburgh (2006)
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar
Li, I.-H., et al.: Autonomous ramp detection and climbing systems for tracked robot using kinect sensor. Int. J. Fuzzy Syst. 15(4), 452–459 (2013)
Google Scholar
Open System Computer Version Library, “OpenCV,” itseez. http://opencv.org/ (2014). Accessed 29 Apr 2015
Mendel, J., John, R., Liu, F.: Interval type-2 fuzzy logic systems made simple. IEEE Trans. Fuzzy Syst. 14, 808–821 (2006)
Article Google Scholar
Liu, X.: A survey of continuous karnik–mendel algorithms and their generalizations. In: Advances in Type-2 Fuzzy Sets and Systems, pp. 19–31. Springer, New York (2013)
Wu, D., Mendel, J.: Enhanced Karnik–Mendel algorithms. IEEE Trans. Fuzzy Syst. 17, 923–934 (2009)
Article Google Scholar
Ayesh, A.: Emotionally motivated reinforcement learning based controller. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 1, pp. 874–878. The Hague (2004)
IRIS.: Human use facial expression to induct reinforcement learning robot to learn in maze environment, National Chung Cheng University, 14 Jan 2014. http://youtu.be/t6sckMGhH-I. Accessed 29 Apr 2015
IRIS.: Human use facial expression to induct reinforcement learning robot to learn in mountain car, National Chung Cheng University, 14 Jan 2014. http://youtu.be/jitfVRbbQkg. Accessed 29 Apr 2015

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Kao-Shing Hwang
Department of Healthcare Administration and Medical Informatic, Kaohsiung Medical University, Kaohsiung, Taiwan
Kao-Shing Hwang
Department of Information Management, Shih Hsin University, Taipei, 11678, Taiwan
Jin-Ling Lin
School of Computer Science, Northwestern Polytechnical University, Xi’an, China
Haobin Shi
Department of Electrical Engineering, National Chung Cheng University, Chiayi, Taiwan
Yu-Ying Chen

Authors

Kao-Shing Hwang
View author publications
You can also search for this author inPubMed Google Scholar
Jin-Ling Lin
View author publications
You can also search for this author inPubMed Google Scholar
Haobin Shi
View author publications
You can also search for this author inPubMed Google Scholar
Yu-Ying Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jin-Ling Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, KS., Lin, JL., Shi, H. et al. Policy Learning with Human Reinforcement. Int. J. Fuzzy Syst. 18, 618–629 (2016). https://doi.org/10.1007/s40815-016-0194-9

Download citation

Received: 13 October 2015
Revised: 03 April 2016
Accepted: 11 April 2016
Published: 03 May 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s40815-016-0194-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Policy Learning with Human Reinforcement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning

Learning Proxemics for Personalized Human–Robot Social Interaction

Self-generation of reward by logarithmic transformation of multiple sensor evaluations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Policy Learning with Human Reinforcement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning

Learning Proxemics for Personalized Human–Robot Social Interaction

Self-generation of reward by logarithmic transformation of multiple sensor evaluations

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now