Skip to main content

Advertisement

Log in

Policy Learning with Human Reinforcement

  • Published:
International Journal of Fuzzy Systems Aims and scope Submit manuscript

Abstract

A reinforcement learning agent learns an optimal policy under a certain environment with an evaluative reward function. In some applications, such an agent may lack sufficient adaptability to handle a variety of scenarios, similar to the one in the learning process. In other words, a learning agent is supposed to satisfy minor demands which are not part of the reward function. This paper proposes an interactive approach to accommodating human reinforcement and environmental rewards in a shaped reinforcement function, which can coach a robot despite goal modification or an inaccurate reward. The proposed approach coaches a robot, already equipped with a reinforcement learning mechanism, by human reinforcement feedback to conquer insufficiencies or shortsightedness in the environmental reward function. The proposed reinforcement learning algorithm links direct policy evaluation and human reinforcement to autonomous robots, accordingly shaping the reward function by combining both reinforcement signals. The technique of relative information entropy was applied to provide more effective learning for solving the conflict between human reinforcement and the robot’s core policy. In this work, the human coaching is conveyed by a bystander’s facial expressions. The transformation of facial expressions to a scalar index was processed by a type-2 fuzzy system. The simulated and experimental results show that a short-sighted robot could walk successfully through a swamp, and an under-powered car could reach the tip of a mountain with coaching from a bystander. The learning system worked quickly enough that the robot could continually adapt to an altered goal or environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. A Bradford Book, The MIT Press, Cambridge (2012)

    Google Scholar 

  2. Yanik, P.M., et al.: A gesture learning interface for simulated robot path shaping with a human teacher. IEEE Trans. Hum.-Mach. Syst 44(1), 41–54 (2014)

    Article  Google Scholar 

  3. Hajimirsadeghi, H., Ahmadabadi, M.N., Araabi, B.N.: Conceptual imitation learning based on perceptual and functional characteristics of action. IEEE Trans. Auton. Mental Dev. 5(4), 311–325 (2013)

    Article  Google Scholar 

  4. Young, J.E., Igarashi, T., Sharlin, E.: Puppet master: designing reactive character behavior by demonstration. In: Symposium on Computer Animation, Dublin, 2008

  5. Grüneberg, P., Suzuki, K.: An approach to subjective computing-a robot that learns from interaction with humans. IEEE Trans. Auton. Mental Dev. 6(1), 5–18 (2014)

    Article  Google Scholar 

  6. Shieh, M.Y., Chang, K.H.: An optimized neuro-fuzzy controller design for bipedal locomotion. Int. J. Fuzzy Syst. 11(3), 137–145 (2009)

    Google Scholar 

  7. Gadanho, S.: Reinforcement learning in autonomous robots: an empirical investigation of the role of emotions. Ph. D. dissertation, University of Edinburgh (1999)

  8. Sequeira, P., Melo, F., Paiva, A.: Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents. In: Affective Computing and Intelligent Interaction. LNCS, vol. 6974, pp. 326–336. Springer, Heidelberg (2011

  9. Marinier, R.P., Laird, J.E.: Emotion-driven reinforcement learning. In: The 30th Annual Meeting of the Cognitive Science Society (CogSci2008) (2008)

  10. Hoffman, G., Breazeal, C.: Cost-based anticipatory action selection for human-robot fluency. IEEE Trans. Robot. 23(5), 952–961 (2007)

    Article  Google Scholar 

  11. Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: The 17th International Conference on Machine Learning, pp. 663–670. (2000)

  12. Knox, W.B., Stone, P.: Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In: Proceedings of AAMAS (2012)

  13. Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the Fifth International Conference on Knowledge, Capture (2009)

  14. Gruebler, A., Berenz, V., Suzuki, K.: Emotionally assisted human-robot interaction using a wearable device for reading facial expressions. Adv. Robot. 26(10), 1143–1159 (2012)

    Article  Google Scholar 

  15. Konidaris, G., Barto, A.: Autonomous shaping: knowledge transfer in reinforcement learning. In: The 23rd International Conference on Machine Learning, Pittsburgh (2006)

  16. Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  17. Li, I.-H., et al.: Autonomous ramp detection and climbing systems for tracked robot using kinect sensor. Int. J. Fuzzy Syst. 15(4), 452–459 (2013)

    Google Scholar 

  18. Open System Computer Version Library, “OpenCV,” itseez. http://opencv.org/ (2014). Accessed 29 Apr 2015

  19. Mendel, J., John, R., Liu, F.: Interval type-2 fuzzy logic systems made simple. IEEE Trans. Fuzzy Syst. 14, 808–821 (2006)

    Article  Google Scholar 

  20. Liu, X.: A survey of continuous karnik–mendel algorithms and their generalizations. In: Advances in Type-2 Fuzzy Sets and Systems, pp. 19–31. Springer, New York (2013)

  21. Wu, D., Mendel, J.: Enhanced Karnik–Mendel algorithms. IEEE Trans. Fuzzy Syst. 17, 923–934 (2009)

    Article  Google Scholar 

  22. Ayesh, A.: Emotionally motivated reinforcement learning based controller. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 1, pp. 874–878. The Hague (2004)

  23. IRIS.: Human use facial expression to induct reinforcement learning robot to learn in maze environment, National Chung Cheng University, 14 Jan 2014. http://youtu.be/t6sckMGhH-I. Accessed 29 Apr 2015

  24. IRIS.: Human use facial expression to induct reinforcement learning robot to learn in mountain car, National Chung Cheng University, 14 Jan 2014. http://youtu.be/jitfVRbbQkg. Accessed 29 Apr 2015

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin-Ling Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, KS., Lin, JL., Shi, H. et al. Policy Learning with Human Reinforcement. Int. J. Fuzzy Syst. 18, 618–629 (2016). https://doi.org/10.1007/s40815-016-0194-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40815-016-0194-9

Keywords

Navigation