Abstract
A reinforcement learning agent learns an optimal policy under a certain environment with an evaluative reward function. In some applications, such an agent may lack sufficient adaptability to handle a variety of scenarios, similar to the one in the learning process. In other words, a learning agent is supposed to satisfy minor demands which are not part of the reward function. This paper proposes an interactive approach to accommodating human reinforcement and environmental rewards in a shaped reinforcement function, which can coach a robot despite goal modification or an inaccurate reward. The proposed approach coaches a robot, already equipped with a reinforcement learning mechanism, by human reinforcement feedback to conquer insufficiencies or shortsightedness in the environmental reward function. The proposed reinforcement learning algorithm links direct policy evaluation and human reinforcement to autonomous robots, accordingly shaping the reward function by combining both reinforcement signals. The technique of relative information entropy was applied to provide more effective learning for solving the conflict between human reinforcement and the robot’s core policy. In this work, the human coaching is conveyed by a bystander’s facial expressions. The transformation of facial expressions to a scalar index was processed by a type-2 fuzzy system. The simulated and experimental results show that a short-sighted robot could walk successfully through a swamp, and an under-powered car could reach the tip of a mountain with coaching from a bystander. The learning system worked quickly enough that the robot could continually adapt to an altered goal or environment.


















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. A Bradford Book, The MIT Press, Cambridge (2012)
Yanik, P.M., et al.: A gesture learning interface for simulated robot path shaping with a human teacher. IEEE Trans. Hum.-Mach. Syst 44(1), 41–54 (2014)
Hajimirsadeghi, H., Ahmadabadi, M.N., Araabi, B.N.: Conceptual imitation learning based on perceptual and functional characteristics of action. IEEE Trans. Auton. Mental Dev. 5(4), 311–325 (2013)
Young, J.E., Igarashi, T., Sharlin, E.: Puppet master: designing reactive character behavior by demonstration. In: Symposium on Computer Animation, Dublin, 2008
Grüneberg, P., Suzuki, K.: An approach to subjective computing-a robot that learns from interaction with humans. IEEE Trans. Auton. Mental Dev. 6(1), 5–18 (2014)
Shieh, M.Y., Chang, K.H.: An optimized neuro-fuzzy controller design for bipedal locomotion. Int. J. Fuzzy Syst. 11(3), 137–145 (2009)
Gadanho, S.: Reinforcement learning in autonomous robots: an empirical investigation of the role of emotions. Ph. D. dissertation, University of Edinburgh (1999)
Sequeira, P., Melo, F., Paiva, A.: Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents. In: Affective Computing and Intelligent Interaction. LNCS, vol. 6974, pp. 326–336. Springer, Heidelberg (2011
Marinier, R.P., Laird, J.E.: Emotion-driven reinforcement learning. In: The 30th Annual Meeting of the Cognitive Science Society (CogSci2008) (2008)
Hoffman, G., Breazeal, C.: Cost-based anticipatory action selection for human-robot fluency. IEEE Trans. Robot. 23(5), 952–961 (2007)
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: The 17th International Conference on Machine Learning, pp. 663–670. (2000)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In: Proceedings of AAMAS (2012)
Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the Fifth International Conference on Knowledge, Capture (2009)
Gruebler, A., Berenz, V., Suzuki, K.: Emotionally assisted human-robot interaction using a wearable device for reading facial expressions. Adv. Robot. 26(10), 1143–1159 (2012)
Konidaris, G., Barto, A.: Autonomous shaping: knowledge transfer in reinforcement learning. In: The 23rd International Conference on Machine Learning, Pittsburgh (2006)
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Li, I.-H., et al.: Autonomous ramp detection and climbing systems for tracked robot using kinect sensor. Int. J. Fuzzy Syst. 15(4), 452–459 (2013)
Open System Computer Version Library, “OpenCV,” itseez. http://opencv.org/ (2014). Accessed 29 Apr 2015
Mendel, J., John, R., Liu, F.: Interval type-2 fuzzy logic systems made simple. IEEE Trans. Fuzzy Syst. 14, 808–821 (2006)
Liu, X.: A survey of continuous karnik–mendel algorithms and their generalizations. In: Advances in Type-2 Fuzzy Sets and Systems, pp. 19–31. Springer, New York (2013)
Wu, D., Mendel, J.: Enhanced Karnik–Mendel algorithms. IEEE Trans. Fuzzy Syst. 17, 923–934 (2009)
Ayesh, A.: Emotionally motivated reinforcement learning based controller. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 1, pp. 874–878. The Hague (2004)
IRIS.: Human use facial expression to induct reinforcement learning robot to learn in maze environment, National Chung Cheng University, 14 Jan 2014. http://youtu.be/t6sckMGhH-I. Accessed 29 Apr 2015
IRIS.: Human use facial expression to induct reinforcement learning robot to learn in mountain car, National Chung Cheng University, 14 Jan 2014. http://youtu.be/jitfVRbbQkg. Accessed 29 Apr 2015
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hwang, KS., Lin, JL., Shi, H. et al. Policy Learning with Human Reinforcement. Int. J. Fuzzy Syst. 18, 618–629 (2016). https://doi.org/10.1007/s40815-016-0194-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-016-0194-9