Learning via human feedback in continuous state and action spaces

Vien, Ngo Anh; Ertel, Wolfgang; Chung, Tae Choong

doi:10.1007/s10489-012-0412-6

Learning via human feedback in continuous state and action spaces

Published: 02 February 2013

Volume 39, pages 267–278, (2013)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ngo Anh Vien¹,
Wolfgang Ertel¹ &
Tae Choong Chung²

765 Accesses
17 Citations
Explore all metrics

Abstract

This paper considers the problem of extending Training an Agent Manually via Evaluative Reinforcement (TAMER) in continuous state and action spaces. Investigative research using the TAMER framework enables a non-technical human to train an agent through a natural form of human feedback (negative or positive). The advantages of TAMER have been shown on tasks of training agents by only human feedback or combining human feedback with environment rewards. However, these methods are originally designed for discrete state-action, or continuous state-discrete action problems. This paper proposes an extension of TAMER to allow both continuous states and actions, called ACTAMER. The new framework utilizes any general function approximation of a human trainer’s feedback signal. Moreover, a combined capability of ACTAMER and reinforcement learning is also investigated and evaluated. The combination of human feedback and reinforcement learning is studied in both settings: sequential and simultaneous. Our experimental results demonstrate the proposed method successfully allowing a human to train an agent in two continuous state-action domains: Mountain Car and Cart-pole (balancing).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Notes

With respect to the actor’s parameters.
See http://library.rl-community.org/.

References

Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning (ICML), pp 1–8
Chapter Google Scholar
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13(5):834–846
Article Google Scholar
Baxter J, Tridgell A, Weaver L (2000) Learning to play chess using temporal differences. Mach Learn 40(3):243–263
Article MATH Google Scholar
Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M (2009) Natural actor-critic algorithms. Automatica 45(11):2471–2482
Article MathSciNet MATH Google Scholar
Detry R, Baseski E, Popovic M, Touati Y, Krüger N, Kroemer O, Peters J, Piater JH (2010) Learning continuous grasp affordances by sensorimotor exploration. In: From motor learning to interaction learning in robots, pp 451–465
Chapter Google Scholar
Granmo OC, Glimsdal S (2012) Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore game. Appl Intell
Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20(1):71–87
Article Google Scholar
Iglesias A, Martínez P, Aler R, Fernández F (2009) Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Appl Intell 31(1):89–106
Article Google Scholar
Judah K, Roy S, Fern A, Dietterich TG (2010) Reinforcement learning via practice and critique advice. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, pp 481–486
Google Scholar
Knox WB, Glass BD, Love BC, Maddox WT, Stone P (2012) How humans teach agents: a new experimental perspective. Int J Soc Robot 4(4):409–421
Article Google Scholar
Knox WB, Setapen A, Stone P (2011) Reinforcement learning with human feedback in Mountain Car. In: AAAI 2011 spring symposium, pp 36–41
Google Scholar
Knox WB, Stone P (2008) TAMER: training of an agent manually via evaluative reinforcement. In: IEEE 7th international conference on development and learning (ICDL-08), pp 292–297
Google Scholar
Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the 5th international conference on knowledge capture (K-CAP), pp 9–16
Chapter Google Scholar
Knox WB, Stone P (2010) Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: 9th international conference on autonomous agents and multiagent systems (AAMAS), pp 5–12
Google Scholar
Knox WB, Stone P (2010) Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework. In: 9th international conference on autonomous agents and multiagent systems (AAMAS), pp 1767–1768
Google Scholar
Knox WB, Stone P (2011) Augmenting reinforcement learning with human feedback. In: 2011 ICML workshop on new developments in imitation learning
Google Scholar
Knox WB, Stone P (2012) Reinforcement learning from simultaneous human and MDP reward. In: 11st international conference on autonomous agents and multiagent systems (AAMAS), pp 475–482
Google Scholar
Kober J, Mohler BJ, Peters J (2010) Imitation and reinforcement learning for motor primitives with perceptual coupling. In: From motor learning to interaction learning in robots, pp 209–225
Chapter Google Scholar
Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1–2):171–203
Article MATH Google Scholar
Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM J Control Optim 42(4):1143–1166
Article MathSciNet MATH Google Scholar
Kroemer O, Detry R, Piater JH, Peters J (2010) Combining active learning and reactive control for robot grasping. Robot Auton Syst 58(9):1105–1116
Article Google Scholar
Li J, Li Z, Chen J (2011) Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm³ omni-directional mobile microrobot. Appl Intell 34(2):211–225
Article Google Scholar
Pakizeh E, Palhang M, Pedram MM (2012) Multi-criteria expertness based cooperative Q-learning. Appl Intell
Phillips-Wren GE, Mørch AI, Tweedale J, Ichalkaranje N (2007) Innovations in agent collaboration, cooperation and teaming, part 2. J Netw Comput Appl 30(3):1085–1088
Article Google Scholar
Pilarski PM, Dawson MR, Degris T, Fahimi F, Carey JP, Sutton RS (2011) Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning. In: IEEE international conference on rehabilitation robotics, pp 1–7
Google Scholar
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Article MathSciNet Google Scholar
Santamaria JC, Sutton RS, Ram A (1998) Experiments with reinforcement learning in problems with continuous state and action spaces. Adapt Behav 6(2):163–218
Article Google Scholar
Sherstov AA, Stone P (2005) Function approximation via tile coding: automating parameter choice. In: Abstraction, reformulation and approximation, 6th international symposium (SARA), pp 194–205
Chapter Google Scholar
Singh SP, Bertsekas D (1996) Reinforcement learning for dynamic channel allocation in cellular telephone systems. In: Advances in neural information processing systems (NIPS), pp 974–980
Google Scholar
Singh SP, Jaakkola T, Jordan MI (1994) Learning without state-estimation in partially observable Markovian decision processes. In: Machine learning, Proceedings of the eleventh international conference (ICML), pp 284–292
Google Scholar
Subramanian K, Isbell C, Thomaz A (2011) Learning options through human interaction. In: Workshop on agents learning interactively from human teachers at IJCAI
Google Scholar
Sutton RS (1995) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in neural information processing systems (NIPS), vol 8, pp 1038–1044
Google Scholar
Sutton RS, Barto AG (1990) Technical note q-learning. In: Learning and computational neuroscience: foundations of adaptive networks, pp 497–537
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Sutton RS, McAllester DA, Singh SP, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, vol 12. NIPS conference, Denver, Colorado, USA, pp 1057–1063
Google Scholar
Taylor ME, Chernova S (2010) Integrating human demonstration and reinforcement learning: initial results in human-agent transfer. In: Proceedings of the agents learning interactively from human teachers workshop (at AAMAS-10)
Google Scholar
Tesauro G (1992) Practical issues in temporal difference learning. Mach Learn 8:257–277
MATH Google Scholar
Tesauro G (1994) Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Article Google Scholar
Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68
Article Google Scholar
Thomaz AL, Breazeal C (2006) Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference
Google Scholar
Vien NA, Viet NH, Lee S, Chung T (2009) Policy gradient SMDP for resource allocation and routing in integrated services networks. IEICE Trans 92-B(6):2008–2022
Google Scholar
Vien NA, Yu H, Chung T (2011) Hessian matrix distribution for Bayesian policy gradient reinforcement learning. Inf Sci 181(9):1671–1685
Article MathSciNet MATH Google Scholar
Witten IH (1977) An adaptive optimal controller for discrete-time Markov environments. Inf Control 34(4):286–295
Article MathSciNet MATH Google Scholar
Wooldridge M (1997) Agent-based software engineering. In: IEE proceedings on software engineering, pp 26–37
Google Scholar
Zhang W, Dietterich TG (1995) A reinforcement learning approach to job-shop scheduling. In: International joint conferences on artificial intelligence, pp 1114–1120
Google Scholar

Download references

Acknowledgements

This work was supported by the Collaborative Center of Applied Research on Service Robotics (ZAFH Servicerobotik, http://www.zafh-servicerobotik.de) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2010-0012609).

Author information

Authors and Affiliations

Institute of Artificial Intelligence, Ravensburg-Weingarten University of Applied Sciences, Weingarten, 88250, Germany
Ngo Anh Vien & Wolfgang Ertel
Department of Computer Engineering, Kyung Hee University, Seoul, South Korea
Tae Choong Chung

Authors

Ngo Anh Vien
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Ertel
View author publications
You can also search for this author in PubMed Google Scholar
Tae Choong Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ngo Anh Vien.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vien, N.A., Ertel, W. & Chung, T.C. Learning via human feedback in continuous state and action spaces. Appl Intell 39, 267–278 (2013). https://doi.org/10.1007/s10489-012-0412-6

Download citation

Published: 02 February 2013
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10489-012-0412-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning via human feedback in continuous state and action spaces

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning via human feedback in continuous state and action spaces

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation