Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Loftin, Robert; Peng, Bei; MacGlashan, James; Littman, Michael L.; Taylor, Matthew E.; Huang, Jeff; Roberts, David L.

doi:10.1007/s10458-015-9283-7

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Published: 13 February 2015

Volume 30, pages 30–59, (2016)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Robert Loftin¹,
Bei Peng²,
James MacGlashan³,
Michael L. Littman³,
Matthew E. Taylor²,
Jeff Huang³ &
…
David L. Roberts¹

2173 Accesses
42 Citations
1 Altmetric
Explore all metrics

Abstract

For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning and Adaptive Control

Interactive Reinforcement Learning for Autonomous Behavior Design

A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

Article 18 September 2021

Notes

Note that for the \(\mu \) parameters, \(+\) and \(-\) distinguish reward and punishment, and not explicit/implicit feedback as in the R\(+\)/P\(+\) notation taken from the behaviorism literature.
Though users were instructed to enable their computer speakers, we have no way of knowing whether the participant could actually hear the dog cry.
We exclude more data in the Mechanical Turk studies to remove participants who do the minimum amount of work to receive their compensation.

References

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning.
Argall, B., Browning, B., & Veloso, M. (2007). Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE International Conference on human-robot interaction (pp. 57–64).
Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.
Article Google Scholar
Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of the Fourteenth International Conference on machine learning.
Cakmak, M., & Lopes, M. (2012). Algorithmic and human teaching of sequential decision tasks. In Proceedings of the Twenty-sixth AAAI Conference on artificial intelligence (pp. 1536–1542).
Chernova, S., & Veloso, M. (2007). Confidence-based policy learning from demonstration using gaussian mixture models. In Proceedings of the 6th International Conference on autonomous agents and multiagent systems.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
MATH MathSciNet Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C., & Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. Advances in Neural Information Processing Systems, 26.
Heer, J., Good, N. S., Ramirez, A., Davis, M., & Mankoff, J. (2004). Presiding over accidents: system direction of human action. In Proceedings of the SIGCHI Conference on human factors in computing systems (pp. 463–470).
Hiby, E., Rooney, N., & Bradshaw, J. (2004). Dog training methods: Their use, effectiveness and interaction with behaviour and welfare. Animal Welfare, 13(1), 63–70.
Google Scholar
Isbell, C., Shelton, C., Kearns, M., Singh, S., & Stone, P. (2001). A social reinforcement learning agent. In Proceedings of the Fifth International Conference on Autonomous Agents (pp. 377–384).
Judah, K., Fern, A., Tadepalli, P., & Goetschalckx, R. (2014). Imitation learning with demonstrations and shaping rewards. In Proceedings of the twenty-eighth AAAI Conference on Artificial Intelligence (pp. 1890–1896).
Judah, K., Roy, S., Fern, A., & Dietterich, T. G. (2010). Reinforcement learning via practice and critique advice. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (pp. 481–486).
Khan, F., Zhu, X., & Mutlu, B. (2011). How do humans teach: On curriculum learning and teaching dimension. In Proceedings of the Twenty-fifth Annual Conference on Neural Information Processing Systems (pp. 1449–1457).
Knox, W., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. Social Robotics, Lecture Notes in Computer Science, 8239, 460–470.
Article Google Scholar
Knox, W. B., Glass, B. D., Love, B. C., Maddox, W. T., & Stone, P. (2012). How humans teach agents—a new experimental perspective. International Journal of Social Robotics, 4(4), 409–421.
Article Google Scholar
Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the Fifth International Conference on Knowledge Capture (pp. 9–16).
Knox, W. B., Taylor, M. E., & Stone, P. (2011). Understanding human teaching modalities in reinforcement learning environments: A preliminary report. In Proceedings of the Workshop on Agents Learning Interactively from Human Teachers (at IJCAI-11).
Li, G., Hung, H., Whiteson, S., & Knox, W. B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (pp. 909–916).
Lu, T., Pal, D., Pal, M. (2010). Contextual multi-armed bandits. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.
Ng, A. Y., & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 663–670).
Ramachandran, D. (2007). Bayesian inverse reinforcement learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century.
Google Scholar
Skinner, B. F. (1953). Science and human behavior. New York: Macmillan.
Google Scholar
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Google Scholar
Tesauro, G. (1990). Neurogammon: A neural-network backgammon program. In Proceedings of the International Joint Conference on Neural Networks (pp. 33–39). IEEE.
Thomaz, A. L., Breazeal, C. (2006). Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In Proceedings of the Twenty-first AAAI Conference on Artificial Intelligence (pp. 1000–1005).
Ziebart, B. D., Maas, A., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-third AAAI conference on artificial intelligence.

Download references

Acknowledgments

This work was supported in part by Grants IIS-1149917 and IIS-1319412 from the National Science Foundation.

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, 890 Oval Drive, Campus Box 8206, Raleigh, NC, 27695-8206, USA
Robert Loftin & David L. Roberts
School of Electrical Engineering and Computer Science, Washington State University, P. O. Box 642752, Pullman, WA, 99164-2752, USA
Bei Peng & Matthew E. Taylor
Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, USA
James MacGlashan, Michael L. Littman & Jeff Huang

Authors

Robert Loftin
View author publications
You can also search for this author in PubMed Google Scholar
Bei Peng
View author publications
You can also search for this author in PubMed Google Scholar
James MacGlashan
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Littman
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Huang
View author publications
You can also search for this author in PubMed Google Scholar
David L. Roberts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Loftin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loftin, R., Peng, B., MacGlashan, J. et al. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton Agent Multi-Agent Syst 30, 30–59 (2016). https://doi.org/10.1007/s10458-015-9283-7

Download citation

Published: 13 February 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s10458-015-9283-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning and Adaptive Control

Interactive Reinforcement Learning for Autonomous Behavior Design

A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning and Adaptive Control

Interactive Reinforcement Learning for Autonomous Behavior Design

A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation