Article

Apprenticeship learning via inverse reinforcement learning

Authors:

Pieter Abbeel,

Andrew Y. NgAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 1

https://doi.org/10.1145/1015330.1015430

Published: 04 July 2004 Publication History

Get Access

Abstract

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

References

[1]

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/irl/.

Digital Library

Google Scholar

[2]

Amit, R., & Mataric, M. (2002). Learning movement sequences from demonstration. Proc. ICDL.

Digital Library

Google Scholar

[3]

Atkeson, C., & Schaal, S. (1997). Robot learning from demonstration. Proc. ICML.

Digital Library

Google Scholar

[4]

Demiris, J., & Hayes, G. (1994). A robot controller using learning by imitation.

Google Scholar

[5]

Hogan, N. (1984). An organizing principle for a class of voluntary movements. J. of Neuroscience, 4, 2745--2754.

Crossref

Google Scholar

[6]

Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. T-RA, 10, 799--822.

Google Scholar

[7]

Manne, A. (1960). Linear programming and sequential decisions. Management Science, 6.

Google Scholar

[8]

Ng, A. Y., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. Proc. ICML.

Digital Library

Google Scholar

[9]

Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. Proc. ICML.

Digital Library

Google Scholar

[10]

Pomerleau, D. (1989). Alvinn: An autonomous land vehicle in a neural network. NIPS 1.

Digital Library

Google Scholar

[11]

Rockafellar, R. (1970). Convex analysis. Princeton University Press.

Google Scholar

[12]

Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. Proc. ICML.

Digital Library

Google Scholar

[13]

Uno, Y., Kawato, M., & Suzuki, R. (1989). Formation and control of optimal trajectory in human multijoint arm movement. minimum torque-change model. Biological Cybernetics, 61, 89--101.

Digital Library

Google Scholar

[14]

Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons.

Digital Library

Google Scholar

Cited By

View all

Ablett TLimoyo OSigal AJilani AKelly JSiddiqi KHogan FDudek G(2025)Multimodal and Force-Matched Imitation Learning With a See-Through Visuotactile SensorIEEE Transactions on Robotics10.1109/TRO.2024.352186441(946-959)Online publication date: 2025
https://doi.org/10.1109/TRO.2024.3521864
Perrusquía AGuo W(2025)Drone’s Objective Inference Using Policy Error Inverse Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333355136:1(1329-1340)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3333551
Perrusquía AGuo W(2025)Uncovering Reward Goals in Distributed Drone Swarms Using Physics-Informed Multiagent Inverse Reinforcement LearningIEEE Transactions on Cybernetics10.1109/TCYB.2024.348996755:1(14-23)Online publication date: Jan-2025
https://doi.org/10.1109/TCYB.2024.3489967
Show More Cited By

Apprenticeship learning via inverse reinforcement learning
1. Computing methodologies

Recommendations

Inverse Reinforcement Learning in Partially Observable Environments

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behavior of an expert. Most of the existing IRL algorithms assume that the environment is modeled as a Markov decision process (MDP), although it ...
Bayesian inverse reinforcement learning
IJCAI'07: Proceedings of the 20th international joint conference on Artifical intelligence

Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a ...
Apprenticeship learning techniques for knowledge based systems

Comments

Information & Contributors

Information

Published In

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,681
Total Citations
View Citations
12,324
Total Downloads

Downloads (Last 12 months)811
Downloads (Last 6 weeks)59

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ablett TLimoyo OSigal AJilani AKelly JSiddiqi KHogan FDudek G(2025)Multimodal and Force-Matched Imitation Learning With a See-Through Visuotactile SensorIEEE Transactions on Robotics10.1109/TRO.2024.352186441(946-959)Online publication date: 2025
https://doi.org/10.1109/TRO.2024.3521864
Perrusquía AGuo W(2025)Drone’s Objective Inference Using Policy Error Inverse Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333355136:1(1329-1340)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3333551
Perrusquía AGuo W(2025)Uncovering Reward Goals in Distributed Drone Swarms Using Physics-Informed Multiagent Inverse Reinforcement LearningIEEE Transactions on Cybernetics10.1109/TCYB.2024.348996755:1(14-23)Online publication date: Jan-2025
https://doi.org/10.1109/TCYB.2024.3489967
Li JLuo BXu XHuang T(2025)Offline reward shaping with scaling human preference feedback for deep reinforcement learningNeural Networks10.1016/j.neunet.2024.106848181:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.neunet.2024.106848
Ghorbani MHosseini RShariatpanahi SAhmadabadi M(2025)Learning from different perspectives for regret reduction in reinforcement learning: A free energy approachNeurocomputing10.1016/j.neucom.2024.128797614(128797)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128797
Khadivi MCharter TYaghoubi MJalayer MAhang MShojaeinasab ANajjaran H(2025)Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directionsComputers & Industrial Engineering10.1016/j.cie.2025.110856200(110856)Online publication date: Feb-2025
https://doi.org/10.1016/j.cie.2025.110856
Sparks JWright A(2025)Models of rational agency in human-centered AI: the realist and constructivist alternativesAI and Ethics10.1007/s43681-025-00658-zOnline publication date: 23-Jan-2025
https://doi.org/10.1007/s43681-025-00658-z
Guha EJames JAcharya KMuthukumar VPananjady AKiyavash NMooij J(2024)One shot inverse reinforcement learning for stochastic linear banditsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702746(1491-1512)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702746
Zhao LWang MBai YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Is inverse reinforcement learning harder than standard reinforcement learning? a theoretical perspectiveProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694592(60957-61020)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694592
Zeng YMu YShao LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Learning reward for robot skills using large language models via self-alignmentProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694478(58366-58386)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694478
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Inverse Reinforcement Learning in Partially Observable Environments

Bayesian inverse reinforcement learning

Apprenticeship learning techniques for knowledge based systems

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations