skip to main content
10.1145/1015330.1015430acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Apprenticeship learning via inverse reinforcement learning

Published: 04 July 2004 Publication History

Abstract

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

References

[1]
Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/irl/.
[2]
Amit, R., & Mataric, M. (2002). Learning movement sequences from demonstration. Proc. ICDL.
[3]
Atkeson, C., & Schaal, S. (1997). Robot learning from demonstration. Proc. ICML.
[4]
Demiris, J., & Hayes, G. (1994). A robot controller using learning by imitation.
[5]
Hogan, N. (1984). An organizing principle for a class of voluntary movements. J. of Neuroscience, 4, 2745--2754.
[6]
Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. T-RA, 10, 799--822.
[7]
Manne, A. (1960). Linear programming and sequential decisions. Management Science, 6.
[8]
Ng, A. Y., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. Proc. ICML.
[9]
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. Proc. ICML.
[10]
Pomerleau, D. (1989). Alvinn: An autonomous land vehicle in a neural network. NIPS 1.
[11]
Rockafellar, R. (1970). Convex analysis. Princeton University Press.
[12]
Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. Proc. ICML.
[13]
Uno, Y., Kawato, M., & Suzuki, R. (1989). Formation and control of optimal trajectory in human multijoint arm movement. minimum torque-change model. Biological Cybernetics, 61, 89--101.
[14]
Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons.

Cited By

View all
  • (2025)Multimodal and Force-Matched Imitation Learning With a See-Through Visuotactile SensorIEEE Transactions on Robotics10.1109/TRO.2024.352186441(946-959)Online publication date: 2025
  • (2025)Drone’s Objective Inference Using Policy Error Inverse Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333355136:1(1329-1340)Online publication date: Jan-2025
  • (2025)Uncovering Reward Goals in Distributed Drone Swarms Using Physics-Informed Multiagent Inverse Reinforcement LearningIEEE Transactions on Cybernetics10.1109/TCYB.2024.348996755:1(14-23)Online publication date: Jan-2025
  • Show More Cited By
  1. Apprenticeship learning via inverse reinforcement learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '04: Proceedings of the twenty-first international conference on Machine learning
    July 2004
    934 pages
    ISBN:1581138385
    DOI:10.1145/1015330
    • Conference Chair:
    • Carla Brodley

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)810
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Multimodal and Force-Matched Imitation Learning With a See-Through Visuotactile SensorIEEE Transactions on Robotics10.1109/TRO.2024.352186441(946-959)Online publication date: 2025
    • (2025)Drone’s Objective Inference Using Policy Error Inverse Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333355136:1(1329-1340)Online publication date: Jan-2025
    • (2025)Uncovering Reward Goals in Distributed Drone Swarms Using Physics-Informed Multiagent Inverse Reinforcement LearningIEEE Transactions on Cybernetics10.1109/TCYB.2024.348996755:1(14-23)Online publication date: Jan-2025
    • (2025)Offline reward shaping with scaling human preference feedback for deep reinforcement learningNeural Networks10.1016/j.neunet.2024.106848181:COnline publication date: 1-Jan-2025
    • (2025)Learning from different perspectives for regret reduction in reinforcement learning: A free energy approachNeurocomputing10.1016/j.neucom.2024.128797614(128797)Online publication date: Jan-2025
    • (2025)Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directionsComputers & Industrial Engineering10.1016/j.cie.2025.110856200(110856)Online publication date: Feb-2025
    • (2025)Models of rational agency in human-centered AI: the realist and constructivist alternativesAI and Ethics10.1007/s43681-025-00658-zOnline publication date: 23-Jan-2025
    • (2024)One shot inverse reinforcement learning for stochastic linear banditsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702746(1491-1512)Online publication date: 15-Jul-2024
    • (2024)Is inverse reinforcement learning harder than standard reinforcement learning? a theoretical perspectiveProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694592(60957-61020)Online publication date: 21-Jul-2024
    • (2024)Learning reward for robot skills using large language models via self-alignmentProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694478(58366-58386)Online publication date: 21-Jul-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media