Skip to main content

Inverse Reinforcement Learning

  • Reference work entry
Encyclopedia of Machine Learning

Synonyms

Intent recognition; Inverse optimal control; Plan recognition

Definition

Inverse reinforcement learning (inverse RL) considers the problem of extracting a reward function from observed (nearly) optimal behavior of an expert acting in an environment.

Motivation and Background

The motivation for inverse RL is two fold:

  1. 1.

    For many RL applications, it is difficult to write down an explicit reward function specifying how different desiderata should be traded off exactly. In fact, engineers often spend significant effort tweaking the reward function such that the optimal policy corresponds to performing the task they have in mind. For example, consider the task of driving a car well. Various desiderata have to be traded off, such as speed, following distance, lane preference, frequency of lane changes, distance from the curb, and so on. Specifying the reward function for the task of driving requires explicitly writing down the trade-off between these features.

    Inverse RL algorithms...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of ICML, Banff, Alberta, Canada.

    Google Scholar 

  • Doya, K., & Sejnowski, T. (1995). A novel reinforcement model of birdsong vocalization learning. In Neural Information Processing Systems 7. Cambridge, MA: MIT Press.

    Google Scholar 

  • Montague, P. R., Dayan, P., Person, C., & Sejnowski, T. J. (1995). Bee foraging in uncertain environments using predictive hebbian learning. Nature, 377(6551), 725–728.

    Google Scholar 

  • Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In NIPS 1. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Ratliff, N., Bagnell, J., & Zinkevich, M. (2006). Maximum margin planning. In Proceedings of ICML, Pittsburgh, Pennsylvania.

    Google Scholar 

  • Ratliff, N., Bradley, D., Bagnell, J., & Chestnutt, J. (2007). Boosting structured prediction for imitation learning. In Neural Information Processing Systems 19. Cambridge, MA: MIT Press.

    Google Scholar 

  • Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. In Proceedings of ICML. Aberdeen, Scotland, UK.

    Google Scholar 

  • Schmajuk, N. A., & Zanutto, B. S. (1997). Escape, avoidance, and imitation. Adaptive Behavior, 6, 63–129.

    Google Scholar 

  • Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada.

    Google Scholar 

  • Touretzky, D. S., & Saksida, L. M. (1997). Operant conditioning in skinnerbots. Adaptive Behavior, 5, 219–247.

    Google Scholar 

  • Watkins, C. J. (1989). Models of delayed reinforcement learning. PhD thesis, Psychology Department, Cambridge University.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Abbeel, P., Ng, A.Y. (2011). Inverse Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_417

Download citation

Publish with us

Policies and ethics