Article

Maximum margin planning

Authors:

Nathan D. Ratliff,

J. Andrew Bagnell,

Martin A. ZinkevichAuthors Info & Claims

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 729 - 736

https://doi.org/10.1145/1143844.1143936

Published: 25 June 2006 Publication History

Abstract

Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.

References

[1]

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. ICML '04: Proceedings of the twenty-first international conference on Machine learning.]]

Digital Library

[2]

Duame, H., Langford, J., & Marcu, D. (2006). Search-based structured prediction. In Preparation.]]

[3]

Hazan, E., Kalai, A., Kale, S., & Agarwal, A. (2006). Logarithmic regret algorithms for online convex optimization. To appear in COLT 2006.]]

Digital Library

[4]

Hebert, M., Stentz, A. T., & Thorpe, C. (1998). Mobility planning for autonomous navigation of multiple robots in unstructured environments. Proceedings of ISIC/CIRA/ISAS Joint Conference.]]

[5]

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE (pp. 2278--2324).]]

[6]

LeCun, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems 18.]]

[7]

Nedic, A., & Bertsekas, D. (2000). Convergence rate of incremental subgradient algorithms. Stochastic Optimization: Algorithms and Applications.]]

[8]

Pomerleau, D. (1989). Alvinn: An autonomous land vehicle in a neural network. Advances in Neural Information Processing Systems 1.]]

Digital Library

[9]

Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley.]]

[10]

Rifkin, Y., & Poggio (2003). Regularized least squares classification. Advances in Learning Theory: Methods, Models and Applications. IOS Press.]]

[11]

Shor, N. Z. (1985). Minimization methods for non-differentiable functions. Springer-Verlag.]]

[12]

Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. Twenty Second International Conference on Machine Learning (ICML05).]]

Digital Library

[13]

Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin markov networks. Advances in Neural Information Processing Systems (NIPS-14).]]

[14]

Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. In Advances in neural information processing systems 18.]]

[15]

Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 1453--1484.]]

Digital Library

[16]

Vandapel, N., Huber, D., Kapuria, A., & Hebert, M. (2004). Natural terrain classification using 3-d ladar data. IEEE International Conference on Robotics and Automation.]]

[17]

Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proceedings of ICML.]]

Digital Library

[18]

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the Twentieth International Conference on Machine Learning.]]

Cited By

Bajgar OAbate AGatsis KOsborne MKiyavash NMooij J(2024)Walking the values in Bayesian inverse reinforcement learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702688(273-287)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702688
Zhao LWang MBai YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Is inverse reinforcement learning harder than standard reinforcement learning? a theoretical perspectiveProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694592(60957-61020)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694592
Yue SHua XRen JLin SZhang JZhang YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)OLLIEProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694462(57966-58018)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694462
Show More Cited By

Index Terms

Maximum margin planning
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
    2. Search methodologies
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Dynamic programming

Recommendations

Continual planning in Golog
AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

To solve ever more complex and longer tasks, mobile robots need to generate more elaborate plans and must handle dynamic environments and incomplete knowledge. We address this challenge by integrating two seemingly different approaches – PDDL-based ...
"Planning" or e-Planning?: Implications for Theory, Education and Practice

Planning theory is hardly relevant for E-Planning, because generic "planning" does not exist for practical purposes, except as distinct planning practices. E-Planning is such a practice, with implications for E-Planning theory, education and practice. ...
Fast planning through planning graph analysis
IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

We introduce a new approach to planning in STRIPS-like domains based on constructing and analyzing a compact structure we call a Planning Graph. We describe a new planner, Graphplan, that uses this paradigm. Graphplan always returns a shortest-possible ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '06: Proceedings of the 23rd international conference on Machine learning

June 2006

1154 pages

ISBN:1595933832

DOI:10.1145/1143844

Program Chairs:
William Cohen,
Andrew Moore

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

380
Total Citations
View Citations
1,988
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)9

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bajgar OAbate AGatsis KOsborne MKiyavash NMooij J(2024)Walking the values in Bayesian inverse reinforcement learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702688(273-287)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702688
Zhao LWang MBai YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Is inverse reinforcement learning harder than standard reinforcement learning? a theoretical perspectiveProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694592(60957-61020)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694592
Yue SHua XRen JLin SZhang JZhang YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)OLLIEProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694462(57966-58018)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694462
Lazzati FMutti MMetelli ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Offline inverse RLProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693114(26085-26151)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693114
Kim KZhang JOzdaglar AParrilo PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A unified linear programming framework for offline reward learning from human demonstrations and feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693059(24694-24712)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693059
Wang YHayashibe MOwaki D(2024)Data-Driven Policy Learning Methods from Biological Behavior: A Systematic ReviewApplied Sciences10.3390/app1410403814:10(4038)Online publication date: 9-May-2024
https://doi.org/10.3390/app14104038
Zhang HZheng Z(2024)Imitating with Sequential Masks: Alleviating Causal Confusion in Autonomous DrivingJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p088228:4(882-892)Online publication date: 20-Jul-2024
https://doi.org/10.20965/jaciii.2024.p0882
Liu YBillard A(2024)Tube Acceleration: Robust Dexterous Throwing Against Release UncertaintyIEEE Transactions on Robotics10.1109/TRO.2024.338639140(2831-2849)Online publication date: 2024
https://doi.org/10.1109/TRO.2024.3386391
Wu HWang M(2024)Human-in-the-Loop Behavior Modeling via an Integral Concurrent Adaptive Inverse Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.325958135:8(11359-11370)Online publication date: Aug-2024
https://doi.org/10.1109/TNNLS.2023.3259581
Wang XWang SLiang XZhao DHuang JXu XDai BMiao Q(2024)Deep Reinforcement Learning: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3207346(1-15)Online publication date: 2024
https://doi.org/10.1109/TNNLS.2022.3207346
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten