skip to main content
10.1145/1143844.1143936acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Maximum margin planning

Published: 25 June 2006 Publication History

Abstract

Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.

References

[1]
Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. ICML '04: Proceedings of the twenty-first international conference on Machine learning.]]
[2]
Duame, H., Langford, J., & Marcu, D. (2006). Search-based structured prediction. In Preparation.]]
[3]
Hazan, E., Kalai, A., Kale, S., & Agarwal, A. (2006). Logarithmic regret algorithms for online convex optimization. To appear in COLT 2006.]]
[4]
Hebert, M., Stentz, A. T., & Thorpe, C. (1998). Mobility planning for autonomous navigation of multiple robots in unstructured environments. Proceedings of ISIC/CIRA/ISAS Joint Conference.]]
[5]
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE (pp. 2278--2324).]]
[6]
LeCun, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems 18.]]
[7]
Nedic, A., & Bertsekas, D. (2000). Convergence rate of incremental subgradient algorithms. Stochastic Optimization: Algorithms and Applications.]]
[8]
Pomerleau, D. (1989). Alvinn: An autonomous land vehicle in a neural network. Advances in Neural Information Processing Systems 1.]]
[9]
Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley.]]
[10]
Rifkin, Y., & Poggio (2003). Regularized least squares classification. Advances in Learning Theory: Methods, Models and Applications. IOS Press.]]
[11]
Shor, N. Z. (1985). Minimization methods for non-differentiable functions. Springer-Verlag.]]
[12]
Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. Twenty Second International Conference on Machine Learning (ICML05).]]
[13]
Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin markov networks. Advances in Neural Information Processing Systems (NIPS-14).]]
[14]
Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. In Advances in neural information processing systems 18.]]
[15]
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 1453--1484.]]
[16]
Vandapel, N., Huber, D., Kapuria, A., & Hebert, M. (2004). Natural terrain classification using 3-d ladar data. IEEE International Conference on Robotics and Automation.]]
[17]
Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proceedings of ICML.]]
[18]
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the Twentieth International Conference on Machine Learning.]]

Cited By

View all
  • (2024)Walking the values in Bayesian inverse reinforcement learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702688(273-287)Online publication date: 15-Jul-2024
  • (2024)Is inverse reinforcement learning harder than standard reinforcement learning? a theoretical perspectiveProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694592(60957-61020)Online publication date: 21-Jul-2024
  • (2024)OLLIEProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694462(57966-58018)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)9
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Walking the values in Bayesian inverse reinforcement learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702688(273-287)Online publication date: 15-Jul-2024
  • (2024)Is inverse reinforcement learning harder than standard reinforcement learning? a theoretical perspectiveProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694592(60957-61020)Online publication date: 21-Jul-2024
  • (2024)OLLIEProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694462(57966-58018)Online publication date: 21-Jul-2024
  • (2024)Offline inverse RLProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693114(26085-26151)Online publication date: 21-Jul-2024
  • (2024)A unified linear programming framework for offline reward learning from human demonstrations and feedbackProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693059(24694-24712)Online publication date: 21-Jul-2024
  • (2024)Data-Driven Policy Learning Methods from Biological Behavior: A Systematic ReviewApplied Sciences10.3390/app1410403814:10(4038)Online publication date: 9-May-2024
  • (2024)Imitating with Sequential Masks: Alleviating Causal Confusion in Autonomous DrivingJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p088228:4(882-892)Online publication date: 20-Jul-2024
  • (2024)Tube Acceleration: Robust Dexterous Throwing Against Release UncertaintyIEEE Transactions on Robotics10.1109/TRO.2024.338639140(2831-2849)Online publication date: 2024
  • (2024)Human-in-the-Loop Behavior Modeling via an Integral Concurrent Adaptive Inverse Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.325958135:8(11359-11370)Online publication date: Aug-2024
  • (2024)Deep Reinforcement Learning: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3207346(1-15)Online publication date: 2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media