Active reward learning with a novel acquisition function

Daniel, Christian; Kroemer, Oliver; Viering, Malte; Metz, Jan; Peters, Jan

doi:10.1007/s10514-015-9454-z

Active reward learning with a novel acquisition function

Published: 16 July 2015

Volume 39, pages 389–405, (2015)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Christian Daniel ORCID: orcid.org/0000-0001-7065-5326¹,
Oliver Kroemer¹,
Malte Viering¹,
Jan Metz¹ &
…
Jan Peters^1,2

1257 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simultaneously learns an action policy and a model of the reward function by actively querying a human expert for ratings. We represent the reward model using a Gaussian process and evaluate several classical acquisition functions (AFs) from the Bayesian optimization literature in this context. Furthermore, we present a novel AF, expected policy divergence. We demonstrate results of our method for a robot grasping task and show that the learned reward function generalizes to a similar task. Additionally, we evaluate the proposed novel AF on a real robot pendulum swing-up task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Article 17 August 2020

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

References

Akrour, R., Schoenauer, M., & Sebag, M. (2011). Preference-based policy learning. Machine learning and knowledge discovery in databases. Berlin: Springer.
Google Scholar
Akrour, R., Schoenauer, M., & Sebag, M. (2013). Interactive robot education. In European Conference on Machine Learning Workshop.
Balasubramanian, R., Ling, X., Brook, P. D., Smith, J. R., & Matsuoka, Y. (2012). Physical human interactive guidance: Identifying grasping principles from human-planned grasps. IEEE Transactions on Robotics, 28(4), 899–910.
Article Google Scholar
Bratman, J., Singh, S., Sorg, J., & Lewis, R. (2012). Strong mitigation: Nesting search for good policies within search for good reward. In International Conference on Autonomous Agents and Multiagent Systems.
Cakmak, M., & Thomaz, A. L. (2012). Designing robot learners that ask good questions. In International Conference on Human-Robot Interaction.
Cheng, W., Fürnkranz, J., Hüllermeier, E., & Park, S.-H. (2011). Preference-based policy iteration: Leveraging preference learning for reinforcement learning. Machine learning and knowledge discovery in databases. Berlin: Springer.
Google Scholar
Chu, W., & Ghahramani, Z. (2005). Preference learning with Gaussian processes. In International Conference on Machine Learning.
Dang, H., & Allen, P. K. (2012). Learning grasp stability. In International Conference on Robotics and Automation.
Daniel, C., Neumann, G., & Peters, J. (2013). Learning sequential motor tasks. In International Conference on Robotics and Automation.
Devlin, S., & Kudenko, D. (2012). Dynamic potential-based reward shaping. In International Conference on Autonomous Agents and Multiagent Systems.
Dorigo, M., & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71, 321–370.
Article Google Scholar
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In International Conference on Machine Learning.
Ghavamzadeh, M., & Engel, Y. (2007). Bayesian policy gradient algorithms. Advances in neural information processing systems. Cambridge, MA: MIT Press.
Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C., & Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems.
Hake, H. W., & Garner, W. R. (1951). The effect of presenting various numbers of discrete steps on scale reading accuracy. Journal of Experimental Psychology, 42, 358.
Article Google Scholar
Hoffman, M. D., Brochu, E., & de Freitas, N. (2011). Portfolio allocation for Bayesian optimization. In Conference on Uncertainty in Artificial Intelligence.
Ijspeert, A., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. Advances in neural information processing systems. Cambridge, MA: MIT Press.
Google Scholar
Jain, A., Wojcik, B., Joachims, T., & Saxena, A. (2013). Learning trajectory preferences for manipulators via iterative improvement. Advances in neural information processing systems. Cambridge, MA: MIT Press.
Google Scholar
Julier, S. J, & Uhlmann, J. K. (1997). A new extension of the kalman filter to nonlinear systems. In International Symposium on Aerospace/Defense Sensing, Simulation and Controls.
Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In International Conference on Knowledge Capture.
Kober, J., Mohler, B. J., & Peters, J. (2008). Learning perceptual coupling for motor primitives. In Intelligent Robots and Systems.
Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In International Conference on Machine Learning.
Kormushev, P., Calinon, S., & Caldwell, D. (2010). Robot motor skill coordination with EM-based reinforcement learning. In Intelligent Robots and Systems.
Kroemer, O., Detry, R., Piater, J., & Peters, J. (2010). Combining active learning and reactive control for robot grasping. Robotics and Autonomous Systems, 58, 1105–1116.
Article Google Scholar
Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Fluids Engineering, 86(1), 97–106.
MathSciNet Google Scholar
Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. Machine learning and knowledge discovery in databases. Berlin: Springer.
Google Scholar
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.
Article Google Scholar
Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards global optimization. Amsterdam: North Holland.
Google Scholar
Ng, A., & Coates, A. (1998). Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX.
Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning.
Peters, J., Mülling, K., & Altun, Y. (2010). Relative entropy policy search. In National Conference on Artificial Intelligence.
Rasmussen, C. E., & Rasmussen, C. K. I. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press.
MATH Google Scholar
Ratliff, N., Bagnell, A., & Zinkevich, M. (2006). Maximum margin planning. In International Conference on Machine Learning.
Ratliff, N., Silver, D., & Bagnell, A. (2009). Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27, 25–53.
Article Google Scholar
Schoenauer, M., Akrour, R., Sebag, M., & Souplet, J.-C. (2014). Programming by feedback. In International Conference on Machine Learning.
Singh, S., Lewis, R. L., Barto, A. G., & Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. Autonomous Mental Development, 2, 70–82.
Article Google Scholar
Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. (2010). Gaussian process optimization in the bandit setting: No regret and experimental design. In Internation Conference on Machine Learning.
Suárez Feijóo, R., Cornellá Medrano, J., & Roa Garzón, M. (2014). Grasp quality measures: Review and performance. Autonomous Robots, 38(1), 65–88. http://link.springer.com/article/10.1007/s10514-014-9402-3
Thomaz, A. L., & Breazeal, C. (2008). Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, 172, 716–737.
Article Google Scholar
Wilson, A., Fern, A., & Tadepalli, P. (2012). A bayesian approach for policy learning from trajectory preference queries. In Advances in Neural Information Processing Systems.
Ziebart, B., Maas, A., Bagnell, A., & Dey, A. (2008) Maximum entropy inverse reinforcement learning. In Conference on Artificial Intelligence.

Download references

Acknowledgments

The authors want to thank for the support of the European Union Projects #FP7-ICT-270327 (Complacs) and #FP7-ICT-2013-10 (3rd Hand).

Author information

Authors and Affiliations

Technische Universität Darmstadt, Hochschulstrasse 10, 64289, Darmstadt, Germany
Christian Daniel, Oliver Kroemer, Malte Viering, Jan Metz & Jan Peters
Max-Planck-Institut für Intelligente Systeme, Spemannstraße 38, 72076, Tübingen, Germany
Jan Peters

Authors

Christian Daniel
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Kroemer
View author publications
You can also search for this author in PubMed Google Scholar
Malte Viering
View author publications
You can also search for this author in PubMed Google Scholar
Jan Metz
View author publications
You can also search for this author in PubMed Google Scholar
Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Daniel.

Additional information

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daniel, C., Kroemer, O., Viering, M. et al. Active reward learning with a novel acquisition function. Auton Robot 39, 389–405 (2015). https://doi.org/10.1007/s10514-015-9454-z

Download citation

Received: 07 November 2014
Accepted: 02 July 2015
Published: 16 July 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10514-015-9454-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active reward learning with a novel acquisition function

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Active reward learning with a novel acquisition function

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation