Incremental reinforcement learning for multi-objective robotic tasks

García, Javier; Iglesias, Roberto; Rodríguez, Miguel A.; Regueiro, Carlos V.

doi:10.1007/s10115-016-0992-2

Incremental reinforcement learning for multi-objective robotic tasks

Regular Paper
Published: 22 September 2016

Volume 51, pages 911–940, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Javier García ORCID: orcid.org/0000-0002-5638-5240¹,
Roberto Iglesias¹,
Miguel A. Rodríguez¹ &
…
Carlos V. Regueiro²

1029 Accesses
Explore all metrics

Abstract

Recently reinforcement learning has been widely applied to robotic tasks. However, most of these tasks hide more than one objective. In these cases, the construction of a reward function is a key and difficult issue. A typical solution is combining the multiple objectives into one single-objective reward function. However, quite often this formulation is far from being intuitive, and the learning process might converge to a behaviour far from what we need. Another alternative to face these multi-objective tasks is to use what is called transfer learning. In this case, the idea is to reuse the experience gained after the learning of an objective to learn a new one. Nevertheless, the transfer affects only to the learned policy, leaving out other gained information that might be relevant. In this paper, we propose a different approach to learn problems with more than one objective. In particular, we describe a two-stage approach. During the first stage, our algorithm will learn a policy compatible with a main goal at the same time that it gathers relevant information for a subsequent search process. Once this is done, a second stage will start, which consists of a cyclical process of small perturbations and stabilizations, and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective. We have applied our proposal for the learning of the biped walking. We have tested it on a humanoid robot, both on simulation and on a real robot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robot Learning

Optimal Control and Reinforcement Learning for Robot: A Survey

Survey of Model-Based Reinforcement Learning: Applications on Robotics

Article 26 January 2017

Notes

This parameter setting is the best found after extensive testing varying the values of $\alpha _{1}$ and $\alpha _{2}$.
We have relied in the Mann–Whitney test at $p \le 0.05$ and we have obtained a ${p}=0.022$.

References

Allen BF, Petros F (2009) Complex networks of simple neurons for bipedal locomotion. In: 2009 IEEE/RSJ international conference on intelligent robots and systems, October 11–15, 2009. St. Louis, MO, USA, pp 4457–4462
Anderson C (2000) Approximating a policy can be easier than approximating a value function. Technical report, University of Colorado State
Barrett S, Taylor ME, Stone P (2010) Transfer learning for reinforcement learning on a physical robot. In: Ninth international conference on autonomous agents and multiagent systems—adaptive learning agents workshop (ALA), May 2010
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):41–77
Article MathSciNet MATH Google Scholar
Boedecker J (2005) Humanoid robot simulation and walking behaviour development in the spark simulator framework. Technical report, Artificial Intelligence Research University of Koblenz
Castelletti A, Corani G, Rizzoli AE, Soncini Sessa R, Weber E (2002) Reinforcement learning in the operational management of a water system. In: IFAC workshop on modeling and control in environmental issues
Castro DD, Tamar A, Mannor S (2012) Policy gradients with variance related risk criteria. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, June 26–July 1, 2012
Deisenroth MP, Neumann G, Peters J (2013) A survey on policy search for robotics, foundations and trends in robotics. Found Trends Robot 2(1–2):1–142
Google Scholar
Domingues E, Lau N, Pimentel B, Shafii N, Reis LP, Neves AJR (2011) Humanoid behaviors: from simulation to a real robot. In: Progress in artificial intelligence, 15th Portuguese conference on artificial intelligence, EPIA 2011, Lisbon, Portugal, October 10–13, 2011. Proceedings, pp 352–364
Fernández F, García J, Veloso MM (2010) Probabilistic policy reuse for inter-task transfer learning. Robot Auton Syst 58(7):866–871
Article Google Scholar
Ferreira L, Bianchi R, Ribeiro C (2012) Multi-agent multi-objective learning using heuristically accelerated reinforcement learning. In: Brazilian robotics symposium and Latin American robotics symposium
Gabor Z, Kalmar Z, Szepesvari C (1998) Multi-criteria reinforcement learning. In: International conference on machine learning (ICML-98), Madison, WI
Garg A, Roth D (2001) Understanding probabilistic classifiers. In: EMCL ’01: proceedings of the 12th European conference on machine learning, London, UK, 2001. Springer, pp 179–191
Geibel P (2006) In: Proceedings of the 17th European conference on machine learning Berlin, Germany, September 18–22, 2006 Proceedings, Berlin, Heidelberg, 2006. Springer, Berlin, Heidelberg, pp 646–653
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. JAIR 24:81–108
MATH Google Scholar
Ijspeert AJ, Nakanishi J, Schaal S (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In: IEEE international conference on robotics and automation (ICRA2002), pp 1398–1403
Kalyanakrishnan S, Stone P (2009) An empirical analysis of value function-based and policy search reinforcement learning. In: The eighth international conference on autonomous agents and multiagent systems (AAMAS), Richland, SC, May 2009. International Foundation for Autonomous Agents and Multiagent Systems, pp 749–756
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 12:579–610
Google Scholar
Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1–2):171–203
Article MathSciNet MATH Google Scholar
Kuhlmann G, Stone P (2007) Graph-based domain mapping for transfer learning in general games. In: Proceedings of the 18th European conference on machine learning, September 2007
Lee H, Shen Y, Yu C-H, Singh G, Ng AY (2006) Quadruped robot obstacle negotiation via reinforcement learning. In: Proceedings of the 2006 IEEE international conference on robotics and automation, ICRA 2006, May 15–19, 2006, Orlando, Florida, USA, pp 3003–3010
Liu C, Xu X, Hu D (2015) Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans Syst Man Cybern Syst 45(3):385–398
Article Google Scholar
Van Moffaert K, Brys T, Nowé A (2015) Risk-sensitivity through multi-objective reinforcement learning. In: Proceedings of the IEEE congress on evolutionary computation (IEEE CEC)
Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Evolutionary multi-criterion optimization—7th international conference, EMO 2013, Sheffield, UK, March 19–22, 2013. Proceedings, pp 352–366
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512
MathSciNet MATH Google Scholar
Morimoto J, Doya K (2000) Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2, 2000, pp 623–630
Parisi S, Pirotta M, Smacchia N, Bascetta L, Restelli M (2014) Policy gradient approaches for multi-objective sequential decision making. In: 2014 International joint conference on neural networks, IJCNN 2014, Beijing, China, July 6–11, 2014, pp 2323–2330
Perez J, Germain-Renaud C, Kégl B, Loomis C (2009) Responsive elastic computing. In: Proceedings of the 6th international conference industry session on grids meets autonomic computing, GMAC’09, New York, NY, USA, pp 55–64
Roijers D, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Intell Res 48:67–113
MathSciNet MATH Google Scholar
Rückstieß T, Felder M, Schmidhuber J (2008) State-dependent exploration for policy gradient methods. In: European conference on machine learning and principles and practice of knowledge discovery in databases 2008, Part II, LNAI 5212, pp 234–249
Shafii N, Reis LP, Lao N (2010) Biped walking using coronal and sagittal movements based on truncated fourier series. In: Sousa AA, Eugénio O (eds) Proceedings of the fifth doctoral symposium in informatics engineering, (DSIE 2010), Porto, Portugal, January 2010. Faculdade de Engenharia, Universidade do Porto, pp 79–90
Shelton CR (2001) Importance sampling for reinforcement learning with multiple objectives. PhD thesis, Massachusetts Institute of Technology, August 2001
Smith AE, Coit DW, Baeck T, Fogel D, Michalewicz Z (1997) Penalty functions. Oxford University Press and Institute of Physics Publishing, New York
Book Google Scholar
Richard S, Sutton RS, Andrew G (1998) Introduction to reinforcement learning, 1st edn. MIT Press, Cambridge
Google Scholar
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(1):1633–1685
MathSciNet MATH Google Scholar
Taylor ME, Whiteson S, Stone P (2007) Temporal difference and policy search methods for reinforcement learning: an empirical comparison. In: Proceedings of the twenty-second conference on artificial intelligence, pp. 1675–1678, July 2007. Nectar Track
Uchibe E, Doya K (2009) Constrained reinforcement learning from intrinsic and extrinsic rewards. INTECH Open Access Publisher, New York
MATH Google Scholar
van Hasselt H (2012) Reinforcement learning in continuous state and action spaces, volume 12 of adaptation, learning, and optimization, Chapter 7. Springer, Berlin, Heidelberg, pp 207–251

Download references

Acknowledgments

This work was supported by the research grant TIN2012-32262 (FEDER), and by the Galician Government (Xunta de Galicia) under the Consolidation Program of Competitive Reference Groups (GRC2014/030).

Author information

Authors and Affiliations

CITIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Javier García, Roberto Iglesias & Miguel A. Rodríguez
Departament of Electronics and Systems, Universidade da Coruña, A Coruña, Spain
Carlos V. Regueiro

Authors

Javier García
View author publications
You can also search for this author inPubMed Google Scholar
Roberto Iglesias
View author publications
You can also search for this author inPubMed Google Scholar
Miguel A. Rodríguez
View author publications
You can also search for this author inPubMed Google Scholar
Carlos V. Regueiro
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Javier García.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García, J., Iglesias, R., Rodríguez, M.A. et al. Incremental reinforcement learning for multi-objective robotic tasks. Knowl Inf Syst 51, 911–940 (2017). https://doi.org/10.1007/s10115-016-0992-2

Download citation

Received: 07 August 2015
Accepted: 08 September 2016
Published: 22 September 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10115-016-0992-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental reinforcement learning for multi-objective robotic tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robot Learning

Optimal Control and Reinforcement Learning for Robot: A Survey

Survey of Model-Based Reinforcement Learning: Applications on Robotics

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now