Abstract
Recently reinforcement learning has been widely applied to robotic tasks. However, most of these tasks hide more than one objective. In these cases, the construction of a reward function is a key and difficult issue. A typical solution is combining the multiple objectives into one single-objective reward function. However, quite often this formulation is far from being intuitive, and the learning process might converge to a behaviour far from what we need. Another alternative to face these multi-objective tasks is to use what is called transfer learning. In this case, the idea is to reuse the experience gained after the learning of an objective to learn a new one. Nevertheless, the transfer affects only to the learned policy, leaving out other gained information that might be relevant. In this paper, we propose a different approach to learn problems with more than one objective. In particular, we describe a two-stage approach. During the first stage, our algorithm will learn a policy compatible with a main goal at the same time that it gathers relevant information for a subsequent search process. Once this is done, a second stage will start, which consists of a cyclical process of small perturbations and stabilizations, and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective. We have applied our proposal for the learning of the biped walking. We have tested it on a humanoid robot, both on simulation and on a real robot.










Similar content being viewed by others
Notes
This parameter setting is the best found after extensive testing varying the values of \(\alpha _{1}\) and \(\alpha _{2}\).
We have relied in the Mann–Whitney test at \(p \le 0.05\) and we have obtained a \({p}=0.022\).
References
Allen BF, Petros F (2009) Complex networks of simple neurons for bipedal locomotion. In: 2009 IEEE/RSJ international conference on intelligent robots and systems, October 11–15, 2009. St. Louis, MO, USA, pp 4457–4462
Anderson C (2000) Approximating a policy can be easier than approximating a value function. Technical report, University of Colorado State
Barrett S, Taylor ME, Stone P (2010) Transfer learning for reinforcement learning on a physical robot. In: Ninth international conference on autonomous agents and multiagent systems—adaptive learning agents workshop (ALA), May 2010
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(1–2):41–77
Boedecker J (2005) Humanoid robot simulation and walking behaviour development in the spark simulator framework. Technical report, Artificial Intelligence Research University of Koblenz
Castelletti A, Corani G, Rizzoli AE, Soncini Sessa R, Weber E (2002) Reinforcement learning in the operational management of a water system. In: IFAC workshop on modeling and control in environmental issues
Castro DD, Tamar A, Mannor S (2012) Policy gradients with variance related risk criteria. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, June 26–July 1, 2012
Deisenroth MP, Neumann G, Peters J (2013) A survey on policy search for robotics, foundations and trends in robotics. Found Trends Robot 2(1–2):1–142
Domingues E, Lau N, Pimentel B, Shafii N, Reis LP, Neves AJR (2011) Humanoid behaviors: from simulation to a real robot. In: Progress in artificial intelligence, 15th Portuguese conference on artificial intelligence, EPIA 2011, Lisbon, Portugal, October 10–13, 2011. Proceedings, pp 352–364
Fernández F, García J, Veloso MM (2010) Probabilistic policy reuse for inter-task transfer learning. Robot Auton Syst 58(7):866–871
Ferreira L, Bianchi R, Ribeiro C (2012) Multi-agent multi-objective learning using heuristically accelerated reinforcement learning. In: Brazilian robotics symposium and Latin American robotics symposium
Gabor Z, Kalmar Z, Szepesvari C (1998) Multi-criteria reinforcement learning. In: International conference on machine learning (ICML-98), Madison, WI
Garg A, Roth D (2001) Understanding probabilistic classifiers. In: EMCL ’01: proceedings of the 12th European conference on machine learning, London, UK, 2001. Springer, pp 179–191
Geibel P (2006) In: Proceedings of the 17th European conference on machine learning Berlin, Germany, September 18–22, 2006 Proceedings, Berlin, Heidelberg, 2006. Springer, Berlin, Heidelberg, pp 646–653
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. JAIR 24:81–108
Ijspeert AJ, Nakanishi J, Schaal S (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In: IEEE international conference on robotics and automation (ICRA2002), pp 1398–1403
Kalyanakrishnan S, Stone P (2009) An empirical analysis of value function-based and policy search reinforcement learning. In: The eighth international conference on autonomous agents and multiagent systems (AAMAS), Richland, SC, May 2009. International Foundation for Autonomous Agents and Multiagent Systems, pp 749–756
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 12:579–610
Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1–2):171–203
Kuhlmann G, Stone P (2007) Graph-based domain mapping for transfer learning in general games. In: Proceedings of the 18th European conference on machine learning, September 2007
Lee H, Shen Y, Yu C-H, Singh G, Ng AY (2006) Quadruped robot obstacle negotiation via reinforcement learning. In: Proceedings of the 2006 IEEE international conference on robotics and automation, ICRA 2006, May 15–19, 2006, Orlando, Florida, USA, pp 3003–3010
Liu C, Xu X, Hu D (2015) Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans Syst Man Cybern Syst 45(3):385–398
Van Moffaert K, Brys T, Nowé A (2015) Risk-sensitivity through multi-objective reinforcement learning. In: Proceedings of the IEEE congress on evolutionary computation (IEEE CEC)
Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Evolutionary multi-criterion optimization—7th international conference, EMO 2013, Sheffield, UK, March 19–22, 2013. Proceedings, pp 352–366
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512
Morimoto J, Doya K (2000) Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2, 2000, pp 623–630
Parisi S, Pirotta M, Smacchia N, Bascetta L, Restelli M (2014) Policy gradient approaches for multi-objective sequential decision making. In: 2014 International joint conference on neural networks, IJCNN 2014, Beijing, China, July 6–11, 2014, pp 2323–2330
Perez J, Germain-Renaud C, Kégl B, Loomis C (2009) Responsive elastic computing. In: Proceedings of the 6th international conference industry session on grids meets autonomic computing, GMAC’09, New York, NY, USA, pp 55–64
Roijers D, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Intell Res 48:67–113
Rückstieß T, Felder M, Schmidhuber J (2008) State-dependent exploration for policy gradient methods. In: European conference on machine learning and principles and practice of knowledge discovery in databases 2008, Part II, LNAI 5212, pp 234–249
Shafii N, Reis LP, Lao N (2010) Biped walking using coronal and sagittal movements based on truncated fourier series. In: Sousa AA, Eugénio O (eds) Proceedings of the fifth doctoral symposium in informatics engineering, (DSIE 2010), Porto, Portugal, January 2010. Faculdade de Engenharia, Universidade do Porto, pp 79–90
Shelton CR (2001) Importance sampling for reinforcement learning with multiple objectives. PhD thesis, Massachusetts Institute of Technology, August 2001
Smith AE, Coit DW, Baeck T, Fogel D, Michalewicz Z (1997) Penalty functions. Oxford University Press and Institute of Physics Publishing, New York
Richard S, Sutton RS, Andrew G (1998) Introduction to reinforcement learning, 1st edn. MIT Press, Cambridge
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(1):1633–1685
Taylor ME, Whiteson S, Stone P (2007) Temporal difference and policy search methods for reinforcement learning: an empirical comparison. In: Proceedings of the twenty-second conference on artificial intelligence, pp. 1675–1678, July 2007. Nectar Track
Uchibe E, Doya K (2009) Constrained reinforcement learning from intrinsic and extrinsic rewards. INTECH Open Access Publisher, New York
van Hasselt H (2012) Reinforcement learning in continuous state and action spaces, volume 12 of adaptation, learning, and optimization, Chapter 7. Springer, Berlin, Heidelberg, pp 207–251
Acknowledgments
This work was supported by the research grant TIN2012-32262 (FEDER), and by the Galician Government (Xunta de Galicia) under the Consolidation Program of Competitive Reference Groups (GRC2014/030).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
García, J., Iglesias, R., Rodríguez, M.A. et al. Incremental reinforcement learning for multi-objective robotic tasks. Knowl Inf Syst 51, 911–940 (2017). https://doi.org/10.1007/s10115-016-0992-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0992-2