Recently reinforcement learning has been widely applied to robotic tasks. However, most of these tasks hide more than one objective. In these cases, the construction of a reward function is a key and difficult issue. A typical solution is combining the multiple objectives into one single-objective reward function. However, quite often this formulation is far from being intuitive, and the learning process might converge to a behaviour far from what we need. Another alternative to face these multi-objective tasks is to use what is called transfer learning. In this case, the idea is to reuse the experience gained after the learning of an objective to learn a new one. Nevertheless, the transfer affects only to the learned policy, leaving out other gained information that might be relevant. In this paper, we propose a different approach to learn problems with more than one objective. In particular, we describe a two-stage approach. During the first stage, our algorithm will learn a policy compatible with a main goal at the same time that it gathers relevant information for a subsequent search process. Once this is done, a second stage will start, which consists of a cyclical process of small perturbations and stabilizations, and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective. We have applied our proposal for the learning of the biped walking. We have tested it on a humanoid robot, both on simulation and on a real robot.

This parameter setting is the best found after extensive testing varying the values of \(\alpha _{1}\) and \(\alpha _{2}\).
We have relied in the Mann–Whitney test at \(p \le 0.05\) and we have obtained a \({p}=0.022\).
This work was supported by the research grant TIN2012-32262 (FEDER), and by the Galician Government (Xunta de Galicia) under the Consolidation Program of Competitive Reference Groups (GRC2014/030).
García, J., Iglesias, R., Rodríguez, M.A. et al. Incremental reinforcement learning for multi-objective robotic tasks. Knowl Inf Syst 51, 911–940 (2017). https://doi.org/10.1007/s10115-016-0992-2
