Universal Notice Networks: Transferring Learned Skills Through a Broad Panel of Applications

Mounsif, Mehdi; Lengagne, Sébastien; Thuilot, Benoit; Adouane, Lounis

doi:10.1007/s10846-023-01809-2

Universal Notice Networks: Transferring Learned Skills Through a Broad Panel of Applications

Short Paper
Published: 23 January 2023

Volume 107, article number 18, (2023)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Mehdi Mounsif¹,
Sébastien Lengagne ORCID: orcid.org/0000-0002-1831-1072¹,
Benoit Thuilot¹ &
…
Lounis Adouane²

113 Accesses
Explore all metrics

Abstract

Despite great achievements of reinforcement learning based works, those methods are known for their poor sample efficiency. This particular drawback usually means training agents in simulated environments is the only viable option, regarding time constraints. Furthermore, reinforcement learning agents have a strong tendency to overfit on their environment, observing a drastic loss of performances at test time. As a result, tying the agent logic to its current body may very well make transfer unefficient. To tackle that issue, we propose the Universal Notice Network (UNN) method to enforce separation of the neural network layers holding information to solve the task from those related to robot properties, hence enabling easier transfer of knowledge between entities. We demonstrate the efficiency of this method on a broad panel of applications, we consider different kinds of robots, with different morphological structures performing kinematic, dynamic single and multi-robot tasks. We prove that our method produces zero shot (without additionnal learning) transfers that may produce better performances than state-of-the art approaches and show that a fast tuning enhances those performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformed Successor Features for Transfer Reinforcement Learning

Leveraging Expert Demonstrations in Robot Cooperation with Multi-Agent Reinforcement Learning

Addressing Reward Engineering for Deep Reinforcement Learning on Multi-stage Task

Availability of data and materials

Not applicable

Code Availability

Not applicable

References

Cunha, J., Serra, R., Lau, N., Lopes, L.S., Neves, A.J.R.: Batch Reinforcement Learning for Robotic Soccer using the Q-batch Update-Rule. Journal of Intelligent & Robotic Systems 80(3), 385–399 (2015)
Article Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
Peng, X.B., Kanazawa, A., Malik, J., Abbeel, P., Levine, S.: SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph 37(6) (2018)
Li, Y., Ni, P., Chang, V.: Application of deep reinforcement learning in stock trading strategies and stock forecasting. Journal of Intelligent & Robotic Systems 283–300 (2019)
Andrychowicz, O.M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39(1), 3–20 (2020)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)
Article Google Scholar
Sun, Y., Cheng, J., Zhang, G., Xu, H.: Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning. Journal of Intelligent & Robotic Systems 96 (2019)
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018)
Srinivas, A., et al: Universal planning networks. arXiv:1804.00645 (2018)
Howard, J., Ruder, S.: Fine-tuned language models for text classification. arXiv:1801.06146 (2018)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research - Proceedings Track 9, 249–256 (2010)
Google Scholar
Mounsif, M., et al.: Universal Notice Network: Transferable Knowledge Among Agents. 6th 2019 International Conference on Control, Decision and Information Technologies (IEEE-CoDIT 2019). [cs.RO] (2019)
Juliani, A., et al.: Unity: a general platform for intelligent agents. arXiv:1809.02627(2018)
Starke, S., et al.: Evolutionary multi-objective inverse kinematics on highly articulated and humanoid robot 2017. In: IEEE/RSJ international conference on intelligent robots and systems (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th international conference on machine learning. Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870 (2018)
Mounsif, M.: Exploration of Teacher-Centered and Task-Centered Paradigms for Efficient Transfer of Skills between Morphologically Distinct Robot. PhD Thesis. Université Clermont Auvergne, Ecole doctorale des sciences pour l’ingénieur de Clermont Ferrand (2020)
Devlin, J., Chang, M. -W., Lee, K., Toutanova, K.: BERT: Pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)
Article Google Scholar
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: international conference on learning representations (2019)
Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A. G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning (2018)
Tao, L., Bowman, M., Zhou, X., Zhang, J., Zhang, X.: Learn and transfer knowledge of preferred assistance strategies in semi-autonomous telemanipulation. Journal of Intelligent & Robotic Systems 104(48) (2022)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016)
Cao, Z., Kwon, M., Sadigh, D.: Transfer reinforcement learning across homotopy classes. IEEE Robotics and Automation Letters 6(2), 2706–2713 (2021)
Article Google Scholar
Zhao, W., Queralta, J.P., Westerlund, T.: Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744 (2020)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 23–30 (2017)
Zeng, A., Song, S., Lee, J., Rodriguez, A., Funkhouser, T.: Tossingbot: learning to throw arbitrary objects with residual physics. IEEE Trans. Robot. 36(4), 1307–1319 (2020)
Article Google Scholar
Frans, K., Ho, J., Chen, X., Abbeel, P., Schulman, J.: Meta-learning shared hierarchies. In: international conference on learning representations (2018)
Reda, D., Ling, H.Y., van de Panne, M.: Learning to Brachiate via simplified model imitation. In: ACM SIGGRAPH 2022 conference proceedings. SIGGRAPH ’22 (2022)
Beaussant, S., Lengagne, S., Thuilot, B., Stasse, O.: Delay aware universal notice network: real world multi-robot transfer learning. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1251–1258 (2021)

Download references

Acknowledgements

This work has been sponsored by the French government research program Investissements d’avenir through the RobotEx Equipment of Excellence (ANR-10-EQPX-44) and the IMobS3 Laboratory of Excellence (ANR-10-LABX-16-01), by the European Union through the program of Regional competitiveness and employment 2007-2013 (ERDF – Auvergne region), by the Auvergne region and by French Institute for Advanced Mechanics.

Author information

Authors and Affiliations

ISPR, Université Clermont Auvergne, Clermont Auvergne INP, CNRS, Institut Pascal, Clermont-Ferrand, 63000, France
Mehdi Mounsif, Sébastien Lengagne & Benoit Thuilot
Heudiasyc, Université de technologie de Compiègne, 57 avenue de Landshut, Compiègne, 60203, France
Lounis Adouane

Authors

Mehdi Mounsif
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Lengagne
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Thuilot
View author publications
You can also search for this author in PubMed Google Scholar
Lounis Adouane
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The technical implementation and the first draft manuscript was written by Mehdi Mounsif and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript

Corresponding author

Correspondence to Sébastien Lengagne.

Ethics declarations

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for Publication

Not applicable

Conflict of Interests

The authors have no relevant financial or nonfinancial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix : Additional Results

In this section, additional results for the presented environments are detailed. Specifically, we show both learning curves and test performances for the Raise Plank, Cooperative Raise Plank and Basket-Ball task. These curves confirm the observations detailed in Section 4, especially concerning the transfer efficiency. Indeed, while in the Raise Plank and Cooperative Raise Plank, the vanilla PPO policy and the UNN exhibit a similar final mean reward, see Figs. 14 and 15, the transferred PPO policy is in these cases negatively impacted and its learning progress is undeniably slow, if not null. However, the transferred UNN policy starts with a superior mean reward and, for the Raise Plank and Cooperative Raise Plank environment, reaches the final mean reward of the previous policy in a fraction of the total time. At test time, we use the following metrics, averaged over 100 episodes:

Raise Plank: A counter is incremented for each timestep if the plank altitude is over a height threshold
Cooperative Raise Plank: A counter is incremented for each timestep if the plank is in the proximity of a target position
Basket-Ball: A counter is incremented for each timestep if the ball is kept over a height threshold

These results detailed in Table 5 are coherent with the final mean reward of the learning curves and further emphasize the fact that transfer learning using the UNN approach outperforms by an important margin straightfoward transfer. As Table 5 refers in some cases to the Generic 2 (G2) robot, we inform the reader that this robot architecture is similar to the G3 robot, with the slight difference that a segment was removed.

Table 5 Test performances summary

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mounsif, M., Lengagne, S., Thuilot, B. et al. Universal Notice Networks: Transferring Learned Skills Through a Broad Panel of Applications. J Intell Robot Syst 107, 18 (2023). https://doi.org/10.1007/s10846-023-01809-2

Download citation

Received: 11 July 2022
Accepted: 03 January 2023
Published: 23 January 2023
DOI: https://doi.org/10.1007/s10846-023-01809-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Universal Notice Networks: Transferring Learned Skills Through a Broad Panel of Applications

Abstract

Access this article

Similar content being viewed by others

Transformed Successor Features for Transfer Reinforcement Learning

Leveraging Expert Demonstrations in Robot Cooperation with Multi-Agent Reinforcement Learning

Addressing Reward Engineering for Deep Reinforcement Learning on Multi-stage Task

Availability of data and materials

Code Availability

References

Acknowledgements