Skip to main content
Log in

Universal Notice Networks: Transferring Learned Skills Through a Broad Panel of Applications

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Despite great achievements of reinforcement learning based works, those methods are known for their poor sample efficiency. This particular drawback usually means training agents in simulated environments is the only viable option, regarding time constraints. Furthermore, reinforcement learning agents have a strong tendency to overfit on their environment, observing a drastic loss of performances at test time. As a result, tying the agent logic to its current body may very well make transfer unefficient. To tackle that issue, we propose the Universal Notice Network (UNN) method to enforce separation of the neural network layers holding information to solve the task from those related to robot properties, hence enabling easier transfer of knowledge between entities. We demonstrate the efficiency of this method on a broad panel of applications, we consider different kinds of robots, with different morphological structures performing kinematic, dynamic single and multi-robot tasks. We prove that our method produces zero shot (without additionnal learning) transfers that may produce better performances than state-of-the art approaches and show that a fast tuning enhances those performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of data and materials

Not applicable

Code Availability

Not applicable

References

  1. Cunha, J., Serra, R., Lau, N., Lopes, L.S., Neves, A.J.R.: Batch Reinforcement Learning for Robotic Soccer using the Q-batch Update-Rule. Journal of Intelligent & Robotic Systems 80(3), 385–399 (2015)

    Article  Google Scholar 

  2. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)

  3. Peng, X.B., Kanazawa, A., Malik, J., Abbeel, P., Levine, S.: SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph 37(6) (2018)

  4. Li, Y., Ni, P., Chang, V.: Application of deep reinforcement learning in stock trading strategies and stock forecasting. Journal of Intelligent & Robotic Systems 283–300 (2019)

  5. Andrychowicz, O.M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39(1), 3–20 (2020)

    Article  Google Scholar 

  6. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)

    Article  Google Scholar 

  7. Sun, Y., Cheng, J., Zhang, G., Xu, H.: Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning. Journal of Intelligent & Robotic Systems 96 (2019)

  8. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018)

  9. Srinivas, A., et al: Universal planning networks. arXiv:1804.00645 (2018)

  10. Howard, J., Ruder, S.: Fine-tuned language models for text classification. arXiv:1801.06146 (2018)

  11. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research - Proceedings Track 9, 249–256 (2010)

    Google Scholar 

  12. Mounsif, M., et al.: Universal Notice Network: Transferable Knowledge Among Agents. 6th 2019 International Conference on Control, Decision and Information Technologies (IEEE-CoDIT 2019). [cs.RO] (2019)

  13. Juliani, A., et al.: Unity: a general platform for intelligent agents. arXiv:1809.02627(2018)

  14. Starke, S., et al.: Evolutionary multi-objective inverse kinematics on highly articulated and humanoid robot 2017. In: IEEE/RSJ international conference on intelligent robots and systems (2017)

  15. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

  16. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th international conference on machine learning. Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870 (2018)

  17. Mounsif, M.: Exploration of Teacher-Centered and Task-Centered Paradigms for Efficient Transfer of Skills between Morphologically Distinct Robot. PhD Thesis. Université Clermont Auvergne, Ecole doctorale des sciences pour l’ingénieur de Clermont Ferrand (2020)

  18. Devlin, J., Chang, M. -W., Lee, K., Toutanova, K.: BERT: Pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)

  19. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)

    Article  Google Scholar 

  20. Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: international conference on learning representations (2019)

  21. Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A. G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning (2018)

  22. Tao, L., Bowman, M., Zhou, X., Zhang, J., Zhang, X.: Learn and transfer knowledge of preferred assistance strategies in semi-autonomous telemanipulation. Journal of Intelligent & Robotic Systems 104(48) (2022)

  23. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016)

  24. Cao, Z., Kwon, M., Sadigh, D.: Transfer reinforcement learning across homotopy classes. IEEE Robotics and Automation Letters 6(2), 2706–2713 (2021)

    Article  Google Scholar 

  25. Zhao, W., Queralta, J.P., Westerlund, T.: Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737–744 (2020)

  26. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 23–30 (2017)

  27. Zeng, A., Song, S., Lee, J., Rodriguez, A., Funkhouser, T.: Tossingbot: learning to throw arbitrary objects with residual physics. IEEE Trans. Robot. 36(4), 1307–1319 (2020)

    Article  Google Scholar 

  28. Frans, K., Ho, J., Chen, X., Abbeel, P., Schulman, J.: Meta-learning shared hierarchies. In: international conference on learning representations (2018)

  29. Reda, D., Ling, H.Y., van de Panne, M.: Learning to Brachiate via simplified model imitation. In: ACM SIGGRAPH 2022 conference proceedings. SIGGRAPH ’22 (2022)

  30. Beaussant, S., Lengagne, S., Thuilot, B., Stasse, O.: Delay aware universal notice network: real world multi-robot transfer learning. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1251–1258 (2021)

Download references

Acknowledgements

This work has been sponsored by the French government research program Investissements d’avenir through the RobotEx Equipment of Excellence (ANR-10-EQPX-44) and the IMobS3 Laboratory of Excellence (ANR-10-LABX-16-01), by the European Union through the program of Regional competitiveness and employment 2007-2013 (ERDF – Auvergne region), by the Auvergne region and by French Institute for Advanced Mechanics.

Author information

Authors and Affiliations

Authors

Contributions

The technical implementation and the first draft manuscript was written by Mehdi Mounsif and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript

Corresponding author

Correspondence to Sébastien Lengagne.

Ethics declarations

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for Publication

Not applicable

Conflict of Interests

The authors have no relevant financial or nonfinancial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix : Additional Results

Appendix : Additional Results

In this section, additional results for the presented environments are detailed. Specifically, we show both learning curves and test performances for the Raise Plank, Cooperative Raise Plank and Basket-Ball task. These curves confirm the observations detailed in Section 4, especially concerning the transfer efficiency. Indeed, while in the Raise Plank and Cooperative Raise Plank, the vanilla PPO policy and the UNN exhibit a similar final mean reward, see Figs. 14 and 15, the transferred PPO policy is in these cases negatively impacted and its learning progress is undeniably slow, if not null. However, the transferred UNN policy starts with a superior mean reward and, for the Raise Plank and Cooperative Raise Plank environment, reaches the final mean reward of the previous policy in a fraction of the total time. At test time, we use the following metrics, averaged over 100 episodes:

  • Raise Plank: A counter is incremented for each timestep if the plank altitude is over a height threshold

  • Cooperative Raise Plank: A counter is incremented for each timestep if the plank is in the proximity of a target position

  • Basket-Ball: A counter is incremented for each timestep if the ball is kept over a height threshold

These results detailed in Table 5 are coherent with the final mean reward of the learning curves and further emphasize the fact that transfer learning using the UNN approach outperforms by an important margin straightfoward transfer. As Table 5 refers in some cases to the Generic 2 (G2) robot, we inform the reader that this robot architecture is similar to the G3 robot, with the slight difference that a segment was removed.

Fig. 14
figure 14

Addition learning curves for Raise Plank learning performances and transfer from KUKA to G2

Fig. 15
figure 15

Addition learning curves for Cooperative Raise Plank learning performances and transfer from G3-G3 to G2-G3

Table 5 Test performances summary
Fig. 16
figure 16

Addition learning curves forBasket-Ball learning performances and transfer from HCL to KUKA-KUKA

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mounsif, M., Lengagne, S., Thuilot, B. et al. Universal Notice Networks: Transferring Learned Skills Through a Broad Panel of Applications. J Intell Robot Syst 107, 18 (2023). https://doi.org/10.1007/s10846-023-01809-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-01809-2

Keywords

Navigation