Task-Specific Loss: A Teacher-Centered Approach to Transfer Learning Between Distinctly Structured Robotic Agents

Mounsif, Mehdi; Lengagne, Sébastien; Thuilot, Benoit; Adouane, Lounis

doi:10.1007/978-3-030-92442-3_10

Mehdi Mounsif ORCID: orcid.org/0000-0002-2763-3890³⁹,
Sébastien Lengagne³⁹,
Benoit Thuilot³⁹ &
…
Lounis Adouane ORCID: orcid.org/0000-0002-5686-5279⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 793))

Included in the following conference series:

International Conference on Informatics in Control, Automation and Robotics

505 Accesses

Abstract

Recent progress in robotics and artificial intelligence let us envision a future where robot presence and activity will be ubiquitous. Fueled by economics, cultural background, and design choices, human creativity will most likely design robots of various forms and shapes, using a wide range of sensors and actuators to accomplish their tasks. Consequently, it is highly probable that differently structured robots will be required to perform the same task. As many of these tasks may require learning-based control, which still relies on millions of examples to perform correctly, it would be eminently useful to be able to transfer skills from one agent to another, notwithstanding their distinct physical structure. As such, we propose a new method for the fast transfer of skills using a family of differentiable task-specific distance metrics called Task-Specific Losses (TSL). After highlighting the main shared concepts and differences with the closest existing state-of-the-art method, we demonstrate this technique on two different assistive control tasks, showing that we can indeed transfer the realization of a task learned from an expert/teacher to an agent with no previous interaction with the environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Inter-task Similarity Measure for Heterogeneous Tasks

Learning from Humans

Robot Learning

References

Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mordatch, I.: Emergent tool use from multi-agent autocurricula. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: Emergent complexity via multi-agent competition. International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Burgess, C.P., et al.: Understanding disentangling in $\beta $-VAE. In: International Conference on Representation Learning (ICLR) (2018)
Google Scholar
Cobbe, K., Klimov, O., Hesse, C., Kim, T., Schulman, J.: Quantifying generalization in reinforcement learning. In: Proceedings of Machine Learning Research (MLR) (2018)
Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. CoRR abs/1901.02860 (2019). http://arxiv.org/abs/1901.02860
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: RL2: fast reinforcement learning via slow reinforcement learning. In: International Conference on Representation Learning (ICRL) (2017)
Google Scholar
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: International Conference on Representation Learning (ICRL) (2019)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning (ICML) (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE Computer Society, USA (2015). https://doi.org/10.1109/ICCV.2015.169
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems (NeurIPS) (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Association for Computational Linguistics (ACL) (2018)
Google Scholar
James, S., Johns, E.: 3D simulation for robot arm control with deep Q-learning. arXiv (2016)
Google Scholar
Kaiser, L., et al.: One model to learn them all. arXiv:170605137v1 (2017)
Kingma, D.P., Lei Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980v9 (2017)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR Banff, AB, Canada, 14–16 April, Conference Track Proceedings (2014). http://arxiv.org/abs/1312.6114
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS’12, vol. 1, pp. 1097–1105. Curran Associates Inc., Red Hook (2012)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2020)
Google Scholar
Lopes, M., Lang, T., Toussaint, M., Yves Oudeyer, P.: Exploration in model-based reinforcement learning by empirically estimating learning progress. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 206–214. Curran Associates, Inc. (2012)
Google Scholar
Losey, D.P., Srinivasan, K., Mandlekar, A., Garg, A., Sadigh, D.: Controlling assistive robots with learned latent actions. arXiv e-prints arXiv:1909.09674 (2019)
Martin, L., et al.: CamemBERT: a tasty French language model (2019)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). http://arxiv.org/abs/1312.5602
Mounsif, M., Lengagne, S., Thuilot, B., Adouane, L.: Universal notice network: transferable knowledge among agents. In: 6th 2019 International Conference on Control, Decision and Information Technologies (IEEE-CoDIT) (2019)
Google Scholar
Mounsif, M., Lengagne, S., Thuilot, B., Adouane, L.: BAM! base abstracted modeling with universal notice network: fast skill transfer between mobile manipulators. In: 7th 2020 International Conference on Control, Decision and Information Technologies (IEEE-CoDIT) (2020)
Google Scholar
Mounsif, M., Lengagne, S., Thuilot, B., Adouane, L.: CoachGAN: fast adversarial transfer learning between differently shaped entities. In: 17th International Conference on Informatics in Control, Automation and Robotics (ICINCO) (2020)
Google Scholar
Nachum, O., Norouzi, M., Xu, K., Schuurmans, D.: Bridging the gap between value and policy based reinforcement learning. In: International Conference on Neural Information Processing Systems (NeurIPS) (2017)
Google Scholar
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. In: International Conference on Representation Learning (ICRL) (2019)
Google Scholar
OpenAI, Akkaya, I., et al.: Solving Rubik’s cube with a robot hand. arXiv e-prints arXiv:1910.07113 (2019)
OpenAI, Andrychowicz, M., et al.: Learning dexterous in-hand manipulation. CoRR abs/1808.00177 (2018). http://arxiv.org/abs/1808.00177
Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ArXiv e-prints (2018)
Google Scholar
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: International Conference on Robotic and Automation (ICRA) (2018)
Google Scholar
Peng, X.B., Berseth, G., Yin, K., van de Panne, M.: DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans. Graph. (Proc. SIGGRAPH 2017) 36(4), 1–13 (2017)
Google Scholar
Peng, X.B., Chang, M., Zhang, G., Abbeel, P., Levine, S.: MCP: learning composable hierarchical control with multiplicative compositional policies. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 3681–3692. Curran Associates, Inc. (2019)
Google Scholar
Peng, X.B., Kanazawa, A., Malik, J., Abbeel, P., Levine, S.: SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph. 37(6), 1–14 (2018). https://doi.org/10.1145/3272127.3275014
Article Google Scholar
Peng, X.B., Kumar, A., Zhang, G., Levine, S.: Advantage-weighted regression: simple and scalable off-policy reinforcement learning. CoRR abs/1910.00177 (2019). https://arxiv.org/abs/1910.00177
Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W., Abbeel, P.: Asymmetric actor critic for image-based robot learning. In: Kress-Gazit, H., Srinivasa, S.S., Howard, T., Atanasov, N. (eds.) Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, 26–30 June 2018 (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Report (2019)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/1804.02767 (2018), http://arxiv.org/abs/1804.02767
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates, Inc. (2015)
Google Scholar
Riedmiller, M.A., et al.: Learning by playing - solving sparse reward tasks from scratch. In: International Conference on Learning Representation (ICLR) (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28. http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a. arXiv:1505.04597 [cs.CV]
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. ArXiv e-prints (2017)
Google Scholar
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2(3), 230–247 (2010)
Article Google Scholar
Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: Proceedings of the First International Conference on Simulation of Adaptive Behavior on From Animals to Animats (1991)
Google Scholar
Schmidhuber, J.: Evolutionary principles in self-referential learning. Ph.D. thesis, Technische Universitat München (1987)
Google Scholar
Schulman, J., Abbeel, P., Chen, X.: Equivalence between policy gradients and soft Q-learning. CoRR abs/1704.06440 (2017). http://arxiv.org/abs/1704.06440
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: Trust region policy optimization. In: International Conference on Machine Learning (ICML) (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Article Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. J. Nat. (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. ArXiv e-prints (2017)
Google Scholar
Wang, J.X., et al.: Learning to reinforcement learn. CoRR abs/1611.05763 (2016). http://arxiv.org/abs/1611.05763
Wang, R., Lehman, J., Clune, J., Stanley, K.O.: Paired open-ended trailblazer (POET): endlessly generating increasingly complex and diverse learning environments and their solutions. CoRR abs/1901.01753 (2019). http://arxiv.org/abs/1901.01753
Xavier, G., Bengio, Y.: Understanding the difficulty of training deep feedforward neural network. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)
Google Scholar
Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta-reinforcement learning (2019). https://github.com/rlworkgroup/metaworld
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Association for Advancement of Artificial Intelligence (AAAI) (2008)
Google Scholar
Zintgraf, L.M., Shiarlis, K., Kurin, V., Hofmann, K., Whiteson, S.: CAML: fast context adaptation via meta-learning. ArXiv e-prints (2018)
Google Scholar

Download references

Acknowledgment

This work has been sponsored by the French government research program Investissements d’Avenir through the RobotEx Equipment of Excellence (ANR-10-EQPX-44) and the IMobS3 Laboratory of Excellence (ANR-10-LABX-16-01), by the European Union through the program of Regional competitiveness and employment 2007–2013 (ERDF - Auvergne Region), by the Auvergne region and the French Institute for Advanced Mechanics.

Author information

Authors and Affiliations

Université Clermont Auvergne, CNRS, SIGMA Clermont, Institut Pascal, 63000, Clermont-Ferrand, France
Mehdi Mounsif, Sébastien Lengagne & Benoit Thuilot
Université de Technologie de Compiègne, CNRS, Heudiasyc, 60200, Compiègne, France
Lounis Adouane

Authors

Mehdi Mounsif
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Lengagne
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Thuilot
View author publications
You can also search for this author in PubMed Google Scholar
Lounis Adouane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Mounsif .

Editor information

Editors and Affiliations

Ford Research and Advanced Engineering, Dearborn, MI, USA
Oleg Gusikhin
University Paris-Est Créteil (UPEC), Créteil, France
Kurosh Madani
University of Reims Champagne-Ardenne, Reims, France
Janan Zaytoon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mounsif, M., Lengagne, S., Thuilot, B., Adouane, L. (2022). Task-Specific Loss: A Teacher-Centered Approach to Transfer Learning Between Distinctly Structured Robotic Agents. In: Gusikhin, O., Madani, K., Zaytoon, J. (eds) Informatics in Control, Automation and Robotics. ICINCO 2020. Lecture Notes in Electrical Engineering, vol 793. Springer, Cham. https://doi.org/10.1007/978-3-030-92442-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-92442-3_10
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92441-6
Online ISBN: 978-3-030-92442-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Task-Specific Loss: A Teacher-Centered Approach to Transfer Learning Between Distinctly Structured Robotic Agents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Inter-task Similarity Measure for Heterogeneous Tasks

Learning from Humans

Robot Learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Task-Specific Loss: A Teacher-Centered Approach to Transfer Learning Between Distinctly Structured Robotic Agents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Inter-task Similarity Measure for Heterogeneous Tasks

Learning from Humans

Robot Learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation