Skip to main content
Log in

A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agents

  • Original Research
  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

One of the challenging aspects of open ended or lifelong agent development is that the final behaviour for which an agent is trained at a given moment can be an element for the future creation of one, or even several, behaviours of greater complexity, whose purpose cannot be anticipated. In this paper, we present modular influence network design (MIND), an artificial agent control architecture suited to open ended and cumulative learning. The MIND architecture encapsulates sub behaviours into modules and combines them into a hierarchy reflecting the modular and hierarchical nature of complex tasks. Compared to similar research, the main original aspect of MIND is the multi layered hierarchy using a generic control signal, the influence, to obtain an efficient global behaviour. This article shows the ability of MIND to learn a curriculum of independent didactic tasks of increasing complexity covering different aspects of a desired behaviour. In so doing we demonstrate the contributions of MIND to open-ended development: encapsulation into modules allows for the preservation and re-usability of all the skills acquired during the curriculum and their focused retraining, the modular structure serves the evolving topology by easing the coordination of new sensors, actuators and heterogeneous learning structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Objects can be naturally split into parts and sub-parts, complex features and simple features (Kruger et al. 2013).

  2. JBox2d: www.jbox2d.org.

  3. Videos of the results are available at the following addresses:

  4. Raspberry PI3: www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/.

  5. Grove Pi: www.dexterindustries.com/grovepi/.

  6. OpenCV computer vision library: https://opencv.org/.

  7. Videos of the results are available at the following address: www.lirmm.fr/~suro/videos/clawDemo.mp4; https://hal.archives-ouvertes.fr/hal-02594407.

References

  • Arkin, R. C., & Balch, T. (1997). Aura: Principles and practice in review. Journal of Experimental & Theoretical Artificial Intelligence, 9(2–3), 175–189.

    Article  Google Scholar 

  • Barto, A. G., Singh, S., & Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of the 3rd International Conference on Development and Learning (pp. 112–19).

  • Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 41–48). ACM.

  • Blaes, S., Pogančić, M. V., Zhu, J., & Martius, G. (2019). Control what you can: Intrinsically motivated task-planning agent. In Advances in Neural Information Processing Systems (pp. 12520–12531).

  • Braitenberg, V. (1986). Vehicles: Experiments in synthetic psychology. Cambridge: MIT press.

    Google Scholar 

  • Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1), 14–23.

    Article  Google Scholar 

  • Connor, J. T., Martin, R. D., & Atlas, L. E. (1994). Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), 240–254.

    Article  Google Scholar 

  • De Jong, K. A. (1992). Are genetic algorithms function optimizers? PPSN, 2, 3–14.

    Google Scholar 

  • Devin, C., Gupta, A., Darrell, T., Abbeel, P., & Levine, S. (2017). Learning modular neural network policies for multi-task and multi-robot transfer. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 2169–2176). IEEE.

  • Dorigo, M., & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial intelligence, 71(2), 321–370.

    Article  Google Scholar 

  • Dorigo, M., & Colombetti, M. (1998). Robot shaping: An experiment in behavior engineering. Cambridge: MIT press.

    Google Scholar 

  • Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1), 1–47.

    Article  Google Scholar 

  • Foglino, F., Christakou, C. C., & Leonetti, M. (2019). An optimization framework for task sequencing in curriculum learning. In Joint IEEE 9th International Conference ICDL-EpiRob (pp. 207–214). IEEE.

  • Forestier, S., Mollard, Y., & Oudeyer, P.-Y. (2017). Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190.

  • Gen, M., & Lin, L. (2007). Genetic algorithms. In Wiley Encyclopedia of Computer Science and Engineering (pp. 1–15).

  • Gülçehre, Ç., Moczulski, M., Visin, F., & Bengio, Y. (2016). Mollifying networks. CoRR, abs/1608.04980.

  • Heess, N., Wayne, G., Tassa, Y., Lillicrap, T. P., Riedmiller, M. A., & Silver, D. (2016). Learning and transfer of modulated locomotor controllers. CoRR, abs/1610.05182.

  • Hester, T., & Stone, P. (2017). Intrinsically motivated model learning for developing curious robots. Artificial Intelligence, 247, 170–186.

    Article  MathSciNet  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. InAdvances in neural information processing systems (pp. 1097–1105).

  • Kruger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., et al. (2013). Deep hierarchies in the primate visual cortex: What can we learn for computer vision? IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1847–1871.

    Article  Google Scholar 

  • Larsen, T., & Hansen, S. T. (2005). Evolving composite robot behaviour-a modular architecture. In Proceedings of the Fifth International Workshop on Robot Motion and Control, 2005. RoMoCo’05., pages 271–276. IEEE.

  • Lessin, D., Fussell, D., & Miikkulainen, R. (2013). Open-ended behavioral complexity for evolved virtual creatures. In Proceedings of the 15th annual conference on Genetic and evolutionary computation (pp. 335–342).

  • Lessin, D., Fussell, D., Miikkulainen, R., & Risi, S. (2015). Increasing behavioral complexity for evolved virtual creatures with the esp method. arXiv preprint arXiv:1510.07957.

  • Lopes, M., & Oudeyer, P.-Y. (2012). The strategic student approach for life-long exploration and learning. In 2012 IEEE international conference on development and learning and epigenetic robotics (ICDL) (pp. 1–8). IEEE.

  • Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149.

    Article  Google Scholar 

  • Lungarella, M., Metta, G., Pfeifer, R., & Sandini, G. (2003). Developmental robotics: A survey. Connection Science, 15(4), 151–190.

    Article  Google Scholar 

  • Narvekar, S., Sinapov, J., Leonetti, M., & Stone, P. (2016). Source task creation for curriculum learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems (pp. 566–574). International Foundation for Autonomous Agents and Multiagent Systems.

  • Niël, R., & Wiering, M. A. (2018). Hierarchical reinforcement learning for playing a dynamic dungeon crawler game. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1159–1166). IEEE.

  • Oudeyer, P.-Y. (2012). Developmental robotics. In Encyclopedia of the sciences of learning (pp 969–972). Springer.

  • Oudeyer, P.-Y., & Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in Neurorobotics, 1, 6.

    Article  Google Scholar 

  • Piaget, J. (1954). The construction of reality in the child. New York: Basic Books.

    Book  Google Scholar 

  • Piaget, J., & Duckworth, E. (1970). Genetic epistemology. American Behavioral Scientist, 13(3), 459–480.

    Article  Google Scholar 

  • Reynolds, C. W. (1987). Flocks, herds and schools: A distributed behavioral model. In ACM SIGGRAPH computer graphics (Vol. 21, pp. 25–34). ACM.

  • Rudolph, G. (1994). Convergence analysis of canonical genetic algorithms. IEEE Transactions on Neural Networks, 5(1), 96–101.

    Article  Google Scholar 

  • Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd ed.). Upper Saddle River: Prentice Hall Press.

    MATH  Google Scholar 

  • Santucci, V. G., Baldassarre, G., & Cartoni, E. (2019). Autonomous reinforcement learning of multiple interrelated tasks. In 2019 Joint IEEE 9th international conference on development and learning and epigenetic robotics (ICDL-EpiRob) (pp. 221–227). IEEE.

  • Santucci, V. G., Baldassarre, G., & Mirolli, M. (2016). Grail: A goal-discovering robotic architecture for intrinsically-motivated learning. IEEE Transactions on Cognitive and Developmental Systems, 8(3), 214–231.

    Article  Google Scholar 

  • Schrum, J., & Miikkulainen, R. (2015). Discovering multimodal behavior in MS PAC-man through evolution of modular neural networks. IEEE Transactions on Computational Intelligence and AI in Games, 8(1), 67–81.

    Article  Google Scholar 

  • Simonin, O., & Ferber, J. (2000). Modeling self satisfaction and altruism to handle action selection and reactive cooperation. In Proceedings of the 6th international conference on the simulation of adaptive behavior (Vol. 2, pp. 314–323).

  • Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2), 99–127.

    Article  Google Scholar 

  • Stone, P., & Veloso, M. (2000). Layered learning. In European conference on machine learning (pp. 369–381). Springer.

  • Whiteson, S., Kohl, N., Miikkulainen, R., & Stone, P. (2003). Evolving keepaway soccer players through task decomposition. In Genetic and Evolutionary Computation Conference (pp. 356–368). Springer.

Download references

Acknowledgements

We thank our anonymous reviewers for their many constructive comments and suggestions which greatly improve the present article. We also thank Eric Bourreau and Marianne Huchard (LIRMM) and the members of the SMILE team (LIRMM) for their comments that helped improve our initial work. This work has been realized with the support of the High Performance Computing Platform HPC@LR, financed by the Occitanie / Pyrénées-Méditerranée Region, Montpellier Mediterranean Metropole and the University of Montpellier, France.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Suro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The following tables give the settings used in scenario 1. All values are given using the units of the physics engine (Jbox2d). These values were used to obtain the results shown in this article. However, other settings can be used with substantial improvements in convergence time and computing cost.

Table 4 Setting used in scenario 1. Jbox2d engine
Table 5 Exhaustive list of the settings and rewards of the curriculum used in scenario 1

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suro, F., Ferber, J., Stratulat, T. et al. A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agents. Auton Robot 45, 245–264 (2021). https://doi.org/10.1007/s10514-020-09960-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-020-09960-7

Keywords

Navigation