Simultaneous Learning and Planning in a Hierarchical Control System for a Cognitive Agent

Panov, A. I.

doi:10.1134/S0005117922060054

Simultaneous Learning and Planning in a Hierarchical Control System for a Cognitive Agent

THEMATIC ISSUE
Published: 05 July 2022

Volume 83, pages 869–883, (2022)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

A. I. Panov^1,2

89 Accesses
4 Citations
Explore all metrics

Abstract

The tasks of behavior planning and decision-making learning in a dynamic environment are usually divided and considered separately in control systems for intelligent agents. A new unified hierarchical formulation of the problem of simultaneous learning and planning (SLAP) is proposed in the context of object-oriented reinforcement learning, and an architecture of a cognitive agent that solves this problem is described. A new algorithm for learning actions in a partially observed external environment is proposed using a reward signal, an object-oriented subject description of the states of the external environment, and dynamically updated action plans. The main properties and advantages of the proposed algorithm are considered, including the lack of a fixed cognitive cycle necessitating the separation of planning and learning subsystems in earlier algorithms and the ability to construct and update the model of interaction with the environment, thus increasing the learning efficiency. A theoretical justification of some provisions of this approach is given, a model example is proposed, and the principle of operation of a SLAP agent when driving an unmanned vehicle is demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Trafton, G.J. et al., ACT-R/E: An embodied cognitive architecture for human-robot interaction, J. Hum.-Rob. Interact., 2013, vol. 2, no. 1, pp. 30–54.
Article Google Scholar
Goertzel, B., From abstract agents models to real-world AGI architectures: bridging the gap, Lecture Notes in Computer Science, Everitt, T., Goertzel, B., and Potapov, A., Eds., Cham: Springer, 2017, vol. 10414. pp. 3–12.
Wu, J. et al., Track to detect and segment: an online multi-object tracker, 2021 IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR), IEEE, 2021, pp. 12347–12356.
Likhachev, M. and Ferguson, D., Planning long dynamically feasible maneuvers for autonomous vehicles, Int. J. Rob. Res., 2009, vol. 28, no. 8, pp. 933–945.
Article Google Scholar
Aitygulov, E., Kiselev, G., and Panov, A.I., Task and spatial planning by the cognitive agent with human-like knowledge representation, Interact. Collab. Rob.. ICR 2018. Lect. Notes Comput. Sci., Ronzhin, A., Rigoll, G., and Meshcheryakov, R., Eds., Springer, 2018, vol. 11097, pp. 1–12.
Satton, R.S. and Barto, A.G., Reinforcement Learning: An Introduction, Cambridge, MA, London: MIT, 1998. Translated under the title: Obuchenie s podkrepleniem, Moscow: BINOM, 2011.
Google Scholar
Moerland, T.M., Broekens, J., and Jonker, C.M., Model-Based Reinforcement Learning: A Survey, 2020, pp. 421–429.
Makarov, D.A., Panov, A.I., and Yakovlev, K.S., Architecture of a multi-level intelligent control system for unmanned aerial vehicles, Iskusstv. Intell. Prinyatie Reshenii, 2015, no. 3, pp. 18–33.
Yakovlev, K. et al., Combining safe interval path planning and constrained path following control: preliminary results, in Interact. Collab. Rob. ICR 2019. Lect. Notes Comput. Sci., 2019, vol. 11659, pp. 310–319.
Staroverov, A. et al., Real-time object navigation with deep neural networks and hierarchical reinforcement learning, IEEE Access, 2020, vol. 8, pp. 195608–195621.
Article Google Scholar
Kiselev, G.A., Intelligent system for planning the behavior of a coalition of robotic agents with STRL architecture, Inf. Tekhnol. Vychisl. Sist., 2020, no. 2, pp. 21–37.
Pack, L., Littman, M.L., and Cassandra, A.R., Planning and acting in partially observable stochastic domains, Artif. Intell., 1998, vol. 101, pp. 99–134.
Article MathSciNet Google Scholar
Bacon, P.-L., Harb, J., and Precup, D., The option-critic architecture, Proc. AAAI Conf. Artif. Intell., 2017, vol. 31.
Keramati, R. et al., Strategic Object Oriented Reinforcement Learning, 2018.
Watters, N. et al., COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration, 2019.
Hafner, D. et al., Dream to control: learning behaviors by latent imagination, Int. Conf. Learn. Representations, 2020.
Jamal, M. and Panov, A., Adaptive maneuver planning for autonomous vehicles using behavior tree on Apollo platform, Artif. Intell. XXXVIII. SGAI 2021. Lect. Notes Comput. Sci., Bramer, M. and Ellis, R., Eds., 2021, vol. 13101, pp. 327–340.

Download references

Funding

This work was supported by the Russian Foundation for Basic Research, project no. 18-29-22027.

Author information

Authors and Affiliations

Federal Research Center “Computer Science and Control,” Russian Academy of Sciences, Moscow, 119333, Russia
A. I. Panov
Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow oblast, 141701, Russia
A. I. Panov

Authors

A. I. Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. I. Panov.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panov, A.I. Simultaneous Learning and Planning in a Hierarchical Control System for a Cognitive Agent. Autom Remote Control 83, 869–883 (2022). https://doi.org/10.1134/S0005117922060054

Download citation

Received: 31 October 2021
Revised: 09 January 2022
Accepted: 26 January 2022
Published: 05 July 2022
Issue Date: June 2022
DOI: https://doi.org/10.1134/S0005117922060054

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions