Spatial Consciousness Model of Intrinsic Reward in Partially Observable Environments

Ni, Zhenghongyuan; Jin, Ye; Liu, Peng; Zhao, Wei

doi:10.1007/s10846-022-01771-5

Spatial Consciousness Model of Intrinsic Reward in Partially Observable Environments

Regular paper
Published: 06 December 2022

Volume 106, article number 71, (2022)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Zhenghongyuan Ni¹,
Ye Jin ORCID: orcid.org/0000-0002-8491-1391¹,
Peng Liu¹ &
…
Wei Zhao¹

152 Accesses
Explore all metrics

Abstract

In reinforcement learning navigation, agent exploration based on intrinsic rewards has uncertainties, including observation, action, and neural network prediction uncertainties, owing to partially observable environments. This causes stagnant exploration, randomizes strategies, and leads to unreasonable intrinsic rewards. This study proposes a spatial consciousness model of intrinsic rewards (SC-modelX) for agent navigation in partially observable environments. In this method, the spatial consciousness state generator not only remembers the historical observation information of the agent but also regresses the current spatial position. In similar scenes that are difficult to identify, the two characteristics suppress the uncertainty of observation and action, respectively. The suppress-uncentainty model was proposed to enhance the input–output relationship of intrinsic rewards to suppress the influence of neural network prediction uncertainties. The experimental results demonstrate that the spatial consciousness model of intrinsic rewards can effectively evaluate the novelty of observations and improve the efficiency of exploration, strategy performance, and convergence speed. The three uncertainties proposed in this paper are complementary to agent exploration in partially observable environment and SC-modelX can effectively improve the agent exploration ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A target-driven visual navigation method based on intrinsic motivation exploration and space topological cognition

Article Open access 02 March 2022

Alleviating Local Optima and Enhancing Path Planning: A Deep Reinforcement Learning Approach for Autonomous Exploration

A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Not applicable.

References

Tang J, Li L, Ai Y, et al. Improvement of End-to-End Automatic Driving Algorithm Based on Reinforcement Learning//2019 Chinese Automation Congress (CAC). IEEE, (2020)
Miyazaki, K., Km, A.: Application of Deep Reinforcement Learning to Decision-Making System Based on Consciousness. Procedia Computer Science 190, 631–636 (2021)
Article Google Scholar
Qiu H. Multi-Agent Navigation Based on Deep Reinforcement Learning and Traditional Pathfinding Algorithm, (2020)
Lee J D, Lee J Y, Chen C H, et al. A New Approach to Robot Guidance in Unfamiliar Environment Using an Indication Post. IEEE, (1989)
Montesanto, A., Tascini, G., Puliti, P., et al.: Navigation with Memory in a Partially Observable Environment. Robot. Auton. Syst. 54(1), 84–94 (2006)
Article MATH Google Scholar
Adomi M, Shikauchi Y, Ishii S. Hidden Markov Model for Human Decision Process in a Partially Observable Environment//International Conference on Artificial Neural Networks. Springer-Verlag, (2010)
Cai, K., Wang, C., Song, S., et al.: Risk-Aware Path Planning Under Uncertainty in Dynamic Environments. J Intell Robot Syst 101, 47 (2021). https://doi.org/10.1007/s10846-021-01323-3
Article Google Scholar
Thrun S B. The Role of Exploration in Learning Control. Handbook of Intelligent Control, (1992)
Xu Z X, Chen X L, Cao L, et al. A Study of Count-Based Exploration and Bonus for Reinforcement Learning//2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, (2017)
Wilcox A, Balakrishna A, Thananjeyan B, et al. LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks, (2021)
Schmidhuber, J.: Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2(3), 230–247 (2010)
Article Google Scholar
Still, S., Precup, D.: An Information-Theoretic Approach to Curiosity-Driven Reinforcement Learning. Theory Biosci. 131(3), 139–148 (2012)
Article Google Scholar
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing Properties of Neural Networks. Computer Science, (2013)
Pfeiffer, C., Serino, A., Blanke, O.: The Vestibular System: A Spatial Reference for Bodily Self-Consciousness. Front. Integr. Neurosci. 8, 31 (2014)
Article Google Scholar
Farrell, J.: The Global Positioning System & Inertial Navigation. Proceedings of the Ion GPS International Technical Meeting of the Satellite Division of the Institute of Navigation Pts and 2283(6), 955–964 (1999)
Google Scholar
Kawaguchi, J., Hashimoto, T., Kubota, T., et al.: Autonomous Optical Guidance and Navigation Strategy Around a Small Body. J. Guidance, Control, Dyn. 20(5), 1010–1017 (2015)
Article MATH Google Scholar
Titterton D H, Weston J L. Strapdown Inertial Navigation Technology. IEEE Aerospace and Electronic Systems Magazine, (2004)
Zhang F, Li S, Yuan S, Sun E, Zhao L. Algorithms Analysis of Mobile Robot SLAM Based on Kalman and Particle Filter 9th International Conference on Modelling, Identification and Control (ICMIC), Kunming, 2017, p. 10501055, (2017). https://doi.org/10.1109/ICMIC.2017.8321612
Huang G P, Mourikis A I, Roumeliotis S I. Analysis and Improvement of the Consistency of Extended Kalman Filter Based SLAM IEEE International Conference on Robotics and Automation, Pasadena, CA, 2008, pp. 473–479, (2008). https://doi.org/10.1109/ROBOT.2008.4543252
Wang S, Clark R, Wen H, Trigoni N. DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks IEEE International Conference on Robotics and Automation (ICRA 2017), (2017)
Chen C, Rosa S, Miao Y, Lu C X, Wu W, Markham A, Trigoni N. Selective Sensor Fusion for Neural VisualInertial Od- Ometry. CVPR, (2019)
Milford, M.J., Wyeth, G.F.: Mapping a Suburb with a Single Camera Using a Biologically Inspired SLAM System. In IEEE Transactions on Robotics 24(5), 1038–1053 (2008). https://doi.org/10.1109/TRO.2008.2004520,October
Article Google Scholar
Jeff. Hwu, Tiffany & Isbell, Jacob & Oros. Nicolas & Krichmar, pp. 635–641, (2017). https://doi.org/10.1109/IJCNN.2017.7965912. A self driving robot using deep convolutional neural networks on neuromorphic hardware
Edelman, G.M., Hopkins, J.J.: Learning in and from Brain-Based Devices. Science 318(5853), 1103–1105 (2007). https://doi.org/10.1126/science.1148677,November16
Article Google Scholar
Rosenbaum D, Besse F, Viola F, Rezende D J, Ali Eslami S M. Learning Models for Visual 3D Localization with Implicit Mapping. Comput. Vis. Pattern Recognit., (2018)
Banino, A., Barry, C., Uria, B., et al.: Vector-Based Navigation Using Grid-Like Representations in Artificial Agents. Nature 557(7705), 429–433 (2018)
Article Google Scholar
Jimenez-Romero C et al. “A Model for Foraging Ants, Controlled by Spiking Neural Networks and Double Pheromones.” Arxiv Admin/1507.08467, (2015)
Moser, M.-B., Moser, E.: Where Am I? WHERE AM I GOING ? Sci. Am. 314(1), 26–33 (2016)
Article Google Scholar
Bush, D., Barry, C., Manson, D., Burgess, N.: Using Grid Cells for Navigation. Neuron 87(3), 507–520 (2015)
Article Google Scholar
Sanders, H., Rennó-Costa, C., Idiart, M., Lisman, J.: Grid Cells and Place Cells: An Integrated View of Their Navigational and Memory Function. Trends. Neurosci. 38(12, December), 763–775 (2015)
Article Google Scholar
Sutton R, Barto A. Reinforcement Learning:an Introduction. MIT Press, (1998)
Goharimanesh, M., Mehrkish, A., Janabi-Sharifi, F.: A Fuzzy Reinforcement Learning Approach for Continuum Robot Control. J Intell Robot Syst 100, 809–826 (2020). https://doi.org/10.1007/s10846-020-01237-6
Article Google Scholar
Luo J, Oubong G. A Comparison of SIFT, PCA-SIFT and SURF. International Journal of Image Processing, (2009)
Lu, X., Ji, W., Li, X., et al.: Bidirectional Adaptive Feature Fusion for Remote Sensing Scene Classification. Neurocomputing. FEB 328, 135–146 (2019)
Article Google Scholar
Wang X, Han T X, Yan S. An HOG-LBP Human Detector with Partial Occlusion Handling//IEEE International Conference on Computer Vision. IEEE, (2009)
Neves G, Cerqueira R, Albiez J, et al. Rotation-Invariant Shipwreck Recognition with Forward-Looking Sonar, (2019)
Yang L, Dong P Z, Sun B. Hierarchical Category Classification Scheme Using Multiple Sets of Fully-Connected Networks with a CNN Based Integrated Circuit as Feature Extractor, (2018)
Lau M M, Phang J, Lim K H. Convolutional Deep Feedforward Network for Image. Classification//2019 7th International Conference on Smart Computing & Communications (ICSCC), (2019)
Wu, J.L., He, Y., Yu, L.C., et al.: Identifying Emotion Labels from Psychiatric Social Texts Using a Bi-Directional LSTM-CNN Model. IEEE Access 8:1–1, 99 (2020)
Google Scholar
Lorenzo J, Parra I, Wirth F, et al. RNN-Based Pedestrian Crossing Prediction Using Activity and Pose-Related Features. (2020)
Zhang Y, Feng C, Li H. Quality Estimation with Transformer and RNN Architectures. (2019)
Yang G, Chen X, Liu K, et al. DeepPseudo: Deep Pseudo-Code Generation via Transformer and Code Feature Extraction, (2021)
Neuland, R., Rodrigues, F., Pittol, D., et al.: Interval Inspired Approach Based on Temporal Sequence Constraints to Place Recognition. J Intell Robot Syst 102, 4 (2021). https://doi.org/10.1007/s10846-021-01375-5
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with Deep Reinforcement Learning. Comput. Sci., (2013)
Hausknecht M, Stone P. Deep Recurrent Q-Learning for Partially Observable MDPs. Comput. Sci., (2015)
Kapturowski S, Ostrovski G, Quan J, et al. Recurrent Experience Replay in Distributed Reinforcement Learning. ICLR, (2019)
Fickinger A, Jaques N, Parajuli S, Chang M, Rhinehart N, Berseth G, Russell S, Levine S. Explore and Control with Adversarial Surprise. arXiv Preprint ArXiv:2107.07394, (2021)
Strehl, A.L., Littman, M.L.: An Analysis of Model-Based Interval Estimation for Markov Decision Processes. J. Comput. Syst. Sci. 74(8), 1309–1331 (2008)
Article MathSciNet MATH Google Scholar
Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying Count-Based Exploration and Intrinsic Motivation. In Advances in Neural Information Processing Systems:1471–1479, (2016)
Ostrovski G, Bellemare M G, Oord A V D, et al. Count-Based Exploration with Neural Density Models, (2017)
Uri Burda Y, Edwards H, Storkey A J, Klimov O. Exploration by Random Network Distillation. In 7th International Conference on Learning Representations. ICLR, NewOrleans, LA, USA, May 6–9, 2019, (2019)
Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Alexei A. Efros. Large-Scale Study of Curiosity-Driven Learning, (2018)
Pathak D, Agrawal P, Efros A A, Darrell T. Curiosity-Driven Exploration by Self-Supervised Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17, (2017)
Zhelo O, Zhang J, Tai L, et al. Curiosity-Driven Exploration for Maples Navigation with Deep Reinforcement Learning//ICRA Workshop in Machine Learning in the Planning and Control of Robot Motion. (2018)
Sequeira, P., Melo, F.S., Paiva, A.: Learning by appraising: an emotion-based approach to intrinsic reward design[J]. Adapt. Behav. 22(5), 330–349 (2014)
Article Google Scholar
Raileanu R , Rocktschel T . RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments[C]// 08th International Conference on Learning Representations 2020. (2020)
Campero A , Raileanu R , H Küttler, et al. Learning with AMIGo: Adversarially Motivated Intrinsic Goals[C]// (2020)
Hochreiter S . Recurrent Neural Net Learning and Vanishing Gradient. (1998)
Glorot, X., Bordes, A. & Bengio, Y . Deep Sparse Rectifier Neural Networks. Fourteenth International Conference on Artificial Intelligence and Statistics. (2011)
Thomas P S, Brunskill E. Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines, (2017)
Zhang Y, Clavera I, Tsai B, et al. Asynchronous Methods for Model-Based Reinforcement Learning, (2019)
Chevalier-Boisvert M, Bahdanau D, Lahlou S, et al. BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning, (2018)
Flet-Berliac Y, Ferret J, Pietquin O, et al. Adversarially Guided Actor-Critic, (2021)
Jiang Z, Minervini P, Jiang M, et al. Grid-to-Graph: Flexible Spatial Relational Inductive Biases for Reinforcement Learning, (2021)
Gan C, Schwartz J, Alter S, et al. ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation, (2020)
Gan C, Zhou S, Schwartz J, et al. The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI[J], (2021)

Download references

Funding

This work was supported by National Natural Science Foundation of China (Key Program) (Grant number 51935005). Author Peng Liu has received research support from it.

This work was supported by Basic Scientific Research Projects of China (Grant number JCKY20200603C010). Author Peng Liu has received research support from it.

This work was supported by Science and Technology Program Projects of Heilongjiang Province, China (Grant number GA21C031). Author Ye Jin has received research support from it.

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, China
Zhenghongyuan Ni, Ye Jin, Peng Liu & Wei Zhao

Authors

Zhenghongyuan Ni
View author publications
You can also search for this author inPubMed Google Scholar
Ye Jin
View author publications
You can also search for this author inPubMed Google Scholar
Peng Liu
View author publications
You can also search for this author inPubMed Google Scholar
Wei Zhao
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Zhenghongyuan Ni, Ye Jin, Peng Liu and Wei Zhao. The first draft of the manuscript was written by Zhenghongyuan Ni and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ye Jin.

Ethics declarations

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ni, Z., Jin, Y., Liu, P. et al. Spatial Consciousness Model of Intrinsic Reward in Partially Observable Environments. J Intell Robot Syst 106, 71 (2022). https://doi.org/10.1007/s10846-022-01771-5

Download citation

Received: 20 February 2022
Accepted: 14 November 2022
Published: 06 December 2022
DOI: https://doi.org/10.1007/s10846-022-01771-5

Keywords

Classification codes

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial Consciousness Model of Intrinsic Reward in Partially Observable Environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A target-driven visual navigation method based on intrinsic motivation exploration and space topological cognition

Alleviating Local Optima and Enhancing Path Planning: A Deep Reinforcement Learning Approach for Autonomous Exploration

A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethics Approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Classification codes

Subscribe and save

Buy Now