Abstract
This paper presents a learning-based control policy design for point-to-point vehicle positioning in the urban environment via BeiDou navigation. While navigating in urban canyons, the multipath effect is a kind of interference that causes the navigation signal to drift and thus imposes severe impacts on vehicle localization due to the reflection and diffraction of the BeiDou signal. Here, the authors formulated the navigation control system with unknown vehicle dynamics into an optimal control-seeking problem through a linear discrete-time system, and the point-to-point localization control is modeled and handled by leveraging off-policy reinforcement learning for feedback control. The proposed learning-based design guarantees optimality with prescribed performance and also stabilizes the closed-loop navigation system, without the full knowledge of the vehicle dynamics. It is seen that the proposed method can withstand the impact of the multipath effect while satisfying the prescribed convergence rate. A case study demonstrates that the proposed algorithms effectively drive the vehicle to a desired setpoint under the multipath effect introduced by actual experiments of BeiDou navigation in the urban environment.
Similar content being viewed by others
References
Hsu L T and Wen W, New integrated navigation scheme for the level 4 autonomous vehicles in dense urban areas, Proceedings of the 2020 IEEE/ION Position, Location and Navigation Symposium (PLANS), Portland, 2020, 297–305.
Suzuki T, Matsuo K, and Amano Y, Rotating gnss antennas: Simultaneous LOS and NLOS multipath mitigation, GPS Solutions, 2020, 24: 1–13.
Hsu L T, Analysis and modeling GPS NLOS effect in highly urbanized area, GPS Solutions, 2018, 22(1): 1–12.
Wen W, Bai X, and Hsu L T, 3D vision aided GNSS real-time kinematic positioning for autonomous systems in urban canyons, NAVIGATION: Journal of the Institute of Navigation, 2023, 70(3): navi.590.
Sun R, Zhang Z, Cheng Q, et al., Pseudorange error prediction for adaptive tightly coupled gnss/imu navigation in urban areas, GPS Solutions, 2022, 26: 1–13.
Zhang G, Wen W, Xu B, et al., Extending shadow matching to tightly-coupled GNSS/INS integration system, IEEE Transactions on Vehicular Technology, 2020, 69(5): 4979–4991.
Sharaf R, Noureldin A, Osman A, et al., Online INS/GPS integration with a radial basis function neural network, IEEE Aerospace and Electronic Systems Magazine, 2005, 20(3): 8–14.
Liu Z, Liu J, Xu X, et al., DeepGPS: Deep learning enhanced GPS positioning in urban canyons, IEEE Transactions on Mobile Computing, 2022, DOI: https://doi.org/10.1109/TMC.2022.3208240.
Kanhere A V, Gupta S, Shetty A, et al., Improving GNSS positioning using neural-network-based corrections, NAVIGATION: Journal of the Institute of Navigation, 2022, 69(4): navi.548.
Zhang E and Masoud N, Increasing GPS localization accuracy with reinforcement learning, IEEE Transactions on Intelligent Transportation Systems, 2020, 22(5): 2615–2626.
Cao X R, Stochastic learning and optimization-a sensitivity-based approach, IFAC Proceedings Volumes, 2008, 41(2): 3480–3492.
Sutton R S and Barto A G, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 2018.
Lewis F L, Vrabie D, and Syrmos V L, Optimal Control, John Wiley & Sons, New York, 2012.
Zhang H, Liu D, Luo Y, et al., Adaptive Dynamic Programming For Control: Algorithms and Stability, Springer Science & Business Media, Berlin, 2012.
Lewis F L, Vrabie D, and Vamvoudakis K G, Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers, IEEE Control Systems Magazine, 2012, 32(6): 76–105.
Jiang Y and Jiang Z P, Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 2012, 48(10): 2699–2704.
Modares H, Lewis F L, and Jiang Z P, Optimal output-feedback control of unknown continuous-time linear systems using off-policy reinforcement learning, IEEE Transactions on Cybernetics, 2016, 46(11): 2401–2410.
Chen C, Modares H, Xie K, et al., Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics, IEEE Transactions on Automatic Control, 2019, 64(11): 4423–4438.
Chen C, Lewis F L, Xie K, et al., Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems, Automatica, 2020, 119: 109081.
Jiang Z P, Bian T, Gao W, et al., Learning-based control: A tutorial and some recent results, Foundations and Trends in Systems and Control, 2020, 8(3): 176–284.
Chen C, Xie L, Xie K, et al., Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning, Automatica, 2022, 146: 110581.
Gao W, Deng C, Jiang Y, et al., Resilient reinforcement learning and robust output regulation under denial-of-service attacks, Automatica, 2022, 142: 110366.
Qasem O, Gao W, and Vamvoudakis K G, Adaptive optimal control of continuous-time nonlinear affine systems via hybrid iteration, Automatica, 2023, 157: 111261.
Jiang Y and Jiang Z P, Robust Adaptive Dynamic Programming, John Wiley & Sons, New York, 2017.
Kamalapurkar R, Walters P, Rosenfeld J, et al., Reinforcement Learning for Optimal Feedback Control, Springer, Berlin, 2018.
Chen C, Xie L, Jiang Y, et al., Robust output regulation and reinforcement learning-based output tracking design for unknown linear discrete-time systems, IEEE Transactions on Automatic Control, 2022, 68(4): 2391–2398.
Kiumarsi B and Lewis F L, Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems, IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140–151.
Kiumarsi B, Lewis F L, Modares H, et al., Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 2014, 50(4): 1167–1175.
Lu X, Kiumarsi B, Chai T, et al., Operational control of mineral grinding processes using adaptive dynamic programming and reference governor, IEEE Transactions on Industrial Informatics, 2018, 15(4): 2210–2221.
Kiumarsi B, Lewis F L, and Jiang Z P, H∞ control of linear discrete-time systems: Off-policy reinforcement learning, Automatica, 2017, 78: 144–152.
Lewis F L and Vamvoudakis K G, Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 41(1): 14–25.
Kiumarsi B, Lewis F L, Naghibi-Sistani M B, et al., Optimal tracking control of unknown discrete-time linear systems using input-output measured data, IEEE Transactions on Cybernetics, 2015, 45(12): 2770–2779.
Gao W and Jiang Z P, Adaptive dynamic programming and adaptive optimal output regulation of linear systems, IEEE Transactions on Automatic Control, 2016, 61(12): 4164–4169.
Yi J, Fan J L, and Chai T Y, Data-driven optimal output regulation with assured convergence rate, Acta Automatica Sinica, 2021, 47: 1–12.
Chen C and Xie L, A data-driven prescribed convergence rate design for robust tracking of discrete-time systems, Journal of Guangdong University of Technology, 2021, 38: 29–34.
Zhang C, Chen C, and Xie S, Learning-based prescribed rate design for output regulation of discrete-time systems, Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, 2023, 2738–2744.
Hsu L T, Jan S S, Groves P D, et al., Multipath mitigation and nlos detection using vector tracking in urban environments, GPS Solutions, 2015, 19: 249–262.
Groves P D and Jiang Z, Height aiding, C/N0 weighting and consistency checking for gnss nlos and multipath mitigation in urban areas, The Journal of Navigation, 2013, 66(5): 653–669.
Chen X, Morton Y J, Yu W, et al., GPS L1CA/BDS B1I multipath channel measurements and modeling for dynamic land vehicle in shanghai dense urban area, IEEE Transactions on Vehicular Technology, 2020, 69(12): 14247–14263.
Cai C, He C, Santerre R, et al., A comparative analysis of measurement noise and multipath for four constellations: GPS, BeiDou, GLONASS and Galileo, Survey Review, 2016, 48(349): 287–295.
Hewer G, An iterative technique for the computation of the steady state gains for the discrete optimal regulator, IEEE Transactions on Automatic Control, 1971, 16(4): 382–384.
Lancaster P and Rodman L, Algebraic Riccati Equations, Clarendon Press, Oxford, 1995.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare no conflict of interest.
Additional information
This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 62320106008 and 62373114, and in part by the Collaborative Innovation Center for Transportation Science and Technology of Guangzhou under Grant No. 202206010056.
Rights and permissions
About this article
Cite this article
Qin, Y., Zhang, C., Chen, C. et al. Control Policy Learning Design for Vehicle Urban Positioning via BeiDou Navigation. J Syst Sci Complex 37, 114–135 (2024). https://doi.org/10.1007/s11424-024-3357-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-024-3357-z