Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation

Xiong, Kai; Wei, Chunling

doi:10.1007/s11424-020-9265-y

Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation

Published: 12 January 2021

Volume 34, pages 1401–1425, (2021)
Cite this article

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Kai Xiong¹ &
Chunling Wei¹

163 Accesses
Explore all metrics

Abstract

This paper presents a Q-learning-based target selection algorithm for spacecraft autonomous navigation using bearing observations of known visible targets. For the considered navigation system, the position and velocity of the spacecraft are estimated using an extended Kalman filter (EKF) with the measurements of inter-satellite line-of-sight (LOS) vectors obtained via an onboard star camera. This paper focuses on the selection of the appropriate target at each observation period for the star camera adaptively, such that the performance of the EKF is enhanced. To derive an effective algorithm, a Q-function is designed to select a proper observation region, while a U-function is introduced to rank the targets in the selected region. Both the Q-function and the U-function are constructed based on the sequence of innovations obtained from the EKF. The efficiency of the Q-learning-based target selection algorithm is illustrated via numerical simulations, which show that the presented algorithm outperforms the traditional target selection strategy based on a Cramer-Rao bound (CRB) in the case that the prior knowledge about the target location is inaccurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Precision Autonomous Navigation Method for Deep Space Probe Cruise Phase Under Uncertain Conditions with Q-Learning Filter

A 6-DOF Navigation Method based on Iterative Closest Imaging Point Algorithm

Article Open access 12 December 2017

Selection and Analysis of Observation Modes of Upper-Stage Autonomous Orbit Determination

References

Xin S, Wang Y, Zheng W, et al., An interplanetary network for spacecraft autonomous navigation, The Journal of Navigation, 2018, 71: 1381–1395.
Article Google Scholar
Gu L, Li S, Li W, et al., Comparative study on autonomous navigation for Mars cruise probe based on observability analysis, Journal of Astronomical Telescopes, Instruments, and Systems, 2018, 4: 048001.
Google Scholar
Ning X, Gui M, Fang J, et al., A novel autonomous celestial navigation method using solar oscillation time delay measurement, IEEE Transactions on Aerospace and Electronic Systems, 2018, 54: 1392–1403.
Article Google Scholar
Su Q and Huang Y, Observability analysis and navigation algorithm for distributed satellite system using relative range measurements, Journal of Systems Science and Complexity, 2018, 31(5): 1206–1226.
Article MathSciNet MATH Google Scholar
Hesar S G, Parker J S, Leonard J M, et al., Lunar far side surface navigation using linked autonomous interplanetary satellite orbit navigation (LiAISON), Acta Astronautica, 2015, 117: 116–129.
Article Google Scholar
Wang X, Xie J, and Ma S, Starlight atmospheric refraction model for a continuous range of height, Journal of Guidance, Control, and Dynamics, 2010, 33: 634–637.
Article Google Scholar
Ning X, Wang F, and Fang J, An implicit UKF for satellite stellar refraction navigation system, IEEE Transactions on Aerospace and Electronic Systems, 2017, 53: 1489–1503.
Article Google Scholar
Christian J A, Optical navigation using planet’s centroid and apparent diameter in image, Journal of Guidance, Control, and Dynamics, 2015, 38: 192–204.
Article Google Scholar
Karimi R R and Mortari D, Interplanetary autonomous navigation using visible planets, Journal of Guidance, Control, and Dynamics, 2015, 38(6): 1151–1156.
Article Google Scholar
Li J, Gao C, Feng T, et al., Error correction of infrared Earth radiance for autonomous navigation, The Journal of Navigation, 2016, 69: 1427–1437.
Article Google Scholar
Sheikh S I, Pines D J, Wood K S, et al., Spacecraft navigation using X-ray pulsars, Journal of Guidance, Control, and Dynamics, 2006, 29: 49–63.
Article Google Scholar
Emadzadeh A A and Speyer J L, On modeling and pulse phase estimation of X-ray pulsars, IEEE Transactions on Signal Processing, 2010, 58: 4484–4495.
Article MathSciNet MATH Google Scholar
Wang Y, Zheng W, Zhang D, et al., Pulsar profile denosing using kernel regression based on maximum correntropy criterion, Optik, 2017, 130: 757–764.
Article Google Scholar
Zhang H, Jiao R, and Xu L, Orbit determination using pulsar timing data and orientation vector, The Journal of Navigation, 2019, 72: 155–175.
Article Google Scholar
Chen T and Xu S, Double line-of-sight measuring relative navigation for spacecraft autonomous rendezvous, Acta Astronautica, 2010, 67: 122–134.
Article Google Scholar
Psiaki M L, Absolute oribt and gravity determination using relative position measurements between two satellites, Journal of Guidance, Control, and Dynamics, 2011, 34: 1285–1297.
Article Google Scholar
Grzymisch J and Fichter W, Observability criteria and unobservable maneuvers for in-orbit bearing-only navigation, Journal of Guidance, Control, and Dynamics, 2014, 37: 1250–1259.
Article Google Scholar
Kluge S, Reif K, and Brokate M, Stochastic stability of the extended Kalman filter with intermittent observations, IEEE Transactions on Automatic Control, 2010, 55: 514–518.
Article MathSciNet MATH Google Scholar
Zhang X, Wang D, and Huang X, Study on the selection of the beacon asteroids in autonomous optical navigation for interplanetary exploration, Journal of Astronautics, 2009, 30: 947–952.
Google Scholar
Huber M F, Optimal pruning for multi-step sensor scheduling, IEEE Transactions on Automatic Control, 2012, 57: 1338–1343.
Article MathSciNet MATH Google Scholar
Nordio A, Tarable A, Dabbene F, et al., Sensor selection and precoding strategies for wireless sensor networks, IEEE Transactions on Signal Processing, 2015, 63: 4411–4421.
Article MathSciNet MATH Google Scholar
Prabhakar S and Leus G, Sparsity-promoting sensor selection for non-linear measurement models, IEEE Transactions on Signal Processing, 2015, 63: 684–698.
Article MathSciNet MATH Google Scholar
Zhang H, Ayoub R, and Sundaram S, Sensor selection for Kalman filtering of linear dynamical systems: Complexity, limitations and greedy algorithms, Automatica, 2017, 78: 202–210.
Article MathSciNet MATH Google Scholar
Wang J, He Z, Zhou H, et al., Optimal weight and parameter estimation of multi-structure and unequal-precision data fusion, Chinese Journal of Electronic, 2017, 26: 1245–1253.
Article Google Scholar
Watkins C J and Dayan P, Q-learning, Machine Learning, 1992, 8: 279–292.
MATH Google Scholar
Gosavi A, Reinforcement learning: A tutorial survey and recent advances, Informs Journal on Computing, 2009, 21: 178–192.
Article MathSciNet MATH Google Scholar
Kober J, Bagnell J A, and Peters J, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, 2013, 32: 1238–1274.
Article Google Scholar
Rizvi S A A and Lin Z, Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control, Automatica, 2018, 95: 213–221.
Article MathSciNet MATH Google Scholar
Vamvoudakis K G and Hespanha J P, Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems, IEEE Transactions on Automatic Control, 2018, 63: 1018–1031.
Article MathSciNet MATH Google Scholar
Luo B, Wu H, and Huang T, Optimal output regulation for model-free quanser helicopter with multistep Q-learning, IEEE Transactions on Industrial Electronics, 2018, 65: 4953–4961.
Article Google Scholar
Konar A, Chakraborty I G, Singh S J, et al., A deterministic improved Q-learning for path planning of a mobile robot, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013, 43: 1141–1153.
Article Google Scholar
Galindo-Serrano A and Giupponi L, Distributed Q-learning for aggregated interference control in cognitive radio networks, IEEE Transactions on Vehicular Technology, 2010, 59: 1823–1834.
Article Google Scholar
Wei Q, Lewis F L, Sun Q, et al., Discrete-time deterministic Q-learning: A novel convergence analysis, IEEE Transactions on Cybernetics, 2017, 47: 1224–1237.
Article Google Scholar
Arslan G and Yuksel S, Decentralized Q-learning for stochastic teams and games, IEEE Transactions on Automatic Control, 2017, 62: 1545–1558.
Article MathSciNet MATH Google Scholar
Sadhu A K and Konar A, Improving the speed of convergence of multi-agent Q-learning for cooperative task-planning by a robot-team, Robotics and Autonomous Systems, 2017, 92: 66–80.
Article Google Scholar
Ahn H S, Jung O, Choi S, et al., An optimal satellite antenna profile using reinforcement learning, IEEE Transactions on System, Man, and Cybernetics — Part C: Applications and Reviews, 2011, 41: 393–406.
Article Google Scholar
Kim D, Lee T, Kim S, et al., Adaptive packet scheduling in IoT environment based on Q-learning, Procedia Computer Science, 2018, 141: 247–254.
Article Google Scholar
Han C, Niu Y, Pang T, et al., Intelligent anti-jamming communication based on the modified Q-learning, Procedia Computer Science, 2018, 131: 1023–1031.
Article Google Scholar
Hafner R and Riedmiller M, Reinforcement learning in feedback control, Mach. Learn., 2011, 84: 137–169.
Article MathSciNet Google Scholar
Martinez-Tenor A, Antonio J, Fernandez-Madrigal A, et al., Towards a common implementation of reinforcement learning for multiple robotic tasks, Expert Systems with Applications, 2018, 100: 246–259.
Article Google Scholar
Kiumarsi B, Vamvoudakis K G, Modares H, et al., Optimal and autonomous control using reinforcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems, 2018, 29: 2042–2062.
Article MathSciNet Google Scholar
Liebe C C, Accuracy performance of star trackers — A tutorial, IEEE Transactions on Aerospace and Electronic Systems, 2002, 38: 587–599.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute of Control Engineering, Science and Technology on Space Intelligent Control Laboratory, Beijing, 100094, China
Kai Xiong & Chunling Wei

Authors

Kai Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Chunling Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Xiong.

Additional information

This paper was supported by the National Natural Science Foundation under Grant Nos. 61573059, 61525301, 61690215.

This paper was recommended for publication by Editor SUN Jian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, K., Wei, C. Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation. J Syst Sci Complex 34, 1401–1425 (2021). https://doi.org/10.1007/s11424-020-9265-y

Download citation

Received: 23 September 2019
Revised: 02 June 2020
Published: 12 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11424-020-9265-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-Precision Autonomous Navigation Method for Deep Space Probe Cruise Phase Under Uncertain Conditions with Q-Learning Filter

A 6-DOF Navigation Method based on Iterative Closest Imaging Point Algorithm

Selection and Analysis of Observation Modes of Upper-Stage Autonomous Orbit Determination

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation