Abstract
This paper presents a Q-learning-based target selection algorithm for spacecraft autonomous navigation using bearing observations of known visible targets. For the considered navigation system, the position and velocity of the spacecraft are estimated using an extended Kalman filter (EKF) with the measurements of inter-satellite line-of-sight (LOS) vectors obtained via an onboard star camera. This paper focuses on the selection of the appropriate target at each observation period for the star camera adaptively, such that the performance of the EKF is enhanced. To derive an effective algorithm, a Q-function is designed to select a proper observation region, while a U-function is introduced to rank the targets in the selected region. Both the Q-function and the U-function are constructed based on the sequence of innovations obtained from the EKF. The efficiency of the Q-learning-based target selection algorithm is illustrated via numerical simulations, which show that the presented algorithm outperforms the traditional target selection strategy based on a Cramer-Rao bound (CRB) in the case that the prior knowledge about the target location is inaccurate.
Similar content being viewed by others
References
Xin S, Wang Y, Zheng W, et al., An interplanetary network for spacecraft autonomous navigation, The Journal of Navigation, 2018, 71: 1381–1395.
Gu L, Li S, Li W, et al., Comparative study on autonomous navigation for Mars cruise probe based on observability analysis, Journal of Astronomical Telescopes, Instruments, and Systems, 2018, 4: 048001.
Ning X, Gui M, Fang J, et al., A novel autonomous celestial navigation method using solar oscillation time delay measurement, IEEE Transactions on Aerospace and Electronic Systems, 2018, 54: 1392–1403.
Su Q and Huang Y, Observability analysis and navigation algorithm for distributed satellite system using relative range measurements, Journal of Systems Science and Complexity, 2018, 31(5): 1206–1226.
Hesar S G, Parker J S, Leonard J M, et al., Lunar far side surface navigation using linked autonomous interplanetary satellite orbit navigation (LiAISON), Acta Astronautica, 2015, 117: 116–129.
Wang X, Xie J, and Ma S, Starlight atmospheric refraction model for a continuous range of height, Journal of Guidance, Control, and Dynamics, 2010, 33: 634–637.
Ning X, Wang F, and Fang J, An implicit UKF for satellite stellar refraction navigation system, IEEE Transactions on Aerospace and Electronic Systems, 2017, 53: 1489–1503.
Christian J A, Optical navigation using planet’s centroid and apparent diameter in image, Journal of Guidance, Control, and Dynamics, 2015, 38: 192–204.
Karimi R R and Mortari D, Interplanetary autonomous navigation using visible planets, Journal of Guidance, Control, and Dynamics, 2015, 38(6): 1151–1156.
Li J, Gao C, Feng T, et al., Error correction of infrared Earth radiance for autonomous navigation, The Journal of Navigation, 2016, 69: 1427–1437.
Sheikh S I, Pines D J, Wood K S, et al., Spacecraft navigation using X-ray pulsars, Journal of Guidance, Control, and Dynamics, 2006, 29: 49–63.
Emadzadeh A A and Speyer J L, On modeling and pulse phase estimation of X-ray pulsars, IEEE Transactions on Signal Processing, 2010, 58: 4484–4495.
Wang Y, Zheng W, Zhang D, et al., Pulsar profile denosing using kernel regression based on maximum correntropy criterion, Optik, 2017, 130: 757–764.
Zhang H, Jiao R, and Xu L, Orbit determination using pulsar timing data and orientation vector, The Journal of Navigation, 2019, 72: 155–175.
Chen T and Xu S, Double line-of-sight measuring relative navigation for spacecraft autonomous rendezvous, Acta Astronautica, 2010, 67: 122–134.
Psiaki M L, Absolute oribt and gravity determination using relative position measurements between two satellites, Journal of Guidance, Control, and Dynamics, 2011, 34: 1285–1297.
Grzymisch J and Fichter W, Observability criteria and unobservable maneuvers for in-orbit bearing-only navigation, Journal of Guidance, Control, and Dynamics, 2014, 37: 1250–1259.
Kluge S, Reif K, and Brokate M, Stochastic stability of the extended Kalman filter with intermittent observations, IEEE Transactions on Automatic Control, 2010, 55: 514–518.
Zhang X, Wang D, and Huang X, Study on the selection of the beacon asteroids in autonomous optical navigation for interplanetary exploration, Journal of Astronautics, 2009, 30: 947–952.
Huber M F, Optimal pruning for multi-step sensor scheduling, IEEE Transactions on Automatic Control, 2012, 57: 1338–1343.
Nordio A, Tarable A, Dabbene F, et al., Sensor selection and precoding strategies for wireless sensor networks, IEEE Transactions on Signal Processing, 2015, 63: 4411–4421.
Prabhakar S and Leus G, Sparsity-promoting sensor selection for non-linear measurement models, IEEE Transactions on Signal Processing, 2015, 63: 684–698.
Zhang H, Ayoub R, and Sundaram S, Sensor selection for Kalman filtering of linear dynamical systems: Complexity, limitations and greedy algorithms, Automatica, 2017, 78: 202–210.
Wang J, He Z, Zhou H, et al., Optimal weight and parameter estimation of multi-structure and unequal-precision data fusion, Chinese Journal of Electronic, 2017, 26: 1245–1253.
Watkins C J and Dayan P, Q-learning, Machine Learning, 1992, 8: 279–292.
Gosavi A, Reinforcement learning: A tutorial survey and recent advances, Informs Journal on Computing, 2009, 21: 178–192.
Kober J, Bagnell J A, and Peters J, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, 2013, 32: 1238–1274.
Rizvi S A A and Lin Z, Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control, Automatica, 2018, 95: 213–221.
Vamvoudakis K G and Hespanha J P, Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems, IEEE Transactions on Automatic Control, 2018, 63: 1018–1031.
Luo B, Wu H, and Huang T, Optimal output regulation for model-free quanser helicopter with multistep Q-learning, IEEE Transactions on Industrial Electronics, 2018, 65: 4953–4961.
Konar A, Chakraborty I G, Singh S J, et al., A deterministic improved Q-learning for path planning of a mobile robot, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013, 43: 1141–1153.
Galindo-Serrano A and Giupponi L, Distributed Q-learning for aggregated interference control in cognitive radio networks, IEEE Transactions on Vehicular Technology, 2010, 59: 1823–1834.
Wei Q, Lewis F L, Sun Q, et al., Discrete-time deterministic Q-learning: A novel convergence analysis, IEEE Transactions on Cybernetics, 2017, 47: 1224–1237.
Arslan G and Yuksel S, Decentralized Q-learning for stochastic teams and games, IEEE Transactions on Automatic Control, 2017, 62: 1545–1558.
Sadhu A K and Konar A, Improving the speed of convergence of multi-agent Q-learning for cooperative task-planning by a robot-team, Robotics and Autonomous Systems, 2017, 92: 66–80.
Ahn H S, Jung O, Choi S, et al., An optimal satellite antenna profile using reinforcement learning, IEEE Transactions on System, Man, and Cybernetics — Part C: Applications and Reviews, 2011, 41: 393–406.
Kim D, Lee T, Kim S, et al., Adaptive packet scheduling in IoT environment based on Q-learning, Procedia Computer Science, 2018, 141: 247–254.
Han C, Niu Y, Pang T, et al., Intelligent anti-jamming communication based on the modified Q-learning, Procedia Computer Science, 2018, 131: 1023–1031.
Hafner R and Riedmiller M, Reinforcement learning in feedback control, Mach. Learn., 2011, 84: 137–169.
Martinez-Tenor A, Antonio J, Fernandez-Madrigal A, et al., Towards a common implementation of reinforcement learning for multiple robotic tasks, Expert Systems with Applications, 2018, 100: 246–259.
Kiumarsi B, Vamvoudakis K G, Modares H, et al., Optimal and autonomous control using reinforcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems, 2018, 29: 2042–2062.
Liebe C C, Accuracy performance of star trackers — A tutorial, IEEE Transactions on Aerospace and Electronic Systems, 2002, 38: 587–599.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was supported by the National Natural Science Foundation under Grant Nos. 61573059, 61525301, 61690215.
This paper was recommended for publication by Editor SUN Jian.
Rights and permissions
About this article
Cite this article
Xiong, K., Wei, C. Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation. J Syst Sci Complex 34, 1401–1425 (2021). https://doi.org/10.1007/s11424-020-9265-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-020-9265-y