Skip to main content
Log in

Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

This paper presents a Q-learning-based target selection algorithm for spacecraft autonomous navigation using bearing observations of known visible targets. For the considered navigation system, the position and velocity of the spacecraft are estimated using an extended Kalman filter (EKF) with the measurements of inter-satellite line-of-sight (LOS) vectors obtained via an onboard star camera. This paper focuses on the selection of the appropriate target at each observation period for the star camera adaptively, such that the performance of the EKF is enhanced. To derive an effective algorithm, a Q-function is designed to select a proper observation region, while a U-function is introduced to rank the targets in the selected region. Both the Q-function and the U-function are constructed based on the sequence of innovations obtained from the EKF. The efficiency of the Q-learning-based target selection algorithm is illustrated via numerical simulations, which show that the presented algorithm outperforms the traditional target selection strategy based on a Cramer-Rao bound (CRB) in the case that the prior knowledge about the target location is inaccurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Xin S, Wang Y, Zheng W, et al., An interplanetary network for spacecraft autonomous navigation, The Journal of Navigation, 2018, 71: 1381–1395.

    Article  Google Scholar 

  2. Gu L, Li S, Li W, et al., Comparative study on autonomous navigation for Mars cruise probe based on observability analysis, Journal of Astronomical Telescopes, Instruments, and Systems, 2018, 4: 048001.

    Google Scholar 

  3. Ning X, Gui M, Fang J, et al., A novel autonomous celestial navigation method using solar oscillation time delay measurement, IEEE Transactions on Aerospace and Electronic Systems, 2018, 54: 1392–1403.

    Article  Google Scholar 

  4. Su Q and Huang Y, Observability analysis and navigation algorithm for distributed satellite system using relative range measurements, Journal of Systems Science and Complexity, 2018, 31(5): 1206–1226.

    Article  MathSciNet  MATH  Google Scholar 

  5. Hesar S G, Parker J S, Leonard J M, et al., Lunar far side surface navigation using linked autonomous interplanetary satellite orbit navigation (LiAISON), Acta Astronautica, 2015, 117: 116–129.

    Article  Google Scholar 

  6. Wang X, Xie J, and Ma S, Starlight atmospheric refraction model for a continuous range of height, Journal of Guidance, Control, and Dynamics, 2010, 33: 634–637.

    Article  Google Scholar 

  7. Ning X, Wang F, and Fang J, An implicit UKF for satellite stellar refraction navigation system, IEEE Transactions on Aerospace and Electronic Systems, 2017, 53: 1489–1503.

    Article  Google Scholar 

  8. Christian J A, Optical navigation using planet’s centroid and apparent diameter in image, Journal of Guidance, Control, and Dynamics, 2015, 38: 192–204.

    Article  Google Scholar 

  9. Karimi R R and Mortari D, Interplanetary autonomous navigation using visible planets, Journal of Guidance, Control, and Dynamics, 2015, 38(6): 1151–1156.

    Article  Google Scholar 

  10. Li J, Gao C, Feng T, et al., Error correction of infrared Earth radiance for autonomous navigation, The Journal of Navigation, 2016, 69: 1427–1437.

    Article  Google Scholar 

  11. Sheikh S I, Pines D J, Wood K S, et al., Spacecraft navigation using X-ray pulsars, Journal of Guidance, Control, and Dynamics, 2006, 29: 49–63.

    Article  Google Scholar 

  12. Emadzadeh A A and Speyer J L, On modeling and pulse phase estimation of X-ray pulsars, IEEE Transactions on Signal Processing, 2010, 58: 4484–4495.

    Article  MathSciNet  MATH  Google Scholar 

  13. Wang Y, Zheng W, Zhang D, et al., Pulsar profile denosing using kernel regression based on maximum correntropy criterion, Optik, 2017, 130: 757–764.

    Article  Google Scholar 

  14. Zhang H, Jiao R, and Xu L, Orbit determination using pulsar timing data and orientation vector, The Journal of Navigation, 2019, 72: 155–175.

    Article  Google Scholar 

  15. Chen T and Xu S, Double line-of-sight measuring relative navigation for spacecraft autonomous rendezvous, Acta Astronautica, 2010, 67: 122–134.

    Article  Google Scholar 

  16. Psiaki M L, Absolute oribt and gravity determination using relative position measurements between two satellites, Journal of Guidance, Control, and Dynamics, 2011, 34: 1285–1297.

    Article  Google Scholar 

  17. Grzymisch J and Fichter W, Observability criteria and unobservable maneuvers for in-orbit bearing-only navigation, Journal of Guidance, Control, and Dynamics, 2014, 37: 1250–1259.

    Article  Google Scholar 

  18. Kluge S, Reif K, and Brokate M, Stochastic stability of the extended Kalman filter with intermittent observations, IEEE Transactions on Automatic Control, 2010, 55: 514–518.

    Article  MathSciNet  MATH  Google Scholar 

  19. Zhang X, Wang D, and Huang X, Study on the selection of the beacon asteroids in autonomous optical navigation for interplanetary exploration, Journal of Astronautics, 2009, 30: 947–952.

    Google Scholar 

  20. Huber M F, Optimal pruning for multi-step sensor scheduling, IEEE Transactions on Automatic Control, 2012, 57: 1338–1343.

    Article  MathSciNet  MATH  Google Scholar 

  21. Nordio A, Tarable A, Dabbene F, et al., Sensor selection and precoding strategies for wireless sensor networks, IEEE Transactions on Signal Processing, 2015, 63: 4411–4421.

    Article  MathSciNet  MATH  Google Scholar 

  22. Prabhakar S and Leus G, Sparsity-promoting sensor selection for non-linear measurement models, IEEE Transactions on Signal Processing, 2015, 63: 684–698.

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhang H, Ayoub R, and Sundaram S, Sensor selection for Kalman filtering of linear dynamical systems: Complexity, limitations and greedy algorithms, Automatica, 2017, 78: 202–210.

    Article  MathSciNet  MATH  Google Scholar 

  24. Wang J, He Z, Zhou H, et al., Optimal weight and parameter estimation of multi-structure and unequal-precision data fusion, Chinese Journal of Electronic, 2017, 26: 1245–1253.

    Article  Google Scholar 

  25. Watkins C J and Dayan P, Q-learning, Machine Learning, 1992, 8: 279–292.

    MATH  Google Scholar 

  26. Gosavi A, Reinforcement learning: A tutorial survey and recent advances, Informs Journal on Computing, 2009, 21: 178–192.

    Article  MathSciNet  MATH  Google Scholar 

  27. Kober J, Bagnell J A, and Peters J, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, 2013, 32: 1238–1274.

    Article  Google Scholar 

  28. Rizvi S A A and Lin Z, Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control, Automatica, 2018, 95: 213–221.

    Article  MathSciNet  MATH  Google Scholar 

  29. Vamvoudakis K G and Hespanha J P, Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems, IEEE Transactions on Automatic Control, 2018, 63: 1018–1031.

    Article  MathSciNet  MATH  Google Scholar 

  30. Luo B, Wu H, and Huang T, Optimal output regulation for model-free quanser helicopter with multistep Q-learning, IEEE Transactions on Industrial Electronics, 2018, 65: 4953–4961.

    Article  Google Scholar 

  31. Konar A, Chakraborty I G, Singh S J, et al., A deterministic improved Q-learning for path planning of a mobile robot, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013, 43: 1141–1153.

    Article  Google Scholar 

  32. Galindo-Serrano A and Giupponi L, Distributed Q-learning for aggregated interference control in cognitive radio networks, IEEE Transactions on Vehicular Technology, 2010, 59: 1823–1834.

    Article  Google Scholar 

  33. Wei Q, Lewis F L, Sun Q, et al., Discrete-time deterministic Q-learning: A novel convergence analysis, IEEE Transactions on Cybernetics, 2017, 47: 1224–1237.

    Article  Google Scholar 

  34. Arslan G and Yuksel S, Decentralized Q-learning for stochastic teams and games, IEEE Transactions on Automatic Control, 2017, 62: 1545–1558.

    Article  MathSciNet  MATH  Google Scholar 

  35. Sadhu A K and Konar A, Improving the speed of convergence of multi-agent Q-learning for cooperative task-planning by a robot-team, Robotics and Autonomous Systems, 2017, 92: 66–80.

    Article  Google Scholar 

  36. Ahn H S, Jung O, Choi S, et al., An optimal satellite antenna profile using reinforcement learning, IEEE Transactions on System, Man, and CyberneticsPart C: Applications and Reviews, 2011, 41: 393–406.

    Article  Google Scholar 

  37. Kim D, Lee T, Kim S, et al., Adaptive packet scheduling in IoT environment based on Q-learning, Procedia Computer Science, 2018, 141: 247–254.

    Article  Google Scholar 

  38. Han C, Niu Y, Pang T, et al., Intelligent anti-jamming communication based on the modified Q-learning, Procedia Computer Science, 2018, 131: 1023–1031.

    Article  Google Scholar 

  39. Hafner R and Riedmiller M, Reinforcement learning in feedback control, Mach. Learn., 2011, 84: 137–169.

    Article  MathSciNet  Google Scholar 

  40. Martinez-Tenor A, Antonio J, Fernandez-Madrigal A, et al., Towards a common implementation of reinforcement learning for multiple robotic tasks, Expert Systems with Applications, 2018, 100: 246–259.

    Article  Google Scholar 

  41. Kiumarsi B, Vamvoudakis K G, Modares H, et al., Optimal and autonomous control using reinforcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems, 2018, 29: 2042–2062.

    Article  MathSciNet  Google Scholar 

  42. Liebe C C, Accuracy performance of star trackers — A tutorial, IEEE Transactions on Aerospace and Electronic Systems, 2002, 38: 587–599.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Xiong.

Additional information

This paper was supported by the National Natural Science Foundation under Grant Nos. 61573059, 61525301, 61690215.

This paper was recommended for publication by Editor SUN Jian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, K., Wei, C. Q-Learning-Based Target Selection for Bearings-Only Autonomous Navigation. J Syst Sci Complex 34, 1401–1425 (2021). https://doi.org/10.1007/s11424-020-9265-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-020-9265-y

Keywords

Navigation