Abstract
Urban autonomous navigation has broad application prospects. Reinforcement Learning (RL) based navigation models can be continuously optimized through self-exploration, eliminating the need for human heuristics. However, training effective navigation models faces challenges due to the dynamic nature of urban traffic conditions and the exploration-exploitation dilemma in RL. Moreover, the limited vehicle perception and traffic uncertainty introduce potential safety hazards, hampering the real-world application of RL-based navigation models. In this paper, we proposed a novel end-to-end urban navigation framework with decision hindsight. By formulating the problem of Partially Observable Markov Decision Process (POMDP), we employ a causal Transformer-based autoregressive modeling approach to process the historical navigation information as supplementary observations. We then combine these historical observations with current perceptions to construct a history-feedforward state representation that enhances global awareness, improving data availability and decision predictability. Furthermore, by integrating the historical-feedforward state encoding upstream, we develop an end-to-end learning framework based on RL to obtain a navigation model with decision hindsight, enabling more reliable navigation. To validate the effectiveness of our proposed method, we conduct experiments on challenging urban navigation tasks using the CARLA simulator. The results demonstrate that our method achieves higher learning efficiency and improved driving performance, taking priority over prior methods on urban navigation benchmarks.
This work was supported by Shandong Provincial Natural Science Foundation, China under Grant ZR2022LZH002.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmed, M., Abobakr, A., Lim, C.P., Nahavandi, S.: Policy-based reinforcement learning for training autonomous driving agents in urban areas with affordance learning. IEEE Trans. Intell. Transp. Syst. 23(8), 12562–12571 (2022). https://doi.org/10.1109/TITS.2021.3115235
Chen, D., Koltun, V., Krähenbühl, P.: Learning to drive from a world on rails. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021, pp. 15570–15579. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.01530
Chen, J., Li, S.E., Tomizuka, M.: Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 23(6), 5068–5078 (2022). https://doi.org/10.1109/TITS.2020.3046646
Chen, J., Yuan, B., Tomizuka, M.: Model-free deep reinforcement learning for urban autonomous driving. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 2765–2771 (2019). https://doi.org/10.1109/ITSC.2019.8917306
Chen, L., et al.: Decision transformer: Reinforcement learning via sequence modeling. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021(December), pp. 6–14, 2021. Virtual, pp. 15084–15097 (2021). https://proceedings.neurips.cc/paper/2021/hash/7f489f642a0ddb10272b5c31057f0663-Abstract.html
Chen, M., Xiao, X., Zhang, W., Gao, X.: Efficient and stable information directed exploration for continuous reinforcement learning. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4023–4027 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746211
Chitta, K., Prakash, A., Geiger, A.: NEAT: neural attention fields for end-to-end autonomous driving. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021, pp. 15773–15783. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.01550
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 9328–9337. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00942
Deshpande, N., Vaufreydaz, D., Spalanzani, A.: Navigation in urban environments amongst pedestrians using multi-objective deep reinforcement learning. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 923–928 (2021). https://doi.org/10.1109/ITSC48978.2021.9564601
Dosovitskiy, A., Ros, G., Codevilla, F., López, A.M., Koltun, V.: CARLA: an open urban driving simulator. In: 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, 13–15 November 2017, Proceedings. Proceedings of Machine Learning Research, vol. 78, pp. 1–16. PMLR (2017). http://proceedings.mlr.press/v78/dosovitskiy17a.html
Fakoor, R., Chaudhari, P., Soatto, S., Smola, A.J.: Meta-q-learning. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=SJeD3CEFPH
Huang, C., et al.: Deductive reinforcement learning for visual autonomous urban driving navigation. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5379–5391 (2021). https://doi.org/10.1109/TNNLS.2021.3109284
Janner, M., Li, Q., Levine, S.: Offline reinforcement learning as one big sequence modeling problem. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021(December), pp. 6–14, 2021. virtual, pp. 1273–1286 (2021). https://proceedings.neurips.cc/paper/2021/hash/099fe6b0b444c23836c4a5d07346082b-Abstract.html
Kargar, E., Kyrki, V.: Increasing the efficiency of policy learning for autonomous vehicles by multi-task representation learning. IEEE Trans. Intell. Veh. 7(3), 701–710 (2022). https://doi.org/10.1109/TIV.2022.3149891
Khalil, Y.H., Mouftah, H.T.: Exploiting multi-modal fusion for urban autonomous driving using latent deep reinforcement learning. IEEE Trans. Veh. Technol. 72(3), 2921–2935 (2023). https://doi.org/10.1109/TVT.2022.3217299
Li, W., Luo, H., Lin, Z., Zhang, C., Lu, Z., Ye, D.: A survey on transformers in reinforcement learning. CoRR abs/2301.03044 (2023). https://doi.org/10.48550/arXiv.2301.03044
Liu, H., Huang, Z., Wu, J., Lv, C.: Improved deep reinforcement learning with expert demonstrations for urban autonomous driving. In: 2022 IEEE Intelligent Vehicles Symposium (IV), pp. 921–928 (2022). https://doi.org/10.1109/IV51971.2022.9827073
Loynd, R., Fernandez, R., Celikyilmaz, A., Swaminathan, A., Hausknecht, M.J.: Working memory graphs. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 6404–6414. PMLR (2020). http://proceedings.mlr.press/v119/loynd20a.html
Melo, L.C.: Transformers are meta-reinforcement learners. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 15340–15359. PMLR (2022). https://proceedings.mlr.press/v162/melo22a.html
Parisotto, E., et al.: Stabilizing transformers for reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 7487–7498. PMLR (2020). http://proceedings.mlr.press/v119/parisotto20a.html
Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 7077–7087. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00700, https://openaccess.thecvf.com/content/CVPR2021/html/Prakash_Multi-Modal_Fusion_Transformer_for_End-to-End_Autonomous_Driving_CVPR_2021_paper.html
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1506.02438
Schwarting, W., Alonso-Mora, J., Rus, D.: Planning and decision-making for autonomous vehicles. Ann. Rev. Control Robot. Autonom. Syst. 1(1), 187–210 (2018). https://doi.org/10.1146/annurev-control-060117-105157
Shao, H., Wang, L., Chen, R., Li, H., Liu, Y.: Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In: Liu, K., Kulic, D., Ichnowski, J. (eds.) Conference on Robot Learning, CoRL 2022, 14–18 December 2022, Auckland, New Zealand. Proceedings of Machine Learning Research, vol. 205, pp. 726–737. PMLR (2022). https://proceedings.mlr.press/v205/shao23a.html
Vinyals, O., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Wang, C., Wang, J., Shen, Y., Zhang, X.: Autonomous navigation of UAVs in large-scale complex environments: a deep reinforcement learning approach. IEEE Trans. Veh. Technol. 68(3), 2124–2136 (2019). https://doi.org/10.1109/TVT.2018.2890773
Wang, P., Chan, C.Y.: Formulation of deep reinforcement learning architecture toward autonomous driving for on-ramp merge. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 1–6 (2017). https://doi.org/10.1109/ITSC.2017.8317735
Wu, J., Huang, W., de Boer, N., Mo, Y., He, X., Lv, C.: Safe decision-making for lane-change of autonomous vehicles via human demonstration-aided reinforcement learning. In: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp. 1228–1233 (2022). https://doi.org/10.1109/ITSC55140.2022.9921872
Zambaldi, V.F., et al.: Deep reinforcement learning with relational inductive biases. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019). https://openreview.net/forum?id=HkxaFoC9KQ
Zhang, Z., Liniger, A., Dai, D., Yu, F., Van Gool, L.: End-to-end urban driving by imitating a reinforcement learning coach. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15202–15212 (2021). https://doi.org/10.1109/ICCV48922.2021.01494
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deng, Q., Liu, G., Li, R., Hu, Q., Zhao, Y., Li, R. (2024). End-to-End Urban Autonomous Navigation with Decision Hindsight. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1969. Springer, Singapore. https://doi.org/10.1007/978-981-99-8184-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-8184-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8183-0
Online ISBN: 978-981-99-8184-7
eBook Packages: Computer ScienceComputer Science (R0)