V2ICooper: Toward Vehicle-to-Infrastructure Cooperative Perception with Spatiotemporal Asynchronous Fusion

Yi, Sheng; Zhang, Hao; Jin, Feiyu; Hu, Yiyang; Li, Rongzhen; Liu, Kai

doi:10.1007/978-3-031-71470-2_5

Sheng Yi¹¹,
Hao Zhang¹²,
Feiyu Jin¹¹,
Yiyang Hu¹¹,
Rongzhen Li¹¹ &
…
Kai Liu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14999))

Included in the following conference series:

International Conference on Wireless Artificial Intelligent Computing Systems and Applications

230 Accesses

Abstract

In recent years, cooperative perception (CP) in vehicle-to-infrastructure (V2I) scenarios has gained significant traction as a key technology in autonomous driving. In this paper, we investigate the end-to-end object detection model and spatiotemporal asynchrony to enhance the perception performance of autonomous vehicles. We propose a novel V2I CP framework termed V2ICooper, designed for efficient and robust object detection and fusion. We propose an end-to-end object detection model with a heterogeneous multi-agent middle layer (HMML) serving as a backbone module. HMML facilitates feature interaction across different levels, allowing for the exploration of richer features and enhancing the system’s detection performance. To mitigate the impact of spatiotemporal asynchrony on the results, we introduce the spatiotemporal asynchronous fusion (SAF) method. This approach involves learning complex nonlinear mapping relationships between input sequences and corresponding object sequences, enabling spatiotemporal alignment. Experimental validations conducted by V2ICooper on real-world DAIR-V2X-C dataset demonstrate superior accuracy and robustness in object detection. Additionally, the successful implementation of the proposed system in real scenarios substantiates its effectiveness, as evidenced by experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

V2X-Real: A Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

DOLPHINS: Dataset for Collaborative Perception Enabled Harmonious and Interconnected Self-driving

References

Liu, Y.-C., Tian, J., Glaser, N., Kira, Z.: When2com: multi-agent perception via communication graph grouping. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4105–4114 (2020)
Google Scholar
Liu, K., Liu, C., Yan, G., Lee, V.C.S., Cao, J.: Accelerating DNN inference with reliability guarantee in vehicular edge computing. IEEE/ACM Trans. Netw. 31(6), 3238–3253 (2023)
Article Google Scholar
Liu, C., Liu, K.: Toward reliable DNN-based task partitioning and offloading in vehicular edge computing. IEEE Trans. Consum. Electron., 1 (2023)
Google Scholar
Song, J., Hyun, S.-H., Lee, J.-H., Choi, J., Kim, S.-C.: Joint vehicle tracking and RSU selection for V2I communications with extended Kalman filter. IEEE Trans. Veh. Technol. 71(5), 5609–5614 (2022)
Article Google Scholar
Zhang, S., Wang, S., Yu, S., Yu, J.J.Q., Wen, M.: Collision avoidance predictive motion planning based on integrated perception and V2V communication. IEEE Trans. Intell. Transp. Syst. 23(7), 9640–9653 (2022)
Article Google Scholar
Shi, S., et al.: VIPS: real-time perception fusion for infrastructure-assisted autonomous driving. In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pp. 133–146 (2022)
Google Scholar
Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., Ma, J.: CoBEVT: cooperative bird’s eye view semantic segmentation with sparse transformers. arXiv preprint arXiv:2207.02202 (2022)
Vadivelu, N., Ren, M., Tu, J., Wang, J., Urtasun, R.: Learning to communicate and correct pose errors. In: Conference on Robot Learning, pp. 1195–1210 (2021)
Google Scholar
Yu, H., et al.: DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21361–21370 (2022)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Hu, Y., Ding, Z., Ge, R., Shao, W., Huang, L., Li, K., Liu, Q.: AFDetV2: rethinking the necessity of the second stage for object detection from point clouds. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 969–979 (2022)
Google Scholar
Wang, TH., Manivasagam, S., Liang, M., Yang, B., Zeng, W., Urtasun, R.: V2VNet: vehicle-to-vehicle communication for joint perception and prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 605–621. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_36
Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., Ma, J.: V2X-ViT: vehicle-to-everything cooperative perception with vision transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13699, pp. 107–124. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19842-7_7
Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: OPV2V: an open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation, pp. 2583–2589 (2022)
Google Scholar
Mehr, E., Jourdan, A., Thome, N., Cord, M., Guitteny, V.: DiscoNet: shapes learning on disconnected manifolds for 3D editing. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3474–3483 (2019)
Google Scholar

Download references

Acknowledgement

This work was partially supported by the National Natural Science Foundation of China under Grant No. 62172064, and by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202100637).

Author information

Authors and Affiliations

Chongqing University, Chongqing, China
Sheng Yi, Feiyu Jin, Yiyang Hu, Rongzhen Li & Kai Liu
Chongqing University of Posts and Telecommunications, Chongqing, China
Hao Zhang

Authors

Sheng Yi
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Feiyu Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yiyang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Rongzhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Kai Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hao Zhang or Kai Liu .

Editor information

Editors and Affiliations

Georgia State University, Atlanta, GA, USA
Zhipeng Cai
Old Dominion University, Norfolk, VA, USA
Daniel Takabi
Beijing University of Posts and Telecommunications, Beijing, China
Shaoyong Guo
Shandong University, Qingdao, China
Yifei Zou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yi, S., Zhang, H., Jin, F., Hu, Y., Li, R., Liu, K. (2025). V2ICooper: Toward Vehicle-to-Infrastructure Cooperative Perception with Spatiotemporal Asynchronous Fusion. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14999. Springer, Cham. https://doi.org/10.1007/978-3-031-71470-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-71470-2_5
Published: 13 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71469-6
Online ISBN: 978-3-031-71470-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

V2ICooper: Toward Vehicle-to-Infrastructure Cooperative Perception with Spatiotemporal Asynchronous Fusion