Abstract
In recent years, cooperative perception (CP) in vehicle-to-infrastructure (V2I) scenarios has gained significant traction as a key technology in autonomous driving. In this paper, we investigate the end-to-end object detection model and spatiotemporal asynchrony to enhance the perception performance of autonomous vehicles. We propose a novel V2I CP framework termed V2ICooper, designed for efficient and robust object detection and fusion. We propose an end-to-end object detection model with a heterogeneous multi-agent middle layer (HMML) serving as a backbone module. HMML facilitates feature interaction across different levels, allowing for the exploration of richer features and enhancing the system’s detection performance. To mitigate the impact of spatiotemporal asynchrony on the results, we introduce the spatiotemporal asynchronous fusion (SAF) method. This approach involves learning complex nonlinear mapping relationships between input sequences and corresponding object sequences, enabling spatiotemporal alignment. Experimental validations conducted by V2ICooper on real-world DAIR-V2X-C dataset demonstrate superior accuracy and robustness in object detection. Additionally, the successful implementation of the proposed system in real scenarios substantiates its effectiveness, as evidenced by experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, Y.-C., Tian, J., Glaser, N., Kira, Z.: When2com: multi-agent perception via communication graph grouping. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4105–4114 (2020)
Liu, K., Liu, C., Yan, G., Lee, V.C.S., Cao, J.: Accelerating DNN inference with reliability guarantee in vehicular edge computing. IEEE/ACM Trans. Netw. 31(6), 3238–3253 (2023)
Liu, C., Liu, K.: Toward reliable DNN-based task partitioning and offloading in vehicular edge computing. IEEE Trans. Consum. Electron., 1 (2023)
Song, J., Hyun, S.-H., Lee, J.-H., Choi, J., Kim, S.-C.: Joint vehicle tracking and RSU selection for V2I communications with extended Kalman filter. IEEE Trans. Veh. Technol. 71(5), 5609–5614 (2022)
Zhang, S., Wang, S., Yu, S., Yu, J.J.Q., Wen, M.: Collision avoidance predictive motion planning based on integrated perception and V2V communication. IEEE Trans. Intell. Transp. Syst. 23(7), 9640–9653 (2022)
Shi, S., et al.: VIPS: real-time perception fusion for infrastructure-assisted autonomous driving. In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pp. 133–146 (2022)
Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., Ma, J.: CoBEVT: cooperative bird’s eye view semantic segmentation with sparse transformers. arXiv preprint arXiv:2207.02202 (2022)
Vadivelu, N., Ren, M., Tu, J., Wang, J., Urtasun, R.: Learning to communicate and correct pose errors. In: Conference on Robot Learning, pp. 1195–1210 (2021)
Yu, H., et al.: DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21361–21370 (2022)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Hu, Y., Ding, Z., Ge, R., Shao, W., Huang, L., Li, K., Liu, Q.: AFDetV2: rethinking the necessity of the second stage for object detection from point clouds. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 969–979 (2022)
Wang, TH., Manivasagam, S., Liang, M., Yang, B., Zeng, W., Urtasun, R.: V2VNet: vehicle-to-vehicle communication for joint perception and prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 605–621. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_36
Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., Ma, J.: V2X-ViT: vehicle-to-everything cooperative perception with vision transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13699, pp. 107–124. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19842-7_7
Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: OPV2V: an open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation, pp. 2583–2589 (2022)
Mehr, E., Jourdan, A., Thome, N., Cord, M., Guitteny, V.: DiscoNet: shapes learning on disconnected manifolds for 3D editing. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3474–3483 (2019)
Acknowledgement
This work was partially supported by the National Natural Science Foundation of China under Grant No. 62172064, and by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202100637).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yi, S., Zhang, H., Jin, F., Hu, Y., Li, R., Liu, K. (2025). V2ICooper: Toward Vehicle-to-Infrastructure Cooperative Perception with Spatiotemporal Asynchronous Fusion. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14999. Springer, Cham. https://doi.org/10.1007/978-3-031-71470-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-71470-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71469-6
Online ISBN: 978-3-031-71470-2
eBook Packages: Computer ScienceComputer Science (R0)