Abstract
Vehicle-to-everything (V2X) technology has become an area of interest in research due to the availability of roadside infrastructure perception datasets. However, these datasets primarily focus on urban intersections and lack data on highway scenarios. Additionally, the perception tasks in the datasets are mainly MONO 3D due to limited synchronized data across multiple sensors. To bridge this gap, we propose Highway-V2X (H-V2X), the first large-scale highway Bird’s-Eye-View (BEV) perception dataset captured by sensors in the real world. The dataset covers over 100 km of highway, with a diverse range of road and weather conditions. H-V2X consists of over 1.9 million fine-grained categorized samples in BEV space, captured by multiple synchronized cameras, with vector map provided. We performed joint 2D-3D calibrations to ensure correct projection and human labor was involved to ensure data quality. Furthermore, we propose three highly relevant tasks to the highway scenario: BEV detection, BEV tracking, and trajectory prediction. We conducted benchmarks for each task, and innovative methods incorporating vector map information were proposed. We hope that H-V2X and benchmark methods will facilitate highway BEV perception research direction. The dataset is available at https://pan.quark.cn/s/86d19da10d18.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE (2016)
Brazil, G., Liu, X.: M3d-rpn: Monocular 3d region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9287–9296 (2019)
Brazil, G., Pons-Moll, G., Liu, X., Schiele, B.: Kinematic 3D object detection in monocular video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 135–152. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_9
Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
Chen, Z., Shi, Y., Jia, J.: Transiff: an instance-level feature fusion framework for vehicle-infrastructure cooperative 3d detection with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18205–18214 (2023)
Creß, C., et al.: A9-dataset: multi-sensor infrastructure-based dataset for mobility research. In: 2022 IEEE Intelligent Vehicles Symposium (IV), pp. 965–970. IEEE (2022)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)
Fan, S., Yu, H., Yang, W., Yuan, J., Nie, Z.: Quest: query stream for vehicle-infrastructure cooperative perception. arXiv preprint arXiv:2308.01804 (2023)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social gan: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2255–2264 (2018)
Howe, M., Reid, I., Mackenzie, J.: Weakly supervised training of monocular 3d object detectors using wide baseline multi-view traffic camera data. arXiv preprint arXiv:2110.10966 (2021)
Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: communication-efficient collaborative perception via spatial confidence maps. Adv. Neural. Inf. Process. Syst. 35, 4874–4886 (2022)
Huang, J., Huang, G., Zhu, Z., Ye, Y., Du, D.: Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790 (2021)
Jiang, W., et al.: Optimizing the placement of roadside lidars for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18381–18390 (2023)
Krajewski, R., Bock, J., Kloeker, L., Eckstein, L.: The highd dataset: a drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 2118–2125. IEEE (2018)
Krajzewicz, D., Erdmann, J., Behrisch, M., Bieker, L.: Recent development and applications of sumo-simulation of urban mobility. Inter. J. Adv. Syst. Measurem. 5(3 &4) (2012)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Li, Y., et al.: V2x-sim: multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robot. Autom. Lett. 7(4), 10914–10921 (2022)
Li, Z., et al.: Bevformer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: European Conference on Computer Vision, pp. 1–18. Springer (2022). https://doi.org/10.1007/978-3-031-20077-9_1
Liu, S., et al.: Towards vehicle-to-everything autonomous driving: A survey on collaborative perception. arXiv preprint arXiv:2308.16714 (2023)
Ma, X., et al.: Delving into localization errors for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4721–4730 (2021)
Qian, R., Lai, X., Li, X.: 3d object detection for autonomous driving: a survey. Pattern Recogn. 130, 108796 (2022)
Rukhovich, D., Vorontsova, A., Konushin, A.: Imvoxelnet: image to voxels projection for monocular and multi-view general-purpose 3d object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2397–2406 (2022)
Sindagi, V.A., Zhou, Y., Tuzel, O.: Mvx-net: multimodal voxelnet for 3d object detection. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7276–7282. IEEE (2019)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Wang, B., Zhang, L., Wang, Z., Zhao, Y., Zhou, T.: Core: cooperative reconstruction for multi-agent perception. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8710–8720 (2023)
Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2583–2589. IEEE (2022)
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Yan, Z., Li, P., et al.: Int2: interactive trajectory prediction at intersections. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8536–8547 (2023)
Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)
Yang, K., et al.: Spatio-temporal domain awareness for multi-agent collaborative perception. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 23383–23392 (2023)
Yang, L., et al.: Bevheight++: Toward robust visual centric 3d object detection. arXiv preprint arXiv:2309.16179 (2023)
Yang, L., et al.: Monogae: Roadside monocular 3d object detection with ground-aware embeddings. arXiv preprint arXiv:2310.00400 (2023)
Yang, L., et al.: Bevheight: a robust framework for vision-based roadside 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21611–21620 (2023)
Ye, X., et al.: Rope3d: the roadside perception dataset for autonomous driving and monocular 3d object detection task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21341–21350 (2022)
Yu, H., et al.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21361–21370 (2022)
Yu, H., Tang, Y., Xie, E., Mao, J., Luo, P., Nie, Z.: Flow-based feature fusion for vehicle-infrastructure cooperative 3d object detection. arXiv preprint arXiv:2311.01682 (2023)
Yu, H., et al.: V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5486–5495 (2023)
Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020)
Zhang, Y., Lu, J., Zhou, J.: Objects are different: flexible monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3289–3298 (2021)
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Zhao, H., et al.: Tnt: Target-driven trajectory prediction. In: Conference on Robot Learning, pp. 895–904. PMLR (2021)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Zhou, Z., Ye, L., Wang, J., Wu, K., Lu, K.: Hivt: hierarchical vector transformer for multi-agent motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8823–8833 (2022)
Zimmer, W., et al.: Infradet3d: Multi-modal 3d object detection based on roadside infrastructure camera and lidar sensors. arXiv preprint arXiv:2305.00314 (2023)
Zimmer, W., Creß, C., Nguyen, H.T., Knoll, A.C.: A9 intersection dataset: All you need for urban 3d camera-lidar roadside perception. arXiv preprint arXiv:2306.09266 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, C., Zhu, M., Ma, C. (2025). H-V2X: A Large Scale Highway Dataset for BEV Perception. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15059. Springer, Cham. https://doi.org/10.1007/978-3-031-73232-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-73232-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73231-7
Online ISBN: 978-3-031-73232-4
eBook Packages: Computer ScienceComputer Science (R0)