skip to main content
10.1145/3664647.3681128acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection

Published: 28 October 2024 Publication History

Abstract

Fusing the data of millimeter-wave Radar sensors and high-definition cameras has emerged as a viable approach to achieving precise 3D object detection for roadside traffic surveillance. For roadside perception systems, earlier studies have pointed out that it is better to perform the fusion on the 2D image plane than on the BEV plane (which is popular for on-car perception systems), especially when the perception range is large (e.g., >150m). Image-plane fusion requires critical transformations, like perspective projection from the Radar's BEV to the camera's 2D plane and reverse IPM. However, real-world issues like uneven terrain and sensor movement degrade these transformations' precision, impacting fusion effectiveness. To alleviate these issues, we propose a geometry-based Radar-camera fusion method on the ground, namely FARFusion V2. Specifically, we extend the ground-plane assumption in FARFusion[20] to support arbitrary shapes by formulating the ground height as an implicit representation based on geometric transformations. By incorporating the ground information, we can enhance Radar data with target height measurements. Consequently, we can thus project the enhanced Radar data onto the 2D plane to obtain more accurate depth information, thereby assisting the IPM process. A real-time parameterized transformation parameters estimation module is further introduced to refine the view transformation processes. Moreover, considering various measurement noises across these two sensors, we introduce an uncertainty-based depth fusion strategy into the 2D fusion process to maximize the probability of obtaining the optimal depth value. Extensive experiments are conducted on our collected roadside OWL benchmark, demonstrating the excellent localization capacity of FARFusion V2 in far-range scenarios. Our method achieves an average location accuracy of 0.771m when we extend the detection range up to 500m.

Supplemental Material

MP4 File - ftp2691-video.mp4
Presentation video for "FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection".

References

[1]
Jie Bai, Sen Li, Han Zhang, Libo Huang, and Ping Wang. 2021. Robust target detection and tracking algorithm based on roadside radar and camera. Sensors, Vol. 21, 4 (2021), 1116.
[2]
Wentao Bao, Qi Yu, and Yu Kong. 2020. Uncertainty-based traffic accident anticipation with spatio-temporal relational learning. In Proceedings of the 28th ACM International Conference on Multimedia. 2682--2690.
[3]
R Omar Chavez-Garcia, Julien Burlet, Trung-Dung Vu, and Olivier Aycard. 2012. Frontal object perception using radar and mono-vision. In 2012 IEEE Intelligent Vehicles Symposium. IEEE, 159--164.
[4]
Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, and Feng Zhao. 2022. Graph-DETR3D: rethinking overlapping regions for multi-view 3D object detection. In Proceedings of the 30th ACM International Conference on Multimedia. 5999--6008.
[5]
Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang, Jianmin Ji, and Yu Zhang. 2021. Neighbor-vote: Improving monocular 3d object detection through neighbor distance voting. In Proceedings of the 29th ACM International Conference on Multimedia. 5239--5247.
[6]
Christophe Coué, Th Fraichard, Pierre Bessiere, and Emmanuel Mazer. 2002. Multi-sensor data fusion using Bayesian programming: An automotive application. In Intelligent Vehicle Symposium, 2002. IEEE, Vol. 2. IEEE, 442--447.
[7]
Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. 2021. Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 1201--1209.
[8]
Yuchuan Du, Bohao Qin, Cong Zhao, Yifan Zhu, Jing Cao, and Yuxiong Ji. 2021. A novel spatio-temporal synchronization method of roadside asynchronous MMW radar-camera for sensor fusion. IEEE transactions on intelligent transportation systems, Vol. 23, 11 (2021), 22278--22289.
[9]
Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng, Yu Zhang, Jianmin Ji, et al. 2024. Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration. arXiv preprint arXiv:2405.05589 (2024).
[10]
Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, and Tae Eun Choe. 2020. Gen-lanenet: A generalized and scalable approach for 3d lane detection. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXI 16. Springer, 666--681.
[11]
Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, and Zaiqing Nie. 2024. RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception. arXiv preprint arXiv:2403.10145 (2024).
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[13]
Hung-Min Hsu, Yizhou Wang, and Jenq-Neng Hwang. 2020. Traffic-aware multi-camera tracking of vehicles based on reid and camera link model. In Proceedings of the 28th ACM International Conference on Multimedia. 964--972.
[14]
Jyh-Jing Hwang, Henrik Kretzschmar, Joshua Manela, Sean Rafferty, Nicholas Armstrong-Crews, Tiffany Chen, and Dragomir Anguelov. 2022. Cramnet: Camera-radar fusion with ray-constrained cross-attention for robust 3d object detection. In European Conference on Computer Vision. Springer, 388--405.
[15]
Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, Vol. 30 (2017).
[16]
Du Yong Kim and Moongu Jeon. 2014. Data fusion of radar and image measurements for multi-object tracking via Kalman filtering. Information Sciences, Vol. 278 (2014), 641--652.
[17]
Youngseok Kim, Juyeb Shin, Sanmin Kim, In-Jae Lee, Jun Won Choi, and Dongsuk Kum. 2023. Crn: Camera radar net for accurate, robust, efficient 3d perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 17615--17626.
[18]
Xingchen Li, Yuxuan Xiao, Beibei Wang, Haojie Ren, Yanyong Zhang, and Jianmin Ji. 2023. Automatic targetless LiDAR--camera calibration: a survey. Artificial Intelligence Review, Vol. 56, 9 (2023), 9949--9987.
[19]
Yao Li, Jiajun Deng, Yu Zhang, Jianmin Ji, Houqiang Li, and Yanyong Zhang. 2022. EZFusion: A Close Look at the Integration of LiDAR, Millimeter-Wave Radar, and Camera for Accurate 3D Object Detection and Tracking. IEEE Robotics and Automation Letters, Vol. 7, 4 (2022), 11182--11189.
[20]
Yao Li, Yingjie Wang, Chengzhen Meng, Yifan Duan, Jianmin Ji, Yu Zhang, and Yanyong Zhang. 2024. FARFusion: A Practical Roadside Radar-Camera Fusion System for Far-Range Perception. IEEE Robotics and Automation Letters (2024), 1--8. https://doi.org/10.1109/LRA.2024.3387700
[21]
Guibiao Liao, Wei Gao, Qiuping Jiang, Ronggang Wang, and Ge Li. 2020. Mmnet: Multi-stage and multi-scale fusion network for rgb-d salient object detection. In Proceedings of the 28th ACM international conference on multimedia. 2436--2444.
[22]
Yang Liu, Feng Wang, Naiyan Wang, and ZHAO-XIANG ZHANG. 2024. Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion. Advances in Neural Information Processing Systems, Vol. 36 (2024).
[23]
Xudong Lv, Boya Wang, Ziwen Dou, Dong Ye, and Shuo Wang. 2021. LCCNet: LiDAR and camera self-calibration using cost volume network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2894--2901.
[24]
Ramin Nabati and Hairong Qi. 2021. Centerfusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1527--1536.
[25]
Marko Obrvan, Josip Ćesić, and Ivan Petrović. 2016. Appearance based vehicle detection by radar-stereo vision integration. In Robot 2015: Second Iberian Robotics Conference: Advances in Robotics, Volume 1. Springer, 437--449.
[26]
Zequn Qin and Xi Li. 2022. Monoground: Detecting monocular 3d objects from the ground. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3793--3802.
[27]
Haojie Ren, Sha Zhang, Sugang Li, Yao Li, Xinchen Li, Jianmin Ji, Yu Zhang, and Yanyong Zhang. 2023. TrajMatch: Toward Automatic Spatio-Temporal Calibration for Roadside LiDARs Through Trajectory Matching. IEEE Transactions on Intelligent Transportation Systems, Vol. 24, 11 (2023), 12549--12559. https://doi.org/10.1109/TITS.2023.3295757
[28]
Chuanbeibei Shi, Ganghua Lai, Yushu Yu, Mauro Bellone, and Vincezo Lippiello. 2023. Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped With Limited Field of View LiDAR and Camera. IEEE Robotics and Automation Letters (2023).
[29]
Fei-Yue Wang. 2010. Parallel control and management for intelligent transportation systems: Concepts, architectures, and applications. IEEE transactions on intelligent transportation systems, Vol. 11, 3 (2010), 630--638.
[30]
Lefei Wang, Zhaoyu Zhang, Xin Di, and Jun Tian. 2021. A roadside camera-radar sensing fusion system for intelligent transportation. In 2020 17th European Radar Conference (EuRAD). IEEE, 282--285.
[31]
Xiyang Wang, Chunyun Fu, Zhankun Li, Ying Lai, and Jiawei He. 2022. Deepfusionmot: A 3d multi-object tracking framework based on camera-lidar fusion with deep association. IEEE Robotics and Automation Letters, Vol. 7, 3 (2022), 8260--8267.
[32]
Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Houqiang Li, and Yanyong Zhang. 2023. Multi-modal 3d object detection in autonomous driving: a survey. International Journal of Computer Vision, Vol. 131, 8 (2023), 2122--2152.
[33]
Xinshuo Weng, Jianren Wang, David Held, and Kris Kitani. 2020. 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10359--10366.
[34]
Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, and Jian Pu. 2023. Mvfusion: Multi-view 3d object detection with semantic-aligned radar and camera fusion. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2766--2773.
[35]
Zizhang Wu, Yunzhe Wu, Xiaoquan Wang, Yuanzhu Gan, and Jian Pu. 2024. A Robust Diffusion Modeling Framework for Radar Camera 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3282--3292.
[36]
Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, et al. 2022. PP-YOLOE: An evolved version of YOLO. arXiv preprint arXiv:2203.16250 (2022).
[37]
Zhijie Yan, Pengfei Li, Zheng Fu, Shaocong Xu, Yongliang Shi, Xiaoxue Chen, Yuhang Zheng, Yang Li, Tianyu Liu, Chuxuan Li, et al. 2023. INT2: Interactive Trajectory Prediction at Intersections. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8536--8547.
[38]
Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, and Errui Ding. 2022. Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21341--21350.
[39]
Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, et al. 2022. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21361--21370.
[40]
Daiming Zhang, Bin Fang, Weibin Yang, Xiaosong Luo, and Yuanyan Tang. 2014. Robust inverse perspective mapping based on vanishing point. In Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC). IEEE, 458--463.
[41]
Junping Zhang, Fei-Yue Wang, Kunfeng Wang, Wei-Hua Lin, Xin Xu, and Cheng Chen. 2011. Data-driven intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, Vol. 12, 4 (2011), 1624--1639.
[42]
Wentao Zhang, Huansheng Song, and Lichen Liu. 2023. Automatic calibration for monocular cameras in highway scenes via vehicle vanishing point detection. Journal of transportation engineering, Part A: Systems, Vol. 149, 7 (2023), 04023050.
[43]
Yuan Zhao, Lu Zhang, Jiajun Deng, and Yanyong Zhang. 2024. BEV-radar: bidirectional radar-camera fusion for 3D object detection. JUSTC, Vol. 54, 1 (2024), 0101--1.
[44]
Lianqing Zheng, Sen Li, Bin Tan, Long Yang, Sihan Chen, Libo Huang, Jie Bai, Xichan Zhu, and Zhixiong Ma. 2023. Rcfusion: Fusing 4d radar and camera with bird?s-eye view features for 3d object detection. IEEE Transactions on Instrumentation and Measurement (2023).
[45]
Taohua Zhou, Junjie Chen, Yining Shi, Kun Jiang, Mengmeng Yang, and Diange Yang. 2023. Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection. IEEE Transactions on Intelligent Vehicles, Vol. 8, 2 (2023), 1523--1535.
[46]
Li Zhu, Fei Richard Yu, Yige Wang, Bin Ning, and Tao Tang. 2018. Big data analytics in intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, Vol. 20, 1 (2018), 383--398.

Index Terms

  1. FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3d object detection
    2. intelligent transportation system
    3. sensor fusion

    Qualifiers

    • Research-article

    Funding Sources

    • the National Natural Science Foundation of China

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 92
      Total Downloads
    • Downloads (Last 12 months)92
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media