Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss

Chen, Jingang; Wang, Fengsui; Liu, Furong; Wang, Qisheng

doi:10.1007/978-3-030-86608-2_45

Jingang Chen^12,13,14,
Fengsui Wang^12,13,14,
Furong Liu^12,13,14 &
…
Qisheng Wang^12,13,14

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12878))

Included in the following conference series:

Chinese Conference on Biometric Recognition

1396 Accesses

Abstract

In recent years, the monocular 3D target detection algorithm based on pseudo-LiDAR has achieved great accuracy improvement on the KITTI data set. However, due to the large amount of noise contained in the point cloud obtained by depth estimation, the detection accuracy is affected. In this paper, we propose a monocular 3D target detection network that adaptively fuses image and pseudo-LiDAR information, and realizes quality perception of the prediction boxes. First, we propose an adaptive feature fusion mechanism, which uses the attention mechanism to achieve effective fusion of different modal information, to improve the precision of target detection. Then, to gain awareness about the quality of the forecast box, we propose to construct an independent 3D detection box confidence prediction network, and put forward a quality perception loss to train the confidence prediction network. Finally, the experimental verification on the KITTI data set shows that the proposed algorithm is higher than that of other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qi, C.R., Yi, L., Su, H., et al.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Wang, Y., Chao, W.L., Garg, D., et al.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8445–8453 (2019)
Google Scholar
Ku, J., Mozifian, M., Lee, J., et al.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)
Google Scholar
Qi, C.R., Liu, W., Wu, C., et al.: Frustum PointNets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Google Scholar
Ma, X., Wang, Z., Li, H., et al.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6851–6860 (2019)
Google Scholar
Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-lidar representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 311–327. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_19
Chapter Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., et al.: 3D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7074–7082 (2017)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., et al.: SSD-6D: making RGB-based 3D detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Google Scholar
Li, B., Ouyang, W., Sheng, L., et al.: GS3D: an efficient 3D object detection framework for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1019–1028 (2019)
Google Scholar
Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9287–9296 (2019)
Google Scholar
Qiu, J., Cui, Z., Zhang, Y., et al.: DeepLiDAR: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)
Google Scholar
Fu, H., Gong, M., Wang, C., et al.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 2002–2011 (2018)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Simonelli, A., Bulo, S.R., Porzi, L., et al.: Disentangling monocular 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1991–1999 (2019)
Google Scholar
Wang, L., Du, L., Ye, X., et al.: Depth-conditioned dynamic message propagation for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 454–463 (2021)
Google Scholar

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China (Grant No. KJ2019A0162), the Natural Science Foundation of Anhui Province, China (Grand No. 2108085MF197 and Grand No.1708085MF154), and the Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Anhui Polytechnic University (Grant No. DTESD2020B02).

Author information

Authors and Affiliations

School of Electrical Engineering, Anhui Polytechnic University, Wuhu, 241000, China
Jingang Chen, Fengsui Wang, Furong Liu & Qisheng Wang
Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Wuhu, 241000, China
Jingang Chen, Fengsui Wang, Furong Liu & Qisheng Wang
Key Laboratory of Advanced Perception and Intelligent Control of High-End Equipment, Ministry of Education, Wuhu, 241000, China
Jingang Chen, Fengsui Wang, Furong Liu & Qisheng Wang

Authors

Jingang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fengsui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Furong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qisheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengsui Wang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jianjiang Feng
Fudan University, Shanghai, China
Junping Zhang
Shanghai Jiao Tong University, Shanghai, China
Manhua Liu
Shanghai University, Shanghai, China
Yuchun Fang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Wang, F., Liu, F., Wang, Q. (2021). Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-86608-2_45
Published: 08 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86607-5
Online ISBN: 978-3-030-86608-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics