Skip to main content

Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss

  • Conference paper
  • First Online:
Biometric Recognition (CCBR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12878))

Included in the following conference series:

  • 1396 Accesses

Abstract

In recent years, the monocular 3D target detection algorithm based on pseudo-LiDAR has achieved great accuracy improvement on the KITTI data set. However, due to the large amount of noise contained in the point cloud obtained by depth estimation, the detection accuracy is affected. In this paper, we propose a monocular 3D target detection network that adaptively fuses image and pseudo-LiDAR information, and realizes quality perception of the prediction boxes. First, we propose an adaptive feature fusion mechanism, which uses the attention mechanism to achieve effective fusion of different modal information, to improve the precision of target detection. Then, to gain awareness about the quality of the forecast box, we propose to construct an independent 3D detection box confidence prediction network, and put forward a quality perception loss to train the confidence prediction network. Finally, the experimental verification on the KITTI data set shows that the proposed algorithm is higher than that of other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Qi, C.R., Yi, L., Su, H., et al.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)

  2. Wang, Y., Chao, W.L., Garg, D., et al.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8445–8453 (2019)

    Google Scholar 

  3. Ku, J., Mozifian, M., Lee, J., et al.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)

    Google Scholar 

  4. Qi, C.R., Liu, W., Wu, C., et al.: Frustum PointNets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)

    Google Scholar 

  5. Ma, X., Wang, Z., Li, H., et al.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6851–6860 (2019)

    Google Scholar 

  6. Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-lidar representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 311–327. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_19

    Chapter  Google Scholar 

  7. Mousavian, A., Anguelov, D., Flynn, J., et al.: 3D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7074–7082 (2017)

    Google Scholar 

  8. Kehl, W., Manhardt, F., Tombari, F., et al.: SSD-6D: making RGB-based 3D detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)

    Google Scholar 

  9. Li, B., Ouyang, W., Sheng, L., et al.: GS3D: an efficient 3D object detection framework for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1019–1028 (2019)

    Google Scholar 

  10. Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9287–9296 (2019)

    Google Scholar 

  11. Qiu, J., Cui, Z., Zhang, Y., et al.: DeepLiDAR: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)

    Google Scholar 

  12. Fu, H., Gong, M., Wang, C., et al.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 2002–2011 (2018)

    Google Scholar 

  13. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  14. Simonelli, A., Bulo, S.R., Porzi, L., et al.: Disentangling monocular 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1991–1999 (2019)

    Google Scholar 

  15. Wang, L., Du, L., Ye, X., et al.: Depth-conditioned dynamic message propagation for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 454–463 (2021)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China (Grant No. KJ2019A0162), the Natural Science Foundation of Anhui Province, China (Grand No. 2108085MF197 and Grand No.1708085MF154), and the Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Anhui Polytechnic University (Grant No. DTESD2020B02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengsui Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Wang, F., Liu, F., Wang, Q. (2021). Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86608-2_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86607-5

  • Online ISBN: 978-3-030-86608-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics