Abstract
In recent years, the monocular 3D target detection algorithm based on pseudo-LiDAR has achieved great accuracy improvement on the KITTI data set. However, due to the large amount of noise contained in the point cloud obtained by depth estimation, the detection accuracy is affected. In this paper, we propose a monocular 3D target detection network that adaptively fuses image and pseudo-LiDAR information, and realizes quality perception of the prediction boxes. First, we propose an adaptive feature fusion mechanism, which uses the attention mechanism to achieve effective fusion of different modal information, to improve the precision of target detection. Then, to gain awareness about the quality of the forecast box, we propose to construct an independent 3D detection box confidence prediction network, and put forward a quality perception loss to train the confidence prediction network. Finally, the experimental verification on the KITTI data set shows that the proposed algorithm is higher than that of other algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qi, C.R., Yi, L., Su, H., et al.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Wang, Y., Chao, W.L., Garg, D., et al.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8445–8453 (2019)
Ku, J., Mozifian, M., Lee, J., et al.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)
Qi, C.R., Liu, W., Wu, C., et al.: Frustum PointNets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Ma, X., Wang, Z., Li, H., et al.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6851–6860 (2019)
Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-lidar representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 311–327. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_19
Mousavian, A., Anguelov, D., Flynn, J., et al.: 3D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7074–7082 (2017)
Kehl, W., Manhardt, F., Tombari, F., et al.: SSD-6D: making RGB-based 3D detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Li, B., Ouyang, W., Sheng, L., et al.: GS3D: an efficient 3D object detection framework for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1019–1028 (2019)
Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9287–9296 (2019)
Qiu, J., Cui, Z., Zhang, Y., et al.: DeepLiDAR: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)
Fu, H., Gong, M., Wang, C., et al.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 2002–2011 (2018)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Simonelli, A., Bulo, S.R., Porzi, L., et al.: Disentangling monocular 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1991–1999 (2019)
Wang, L., Du, L., Ye, X., et al.: Depth-conditioned dynamic message propagation for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 454–463 (2021)
Acknowledgments
This work was supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China (Grant No. KJ2019A0162), the Natural Science Foundation of Anhui Province, China (Grand No. 2108085MF197 and Grand No.1708085MF154), and the Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Anhui Polytechnic University (Grant No. DTESD2020B02).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, J., Wang, F., Liu, F., Wang, Q. (2021). Monocular 3D Target Detection Based on Cross-Modal and Mass Perceived Loss. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-86608-2_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86607-5
Online ISBN: 978-3-030-86608-2
eBook Packages: Computer ScienceComputer Science (R0)