Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Ji, Chaofeng; Liu, Guizhong; Zhao, Dan

doi:10.1007/s00371-022-02607-x

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Original article
Published: 22 July 2022

Volume 39, pages 4543–4554, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Chaofeng Ji¹,
Guizhong Liu¹ &
Dan Zhao¹

332 Accesses
5 Citations
Explore all metrics

Abstract

We present a novel and high-performance framework for 3D object detection using stereo vision. This framework incorporates direct instance depth estimation efficiently, improving the accuracy of the final 3D object detection. Instead of detecting objects separately in the left and right images of a stereo display, we exploit a modified 2D object detector that takes only the left image as input to generate union 2D bounding boxes for both images, and to predict the depth of the 3D box center for each object. Using the union 2D boxes, we propose a direct instance-level depth estimation network, taking the estimated depth as guidance, to predict the desired depths for pixels belonging to an object from a small search range. This approach greatly improves the efficiency and accuracy of 3D detection. Moreover, we design an adaptive spatial feature aggregation module that can weaken the effect of background points and automatically integrate important instance features to achieve accurate 3D object localization. Our method outperforms current state-of-the-art stereo-based 3D detection methods on the KITTI benchmark dataset, and it can efficiently employ a shared model for 3D multi-class detection. Code will be available at https://github.com/xjtuwh/iDepNet/tree/master.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Huang, T., Liu, Z., Chen, X., Bai, X.: EPNet: enhancing point features with image semantics for 3D object detection, arXiv preprint: http://arxiv.org/abs/2007.08856, (2020)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds, In: IEEE International Conference on Computer Vision (ICCV), pp. 9276–9285 (2019)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 918–927 (2018)
Shi, R.R.W., Point-GNN: graph neural network for 3D object detection in a point cloud, In: Proceeding of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1708–1716 (2020)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10526–10535 (2020)
Sun, J., Chen, L., Xie, Y., Zhang, S., Jiang, Q., Zhou, X., Bao, H.: Disp R-CNN: stereo 3D object detection via shape prior guided instance disparity estimation, In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10545–10554 (2020)
Chen, Y., Liu, S., Shen, X., Jia, J.: DSGN: deep stereo geometry network for 3D object detection, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12533–12542 (2020)
Li, P., Chen, X., Shen, S.: Stereo r-cnn based 3d object detection for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7636–7644 (2019)
Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7607–7615 (2019)
Pon, A.D., Ku, J., Li, C., Waslander, S.L.: Object-centric stereo matching for 3D object detection, In: Proceeding of the IEEE International Conference on Robotics and Automation (ICRA), pp. 8383–8389 (2020)
Peng, W., Pan, H., Liu, H., Sun, Y.: IDA-3D: instance-depth-aware 3D object detection from stereo vision for autonomous driving, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13012–13021 (2020)
X. Ma, Z. Wang, H. Li, P. Zhang, X. Fan, W. Ouyang, Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving, in: IEEE International Conference on Computer Vision (ICCV), pp. 6850–6859 (2019)
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640 (2017)
Li, B., Ouyang, W., Lu, J., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1019–1028 (2019)
Brazil, G., Liu, X.: M3d-rpn: monocular 3d region proposal network for object detection, In: IEEE International Conference on Computer Vision (ICCV), pp. 9287–9296 (2019)
Xiaozhi, C., Kaustav, K., Ziyu, Z., Huimin, M., Raquel, U.: Monocular 3D object detection for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2147–2156 (2016)
Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11867–11876 (2019)
Qin, Z., Wang, J., Lu, Y.: MonoGRNet: a geometric reasoning network for monocular 3D object localization, In: AAAI Conference on Artificial Intelligence (AAAI), pp. 8851–8858 (2019)
Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-LiDAR representation, arXiv preprint: http://arxiv.org/abs/2008.04582, (2020)
Xu, B., Chen, Z.: Multi-level fusion based 3d object detection from monocular images, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2345–2353 (2018)
Wang, Y., Chao, W.-L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-lidar from visual depth estimation: bridging the gap in 3d object detection for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8445–8453 (2019)
You, Y., Wang, Y., Chao, W.L., Garg, D., Pleiss, G., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving, arXiv preprint: http://arxiv.org/abs/1906.06310, (2019)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–779 (2019)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation, In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 5750–5757 (2018)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points, arXivpreprint http://arxiv.org/abs/1904.07850 (2019)
Liu, Z., Wu, Z., Tóth, R.: SMOKE: single-stage monocular 3D Object Detection via Keypoint Estimation, In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4289–4298 (2020)
Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving, arXivpreprint: http://arxiv.org/abs/2001.03343 (2020)
Chen, Y., Tai, L., Sun, K., Li, M.: MonoPair: monocular 3D object detection using pairwise spatial relationships, In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12090–12099 (2020)
Zhao, H., Yang, D., Yu, J.: 3D target detection using dual domain attention and SIFT operator in indoor scenes, The Visual Computer, pp. 1–10 (2021)
Chang, J., Chen, Y.: Pyramid stereo matching network, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)
Li, X., Fan, Y., Lv, G., Ma, H.: Area-based correlation and non-local attention network for stereo matching, The Visual Computer, pp. 1–15 (2021)
Qian, R., Garg, D., Wang, Y., You, Y., Belongie, S., Hariharan, B., Campbell, M., Weinberger, K.Q., Chao, W.L.: End-to-end pseudo-LiDAR for image-based 3D object detection, In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5880–5889 (2020)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints, In: European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection, IEEE Transactions on Pattern Analysis & Machine Intelligence (TPAMI), pp. 2999–3007 (2017)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361 (2012)
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation, In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018)
Ku, J., Harakeh, A., Waslander, S.L.: In defense of classical image processing: fast depth completion on the CPU, In: Proceedings of 2018 15th Conference on Computer and Robot Vision (CRV), pp. 16–22 (2018)
Xu, Z., Zhang, W., Ye, X., Tan, X., Yang, W., Wen, S., Ding, E., Meng, A., Huang, L.: ZoomNet: part-aware adaptive zooming neural network for 3D object detection, In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 12557–12564 (2020)
Li, C., Ku, J., Waslander, S.L.: Confidence guided stereo 3D object detection with split depth estimation, In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 5776–5783 (2020)
Garg, D., Wang, Y., Hariharan, B., Campbell, M., Weinberger, K.Q., Chao, W.-L.: Wasserstein distances for stereo disparity estimation, In: Advances in Neural Information Processing Systems (NeurIPS), (2020)

Download references

Funding

This work was supported by Shanxi Key Research and Development Program Grant 2018ZDCXL-GY-04–03-02.

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, 710049, China
Chaofeng Ji, Guizhong Liu & Dan Zhao

Authors

Chaofeng Ji
View author publications
You can also search for this author in PubMed Google Scholar
Guizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guizhong Liu.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, C., Liu, G. & Zhao, D. Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation. Vis Comput 39, 4543–4554 (2023). https://doi.org/10.1007/s00371-022-02607-x

Download citation

Accepted: 22 June 2022
Published: 22 July 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00371-022-02607-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation