Skip to main content
Log in

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present a novel and high-performance framework for 3D object detection using stereo vision. This framework incorporates direct instance depth estimation efficiently, improving the accuracy of the final 3D object detection. Instead of detecting objects separately in the left and right images of a stereo display, we exploit a modified 2D object detector that takes only the left image as input to generate union 2D bounding boxes for both images, and to predict the depth of the 3D box center for each object. Using the union 2D boxes, we propose a direct instance-level depth estimation network, taking the estimated depth as guidance, to predict the desired depths for pixels belonging to an object from a small search range. This approach greatly improves the efficiency and accuracy of 3D detection. Moreover, we design an adaptive spatial feature aggregation module that can weaken the effect of background points and automatically integrate important instance features to achieve accurate 3D object localization. Our method outperforms current state-of-the-art stereo-based 3D detection methods on the KITTI benchmark dataset, and it can efficiently employ a shared model for 3D multi-class detection. Code will be available at https://github.com/xjtuwh/iDepNet/tree/master.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Huang, T., Liu, Z., Chen, X., Bai, X.: EPNet: enhancing point features with image semantics for 3D object detection, arXiv preprint: http://arxiv.org/abs/2007.08856, (2020)

  2. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds, In: IEEE International Conference on Computer Vision (ICCV), pp. 9276–9285 (2019)

  3. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 918–927 (2018)

  4. Shi, R.R.W., Point-GNN: graph neural network for 3D object detection in a point cloud, In: Proceeding of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1708–1716 (2020)

  5. Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10526–10535 (2020)

  6. Sun, J., Chen, L., Xie, Y., Zhang, S., Jiang, Q., Zhou, X., Bao, H.: Disp R-CNN: stereo 3D object detection via shape prior guided instance disparity estimation, In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10545–10554 (2020)

  7. Chen, Y., Liu, S., Shen, X., Jia, J.: DSGN: deep stereo geometry network for 3D object detection, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12533–12542 (2020)

  8. Li, P., Chen, X., Shen, S.: Stereo r-cnn based 3d object detection for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7636–7644 (2019)

  9. Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7607–7615 (2019)

  10. Pon, A.D., Ku, J., Li, C., Waslander, S.L.: Object-centric stereo matching for 3D object detection, In: Proceeding of the IEEE International Conference on Robotics and Automation (ICRA), pp. 8383–8389 (2020)

  11. Peng, W., Pan, H., Liu, H., Sun, Y.: IDA-3D: instance-depth-aware 3D object detection from stereo vision for autonomous driving, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13012–13021 (2020)

  12. X. Ma, Z. Wang, H. Li, P. Zhang, X. Fan, W. Ouyang, Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving, in: IEEE International Conference on Computer Vision (ICCV), pp. 6850–6859 (2019)

  13. Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640 (2017)

  14. Li, B., Ouyang, W., Lu, J., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1019–1028 (2019)

  15. Brazil, G., Liu, X.: M3d-rpn: monocular 3d region proposal network for object detection, In: IEEE International Conference on Computer Vision (ICCV), pp. 9287–9296 (2019)

  16. Xiaozhi, C., Kaustav, K., Ziyu, Z., Huimin, M., Raquel, U.: Monocular 3D object detection for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2147–2156 (2016)

  17. Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11867–11876 (2019)

  18. Qin, Z., Wang, J., Lu, Y.: MonoGRNet: a geometric reasoning network for monocular 3D object localization, In: AAAI Conference on Artificial Intelligence (AAAI), pp. 8851–8858 (2019)

  19. Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-LiDAR representation, arXiv preprint: http://arxiv.org/abs/2008.04582, (2020)

  20. Xu, B., Chen, Z.: Multi-level fusion based 3d object detection from monocular images, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2345–2353 (2018)

  21. Wang, Y., Chao, W.-L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-lidar from visual depth estimation: bridging the gap in 3d object detection for autonomous driving, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8445–8453 (2019)

  22. You, Y., Wang, Y., Chao, W.L., Garg, D., Pleiss, G., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving, arXiv preprint: http://arxiv.org/abs/1906.06310, (2019)

  23. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–779 (2019)

  24. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation, In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 5750–5757 (2018)

  25. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points, arXivpreprint http://arxiv.org/abs/1904.07850 (2019)

  26. Liu, Z., Wu, Z., Tóth, R.: SMOKE: single-stage monocular 3D Object Detection via Keypoint Estimation, In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4289–4298 (2020)

  27. Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving, arXivpreprint: http://arxiv.org/abs/2001.03343 (2020)

  28. Chen, Y., Tai, L., Sun, K., Li, M.: MonoPair: monocular 3D object detection using pairwise spatial relationships, In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12090–12099 (2020)

  29. Zhao, H., Yang, D., Yu, J.: 3D target detection using dual domain attention and SIFT operator in indoor scenes, The Visual Computer, pp. 1–10 (2021)

  30. Chang, J., Chen, Y.: Pyramid stereo matching network, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)

  31. Li, X., Fan, Y., Lv, G., Ma, H.: Area-based correlation and non-local attention network for stereo matching, The Visual Computer, pp. 1–15 (2021)

  32. Qian, R., Garg, D., Wang, Y., You, Y., Belongie, S., Hariharan, B., Campbell, M., Weinberger, K.Q., Chao, W.L.: End-to-end pseudo-LiDAR for image-based 3D object detection, In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5880–5889 (2020)

  33. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints, In: European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  34. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection, IEEE Transactions on Pattern Analysis & Machine Intelligence (TPAMI), pp. 2999–3007 (2017)

  35. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite, In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361 (2012)

  36. Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation, In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018)

  37. Ku, J., Harakeh, A., Waslander, S.L.: In defense of classical image processing: fast depth completion on the CPU, In: Proceedings of 2018 15th Conference on Computer and Robot Vision (CRV), pp. 16–22 (2018)

  38. Xu, Z., Zhang, W., Ye, X., Tan, X., Yang, W., Wen, S., Ding, E., Meng, A., Huang, L.: ZoomNet: part-aware adaptive zooming neural network for 3D object detection, In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 12557–12564 (2020)

  39. Li, C., Ku, J., Waslander, S.L.: Confidence guided stereo 3D object detection with split depth estimation, In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 5776–5783 (2020)

  40. Garg, D., Wang, Y., Hariharan, B., Campbell, M., Weinberger, K.Q., Chao, W.-L.: Wasserstein distances for stereo disparity estimation, In: Advances in Neural Information Processing Systems (NeurIPS), (2020)

Download references

Funding

This work was supported by Shanxi Key Research and Development Program Grant 2018ZDCXL-GY-04–03-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guizhong Liu.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, C., Liu, G. & Zhao, D. Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation. Vis Comput 39, 4543–4554 (2023). https://doi.org/10.1007/s00371-022-02607-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02607-x

Keywords

Navigation