Skip to main content

Rotation-Robust Intersection over Union for 3D Object Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12365))

Abstract

In this paper, we propose a Rotation-robust Intersection over Union (\(\textit{RIoU}\)) for 3D object detection, which aims to learn the overlap of rotated bounding boxes. In most existing 3D object detection methods, the norm-based loss is adopted to individually regress the parameters of bounding boxes, which may suffer from the loss-metric mismatch due to the scaling problem. Motivated by the IoU loss in the axis-aligned 2D object detection which is invariant to the scale, our method jointly optimizes the parameters via the \(\textit{RIoU}\) loss. To tackle the uncertainty of convex caused by rotation, a projection operation is defined to estimate the intersection area. The calculation process of \(\textit{RIoU}\) and its loss function is robust to the rotation condition and feasible for back-propagation, which only comprises basic numerical operations. By incorporating the \(\textit{RIoU}\) loss with the conventional norm-based loss function, we enforce the network to directly optimize the \(\textit{RIoU}\). Experimental results on the KITTI, nuScenes and SUN RGB-D datasets validate the effectiveness of our proposed method. Moreover, we show that our method is suitable for the detection task of 2D rotated objects, such as text boxes and cluttered targets in the aerial images.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    github.com/charlesq34/frustum-pointnets/.

  2. 2.

    github.com/traveller59/second.pytorch.

  3. 3.

    github.com/facebookresearch/votenet.

References

  1. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)

  2. Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep Manta: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR, pp. 2040–2049 (2017)

    Google Scholar 

  3. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2017)

    Article  Google Scholar 

  4. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)

    Google Scholar 

  5. Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: NIPS, pp. 424–432 (2015)

    Google Scholar 

  6. Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. TPAMI 40(5), 1259–1272 (2017)

    Article  Google Scholar 

  7. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)

    Google Scholar 

  8. Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: ICCV, pp. 9775–9784 (2019)

    Google Scholar 

  9. Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: ICRA, pp. 1355–1361 (2017)

    Google Scholar 

  10. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)

    Google Scholar 

  11. Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  13. Huang, C., Zhai, S., Talbott, W., Bautista, M.A., Sun, S.Y., Guestrin, C., Susskind, J.: Addressing the loss-metric mismatch with adaptive loss alignment. In: ICML (2019)

    Google Scholar 

  14. Janssens, R., Zeng, G., Zheng, G.: Fully automatic segmentation of lumbar vertebrae from CT images using cascaded 3D fully convolutional networks. In: ISBI, pp. 893–897 (2018)

    Google Scholar 

  15. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_48

    Chapter  Google Scholar 

  16. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)

    Google Scholar 

  17. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)

    Google Scholar 

  18. Kosiorek, A., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. In: NIPS, pp. 3053–3061 (2017)

    Google Scholar 

  19. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IROS, pp. 1–8 (2018)

    Google Scholar 

  20. Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In: CVPR, pp. 11867–11876 (2019)

    Google Scholar 

  21. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)

    Google Scholar 

  22. Li, B.: 3D fully convolutional network for vehicle detection in point cloud. In: IROS, pp. 1513–1518 (2017)

    Google Scholar 

  23. Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: CVPR, pp. 7644–7652 (2019)

    Google Scholar 

  24. Li, P., Qin, T., et al.: Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving. In: ECCV, pp. 646–661 (2018)

    Google Scholar 

  25. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)

    Google Scholar 

  26. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)

    Google Scholar 

  27. Liu, L., Pan, Z., Lei, B.: Learning a rotation invariant detector with rotatable bounding box. arXiv preprint arXiv:1711.09405 (2017)

  28. Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. In: CVPR, pp. 1057–1066 (2019)

    Google Scholar 

  29. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)

    Google Scholar 

  30. Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)

    Google Scholar 

  31. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)

    Google Scholar 

  32. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)

    Google Scholar 

  33. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR, pp. 918–927 (2018)

    Google Scholar 

  34. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 652–660 (2017)

    Google Scholar 

  35. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS, pp. 5099–5108 (2017)

    Google Scholar 

  36. Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection. In: CVPR, pp. 11867–11876 (2019)

    Google Scholar 

  37. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  38. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR, pp. 658–666 (2019)

    Google Scholar 

  39. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)

    Google Scholar 

  40. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  41. Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded IoU loss. In: CVPR, pp. 6877–6885 (2018)

    Google Scholar 

  42. Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: CVPR, pp. 7774–7783 (2018)

    Google Scholar 

  43. Wang, Z., Jia, K.: Frustum convNet: sliding frustums to aggregate local point-wise features for a modal 3D object detection. arXiv preprint arXiv:1903.01864 (2019)

  44. Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: CVPR, pp. 3974–3983 (2018)

    Google Scholar 

  45. Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: CVPR, pp. 2345–2353 (2018)

    Google Scholar 

  46. Xu, D., Anguelov, D., Jain, A.: PointFusion: deep sensor fusion for 3D bounding box estimation. In: CVPR, pp. 244–253 (2018)

    Google Scholar 

  47. Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  48. Yang, X., Liu, Q., Yan, J., Li, A.: R3Det: refined single-stage detector with feature refinement for rotating object. arXiv preprint arXiv:1908.05612 (2019)

  49. Yang, X., et al.: SCRDet: towards more robust detection for small, cluttered and rotated objects. In: ICCV, pp. 8232–8241 (2019)

    Google Scholar 

  50. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)

    Google Scholar 

  51. Zhou, D., et al.: IoU loss for 2D/3D object detection. In: 3DV, pp. 85–94 (2019)

    Google Scholar 

  52. Zhou, J., Lu, X., Tan, X., Shao, Z., Ding, S., Ma, L.: FVNet: 3D front-view proposal generation for real-time object detection from point clouds. arXiv preprint arXiv:1903.10750 (2019)

  53. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)

    Google Scholar 

  54. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiwen Lu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1182 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, Y., Zhang, D., Xie, S., Lu, J., Zhou, J. (2020). Rotation-Robust Intersection over Union for 3D Object Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12365. Springer, Cham. https://doi.org/10.1007/978-3-030-58565-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58565-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58564-8

  • Online ISBN: 978-3-030-58565-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics