Rotation-Robust Intersection over Union for 3D Object Detection

Zheng, Yu; Zhang, Danyang; Xie, Sinan; Lu, Jiwen; Zhou, Jie

doi:10.1007/978-3-030-58565-5_28

Rotation-Robust Intersection over Union for 3D Object Detection

Yu Zheng^12,13,14,
Danyang Zhang¹²,
Sinan Xie¹²,
Jiwen Lu^12,13,14 &
…
Jie Zhou^12,13,14,15

Conference paper
First Online: 12 November 2020

3826 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12365))

Abstract

In this paper, we propose a Rotation-robust Intersection over Union (\(\textit{RIoU}\)) for 3D object detection, which aims to learn the overlap of rotated bounding boxes. In most existing 3D object detection methods, the norm-based loss is adopted to individually regress the parameters of bounding boxes, which may suffer from the loss-metric mismatch due to the scaling problem. Motivated by the IoU loss in the axis-aligned 2D object detection which is invariant to the scale, our method jointly optimizes the parameters via the \(\textit{RIoU}\) loss. To tackle the uncertainty of convex caused by rotation, a projection operation is defined to estimate the intersection area. The calculation process of \(\textit{RIoU}\) and its loss function is robust to the rotation condition and feasible for back-propagation, which only comprises basic numerical operations. By incorporating the \(\textit{RIoU}\) loss with the conventional norm-based loss function, we enforce the network to directly optimize the \(\textit{RIoU}\). Experimental results on the KITTI, nuScenes and SUN RGB-D datasets validate the effectiveness of our proposed method. Moreover, we show that our method is suitable for the detection task of 2D rotated objects, such as text boxes and cluttered targets in the aerial images.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep Manta: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR, pp. 2040–2049 (2017)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2017)
Article Google Scholar
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)
Google Scholar
Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: NIPS, pp. 424–432 (2015)
Google Scholar
Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. TPAMI 40(5), 1259–1272 (2017)
Article Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)
Google Scholar
Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: ICCV, pp. 9775–9784 (2019)
Google Scholar
Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: ICRA, pp. 1355–1361 (2017)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Huang, C., Zhai, S., Talbott, W., Bautista, M.A., Sun, S.Y., Guestrin, C., Susskind, J.: Addressing the loss-metric mismatch with adaptive loss alignment. In: ICML (2019)
Google Scholar
Janssens, R., Zeng, G., Zheng, G.: Fully automatic segmentation of lumbar vertebrae from CT images using cascaded 3D fully convolutional networks. In: ISBI, pp. 893–897 (2018)
Google Scholar
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_48
Chapter Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
Google Scholar
Kosiorek, A., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. In: NIPS, pp. 3053–3061 (2017)
Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IROS, pp. 1–8 (2018)
Google Scholar
Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In: CVPR, pp. 11867–11876 (2019)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)
Google Scholar
Li, B.: 3D fully convolutional network for vehicle detection in point cloud. In: IROS, pp. 1513–1518 (2017)
Google Scholar
Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: CVPR, pp. 7644–7652 (2019)
Google Scholar
Li, P., Qin, T., et al.: Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving. In: ECCV, pp. 646–661 (2018)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Google Scholar
Liu, L., Pan, Z., Lei, B.: Learning a rotation invariant detector with rotatable bounding box. arXiv preprint arXiv:1711.09405 (2017)
Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. In: CVPR, pp. 1057–1066 (2019)
Google Scholar
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)
Google Scholar
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)
Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR, pp. 918–927 (2018)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS, pp. 5099–5108 (2017)
Google Scholar
Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection. In: CVPR, pp. 11867–11876 (2019)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR, pp. 658–666 (2019)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded IoU loss. In: CVPR, pp. 6877–6885 (2018)
Google Scholar
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: CVPR, pp. 7774–7783 (2018)
Google Scholar
Wang, Z., Jia, K.: Frustum convNet: sliding frustums to aggregate local point-wise features for a modal 3D object detection. arXiv preprint arXiv:1903.01864 (2019)
Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: CVPR, pp. 3974–3983 (2018)
Google Scholar
Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: CVPR, pp. 2345–2353 (2018)
Google Scholar
Xu, D., Anguelov, D., Jain, A.: PointFusion: deep sensor fusion for 3D bounding box estimation. In: CVPR, pp. 244–253 (2018)
Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, X., Liu, Q., Yan, J., Li, A.: R3Det: refined single-stage detector with feature refinement for rotating object. arXiv preprint arXiv:1908.05612 (2019)
Yang, X., et al.: SCRDet: towards more robust detection for small, cluttered and rotated objects. In: ICCV, pp. 8232–8241 (2019)
Google Scholar
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)
Google Scholar
Zhou, D., et al.: IoU loss for 2D/3D object detection. In: 3DV, pp. 85–94 (2019)
Google Scholar
Zhou, J., Lu, X., Tan, X., Shao, Z., Ding, S., Ma, L.: FVNet: 3D front-view proposal generation for real-time object detection from point clouds. arXiv preprint arXiv:1903.10750 (2019)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Department of Automation, Tsinghua University, Beijing, China
Yu Zheng, Danyang Zhang, Sinan Xie, Jiwen Lu & Jie Zhou
State Key Lab of Intelligent Technologies and Systems, Beijing, China
Yu Zheng, Jiwen Lu & Jie Zhou
Beijing National Research Center for Information Science and Technology, Beijing, China
Yu Zheng, Jiwen Lu & Jie Zhou
Tsinghua Shenzhen International Graduate School, Tsinghua University, Beijing, China
Jie Zhou

Authors

Yu Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Danyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sinan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jiwen Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiwen Lu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1182 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, Y., Zhang, D., Xie, S., Lu, J., Zhou, J. (2020). Rotation-Robust Intersection over Union for 3D Object Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12365. Springer, Cham. https://doi.org/10.1007/978-3-030-58565-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-58565-5_28
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58564-8
Online ISBN: 978-3-030-58565-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics