Skip to main content

Semi-automated Generation of Accurate Ground-Truth for 3D Object Detection

  • Conference paper
  • First Online:
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022)

Abstract

Visual algorithms for traffic surveillance systems typically locate and observe traffic movement by representing all traffic with 2D boxes. These 2D bounding boxes around vehicles are insufficient to generate accurate real-world locations. However, 3D annotation datasets are not available for training and evaluation of detection for traffic surveillance. Therefore, a new dataset for training the 3D detector is required. We propose and validate seven different annotation configurations for automated generation of 3D box annotations using only camera calibration, scene information (static vanishing points) and existing 2D annotations. The proposed novel Simple Box method does not require segmentation of vehicles and provides a more simple 3D box construction, which assumes a fixed predefined vehicle width and height. The existing KM3D CNN-based 3D detection model is adopted for traffic surveillance, which directly estimates 3D boxes around vehicles in the camera image, by training the detector on the newly generated dataset. The KM3D detector trained with the Simple Box configuration provides the best 3D object detection results, resulting in 51.9% AP3D on this data. The 3D object detector can estimate an accurate 3D box up to a distance of 125 m from the camera, with a median middle point mean error of only 0.5–1.0 m.

M. H. Zwemer and D. Scholte—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ansari, J.A., Sharma, S., Majumdar, A., Jatavallabhula, K.M., Krishna, K.M.: The earth ain’t flat: monocular reconstruction of vehicles on steep and graded roads from a moving camera. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8404–8410 (2018). https://api.semanticscholar.org/CorpusID:3761728

  2. Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9286–9295. IEEE Computer Society, Los Alamitos (2019). https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00938

  3. Brouwers, G.M.Y.E., Zwemer, M.H., Wijnhoven, R.G.J., de With, P.H.N.: Automatic calibration of stationary surveillance cameras in the wild. In: Proceedings of the IEEE CVPR (2016). https://doi.org/10.1007/978-3-319-48881-3_52

  4. Cai, Y., Li, B., Jiao, Z., Li, H., Zeng, X., Wang, X.: Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation. In: AAAI (2020). https://arxiv.org/abs/2002.01619

  5. Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1827–1836. IEEE Computer Society, Los Alamitos (2017). https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.198

  6. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE CVPR (2016). https://doi.org/10.1109/CVPR.2016.236

  7. Choe, J., Joo, K., Rameau, F., Shim, G., Kweon, I.S.: Segment2Regress: monocular 3D vehicle localization in two stages. In: Robotics: Science and Systems (2019). https://doi.org/10.15607/RSS.2019.XV.016

  8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005). https://doi.org/10.1109/CVPR.2005.177

  9. Ding, M., Huo, Y., Yi, H., Wang, Z., Shi, J., Lu, Z., Luo, P.: Learning depth-guided convolutions for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on CVPR (2020). https://arxiv.org/abs/1912.04799

  10. Dubská, M., Herout, A., Sochor, J.: Automatic camera calibration for traffic understanding. In: BMVC, vol. 4, 6, p. 8 (2014). https://doi.org/10.5244/C.28.42

  11. Fang, J., Zhou, L., Liu, G.: 3D bounding box estimation for autonomous vehicles by cascaded geometric constraints and depurated 2D detections using 3D results (2019). https://arxiv.org/abs/1909.01867

  12. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45103-X_50

    Chapter  Google Scholar 

  13. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on CVPR (2012). https://doi.org/10.1109/CVPR.2012.6248074

  14. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://arxiv.org/abs/1504.08083

  15. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE ICCV (2017). https://arxiv.org/abs/1703.06870

  16. He, T., Soatto, S.: Mono3D++: monocular 3D vehicle detection with two-scale 3D hypotheses and task priors. CoRR abs/1901.03446 (2019). https://arxiv.org/abs/1901.03446

  17. Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE ICCV, pp. 1521–1529 (2017). https://arxiv.org/abs/1711.10006

  18. Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3559–3568 (2018). https://doi.org/10.1109/CVPR.2018.00375

  19. Li, B., Ouyang, W., Sheng, L., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1019–1028. IEEE Computer Society, Los Alamitos (2019). https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00111

  20. Li, P.: Monocular 3D detection with geometric constraints embedding and semi-supervised training (2020). https://arxiv.org/abs/2009.00764

  21. Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving. arXiv preprint arXiv:2001.03343 (2020)

  22. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. CoRR arXiv:abs/1405.0312 (2014). https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14

  23. Liu, Z., Wu, Z., Tóth, R.: SMOKE: single-stage monocular 3D object detection via keypoint estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4289–4298 (2020). https://doi.org/10.1109/CVPRW50498.2020.00506

  24. Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., Fan, X.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: Proceedings of the IEEE ICCV (2019). https://arxiv.org/abs/1903.11444

  25. Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640. IEEE Computer Society, Los Alamitos (2017). ISSN:1063-6919. https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.597

  26. Nilsson, M., Ardö, H.: In search of a car - utilizing a 3D model with context for object detection. In: Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014), pp. 419–424. INSTICC, SciTePress (2014). https://doi.org/10.5220/0004685304190424

  27. Qin, Z., Wang, J., Lu, Y.: Monogrnet: a geometric reasoning network for monocular 3D object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019). https://arxiv.org/abs/1811.10247

  28. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)

  29. Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3D object detection. arXiv:1811.08188 (2018)

  30. Sochor, J., Juránek, R., Herout, A.: Traffic surveillance camera calibration by 3D model bounding box alignment for accurate vehicle speed measurement. Comput. Vis. Image Underst. 161, 87–98 (2017). https://arxiv.org/abs/1702.06451

    Article  Google Scholar 

  31. Sochor, J., Špaňhel, J., Herout, A.: Boxcars: Improving fine-grained recognition of vehicles using 3-D bounding boxes in traffic surveillance. IEEE Trans. Intell. Transp. Syst. 20(1), 97–108 (2018). https://arxiv.org/abs/1703.00686

    Article  Google Scholar 

  32. Srivastava, S., Jurie, F., Sharma, G.: Learning 2D to 3D lifting for object detection in 3D for autonomous vehicles. arXiv preprint arXiv:1904.08494 (2019)

  33. Sullivan, G.D., Baker, K.D., Worrall, A.D., Attwood, C., Remagnino, P.: Model-based vehicle detection and classification using orthographic approximations. Image Vis. Comput. 15(8), 649–654 (1997). https://doi.org/10.1016/S0262-8856(97)00009-7

    Article  Google Scholar 

  34. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

  35. Wang, X., Yin, W., Kong, T., Jiang, Y., Li, L., Shen, C.: Task-aware monocular depth estimation for 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020). https://arxiv.org/abs/1909.07701

  36. Wang, Y., Chao, W., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. CoRR abs/1812.07179 (2018). https://arxiv.org/abs/1812.07179

  37. Weber, M., Fürst, M., Zöllner, J.M.: Direct 3D detection of vehicles in monocular images with a CNN based 3D decoder. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 417–423 (2019). https://doi.org/10.1109/IVS.2019.8814198

  38. Wijnhoven, R.G.J., de With, P.H.N.: Unsupervised sub-categorization for object detection: finding cars from a driving vehicle. In: 2011 IEEE ICCV Workshops, pp. 2077–2083. IEEE (2011). https://doi.org/10.1109/ICCVW.2011.6130504

  39. You, Y., et al.: Pseudo-lidar++: accurate depth for 3D object detection in autonomous driving. CoRR abs/1906.06310 (2019). https://arxiv.org/abs/1906.06310

  40. Zwemer, M.H., Scholte, D., Wijnhoven, R.G.J., de With, P.H.N.: 3D detection of vehicles from 2D images in traffic surveillance. In: Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, pp. 97–106. INSTICC, SciTePress (2022). https://doi.org/10.5220/0010783600003124

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. H. Zwemer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zwemer, M.H., Scholte, D., de With, P.H.N. (2023). Semi-automated Generation of Accurate Ground-Truth for 3D Object Detection. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2022. Communications in Computer and Information Science, vol 1815. Springer, Cham. https://doi.org/10.1007/978-3-031-45725-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45725-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45724-1

  • Online ISBN: 978-3-031-45725-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics