Can Existing 3D Monocular Object Detection Methods Work in Roadside Contexts? A Reproducibility Study

Barra, Silvio; Marras, Mirko; Mohamed, Sondos; Podda, Alessandro Sebastian; Saia, Roberto

doi:10.1007/978-3-031-47546-7_22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14318))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

485 Accesses

Abstract

Detecting 3D objects in images from urban monocular cameras is essential to enable intelligent monitoring applications for local municipalities decision-support systems. However, existing detection methods in this domain are mainly focused on autonomous driving and limited to frontal views from sensors mounted on the vehicle. In contrast, to monitor urban areas, local municipalities rely on streams collected from fixed cameras, especially in intersections and particularly dangerous areas. Such streams represent a rich source of data for applications focused on traffic patterns, road conditions, and potential hazards. In this paper, given the lack of availability of large-scale datasets of images from roadside cameras, and the time-consuming process of generating real labelled data, we first proposed a synthetic dataset using the CARLA simulator, which makes dataset creation efficient yet acceptable. The dataset consists of 7,481 development images and 7,518 test images. Then, we reproduced state-of-the-art models for monocular 3D object detection proven to work well in autonomous driving (e.g., M3DRPN, Monodle, SMOKE, and Kinematic) and tested them on the newly generated dataset. Our results show that our dataset can serve as a reference for future experiments and that state-of-the-art models from the autonomous driving domain do not always generalize well to monocular roadside camera images. Source code and data are available at https://bit.ly/monocular-3d-odt.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atzori, A., Barra, S., Carta, S., Fenu, G., Podda, A.S.: HEIMDALL: an AI-based infrastructure for traffic monitoring and anomalies detection. In: 19th IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2021, Kassel, Germany, 22–26 March 2021, pp. 154–159. IEEE (2021). https://doi.org/10.1109/PerComWorkshops51409.2021.9431052
Atzori, A., Fenu, G., Marras, M.: Explaining bias in deep face recognition via image characteristics. In: IEEE International Joint Conference on Biometrics, IJCB 2022, Abu Dhabi, United Arab Emirates, 10–13 October 2022, pp. 1–10. IEEE (2022). https://doi.org/10.1109/IJCB54206.2022.10007937
Atzori, A., Fenu, G., Marras, M.: Demographic bias in low-resolution deep face recognition in the wild. IEEE J. Sel. Top. Signal Process. 17(3), 599–611 (2023). https://doi.org/10.1109/JSTSP.2023.3249485
Article Google Scholar
Balia, R., Barra, S., Carta, S., Fenu, G., Podda, A.S., Sansoni, N.: A deep learning solution for integrated traffic control through automatic license plate recognition. In: Gervasi, O., et al. (eds.) ICCSA 2021, Part III. LNCS, vol. 12951, pp. 211–226. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86970-0_16
Chapter Google Scholar
Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 9286–9295. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00938
Brazil, G., Pons-Moll, G., Liu, X., Schiele, B.: Kinematic 3D object detection in monocular video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXIII. LNCS, vol. 12368, pp. 135–152. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_9
Chapter Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. CoRR abs/1903.11027 (2019). https://arxiv.org/abs/1903.11027
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2Det: towards high quality object detection and instance segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 11482–11491. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.01150
Carrillo, J., Waslander, S.L.: UrbanNet: leveraging urban maps for long range 3D object detection. In: 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, Indianapolis, IN, USA, 19–22 September 2021, pp. 3799–3806. IEEE (2021). https://doi.org/10.1109/ITSC48978.2021.9564840
Chang, M., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 8748–8757. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00895
Chen, X., et al.: 3D object proposals for accurate object class detection. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 424–432 (2015). https://proceedings.neurips.cc/paper/2015/hash/6da37dd3139aa4d9aa55b8d237ec5d4a-Abstract.html
Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 9774–9783. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00987
Deng, Y., et al.: BAAI-VANJEE roadside dataset: towards the connected automated vehicle highway technologies in challenging environments of china. CoRR abs/2105.14370 (2021). https://arxiv.org/abs/2105.14370
Dosovitskiy, A., Ros, G., Codevilla, F., López, A.M., Koltun, V.: CARLA: an open urban driving simulator. In: 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, 13–15 November 2017, Proceedings. Proceedings of Machine Learning Research, vol. 78, pp. 1–16. PMLR (2017). https://proceedings.mlr.press/v78/dosovitskiy17a.html
Fenu, G., Marras, M.: Controlling user access to cloud-connected mobile applications by means of biometrics. IEEE Cloud Comput. 5(4), 47–57 (2018). https://doi.org/10.1109/MCC.2018.043221014
Article Google Scholar
Fenu, G., Marras, M., Medda, G., Meloni, G.: Causal reasoning for algorithmic fairness in voice controlled cyber-physical systems. Pattern Recognit. Lett. 168, 131–137 (2023). https://doi.org/10.1016/j.patrec.2023.03.014
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012, pp. 3354–3361. IEEE Computer Society (2012). https://doi.org/10.1109/CVPR.2012.6248074
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017). https://arxiv.org/abs/1703.06870
Huang, X., et al.: The ApolloScape dataset for autonomous driving. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 954–960. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPRW.2018.00141
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 12697–12705. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.01298
Law, H., Deng, J.: CornerNet: Detecting Objects as Paired Keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part XIV. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45
Chapter Google Scholar
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), 27 October–2 November 2019, pp. 6053–6062. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00615
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2999–3007. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.324
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Z., Wu, Z., Tóth, R.: SMOKE: single-stage monocular 3D object detection via keypoint estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, 14–19 June 2020, pp. 4289–4298. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPRW50498.2020.00506
Ma, X., et al.: Delving into localization errors for monocular 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 4721–4730. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00469
Mao, J., et al.: One million scenes for autonomous driving: ONCE dataset. In: Vanschoren, J., Yeung, S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual (2021). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/67c6a1e7ce56d3d6fa748ab6d9af3fd7-Abstract-round1.html
Patil, A., Malla, S., Gang, H., Chen, Y.: The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes. In: International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada, 20–24 May 2019, pp. 9552–9557. IEEE (2019). https://doi.org/10.1109/ICRA.2019.8793925
Pham, Q.H., et al.: A*3D dataset: towards autonomous driving in challenging environments. In: Proceedings of the International Conference in Robotics and Automation (ICRA) (2020)
Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 91–99 (2015). https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 10526–10535. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.01054
Sochor, J., Špaňhel, J., Herout, A.: BoxCars: improving fine-grained recognition of vehicles using 3-D bounding boxes in traffic surveillance. IEEE Trans. Intell. Transp. Syst. PP(99), 1–12 (2018). https://doi.org/10.1109/TITS.2018.2799228
Strigel, E., Meissner, D.A., Seeliger, F., Wilking, B., Dietmayer, K.: The Ko-PER intersection laserscanner and video dataset. In: 17th International IEEE Conference on Intelligent Transportation Systems, ITSC 2014, Qingdao, China, 8–11 October 2014, pp. 1900–1901. IEEE (2014). https://doi.org/10.1109/ITSC.2014.6957976
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. CoRR abs/1912.04838 (2019). https://arxiv.org/abs/1912.04838
Wang, T., Zhu, X., Pang, J., Lin, D.: FCOS3D: fully convolutional one-stage monocular 3D object detection. In: IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021, Montreal, BC, Canada, 11–17 October 2021, pp. 913–922. IEEE (2021). https://doi.org/10.1109/ICCVW54120.2021.00107
Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 2345–2353. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00249
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018). https://doi.org/10.3390/s18103337
Article Google Scholar
Ye, X., et al.: Rope3D: the roadside perception dataset for autonomous driving and monocular 3D object detection task. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 21309–21318. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.02065
Yu, F., Wang, D., Darrell, T.: Deep layer aggregation. CoRR abs/1707.06484 (2017). https://arxiv.org/abs/1707.06484
Yu, H., et al.: DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection. CoRR abs/2204.05575 (2022). https://doi.org/10.48550/arXiv.2204.05575
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. CoRR abs/1904.07850 (2019). https://arxiv.org/abs/1904.07850
Zou, Z., et al.: Real-time full-stack traffic scene perception for autonomous driving with roadside cameras. In: 2022 International Conference on Robotics and Automation, ICRA 2022, Philadelphia, PA, USA, 23–27 May 2022, pp. 890–896. IEEE (2022). https://doi.org/10.1109/ICRA46639.2022.9812137

Download references

Acknowledgements

We acknowledge financial support under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.5 - Call for tender No.3277 published on December 30, 2021 by the Italian Ministry of University and Research (MUR) funded by the European Union - NextGenerationEU. Project Code ECS0000038 - Project Title e.INS Ecosystem of Innovation for Next Generation Sardinia - CUP F53C22000430001- Grant Assignment Decree No. 1056 adopted on June 23, 2022 by the MUR.

Author information

Authors and Affiliations

University of Cagliari, Cagliari, Italy
Mirko Marras, Sondos Mohamed, Alessandro Sebastian Podda & Roberto Saia
University of Naples “Federico II”, Naples, Italy
Silvio Barra

Authors

Silvio Barra
View author publications
You can also search for this author in PubMed Google Scholar
Mirko Marras
View author publications
You can also search for this author in PubMed Google Scholar
Sondos Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sebastian Podda
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Saia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirko Marras .

Editor information

Editors and Affiliations

University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Sapienza University of Rome, Rome, Italy
Domenico Lembo
Roma Tre University, Rome, Italy
Carla Limongelli
National Research Council, Rome, Italy
Andrea Orlandini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barra, S., Marras, M., Mohamed, S., Podda, A.S., Saia, R. (2023). Can Existing 3D Monocular Object Detection Methods Work in Roadside Contexts? A Reproducibility Study. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-47546-7_22
Published: 02 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Can Existing 3D Monocular Object Detection Methods Work in Roadside Contexts? A Reproducibility Study