Skip to main content
Log in

Car depth estimation within a monocular image using a light CNN

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Mobile intelligent systems that need to perceive the environment and move in it must measure its depth. Therefore, this issue is pervasive in intelligent devices, especially self-driving cars. Self-driving cars estimate the depth of the surrounding environment and objects using a variety of sensors. However, given the sensitivity of this task, there should be several backup depth estimation systems to minimize the possibility of error. So, it is helpful to design a system that can estimate the depth with higher accuracy and lower computational cost. This paper uses a method to estimate cars’ depth within a monocular image. For this purpose, first, a light CNN(MTCNN) detects the license plates of vehicles. Then an MLP neural network that learned the nonlinear perspective relation between license plate dimensions and its depth estimates the depth of the cars within the image according to the coordinates and dimensions of the license plate bounding box.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

All datasets used in this work are publicly available. The related Python codes supporting this study’s findings are available at https://github.com/AM-Tighkhorshid/monocular-car-depth-estimation.

References

  1. Dong X, Garratt MA, Anavatti SG (2022) Abbass HA towards real-time monocular depth estimation for robotics: a survey. IEEE Trans Intell Transp Syst 23(10):16940–16961. https://doi.org/10.1109/TITS.2022.3160741

    Article  Google Scholar 

  2. Khairul I, Bhuiyan A LiDAR Sensor for Autonomous Vehicle. Technical Report (2017)

  3. Lim BS, Keoh SL, Thing VLL Autonomous vehicle ultrasonic sensor vulnerability and impact assessment. In: IEEE World Forum on Internet of Things, WF-IoT 2018—Proceedings 2018-Janua, 231–236 (2018). https://doi.org/10.1109/WF-IoT.2018.8355132

  4. Roos F, Bechter J, Knill C, Schweizer B, Waldschmidt C (2019) Radar sensors for autonomous driving. IEEE Microw Mag 20(9):58–72. https://doi.org/10.1109/MMM.2019.2922120

    Article  Google Scholar 

  5. Scharstein D, Szeliski R, Zabih R (2001) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In: Proceedings—IEEE Workshop on Stereo and Multi-Baseline Vision, SMBV 2001(1):131–140. https://doi.org/10.1109/SMBV.2001.988771

  6. Hirschmüller H (2008) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328–341. https://doi.org/10.1109/TPAMI.2007.1166

    Article  Google Scholar 

  7. Eigen D, Puhrsch C (2014) Fergus R Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 3(1):2366–2374

    Google Scholar 

  8. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep Ordinal Regression Network for Monocular Depth Estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2002–2011. https://doi.org/10.1109/CVPR.2018.00214, arXiv:1806.02446

  9. Masoumian A, Rashwan HA, Cristiano J, Asif MS, Puig D (2022) Monocular depth estimation using deep learning: a review. Sensors 22(14):1–24. https://doi.org/10.3390/s22145353

    Article  Google Scholar 

  10. Mancini M, Costante G, Valigi P, Ciarfuglia TA, Delmerico J, Scaramuzza D (2017) Toward domain independence for learning-based monocular depth estimation. IEEE Robot Autom Lett 2(3):1778–1785. https://doi.org/10.1109/LRA.2017.2657002

    Article  Google Scholar 

  11. Wang L, Li W, Van Gool L (2018) Appearance-and-relation networks for video classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1430–1439. https://doi.org/10.1109/CVPR.2018.00155, arXiv:1711.09125

  12. Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation, 3917–3925

  13. Chen W, Fu Z, Yang D, Deng J (2016) Single-image depth perception in the wild. CoRR arxiv:1604.03901

  14. Li B, Shen C, Dai Y, van den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1119–1127. https://doi.org/10.1109/CVPR.2015.7298715

  15. Ming Y, Meng X, Fan C, Yu H (2021) Deep learning for monocular depth estimation: a review. Neurocomputing 438:14–33. https://doi.org/10.1016/j.neucom.2020.12.089

    Article  Google Scholar 

  16. Zhao Y, Kong S, Shin D, Fowlkes C (2020) Domain decluttering: simplifying images to mitigate synthetic-real domain shift and improve depth estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3327–3337 https://doi.org/10.1109/CVPR42600.2020.00339, arXiv:2002.12114

  17. Zhao S, Fu H, Gong M, Tao D (2019) Geometry-aware symmetric domain adaptation for monocular depth estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, 9780–9790, https://doi.org/10.1109/CVPR.2019.01002, arXiv:1904.01870

  18. Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 55, pp. 4040–4048. IEEE. https://doi.org/10.1109/CVPR.2016.438. http://ieeexplore.ieee.org/document/7780807/

  19. Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue BT—Computer Vision—ECCV 2016. Computer Vision—ECCV 2016 9912 (Chapter 45), 740–756. https://doi.org/10.1007/978-3-319-46484-8

  20. Gwn K, Reddy K, Giering M, Bernal EA (2018) Generative adversarial networks for depth map estimation from RGB video. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2018-June, 1258–1266. https://doi.org/10.1109/CVPRW.2018.00163

  21. Wang T, Zhu X, Pang J, Lin D (2021) Probabilistic and geometric depth: Detecting objects in perspective. CoRR, arxiv:2107.14160

  22. Tousi SMA, Khorramdel J, Lotfi F, Nikoofard AH, Ardekani AN, Taghirad HD (2020) A new approach to estimate depth of cars using a monocular image. In: 8th Iranian Joint Congress on Fuzzy and Intelligent Systems, CFIS 2020, 45–50. https://doi.org/10.1109/CFIS49607.2020.9238702

  23. Tsai Y-S, Modales AV, Lin H-T (2022) A Convolutional Neural-Network-Based Training Model to Estimate Actual Distance of Persons in Continuous Images

  24. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2016-Decem, 779–788. https://doi.org/10.1109/CVPR.2016.91, arXiv:1506.02640

  25. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-Janua, 6517–6525. https://doi.org/10.1109/CVPR.2017.690, arXiv:1612.08242

  26. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Eccv. Lecture Notes in Computer Science, 9905. Springer, Cham, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  27. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 580–587 https://doi.org/10.1109/CVPR.2014.81, arXiv:1311.2524

  28. Girshick R Fast R-CNN (2015) arXiv:1504.08083v2

  29. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031. arXiv:1506.01497

    Article  Google Scholar 

  30. Akyol G, Kantarci A, Celik AE, Cihan Ak A (2020) Deep Learning Based, Real-Time Object Detection for Autonomous Driving. In: 2020 28th Signal Processing and Communications Applications Conference (SIU), pp 1–4. IEEE. https://doi.org/10.1109/SIU49456.2020.9302500. https://ieeexplore.ieee.org/document/9302500/

  31. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

  32. Wang W, Yang J, Chen M, Wang P (2019) A light CNN for end-to-end car license plates detection and recognition. IEEE Access 7:173875–173883. https://doi.org/10.1109/ACCESS.2019.2956357

    Article  Google Scholar 

  33. Cao M, Ramezani R (2022) Data generation using simulation technology to improve perception mechanism of autonomous vehicles. Conf-Cds, 1–16

  34. Scharstein D, Szeliski R, Zabih R (2001) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Proceedings—IEEE Workshop on Stereo and Multi-Baseline Vision, SMBV 2001(1):131–140. https://doi.org/10.1109/SMBV.2001.988771

  35. Saxena A, Jamie S, Ng AY (2007) Depth estimation using monocular and stereo cues. In: IJCAI International Joint Conference on Artificial Intelligence, 2197–2203

  36. Richard Hartley AZ (2003) Multiple View Geometry in Computer Vision vol. 13, pp. 104–116. https://www.cambridge.org/ir/academic/subjects/computer-science/computer-graphics-image-processing-and-robotics/multiple-view-geometry-computer-vision-2nd-edition?format=PB &isbn=9780521540513

  37. Szeliski R (2011) Computer Vision: Algorithms and Applications. Texts in Computer Science, vol. 8, pp. 7–11. Springer, London (2011). https://doi.org/10.1007/978-1-84882-935-0. http://media-publisher.eu/wp-content/uploads/2023/01/2-5-2022.pdf, https://link.springer.com/10.1007/978-1-84882-935-0

  38. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-January, 6602–6611. https://doi.org/10.1109/CVPR.2017.699, arXiv:1609.03677

  39. Lindeberg T (2012) Scale invariant feature transform. Scholarpedia 7(5):10491. https://doi.org/10.4249/scholarpedia.10491

    Article  Google Scholar 

  40. Koenderink JJ, Van Doorn AJ (1987) Biological cybernetics facts on optic flow. Biol Cybern 56:247–254

    Article  MATH  Google Scholar 

  41. Karsch K, Liu C (2014) Kang SB depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell 36(11):2144–2158. https://doi.org/10.1109/TPAMI.2014.2316835

    Article  Google Scholar 

  42. Narasimhan SG, Nayar SK (2002) Vision and the atmosphere. Int J Comput Vis 48(3):233–254. https://doi.org/10.1023/A:1016328200723

    Article  MATH  Google Scholar 

  43. Cheng X, Wang P, Yang R (2018) Depth estimation via affinity learned with convolutional spatial propagation network. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11220 LNCS, 108–125. https://doi.org/10.1007/978-3-030-01270-0_7, arXiv:1808.00150

  44. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Proceedings—2016 4th International Conference on 3D Vision, 3DV 2016, 239–248. https://doi.org/10.1109/3DV.2016.32, arXiv:1606.00373

  45. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Lecture Notes in Computer Science. Springer, Cham, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  46. Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 07-12-June-2015, 5162–5170. https://doi.org/10.1109/CVPR.2015.7299152, arXiv:1411.6387

  47. Liu P, Zhang Z, Meng Z, Gao N (2021) Monocular depth estimation with joint attention feature distillation and wavelet-based loss function. Sensors (Switzerland) 21(1):1–21. https://doi.org/10.3390/s21010054

    Article  Google Scholar 

  48. Wang Z, Yang S, Shi M, Qin K (2022) FDA-SSD: fast depth-assisted single-shot multibox detector for 3D tracking based on monocular vision. Appl Sci (Switzerland). https://doi.org/10.3390/app12031164

    Article  Google Scholar 

  49. Xu Z, Yang W, Meng A, Lu N, Huang H (2018) Towards end-to-end license plate detection and recognition: a large dataset and baseline. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 255–271

  50. Navab N, Hornegger J, Wells WM, Frangi AF (2015) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference Munich, Germany, October 5-9, 2015 proceedings, part III. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9351(Cvd), 12–20. https://doi.org/10.1007/978-3-319-24574-4

  51. Zhao H, Gallo O, Frosio I, Kautz J (2015) Loss Functions for Neural Networks for Image Processing, 1–11. arXiv:1511.08861

  52. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The KITTI dataset. Int J Robot Res(October), 1–6

  53. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7576 LNCS(PART 5), 746–760 (2012). https://doi.org/10.1007/978-3-642-33715-4_54

  54. Vasiljevic I, Kolkin N, Zhang S, Luo R, Wang H, Dai FZ, Daniele AF, Mostajabi M, Basart S, Walter MR, Shakhnarovich G (2019) DIODE: a dense indoor and outdoor DEpth Dataset. CoRR abs/1908.0

Download references

Funding

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Author information

Authors and Affiliations

Authors

Contributions

AT contributed to conceptualization, software, and writing—original draft. SMAT contributed to conceptualization, validation, writing—review and editing. AN contributed to conceptualization, validation, writing—review and editing, and supervision.

Corresponding author

Correspondence to Amirhossein Nikoofard.

Ethics declarations

Ethical approval

Ethical approval declaration is not applicable to this work.

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tighkhorshid, A., Tousi, S.M.A. & Nikoofard, A. Car depth estimation within a monocular image using a light CNN. J Supercomput 79, 17944–17961 (2023). https://doi.org/10.1007/s11227-023-05359-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05359-0

Keywords

Navigation