Skip to main content
Log in

Structured deep learning based object-specific distance estimation from a monocular image

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Distance calculation is a critical link in the research fields of object trajectory prediction, automatic driving obstacle avoidance, and so on. However, the research on distance using deep learning methods has yet to attract wide attention. The accuracy of traditional distance estimation algorithms based on the optical principle and mathematical modeling is low in practical applications, mainly the curve or slope of the road surface. This paper addresses the challenging distance estimation problem by developing an end-to-end structured model to directly predict the distance for objects in a given image. Besides, the traditional mathematical modeling process is replaced by this learning-based method. To facilitate the research on this task, we construct the extended distance datasets by KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) and NYU(Nathan Silberman, Pushmeet Kohli, Derek Hoiem, Rob Fergus) Depth V2 distance datasets. Experimental results demonstrate that the structured learning model has higher accuracy than the traditional algorithm in different distance ranges and better performance for curves and ramps. Moreover, improving neural network performance will be the direction of improving the model in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  1. Stein GP, Mano O, Shashua A (2003) Vision-based acc with a single camera: bounds on range and range rate accuracy. In: IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), pp 120–125. IEEE

  2. McCarthy JA (2010) Internet sexual activity: a comparison between contact and non-contact child pornography offenders. J Sex Aggress 16(2):181–195

    Article  Google Scholar 

  3. Bieman LH (1989) Survey of design considerations for 3-d imaging systems. In: Svetkoff DJ (ed) Optics, illumination, and image sensing for machine vision III, vol 1005. SPIE, Cambridge, Massachusetts, pp 138–144

  4. Marr D, Poggio T (1979) A computational theory of human stereo vision. Proc R Soc Lond Ser B Biol Sci 204(1156):301–328

    Google Scholar 

  5. Rogers B, Graham M (1979) Motion parallax as an independent cue for depth perception. Perception 8(2):125–134

    Article  Google Scholar 

  6. Rajagopalan AN, Chaudhuri S (1997) Space-variant approaches to recovery of depth from defocused images. Comput Vis Image Underst 68(3):309–329

    Article  Google Scholar 

  7. Pentland AP (1987) A new sense for depth of field. IEEE Trans Pattern Anal Mach Intell 4:523–531

    Article  Google Scholar 

  8. Zhu J, Fang Y (2019) Learning object-specific distance from a monocular image. In: Proceedings of the IEEE/CVF International Conference on computer vision, pp 3839–3848

  9. Saxena A, Sun M, Ng AY (2007) Learning 3-d scene structure from a single still image. In: 2007 IEEE 11th International Conference on computer vision, pp 1–8. IEEE

  10. Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039

    Article  Google Scholar 

  11. Liu M, Salzmann M, He X (2014) Discrete-continuous depth estimation from a single image. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 716–723

  12. Rezaei M, Terauchi M, Klette R (2015) Robust vehicle detection and distance estimation under challenging lighting conditions. IEEE Trans Intell Transp Syst 16(5):2723–2743

    Article  Google Scholar 

  13. Tuohy S, O’Cualain D, Jones E, Glavin M (2010) IET Irish Signals and Systems Conference (ISSC 2010), Distance determination for an automobile environment using Inverse Perspective Mapping in OpenCV, 100–105. https://doi.org/10.1049/cp.2010.0495

  14. Gökçe F, Üçoluk G, Şahin E, Kalkan S (2015) Vision-based detection and distance estimation of micro unmanned aerial vehicles. Sensors 15(9):23805–23846

    Article  Google Scholar 

  15. Haseeb MA, Guan J, Ristic-Durrant, D, Gräser A (2018) Disnet: a novel method for distance estimation from monocular camera. In: 10th Planning, Perception and Navigation for Intelligent Vehicles (PPNIV18), IROS

  16. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on computer vision, pp 21–37. Springer

  17. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7263–7271

  18. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on computer vision, pp 1440–1448

  19. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 580–587

  20. Ren S, He K, Girshick R, Sun J (2015) IEEE Transactions on Pattern Analysis and Machine Intelligence, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 39(6)1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

  21. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. arXiv:1406.2283. https://doi.org/10.48550/arXiv.1406.2283

  22. Kuznietsov Y, Stuckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer vision and pattern recognition, pp 6647–6655

  23. Yang N, Wang R, Stuckler J, Cremers D (2018) Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry. In: Proceedings of the European Conference on computer vision (ECCV), pp 817–833

  24. Tulyakov S, Ivanov A, Fleuret F (2016) Semi-supervised learning of deep metrics for stereo reconstruction. arXiv preprint arXiv:1612.00979

  25. Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5162–5170

  26. Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  27. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. arXiv:1406.2984

  28. Li Y et al (2017) Structured deep learning based depth estimation from a monocular image. Jiqiren/Robot 39(6)812–819

  29. Movassagh AA, Alzubi JA, Gheisari M, Rahimi M, Mohan S, Abbasi AA, Nabipour N (2021) Artificial neural networks training algorithm integrating invasive weed optimization with differential evolutionary model. J Ambient Intell Humaniz Comput, 1–9

  30. Alzubi JA, Jain R, Nagrath P, Satapathy S, Taneja S, Gupta P (2021) Deep image captioning using an ensemble of cnn and lstm based deep neural networks. J Intell Fuzzy Syst 40(4):5761–5769

    Article  Google Scholar 

  31. Banan A, Nasiri A, Taheri-Garavand A (2020) Deep learning-based appearance features extraction for automated carp species identification. Aquacult Eng 89:102053

    Article  Google Scholar 

  32. Afan HA, Ibrahem Ahmed Osman A, Essam Y, Ahmed AN, Huang YF, Kisi O, Sherif M, Sefelnasr A, Chau K-W, El-Shafie A (2021) Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques. Eng Appl Comput Fluid Mech 15(1):1420–1439

    Google Scholar 

  33. Fan Y, Xu K, Wu H, Zheng Y, Tao B (2020) Spatiotemporal modeling for nonlinear distributed thermal processes based on kl decomposition, mlp and lstm network. IEEE Access 8:25111–25121

    Article  Google Scholar 

  34. Chen W, Sharifrazi D, Liang G, Band SS, Chau KW, Mosavi A (2022) Accurate discharge coefficient prediction of streamlined weirs by coupling linear regression and deep convolutional gated recurrent unit. Eng Appl Comput Fluid Mech 16(1):965–976

    Google Scholar 

  35. Chen C, Zhang Q, Kashani MH, Jun C, Bateni SM, Band SS, Dash SS, Chau K-W (2022) Forecast of rainfall distribution based on fixed sliding window long short-term memory. Eng Appl Comput Fluid Mech 16(1):248–261

    Google Scholar 

  36. Wang W-C, Du Y-J, Chau K-W, Xu D-M, Liu C-J, Ma Q (2021) An ensemble hybrid forecasting model for annual runoff based on sample entropy, secondary decomposition, and long short-term memory neural network. Water Resour Manag 35(14):4695–4726

    Article  Google Scholar 

  37. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  38. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 3354–3361. IEEE

  39. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European Conference on computer vision, pp 746–760. Springer

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Lin.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Code availability

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Y., Lin, T., Chen, B. et al. Structured deep learning based object-specific distance estimation from a monocular image. Int. J. Mach. Learn. & Cyber. 14, 4151–4161 (2023). https://doi.org/10.1007/s13042-023-01887-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01887-6

Keywords

Navigation