Skip to main content
Log in

Lightweight monocular depth estimation network for robotics using intercept block GhostNet

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

This study introduces a novel deep learning approach for monocular depth estimation that exhibits excellent performance while utilizing significantly fewer computational and memory resources. Our proposed method, called IBG-Mono, involves the use of the Intercept Block, in conjunction with cost-effective GhostNet components, to efficiently extract relevant information from the input image. The Intercept Block incorporates a mechanism that facilitates the retention and integration of low-resolution feature maps derived from the input image with the downsampled feature maps at different resolutions. On the other hand, the GhostNet components facilitate the efficient processing of coarse-grained feature maps by the Intercept Block, while also enabling their seamless integration with downsampled feature maps. Furthermore, a progressive downsampling is employed to maintain spatial alignment between feature maps of varying resolutions and those that have undergone downsampling. Our study involves conducting extensive experiments using NYU Depth V2 and KITTI dataset and presenting comparative results to the state-of-the-art lightweight monocular depth estimation with only 0.63M parameters and 0.31 GMACs. The results demonstrate the superiority of our system in comparison to other existing lightweight monocular depth estimation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availibility

No datasets were generated or analysed during the current study.

References

  1. Mur-Artal, R., Tardós, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017). https://doi.org/10.1109/TRO.2017.2705103

    Article  Google Scholar 

  2. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision (2011), pp. 2320–2327 (2011). https://doi.org/10.1109/ICCV.2011.6126513

  3. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 3061–3070 (2015). https://doi.org/10.1109/CVPR.2015.7298925

  4. Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE Computer Society, Los Alamitos, CA, USA, 2019), pp. 5683–5692 (2019). https://doi.org/10.1109/ICCV.2019.00578

  5. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, pp. 55–71. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4

    Chapter  Google Scholar 

  6. Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: Fastdepth: Fast monocular depth estimation on embedded systems. In: 2019 International Conference on Robotics and Automation (ICRA) (2019), pp. 6101–6108 (2019). https://doi.org/10.1109/ICRA.2019.8794182

  7. Tu, X., Xu, C., Liu, S., Li, R., Xie, G., Huang, J., Yang, L.T.: Efficient monocular depth estimation for edge devices in internet of things. IEEE Trans. Industr. Inf. 17(4), 2821–2832 (2021). https://doi.org/10.1109/TII.2020.3020583

    Article  MATH  Google Scholar 

  8. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Cowan, M., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., Krishnamurthy, A.: Tvm: An automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (USENIX Association, USA, 2018), OSDI’18, p. 579–594 (2018)

  9. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 1577–1586 (2020). https://doi.org/10.1109/CVPR42600.2020.00165

  10. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015), pp. 2650–2658 (2015). https://doi.org/10.1109/ICCV.2015.304

  11. Chen, X., Chen, X., Zha, Z.J.: Structure-aware residual pyramid network for monocular depth estimation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (AAAI Press, 2019), IJCAI’19, p. 694–700 (2019)

  12. Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE Computer Society, Los Alamitos, CA, USA, 2019), pp. 1043–1051 (2019). https://doi.org/10.1109/WACV.2019.00116

  13. Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 283–291 (2018). https://doi.org/10.1109/CVPR.2018.00037

  14. Facil, J.M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.: Cam-convs: Camera-aware multi-scale convolutions for single-view depth. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 11818–11827 (2019). https://doi.org/10.1109/CVPR.2019.01210

  15. Gonzalez, J.L., Kim, M.: Forget about the lidar: Self-supervised depth estimators with med probability volumes. ArXiv arXiv:2008.03633 (2020)

  16. Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., Heikkilä, J.: Guiding monocular depth estimation using depth-attention volume. In: Computer Vision – ECCV 2020, ed. by A. Vedaldi, H. Bischof, T. Brox, J.M. Frahm (Springer International Publishing, Cham, 2020), pp. 581–597 (2020)

  17. Ramamonjisoa, M., Lepetit, V.: Sharpnet: Fast and accurate recovery of occluding contours in monocular depth estimation. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) (IEEE Computer Society, Los Alamitos, CA, USA, 2019), pp. 2109–2118 (2019). https://doi.org/10.1109/ICCVW.2019.00266

  18. Zhang, Z., Chan, R.K.Y., Wong, K.K.Y.: Glocalfuse-depth: Fusing transformers and cnns for all-day self-supervised monocular depth estimation (2023)

  19. Yang, G., Tang, H., Ding, M., Sebe, N., Ricci, E.: Transformer-based attention networks for continuous pixel-wise prediction. in ICCV (2021) (2021)

  20. Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 12159–12168 (2021). https://doi.org/10.1109/ICCV48922.2021.01196

  21. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017)

  22. Wang, Y., Zhu, H.: Monocular depth estimation: lightweight convolutional and matrix capsule feature-fusion network. Sensors (2022). https://doi.org/10.3390/s22176344

    Article  MATH  Google Scholar 

  23. Liu, J., Li, Q., Cao, R., Tang, W., Qiu, G.: Mininet: an extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS J. Photogramm. Remote. Sens. 166, 255–267 (2020). https://doi.org/10.1016/j.isprsjprs.2020.06.004

    Article  Google Scholar 

  24. Liu, S., Yang, L.T., Tu, X., Li, R., Xu, C.: Lightweight monocular depth estimation on edge devices. IEEE Internet Things J. 9(17), 16168–16180 (2022). https://doi.org/10.1109/JIOT.2022.3151374

    Article  Google Scholar 

  25. Kwon, W., Yu, G.I., Jeong, E., Chun, B.G.: Nimble: lightweight and parallel gpu task scheduling for deep learning. Adv. Neural. Inf. Process. Syst. 2020, 8343–8354 (2020)

    MATH  Google Scholar 

  26. Elkerdawy, S., Zhang, H., Ray, N.: Lightweight monocular depth estimation model by joint end-to-end filter pruning. In: 2019 IEEE International Conference on Image Processing (ICIP) (2019), pp. 4290–4294 (2019). https://doi.org/10.1109/ICIP.2019.8803544

  27. Lyu, X., Liu, L., Wang, M., Kong, X., Liu, L., Liu, Y., Chen, X., Yuan, Y.: Hr-depth: high resolution self-supervised monocular depth estimation. Proc. AAAI Conf. Artif. Intell. 35(3), 2294–2301 (2021). https://doi.org/10.1609/aaai.v35i3.16329

    Article  MATH  Google Scholar 

  28. Papa, L., Proietti Mattia, G., Russo, P., Amerini, I., Beraldi, R.: Lightweight and energy-aware monocular depth estimation models for iot embedded devices: challenges and performances in terrestrial and underwater scenarios. Sensors (2023). https://doi.org/10.3390/s23042223

    Article  Google Scholar 

  29. Khan, F., Shariff, W., Farooq, M.A., Basak, S., Corcoran, P.: A robust light-weight fused-feature encoder-decoder model for monocular facial depth estimation from single images trained on synthetic data. IEEE Access 11, 41480–41491 (2023). https://doi.org/10.1109/ACCESS.2023.3267970

    Article  Google Scholar 

  30. Papa, L., Russo, P., Amerini, I.: Meter: a mobile vision transformer architecture for monocular depth estimation. IEEE Trans. Circuits Syst. Video Technol. 33(10), 5882–5893 (2023). https://doi.org/10.1109/TCSVT.2023.3260310

    Article  MATH  Google Scholar 

  31. Wang, Y., Zhu, H.: Monocular depth estimation: lightweight convolutional and matrix capsule feature-fusion network. Sensors (2022). https://doi.org/10.3390/s22176344.1424-8220/22/17/6344

    Article  MATH  Google Scholar 

  32. Liu, S., Yang, L.T., Tu, X., Li, R., Xu, C.: Lightweight monocular depth estimation on edge devices. IEEE Internet Things J. 9(17), 16168–16180 (2022). https://doi.org/10.1109/JIOT.2022.3151374

    Article  Google Scholar 

  33. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37 (PMLR, 2015), pp. 448–456 (2015)

  34. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195

  35. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  36. Lin, M., Chen, Q., Yan, S.: Network in network (2013)

  37. Howard, A.G., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 1314–1324 (2019)

  38. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Computer Vision – ECCV 2012 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), pp. 746–760 (2012)

  39. Rudolph, M., Dawoud, Y., Güldenring, R., Nalpantidis, L., Belagiannis, V.: Lightweight monocular depth estimation through guided decoding. In: 2022 International Conference on Robotics and Automation (ICRA) (2022), pp. 2344–2350 (2022). https://doi.org/10.1109/ICRA46639.2022.9812220

  40. Alhashim, I., Wonka, P.L High quality monocular depth estimation via transfer learning. ArXiv arXiv:1812.11941 (2018)

  41. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)

  42. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013). https://doi.org/10.1177/0278364913491297

    Article  Google Scholar 

  43. Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: ACM SIGGRAPH 2004 Papers (Association for Computing Machinery, New York, NY, USA, 2004), SIGGRAPH ’04, p. 689–694 (2004). https://doi.org/10.1145/1186562.1015780

  44. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: NeurIPS (2019), pp. 8024–8035 (2019)

  45. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (JMLR.org, 2013), ICML’13, p. III–1139–III–1147 (2013)

  46. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861

    Article  MATH  Google Scholar 

  47. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Igi Ardiyanto developed and implemented the algorithms for the work. Resha Dwika Hefni Al-Fahsi implemented the algorithms on the mobile phone. Both authors draft the manuscript.

Corresponding author

Correspondence to Igi Ardiyanto.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ardiyanto, I., Al-Fahsi, R.D.H. Lightweight monocular depth estimation network for robotics using intercept block GhostNet. SIViP 19, 34 (2025). https://doi.org/10.1007/s11760-024-03720-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03720-1

Keywords

Navigation