Lightweight monocular depth estimation network for robotics using intercept block GhostNet

Ardiyanto, Igi; Al-Fahsi, Resha Dwika Hefni

doi:10.1007/s11760-024-03720-1

Lightweight monocular depth estimation network for robotics using intercept block GhostNet

Original Paper
Published: 02 December 2024

Volume 19, article number 34, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Igi Ardiyanto¹ &
Resha Dwika Hefni Al-Fahsi¹

104 Accesses
Explore all metrics

Abstract

This study introduces a novel deep learning approach for monocular depth estimation that exhibits excellent performance while utilizing significantly fewer computational and memory resources. Our proposed method, called IBG-Mono, involves the use of the Intercept Block, in conjunction with cost-effective GhostNet components, to efficiently extract relevant information from the input image. The Intercept Block incorporates a mechanism that facilitates the retention and integration of low-resolution feature maps derived from the input image with the downsampled feature maps at different resolutions. On the other hand, the GhostNet components facilitate the efficient processing of coarse-grained feature maps by the Intercept Block, while also enabling their seamless integration with downsampled feature maps. Furthermore, a progressive downsampling is employed to maintain spatial alignment between feature maps of varying resolutions and those that have undergone downsampling. Our study involves conducting extensive experiments using NYU Depth V2 and KITTI dataset and presenting comparative results to the state-of-the-art lightweight monocular depth estimation with only 0.63M parameters and 0.31 GMACs. The results demonstrate the superiority of our system in comparison to other existing lightweight monocular depth estimation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB image

Article 25 May 2021

Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference

Article Open access 10 August 2024

OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network

Article 28 January 2023

Data availibility

No datasets were generated or analysed during the current study.

References

Mur-Artal, R., Tardós, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017). https://doi.org/10.1109/TRO.2017.2705103
Article Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision (2011), pp. 2320–2327 (2011). https://doi.org/10.1109/ICCV.2011.6126513
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 3061–3070 (2015). https://doi.org/10.1109/CVPR.2015.7298925
Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE Computer Society, Los Alamitos, CA, USA, 2019), pp. 5683–5692 (2019). https://doi.org/10.1109/ICCV.2019.00578
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, pp. 55–71. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4
Chapter Google Scholar
Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: Fastdepth: Fast monocular depth estimation on embedded systems. In: 2019 International Conference on Robotics and Automation (ICRA) (2019), pp. 6101–6108 (2019). https://doi.org/10.1109/ICRA.2019.8794182
Tu, X., Xu, C., Liu, S., Li, R., Xie, G., Huang, J., Yang, L.T.: Efficient monocular depth estimation for edge devices in internet of things. IEEE Trans. Industr. Inf. 17(4), 2821–2832 (2021). https://doi.org/10.1109/TII.2020.3020583
Article MATH Google Scholar
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Cowan, M., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., Krishnamurthy, A.: Tvm: An automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (USENIX Association, USA, 2018), OSDI’18, p. 579–594 (2018)
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 1577–1586 (2020). https://doi.org/10.1109/CVPR42600.2020.00165
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015), pp. 2650–2658 (2015). https://doi.org/10.1109/ICCV.2015.304
Chen, X., Chen, X., Zha, Z.J.: Structure-aware residual pyramid network for monocular depth estimation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (AAAI Press, 2019), IJCAI’19, p. 694–700 (2019)
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE Computer Society, Los Alamitos, CA, USA, 2019), pp. 1043–1051 (2019). https://doi.org/10.1109/WACV.2019.00116
Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 283–291 (2018). https://doi.org/10.1109/CVPR.2018.00037
Facil, J.M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.: Cam-convs: Camera-aware multi-scale convolutions for single-view depth. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 11818–11827 (2019). https://doi.org/10.1109/CVPR.2019.01210
Gonzalez, J.L., Kim, M.: Forget about the lidar: Self-supervised depth estimators with med probability volumes. ArXiv arXiv:2008.03633 (2020)
Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., Heikkilä, J.: Guiding monocular depth estimation using depth-attention volume. In: Computer Vision – ECCV 2020, ed. by A. Vedaldi, H. Bischof, T. Brox, J.M. Frahm (Springer International Publishing, Cham, 2020), pp. 581–597 (2020)
Ramamonjisoa, M., Lepetit, V.: Sharpnet: Fast and accurate recovery of occluding contours in monocular depth estimation. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) (IEEE Computer Society, Los Alamitos, CA, USA, 2019), pp. 2109–2118 (2019). https://doi.org/10.1109/ICCVW.2019.00266
Zhang, Z., Chan, R.K.Y., Wong, K.K.Y.: Glocalfuse-depth: Fusing transformers and cnns for all-day self-supervised monocular depth estimation (2023)
Yang, G., Tang, H., Ding, M., Sebe, N., Ricci, E.: Transformer-based attention networks for continuous pixel-wise prediction. in ICCV (2021) (2021)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 12159–12168 (2021). https://doi.org/10.1109/ICCV48922.2021.01196
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017)
Wang, Y., Zhu, H.: Monocular depth estimation: lightweight convolutional and matrix capsule feature-fusion network. Sensors (2022). https://doi.org/10.3390/s22176344
Article MATH Google Scholar
Liu, J., Li, Q., Cao, R., Tang, W., Qiu, G.: Mininet: an extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS J. Photogramm. Remote. Sens. 166, 255–267 (2020). https://doi.org/10.1016/j.isprsjprs.2020.06.004
Article Google Scholar
Liu, S., Yang, L.T., Tu, X., Li, R., Xu, C.: Lightweight monocular depth estimation on edge devices. IEEE Internet Things J. 9(17), 16168–16180 (2022). https://doi.org/10.1109/JIOT.2022.3151374
Article Google Scholar
Kwon, W., Yu, G.I., Jeong, E., Chun, B.G.: Nimble: lightweight and parallel gpu task scheduling for deep learning. Adv. Neural. Inf. Process. Syst. 2020, 8343–8354 (2020)
MATH Google Scholar
Elkerdawy, S., Zhang, H., Ray, N.: Lightweight monocular depth estimation model by joint end-to-end filter pruning. In: 2019 IEEE International Conference on Image Processing (ICIP) (2019), pp. 4290–4294 (2019). https://doi.org/10.1109/ICIP.2019.8803544
Lyu, X., Liu, L., Wang, M., Kong, X., Liu, L., Liu, Y., Chen, X., Yuan, Y.: Hr-depth: high resolution self-supervised monocular depth estimation. Proc. AAAI Conf. Artif. Intell. 35(3), 2294–2301 (2021). https://doi.org/10.1609/aaai.v35i3.16329
Article MATH Google Scholar
Papa, L., Proietti Mattia, G., Russo, P., Amerini, I., Beraldi, R.: Lightweight and energy-aware monocular depth estimation models for iot embedded devices: challenges and performances in terrestrial and underwater scenarios. Sensors (2023). https://doi.org/10.3390/s23042223
Article Google Scholar
Khan, F., Shariff, W., Farooq, M.A., Basak, S., Corcoran, P.: A robust light-weight fused-feature encoder-decoder model for monocular facial depth estimation from single images trained on synthetic data. IEEE Access 11, 41480–41491 (2023). https://doi.org/10.1109/ACCESS.2023.3267970
Article Google Scholar
Papa, L., Russo, P., Amerini, I.: Meter: a mobile vision transformer architecture for monocular depth estimation. IEEE Trans. Circuits Syst. Video Technol. 33(10), 5882–5893 (2023). https://doi.org/10.1109/TCSVT.2023.3260310
Article MATH Google Scholar
Wang, Y., Zhu, H.: Monocular depth estimation: lightweight convolutional and matrix capsule feature-fusion network. Sensors (2022). https://doi.org/10.3390/s22176344.1424-8220/22/17/6344
Article MATH Google Scholar
Liu, S., Yang, L.T., Tu, X., Li, R., Xu, C.: Lightweight monocular depth estimation on edge devices. IEEE Internet Things J. 9(17), 16168–16180 (2022). https://doi.org/10.1109/JIOT.2022.3151374
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37 (PMLR, 2015), pp. 448–456 (2015)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Lin, M., Chen, Q., Yan, S.: Network in network (2013)
Howard, A.G., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 1314–1324 (2019)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: Computer Vision – ECCV 2012 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), pp. 746–760 (2012)
Rudolph, M., Dawoud, Y., Güldenring, R., Nalpantidis, L., Belagiannis, V.: Lightweight monocular depth estimation through guided decoding. In: 2022 International Conference on Robotics and Automation (ICRA) (2022), pp. 2344–2350 (2022). https://doi.org/10.1109/ICRA46639.2022.9812220
Alhashim, I., Wonka, P.L High quality monocular depth estimation via transfer learning. ArXiv arXiv:1812.11941 (2018)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013). https://doi.org/10.1177/0278364913491297
Article Google Scholar
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: ACM SIGGRAPH 2004 Papers (Association for Computing Machinery, New York, NY, USA, 2004), SIGGRAPH ’04, p. 689–694 (2004). https://doi.org/10.1145/1186562.1015780
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: NeurIPS (2019), pp. 8024–8035 (2019)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (JMLR.org, 2013), ICML’13, p. III–1139–III–1147 (2013)
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Article MATH Google Scholar
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016)

Download references

Author information

Authors and Affiliations

Department of Electrical and Information Engineering, Faculty of Engineering Universitas Gadjah Mada, Jl. Grafika No. 2, Bulaksumur, Yogyakarta, 55281, Indonesia
Igi Ardiyanto & Resha Dwika Hefni Al-Fahsi

Authors

Igi Ardiyanto
View author publications
You can also search for this author in PubMed Google Scholar
Resha Dwika Hefni Al-Fahsi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Igi Ardiyanto developed and implemented the algorithms for the work. Resha Dwika Hefni Al-Fahsi implemented the algorithms on the mobile phone. Both authors draft the manuscript.

Corresponding author

Correspondence to Igi Ardiyanto.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ardiyanto, I., Al-Fahsi, R.D.H. Lightweight monocular depth estimation network for robotics using intercept block GhostNet. SIViP 19, 34 (2025). https://doi.org/10.1007/s11760-024-03720-1

Download citation

Received: 03 February 2024
Revised: 29 August 2024
Accepted: 06 September 2024
Published: 02 December 2024
DOI: https://doi.org/10.1007/s11760-024-03720-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight monocular depth estimation network for robotics using intercept block GhostNet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB image

Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference

OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network

Data availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Lightweight monocular depth estimation network for robotics using intercept block GhostNet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB image

Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference

OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network

Data availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation