FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge

ZiWen, Dou; YuQi, Li; Dong, Ye

doi:10.1007/s10489-023-04872-2

FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge

Published: 26 July 2023

Volume 53, pages 24566–24586, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Dou ZiWen¹,
Li YuQi¹ &
Ye Dong¹

381 Accesses
Explore all metrics

Abstract

Monocular depth estimation (MDE) is critical in enabling intelligent autonomous systems and has received considerable attention in recent years. Achieving both low latency and high accuracy in MDE is desirable but challenging to optimize, especially on edge devices. In this paper, we present a novel approach to balancing speed and accuracy in MDE on edge devices. We introduce FasterMDE, an efficient and fast encoder-decoder network architecture that leverages a multiobjective neural architecture search method to find the optimal encoder structure for the target edge. Moreover, we incorporate a neural window fully connected CRF module into the network as the decoder, enhancing fine-grained depth prediction based on coarse depth and image features. To address the issue of bad “local minimums” in the multiobjective neural architecture search, we propose a new approach for automatically learning the weights of subobjective loss functions based on uncertainty. We also accelerate the FasterMDE model using TensorRT and implement it on a target edge device. The experimental results demonstrate that FasterMDE achieves a better balance of speed and accuracy on the KITTI and NYUv2 datasets compared to previous methods. We validate the effectiveness of the proposed method through an ablation study and verify the real-time monocular depth estimation performance of FasterMDE in realistic scenarios. On the KITTI dataset, the FasterMDE model achieves a high frame rate of 555.55 FPS with 9.1% Abs Rel on a single NVIDIA Titan RTX GPU and 14.46 FPS on the NVIDIA Jetson Xavier NX.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network

Article 28 January 2023

Designing and Searching for Lightweight Monocular Depth Network

Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference

Article Open access 10 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during the current study are available in the [github] repository [https://github.com/douziwenhit/FasterMDE.git].

References

Liu J, Li Q, Cao R, Tang W, Qiu G (2020) Mininet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS J Photogramm Remote Sens 166:255–267477
Article Google Scholar
Zhang Z, Wang Y, Huang Z, Luo G, Yu G, Fu B (2021) A simple baseline for fast and accurate depth estimation on mobile devices. In: Computer vision and pattern recognition
Muhammad K, Ullah A, Lloret J, Del Ser J, de Albuquerque VHC (2020) Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans Intell Transp Syst 22(7):4316–4336
Article Google Scholar
Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci, E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3917–3925
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2002–2011
Udaya Mohanan K, Cho S, Park B-G (2022) Optimization of the structural complexity of artificial neural network for hardware-driven neuromorphic computing application. Applied Intelligence 1–19
Bhat SF, Alhashim I, Wonka P (2021) Adabins: Depth estimation using adaptive bins. In: Computer vision and pattern recognition
Ignatov A, Malivenko G, Plowman D, Shukla S, Timofte R, Zhang Z, Wang Y, Huang Z, Luo G, Yu G (2021) Fast and accurate single-image depth estimation on mobile devices, Mobile AI 2021 challenge: Report
Dong X, Garratt MA, Anavatti SG, Abbass HA (2021) Towards real-time monocular depth estimation for robotics: A survey
Wofk D, Ma F, Yang TJ, Karaman S, Sze V (2019) FastDepth: Fast monocular depth estimation on embedded systems. IEEE
Yuan W, Gu X, Dai Z, Zhu S, Tan P (2022) New crfs: Neural window fully-connected crfs for monocular depth estimation
Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2018) Monocular depth estimation using multi-scale continuous crfs as sequential deep networks. IEEE
Dan X, Wei W, Hao T, Hong L, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. IEEE
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE
Huang G, Liu Z, Laurens V, Weinberger KQ (2016) Densely connected convolutional networks. IEEE Computer Society
Zhang X, Zhou X, Lin M, Sun J (2017) Shufflenet: An extremely efficient convolutional neural network for mobile devices
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications
Liu C, Chen LC, Schroff F, Adam H, Hua W, Yuille AL, Li FF (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Bartoccioni F, Zablocki É, Pérez P, Cord M, Alahari K (2023) Lidartouch: Monocular metric depth estimation with a few-beam lidar. Comput Vis Image Understand 227:103601
Article Google Scholar
Hwang J-J, Kretzschmar H, Manela J, Rafferty S, Armstrong-Crews N, Chen T, Anguelov D (2022) Cramnet: Camera-radar fusion with ray-constrained cross-attention for robust 3d object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp 388–405 . Springer
Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Towards real-time monocular depth estimation for robotics: A survey. IEEE Trans Intell Transp Syst 23(10):16940–16961
Article Google Scholar
Liu S, Tu X, Xu C, Li R (2022) Deep neural networks with attention mechanism for monocular depth estimation on embedded devices. Future generations computer systems: FGCS (131-), 131
Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Mobilexnet: An efficient convolutional neural network for monocular depth estimation. IEEE Trans Intell Transp Syst 23(11):20134–20147
Wang L, Famouri M, Wong A (2020) Depthnet nano: A highly compact self-normalizing neural network for monocular depth estimation
Liu H, Simonyan K, Yang Y (2018) Darts: Differentiable architecture search. arXiv:1806.09055
Liu C, Chen L-C, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 82–92
Wu J, Kuang H, Lu Q, Lin Z, Shi Q, Liu X, Zhu X (2022) M-fasterseg: An efficient semantic segmentation network based on neural architecture search. Eng Appl Artif Intell 113:104962
Article Google Scholar
Chen W, Gong X, Liu X, Zhang Q, Li Y, Wang Z (2019) Fasterseg: Searching for faster real-time semantic segmentation. arXiv:1912.10917
Dai X, Chen D, Liu M, Chen Y, Yuan L (2020) Da-nas: Data adapted pruning for efficient neural architecture search
Lin P, Sun P, Cheng G, Xie S, Shi J (2020) Graph-Guided Architecture Search for Real-Time Semantic Segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ding M, Huo Y, Lu H, Yang L, Wang Z, Lu Z, Wang J, Luo P (2021) Learning versatile neural architectures by propagating network codes
Wang J, Sun K, Cheng T, Jiang B, Xiao B (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
Google Scholar
Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: Multi-scale local planar guidance for monocular depth estimation
Chai Y (2019) Patchwork: A patch-wise attention network for efficient object detection and segmentation in video streams. IEEE
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press
Google Scholar
Chen W, Gong X, Liu X, Zhang Q, Li Y, Wang Z (2020) Fasterseg: Searching for faster real-time semantic segmentation. In: International conference on learning representations
Cheng A-C, Lin CH, Juan D-C, Wei W, Sun M (2020) Instanas: Instance-aware neural architecture search. Proceedings of the AAAI Conference on artificial intelligence 34:3577–3584
Article Google Scholar
Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed/accuracy trade-off in neural architecture search. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 9145–9153
Gong X, Chang S, Jiang Y, Wang Z (2019) Autogan: Neural architecture search for generative adversarial networks. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 3224–3234
Eigen D, Fergus R (2014) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. IEEE
Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) Geonet: Geometric neural network for joint depth and surface normal estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Pilzer A, Xu D, Puscas MM, Ricci E, Sebe N (2018) Unsupervised adversarial depth estimation using cycled generative networks. IEEE
Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. IEEE
Poggi M, Aleotti F, Tosi F, Mattoccia S (2018) Towards real-time unsupervised monocular depth estimation on cpu. IEEE
Tosi F, Aleotti F, Poggi M, Mattoccia S (2019) Learning monocular depth estimation infusing traditional stereo knowledge
Atapour-Abarghouei A, Breckon TP (2018) Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2800–2810
Patil V, Gansbeke WV, Dai D, Gool LV (2020) Don’t forget the past: Recurrent depth estimation from monocular video. IEEE Robotics and Automation Letters
Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning
Chen X, Zhang R, Jiang J, Wang Y, Li G, Li TH (2023) Self-supervised monocular depth estimation: Solving the edge-fattening problem. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 5776–5786
Dao T-T, Pham Q-V, Hwang W-J (2022) Fastmde: A fast cnn architecture for monocular depth estimation at high resolution. IEEE Access 10:16111–16122
Article Google Scholar
Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 5684–5693
Yang G, Tang H, Ding M, Sebe N, Ricci E (2021) Transformer-based attention networks for continuous pixel-wise prediction. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 16269–16279
Ranftl R, Bochkovskiy A, Koltun V (2021) Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 12179–12188
Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326
Song M, Lim S, Kim W (2021) Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Trans Circ Syst Vid Technol 31(11):4381–4393
Article Google Scholar
Shu C, Chen Z, Chen L, Ma K, Wang M, Ren H (2022) Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv:2204.13892
Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed/accuracy trade-off in neural architecture search. IEEE
Lee JH, Heo M, Kim KR, Kim CS (2018) Single-image depth estimation based on fourier domain analysis. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition
Papa L, Alati E, Russo P, Amerini I (2022) Speed: Separable pyramidal pooling encoder-decoder for real-time monocular depth estimation on low-resource settings. IEEE Access 10:44881–44890
Article Google Scholar
Ibrahem H, Salem A, Kang H-S (2022) Sd-depth: Light-weight monocular depth estimation using space depth cnn for real-time applications. In: Machine learning and artificial intelligence, pp 49–55. IOS Press
Mehta S, Rastegari M (2021) Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. ICLR 2022
Papa L, Russo P, Amerini I (2023) Meter: a mobile vision transformer architecture for monocular depth estimation. IEEE Transactions on Circuits and Systems for Video Technology

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, School of Instrumentation Science and Engineering, Harbin Institute of Technology, No.92, Xidazhi Street, Harbin, 150001, China
Dou ZiWen, Li YuQi & Ye Dong

Authors

Dou ZiWen
View author publications
You can also search for this author inPubMed Google Scholar
Li YuQi
View author publications
You can also search for this author inPubMed Google Scholar
Ye Dong
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

DouZiWen: conceptualization, methodology, validation, investigation, writing; YeDong: supervision; LiYuQi: data curation;

Corresponding author

Correspondence to Ye Dong.

Ethics declarations

Conflict of Interest

No potential conflict of interest was reported by the authors.

Competing Interests

The authors did not receive support from any organization for the submitted work.

Ethical and informed consent for data used

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

ZiWen, D., YuQi, L. & Dong, Y. FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge. Appl Intell 53, 24566–24586 (2023). https://doi.org/10.1007/s10489-023-04872-2

Download citation

Accepted: 07 July 2023
Published: 26 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04872-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network

Designing and Searching for Lightweight Monocular Depth Network

Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference

Explore related subjects

Data Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now