Designing and Searching for Lightweight Monocular Depth Network

Liu, Jinfeng; Kong, Lingtong; Yang, Jie

doi:10.1007/978-3-030-92273-3_39

Jinfeng Liu¹³,
Lingtong Kong¹³ &
Jie Yang¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13111))

Included in the following conference series:

International Conference on Neural Information Processing

1954 Accesses
1 Citations

Abstract

Depth sensing is extremely notable in some tasks of robot and autonomous driving. Nowadays, monocular depth estimation based on deep learning becomes a research focus in computer vision. However, most of the current work is seeking for more complex models to get higher accuracy, which can not achieve real-time inference on mobile or embedded systems. Therefore, we aim to design a lightweight model in this paper. At first, we improve the state-of-the-art model, FastDepth, producing FastDepthV2, which has higher accuracy and lower latency on the NYU Depth v2 dataset. Besides, since designing artificial networks takes time and effort, we make it automatic to design lightweight models in monocular depth estimation, using neural architecture search (NAS). Further, inspired by the architecture of MobileNetV2, a factorized hierarchical search space for encoder is used in this paper. In the meanwhile, we incorporate the accuracy and multiply-add operations of a model together into the searching objective, and use gradient-based reinforcement learning algorithm in the searching iterations. The controller in the reinforcement learning framework has converged after more than 1000 searching iterations, and three network architectures with the best performance are obtained. Under the same conditions of training and testing, two of them perform better than FastDepthV2.

This research is partly supported by NSFC, China (No: 61876107, U1803261).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aleotti, F., Zaccaroni, G., Bartolomei, L., Poggi, M., Tosi, F., Mattoccia, S.: Real-time single image depth perception in the wild with handheld devices. Sensors 21, 15 (2021)
Article Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2019)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: The IEEE International Conference on Computer Vision, pp. 2650–2658. IEEE Press, Santiago (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 2366–2374. MIT Press, Montreal (2014)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE Press, Miami (2009)
Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision, pp. 239–248. IEEE Press, Stanford (2016)
Google Scholar
Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_2
Chapter Google Scholar
Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194(4262), 283–287 (1976)
Article Google Scholar
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: The International Conference on Machine Learning, pp. 4095–4104. PMLR Press, Stockholm (2018)
Google Scholar
Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: Towards real-time unsupervised monocular depth estimation on cpu. In: The International Conference on Intelligent Robots and Systems, pp. 5848–5854. IEEE Press, Madrid (2018)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE Press, Salt Lake City (2018)
Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Article Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Stanley, K., D’Ambrosio, D., Gauci, J.: A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)
Article Google Scholar
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2815–2823. IEEE Press, Los Alamitos (2019)
Google Scholar
Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: Fastdepth: fast monocular depth estimation on embedded systems. In: The International Conference on Robotics and Automation, pp. 6101–6108. IEEE Press, Montreal (2019)
Google Scholar
Yin, Z., Shi, J.: Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992. IEEE Press, Salt Lake City (2018)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 6612–6619. IEEE Press, Hawaii (2017)
Google Scholar
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. IEEE Press, Salt Lake City (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Jinfeng Liu, Lingtong Kong & Jie Yang

Authors

Jinfeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lingtong Kong
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Yang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Kong, L., Yang, J. (2021). Designing and Searching for Lightweight Monocular Depth Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-92273-3_39
Published: 05 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Designing and Searching for Lightweight Monocular Depth Network