Skip to main content

Designing and Searching for Lightweight Monocular Depth Network

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13111))

Included in the following conference series:

Abstract

Depth sensing is extremely notable in some tasks of robot and autonomous driving. Nowadays, monocular depth estimation based on deep learning becomes a research focus in computer vision. However, most of the current work is seeking for more complex models to get higher accuracy, which can not achieve real-time inference on mobile or embedded systems. Therefore, we aim to design a lightweight model in this paper. At first, we improve the state-of-the-art model, FastDepth, producing FastDepthV2, which has higher accuracy and lower latency on the NYU Depth v2 dataset. Besides, since designing artificial networks takes time and effort, we make it automatic to design lightweight models in monocular depth estimation, using neural architecture search (NAS). Further, inspired by the architecture of MobileNetV2, a factorized hierarchical search space for encoder is used in this paper. In the meanwhile, we incorporate the accuracy and multiply-add operations of a model together into the searching objective, and use gradient-based reinforcement learning algorithm in the searching iterations. The controller in the reinforcement learning framework has converged after more than 1000 searching iterations, and three network architectures with the best performance are obtained. Under the same conditions of training and testing, two of them perform better than FastDepthV2.

This research is partly supported by NSFC, China (No: 61876107, U1803261).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aleotti, F., Zaccaroni, G., Bartolomei, L., Poggi, M., Tosi, F., Mattoccia, S.: Real-time single image depth perception in the wild with handheld devices. Sensors 21, 15 (2021)

    Article  Google Scholar 

  2. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2019)

  3. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: The IEEE International Conference on Computer Vision, pp. 2650–2658. IEEE Press, Santiago (2015)

    Google Scholar 

  4. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 2366–2374. MIT Press, Montreal (2014)

    Google Scholar 

  5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  6. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  7. Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE Press, Miami (2009)

    Google Scholar 

  8. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision, pp. 239–248. IEEE Press, Stanford (2016)

    Google Scholar 

  9. Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_2

    Chapter  Google Scholar 

  10. Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194(4262), 283–287 (1976)

    Article  Google Scholar 

  11. Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: The International Conference on Machine Learning, pp. 4095–4104. PMLR Press, Stockholm (2018)

    Google Scholar 

  12. Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: Towards real-time unsupervised monocular depth estimation on cpu. In: The International Conference on Intelligent Robots and Systems, pp. 5848–5854. IEEE Press, Madrid (2018)

    Google Scholar 

  13. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE Press, Salt Lake City (2018)

    Google Scholar 

  14. Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)

    Article  Google Scholar 

  15. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)

    Article  Google Scholar 

  16. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  17. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  18. Stanley, K., D’Ambrosio, D., Gauci, J.: A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)

    Article  Google Scholar 

  19. Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2815–2823. IEEE Press, Los Alamitos (2019)

    Google Scholar 

  20. Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: Fastdepth: fast monocular depth estimation on embedded systems. In: The International Conference on Robotics and Automation, pp. 6101–6108. IEEE Press, Montreal (2019)

    Google Scholar 

  21. Yin, Z., Shi, J.: Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992. IEEE Press, Salt Lake City (2018)

    Google Scholar 

  22. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 6612–6619. IEEE Press, Hawaii (2017)

    Google Scholar 

  23. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. IEEE Press, Salt Lake City (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, J., Kong, L., Yang, J. (2021). Designing and Searching for Lightweight Monocular Depth Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92273-3_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92272-6

  • Online ISBN: 978-3-030-92273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics