Abstract
Depth sensing is extremely notable in some tasks of robot and autonomous driving. Nowadays, monocular depth estimation based on deep learning becomes a research focus in computer vision. However, most of the current work is seeking for more complex models to get higher accuracy, which can not achieve real-time inference on mobile or embedded systems. Therefore, we aim to design a lightweight model in this paper. At first, we improve the state-of-the-art model, FastDepth, producing FastDepthV2, which has higher accuracy and lower latency on the NYU Depth v2 dataset. Besides, since designing artificial networks takes time and effort, we make it automatic to design lightweight models in monocular depth estimation, using neural architecture search (NAS). Further, inspired by the architecture of MobileNetV2, a factorized hierarchical search space for encoder is used in this paper. In the meanwhile, we incorporate the accuracy and multiply-add operations of a model together into the searching objective, and use gradient-based reinforcement learning algorithm in the searching iterations. The controller in the reinforcement learning framework has converged after more than 1000 searching iterations, and three network architectures with the best performance are obtained. Under the same conditions of training and testing, two of them perform better than FastDepthV2.
This research is partly supported by NSFC, China (No: 61876107, U1803261).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aleotti, F., Zaccaroni, G., Bartolomei, L., Poggi, M., Tosi, F., Mattoccia, S.: Real-time single image depth perception in the wild with handheld devices. Sensors 21, 15 (2021)
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2019)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: The IEEE International Conference on Computer Vision, pp. 2650–2658. IEEE Press, Santiago (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 2366–2374. MIT Press, Montreal (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE Press, Miami (2009)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision, pp. 239–248. IEEE Press, Stanford (2016)
Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_2
Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194(4262), 283–287 (1976)
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: The International Conference on Machine Learning, pp. 4095–4104. PMLR Press, Stockholm (2018)
Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: Towards real-time unsupervised monocular depth estimation on cpu. In: The International Conference on Intelligent Robots and Systems, pp. 5848–5854. IEEE Press, Madrid (2018)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE Press, Salt Lake City (2018)
Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Stanley, K., D’Ambrosio, D., Gauci, J.: A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2815–2823. IEEE Press, Los Alamitos (2019)
Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: Fastdepth: fast monocular depth estimation on embedded systems. In: The International Conference on Robotics and Automation, pp. 6101–6108. IEEE Press, Montreal (2019)
Yin, Z., Shi, J.: Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992. IEEE Press, Salt Lake City (2018)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 6612–6619. IEEE Press, Hawaii (2017)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. IEEE Press, Salt Lake City (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, J., Kong, L., Yang, J. (2021). Designing and Searching for Lightweight Monocular Depth Network. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-92273-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)