Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

Ren, Jian; Xie, Jin; Jin, Zhong

doi:10.1007/978-3-031-02444-3_34

Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

Jian Ren¹⁰,
Jin Xie¹⁰ &
Zhong Jin¹⁰

Conference paper
First Online: 10 May 2022

904 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Abstract

Self-supervised learning has shown great promising in monocular depth estimation task, using images as the only source of supervision. Most existing methods design the network architecture in a handcrafted manner, which is usually time-consuming and not specific to depth estimation task. To reduce the human efforts in network design, we propose a hierarchical multi-scale Neural Architecture Search framework for self-supervised monocular depth estimation, named Auto-Depth. Specially, our method consists of a two-level search space: cell level and network level. In the cell level, we explore the possible hierarchical connections within one single cell to enhance the ability of multi-scale information representation. In the network level, we search the depth and width of network to balance the accuracy and computational complexity. Furthermore, the two-level path binarization is adopted to reduce the computational cost in searching stage, which only activate one operation path among several candidate operations. Extensive experiments on the KITTI and Make3D datasets show that our searched network yields better performance than previous state-of-the-art monocular depth estimation methods.

This work was supported by the National Science Fund of China (Grant Nos. 61872188, U1713208), Shanghai Automotive Industry Science and Technology Development Foundation (No. 1917).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brock, A., Lim, T., Ritchie, J.M., Weston, N.: SMASH: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017)
Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: ICLR (2019)
Google Scholar
Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. In: AAAI (2019)
Google Scholar
Chen, P.Y., Liu, A.H., Liu, Y.C., Wang, Y.C.F.: Towards scene understanding: unsupervised monocular depth estimation with semantic-aware representation. In: CVPR (2019)
Google Scholar
Chen, W., Gong, X., Liu, X., Zhang, Q., Li, Y., Wang, Z.: FasterSeg: searching for faster real-time semantic segmentation. arXiv preprint arXiv:1912.10917 (2019)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2019)
Article Google Scholar
Garg, R., Bg, V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Ghiasi, G., Lin, T.Y., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: CVPR (2019)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: ICCV (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, S.Y., Chu, W.T.: Searching by generating: flexible and efficient one-shot NAS with architecture generator. In: CVPR (2021)
Google Scholar
Kalia, M., Navab, N., Salcudean, T.: A real-time interactive augmented reality depth estimation technique for surgical robotics. In: ICRA (2019)
Google Scholar
Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2144–2158 (2014)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV (2016)
Google Scholar
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR (2010)
Google Scholar
Liu, C., et al.: Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: CVPR (2019)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: ICLR (2019)
Google Scholar
Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: CVPR (2014)
Google Scholar
Luo, C., et al.: Every pixel counts++: joint learning of geometry and motion with 3D holistic understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2624–2641 (2019)
Article Google Scholar
Luo, X., Huang, J.B., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. In: ACM SIGGRAPH (2020)
Google Scholar
Lyu, X., et al.: HR-depth: high resolution self-supervised monocular depth estimation. In: AAAI (2021)
Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 1147–1163 (2015)
Article Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Google Scholar
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning (2018)
Google Scholar
Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Berlles, J.J.: S-PTAM: stereo parallel tracking and mapping. Robot. Auton. Syst. 93, 27–42 (2017)
Article Google Scholar
Poggi, M., Tosi, F., Mattoccia, S.: Learning monocular depth estimation with unsupervised trinocular assumptions. In: 3DV (2018)
Google Scholar
Rashwan, A., Du, X., Yin, X., Li, J.: Dilated SpineNet for semantic segmentation. In: CVPR (2021)
Google Scholar
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI (2019)
Google Scholar
Real, E., et al.: Large-scale evolution of image classifiers. In: ICML (2017)
Google Scholar
Sadek, A., Chidlovskii, B.: Self-supervised attention learning for depth and ego-motion estimation. arXiv preprint arXiv:2004.13077 (2020)
Saxena, A., Chung, S.H., Ng, A.Y.: 3-D depth reconstruction from a single still image. Int. J. Comput. Vis. 76, 53–69 (2008). https://doi.org/10.1007/s11263-007-0071-y
Article Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31, 824–840 (2008)
Article Google Scholar
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML (2019)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: CVPR (2020)
Google Scholar
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 3DV (2017)
Google Scholar
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: CVPR (2018)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)
Article Google Scholar
Wimbauer, F., Yang, N., von Stumberg, L., Zeller, N., Cremers, D.: MonoRec: semi-supervised dense reconstruction in dynamic environments from a single moving camera. In: CVPR (2021)
Google Scholar
Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., Ang, M.H.: Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: IROS (2020)
Google Scholar
Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, China
Jian Ren, Jin Xie & Zhong Jin

Authors

Jian Ren
View author publications
You can also search for this author in PubMed Google Scholar
Jin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Xie .

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Christian Wallraven
Nanjing University, Nanjing, China
Qingshan Liu
Osaka University, Osaka, Japan
Hajime Nagahara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, J., Xie, J., Jin, Z. (2022). Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-02444-3_34
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics