Skip to main content

Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

  • Conference paper
  • First Online:
  • 904 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Abstract

Self-supervised learning has shown great promising in monocular depth estimation task, using images as the only source of supervision. Most existing methods design the network architecture in a handcrafted manner, which is usually time-consuming and not specific to depth estimation task. To reduce the human efforts in network design, we propose a hierarchical multi-scale Neural Architecture Search framework for self-supervised monocular depth estimation, named Auto-Depth. Specially, our method consists of a two-level search space: cell level and network level. In the cell level, we explore the possible hierarchical connections within one single cell to enhance the ability of multi-scale information representation. In the network level, we search the depth and width of network to balance the accuracy and computational complexity. Furthermore, the two-level path binarization is adopted to reduce the computational cost in searching stage, which only activate one operation path among several candidate operations. Extensive experiments on the KITTI and Make3D datasets show that our searched network yields better performance than previous state-of-the-art monocular depth estimation methods.

This work was supported by the National Science Fund of China (Grant Nos. 61872188, U1713208), Shanghai Automotive Industry Science and Technology Development Foundation (No. 1917).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Brock, A., Lim, T., Ritchie, J.M., Weston, N.: SMASH: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017)

  2. Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: ICLR (2019)

    Google Scholar 

  3. Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. In: AAAI (2019)

    Google Scholar 

  4. Chen, P.Y., Liu, A.H., Liu, Y.C., Wang, Y.C.F.: Towards scene understanding: unsupervised monocular depth estimation with semantic-aware representation. In: CVPR (2019)

    Google Scholar 

  5. Chen, W., Gong, X., Liu, X., Zhang, Q., Li, Y., Wang, Z.: FasterSeg: searching for faster real-time semantic segmentation. arXiv preprint arXiv:1912.10917 (2019)

  6. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)

    Google Scholar 

  7. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  8. Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2019)

    Article  Google Scholar 

  9. Garg, R., Bg, V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45

    Chapter  Google Scholar 

  10. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)

    Google Scholar 

  11. Ghiasi, G., Lin, T.Y., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: CVPR (2019)

    Google Scholar 

  12. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)

    Google Scholar 

  13. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: ICCV (2019)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  15. Huang, S.Y., Chu, W.T.: Searching by generating: flexible and efficient one-shot NAS with architecture generator. In: CVPR (2021)

    Google Scholar 

  16. Kalia, M., Navab, N., Salcudean, T.: A real-time interactive augmented reality depth estimation technique for surgical robotics. In: ICRA (2019)

    Google Scholar 

  17. Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2144–2158 (2014)

    Article  Google Scholar 

  18. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  19. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV (2016)

    Google Scholar 

  20. Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR (2010)

    Google Scholar 

  21. Liu, C., et al.: Auto-DeepLab: hierarchical neural architecture search for semantic image segmentation. In: CVPR (2019)

    Google Scholar 

  22. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: ICLR (2019)

    Google Scholar 

  23. Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: CVPR (2014)

    Google Scholar 

  24. Luo, C., et al.: Every pixel counts++: joint learning of geometry and motion with 3D holistic understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2624–2641 (2019)

    Article  Google Scholar 

  25. Luo, X., Huang, J.B., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. In: ACM SIGGRAPH (2020)

    Google Scholar 

  26. Lyu, X., et al.: HR-depth: high resolution self-supervised monocular depth estimation. In: AAAI (2021)

    Google Scholar 

  27. Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 1147–1163 (2015)

    Article  Google Scholar 

  28. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  29. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning (2018)

    Google Scholar 

  30. Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Berlles, J.J.: S-PTAM: stereo parallel tracking and mapping. Robot. Auton. Syst. 93, 27–42 (2017)

    Article  Google Scholar 

  31. Poggi, M., Tosi, F., Mattoccia, S.: Learning monocular depth estimation with unsupervised trinocular assumptions. In: 3DV (2018)

    Google Scholar 

  32. Rashwan, A., Du, X., Yin, X., Li, J.: Dilated SpineNet for semantic segmentation. In: CVPR (2021)

    Google Scholar 

  33. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI (2019)

    Google Scholar 

  34. Real, E., et al.: Large-scale evolution of image classifiers. In: ICML (2017)

    Google Scholar 

  35. Sadek, A., Chidlovskii, B.: Self-supervised attention learning for depth and ego-motion estimation. arXiv preprint arXiv:2004.13077 (2020)

  36. Saxena, A., Chung, S.H., Ng, A.Y.: 3-D depth reconstruction from a single still image. Int. J. Comput. Vis. 76, 53–69 (2008). https://doi.org/10.1007/s11263-007-0071-y

    Article  Google Scholar 

  37. Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31, 824–840 (2008)

    Article  Google Scholar 

  38. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML (2019)

    Google Scholar 

  39. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: CVPR (2020)

    Google Scholar 

  40. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 3DV (2017)

    Google Scholar 

  41. Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: CVPR (2018)

    Google Scholar 

  42. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)

    Article  Google Scholar 

  43. Wimbauer, F., Yang, N., von Stumberg, L., Zeller, N., Cremers, D.: MonoRec: semi-supervised dense reconstruction in dynamic environments from a single moving camera. In: CVPR (2021)

    Google Scholar 

  44. Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., Ang, M.H.: Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: IROS (2020)

    Google Scholar 

  45. Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)

    Google Scholar 

  46. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)

    Google Scholar 

  47. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, J., Xie, J., Jin, Z. (2022). Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-02444-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-02443-6

  • Online ISBN: 978-3-031-02444-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics