Boosting LightWeight Depth Estimation via Knowledge Distillation

Hu, Junjie; Fan, Chenyou; Jiang, Hualie; Guo, Xiyue; Gao, Yuan; Lu, Xiangyong; Lam, Tin Lun

doi:10.1007/978-3-031-40283-8_3

Junjie Hu ORCID: orcid.org/0000-0002-1911-4361¹³,
Chenyou Fan¹⁴,
Hualie Jiang¹⁵,
Xiyue Guo¹⁶,
Yuan Gao¹³,
Xiangyong Lu¹⁷ &
…
Tin Lun Lam ORCID: orcid.org/0000-0002-6363-1446^13,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14117))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1016 Accesses

Abstract

Monocular depth estimation (MDE) methods are often either too computationally expensive or not accurate enough due to the trade-off between model complexity and inference performance. In this paper, we propose a lightweight network that can accurately estimate depth maps using minimal computing resources. We achieve this by designing a compact model that maximally reduces model complexity. To improve the performance of our lightweight network, we adopt knowledge distillation (KD) techniques. We consider a large network as an expert teacher that accurately estimates depth maps on the target domain. The student, which is the lightweight network, is then trained to mimic the teacher’s predictions. However, this KD process can be challenging and insufficient due to the large model capacity gap between the teacher and the student. To address this, we propose to use auxiliary unlabeled data to guide KD, enabling the student to better learn from the teacher’s predictions. This approach helps fill the gap between the teacher and the student, resulting in improved data-driven learning. The experiments show that our method achieves comparable performance to state-of-the-art methods while using only 1% of their parameters. Furthermore, our method outperforms previous lightweight methods regarding inference accuracy, computational efficiency, and generalizability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aleotti, F., Zaccaroni, G., Bartolomei, L., Poggi, M., Tosi, F., Mattoccia, S.: Real-time single image depth perception in the wild with handheld devices. Sensors (Basel, Switzerland) 21(1), 15 (2021)
Google Scholar
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3319–3327 (2017)
Google Scholar
Chen, T., et al.: Improving monocular depth estimation by leveraging structural awareness and complementary datasets. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 90–108. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_6
Chapter Google Scholar
Chen, X., Zha, Z.: Structure-aware residual pyramid network for monocular depth estimation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 694–700 (2019)
Google Scholar
Czarnowski, J., Laidlow, T., Clark, R., Davison, A.: Deepfactors: real-time probabilistic dense monocular SLAM. IEEE Robot. Autom. Lett. 5, 721–728 (2020)
Article Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2443 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (NIPS), pp. 2366–2374 (2014)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2002–2011 (2018)
Google Scholar
Guo, X., Hu, J., Chen, J., Deng, F., Lam, T.L.: Semantic histogram based graph matching for real-time multi-robot global localization in large scale environment. IEEE Robot. Autom. Lett. 6(4), 8349–8356 (2021)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Google Scholar
Hu, J., Guo, X., Chen, J., Liang, G., Deng, F., Lam, T.L.: A two-stage unsupervised approach for low light image enhancement. IEEE Robot. Autom. Lett. 6(4), 8363–8370 (2021)
Article Google Scholar
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051 (2019)
Google Scholar
Iro, L., Christian, R., Vasileios, B., Federico, T., Nassir, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 239–248 (2016)
Google Scholar
Liu, J., Li, Q., Cao, R., Tang, W., Qiu, G.: MiniNet: an extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ArXiv abs/2006.15350 (2020)
Google Scholar
Mancini, M., Costante, G., Valigi, P., Ciarfuglia, T.A.: Fast robust monocular depth estimation for obstacle detection with fully convolutional networks. In: IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 4296–4303 (2016)
Google Scholar
Mendes, R.Q., Ribeiro, E.G., Rosa, N.S., Grassi, V.: On deep learning techniques to boost monocular depth estimation for autonomous navigation. Robot. Auton. Syst. 136, 103701 (2021)
Article Google Scholar
Mendes, R.D.Q., Ribeiro, E.G., Rosa, N.D.S., Grassi Jr, V.: On deep learning techniques to boost monocular depth estimation for autonomous navigation. arXiv preprint arXiv:2010.06626 (2020)
Nekrasov, V., Dharmasiri, T., Spek, A., Drummond, T., Shen, C., Reid, I.: Real-time joint semantic segmentation and depth estimation using asymmetric annotations. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 7101–7107 (2019)
Google Scholar
Pilzer, A., Lathuilière, S., Sebe, N., Ricci, E.: Refine and distill: dxploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9760–9769 (2019)
Google Scholar
Romero, A., Ballas, N., Kahou, S., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. In: International Conference on Representation Learning (ICLR) (2015)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision (ECCV), vol. 7576, pp. 746–760 (2012)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–576 (2015)
Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D slam systems. In: IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 573–580 (2012)
Google Scholar
Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular slam with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6565–6574 (2017)
Google Scholar
Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: FastDepth: fast monocular depth estimation on embedded systems. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6101–6108 (2019)
Google Scholar
Zhang, Z., Cui, Z., Xu, C., Yan, Y., Sebe, N., Yang, J.: Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4106–4115 (2019)
Google Scholar
Zhou, T., Brown, M.R., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619 (2017)
Google Scholar

Download references

Acknowledgements

This work is supported by the Shenzhen Science and Technology Program (JSGG20220606142803007) and the funding AC01202101103 from the Shenzhen Institute of Artificial Intelligence and Robotics for Society.

Author information

Authors and Affiliations

Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
Junjie Hu, Yuan Gao & Tin Lun Lam
South China Normal University, Guangzhou, China
Chenyou Fan
The Chinese University of Hong Kong, Shenzhen, China
Hualie Jiang & Tin Lun Lam
Zhejiang University, Hangzhou, China
Xiyue Guo
Tohoku University, Sendai, Japan
Xiangyong Lu

Authors

Junjie Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chenyou Fan
View author publications
You can also search for this author in PubMed Google Scholar
Hualie Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiyue Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Tin Lun Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tin Lun Lam .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, J. et al. (2023). Boosting LightWeight Depth Estimation via Knowledge Distillation. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14117. Springer, Cham. https://doi.org/10.1007/978-3-031-40283-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-40283-8_3
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40282-1
Online ISBN: 978-3-031-40283-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Boosting LightWeight Depth Estimation via Knowledge Distillation