Self-distillation framework for indoor and outdoor monocular depth estimation

Pan, Meng; Zhang, Huanrong; Wu, Jiahao; Jin, Zhi

doi:10.1007/s11042-021-11500-z

Self-distillation framework for indoor and outdoor monocular depth estimation

1190: Depth-Related Processing and Applications in Visual Systems
Published: 02 June 2022

Volume 81, pages 35899–35913, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Meng Pan¹,
Huanrong Zhang¹,
Jiahao Wu¹ &
…
Zhi Jin ORCID: orcid.org/0000-0001-9670-7366^1,2,3

321 Accesses
1 Altmetric
Explore all metrics

Abstract

As one of the most crucial tasks of scene perception, Monocular Depth Estimation (MDE) has made considerable development in recent years. Current MDE researchers are interested in the precision and speed of the estimation, but pay less attention to the generalization ability across scenes. For instance, the MDE networks trained on outdoor scenes achieve impressive performance on outdoor scenes but poor performance on indoor scenes, and vice versa. To tackle this problem, we propose a self-distillation MDE framework to improve the generalization ability across different scenes in this paper. Specifically, we design a student encoder that extracts features from two datasets of indoor and outdoor scenes, respectively. After that, we introduce a dissimilarity loss to pull apart encoded features of different scenes in the feature space. Finally, a decoder is adopted to estimate the final depth from encoded features. By doing so, our self-distillation MDE framework can learn the depth estimation of two different datasets. To the best of our knowledge, we are the first one to tackle the generalization problem across datasets of different scenes in the MDE field. Experiments demonstrate that our method reduces the degradation problem when a MDE network is in the face of datasets with complex data distribution. Note that evaluating on two datasets by a single network is more challenging than evaluating on two datasets by two different networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Depth Map Decomposition for Monocular Depth Estimation

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Notes

Codes will be released once the paper is accepted.

References

Anil R, Pereyra G, Passos A, Ormandi R, Dahl GE, Hinton GE (2018) Large scale distributed neural network training through online distillation. In: ICLR
Bhoi A (2019) Monocular depth estimation: A survey. In: arXiv preprint at arXiv:1412.6572. Accessed 15 Jan 2021
Cao Y, Wu Z, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28(11):3174–3182
Article Google Scholar
Cao ZL, Yan ZH, Wang H (2015) Summary of binocular stereo vision matching technology. Journal of Chongqing University of Technology (Natural Science) 29(2):70–75
Google Scholar
Chen P, Liu AH, Liu Y, Wang, YF (2019) Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: CVPR, pp 2619–2627. https://doi.org/10.1109/CVPR.2019.00273
Chen W, Fu Z, Yang D, Deng J (2016) Single-image depth perception in the wild. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) NIPS, 29, pp. 730–738
Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C (2017) Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans Graph 36(3). https://doi.org/10.1145/3054739
Droeschel D, Behnke S (2017) Mrslasermap: Local multiresolution grids for efficient 3d laser mapping and localization. In: Behnke S, Sheh R, Sar\(\backslash\)iel, S, Lee DD (eds. RoboCup. Springer International Publishing, pp 319–326
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV, pp 2650–2658. https://doi.org/10.1109/ICCV.2015.304
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: In NIPS, pp 2366–2374
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: CVPR, pp 2002–2011. https://doi.org/10.1109/CVPR.2018.00214
Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: ECCV. Springer, pp 740–756
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32(11):1231–1237
Article Google Scholar
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR, pp 270–279
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: ICCV, pp 3827–3837. https://doi.org/10.1109/ICCV.2019.00393
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Kuznietsov Y, Stückler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: CVPR, pp. 2215–2223. https://doi.org/10.1109/CVPR.2017.238
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 3DV. IEEE. pp 239–248
Lee JH, Han M, Ko DW, Suh IH (2019) From big to small: Multi-scale local planar guidance for monocular depth estimation. In: arXiv
Lee JH, Kim CS (2019) Monocular depth estimation using relative depth maps. In: CVPR, pp 9729–9738
Li B, Shen C, Dai Y, Van Den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: CVPR, pp 1119–1127
Li R, Xian K, Shen C, Cao Z, Lu H, Hang L (2018) Deep attention-based classification network for robust depth prediction. In: Jawahar C, Li H, Mori G, Schindler K (eds) ACCV. Springer, pp 663–678
Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: CVPR, pp 5162–5170
Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelligence 38(10):2024–2039
Article Google Scholar
Mousavian A, Pirsiavash H, Košecká J (2016) Joint semantic segmentation and depth estimation with deep convolutional networks. In: 3DV, pp 611–619. https://doi.org/10.1109/3DV.2016.69
Nathan Silberman Derek Hoiem PK, Fergus R Indoor segmentation and support inference from rgbd images. https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html. Accessed 15 Jan 2021
Nathan Silberman Derek Hoiem PK, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV, pp 746–760
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L., et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: NIPS, pp 8026–8037
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille A (2015) owards unified depth and semantic prediction from a single image. In: CVPR, pp 2800–2809. https://doi.org/10.1109/CVPR.2015.7298897
Poggi M, Aleotti F, Tosi F, Mattoccia S (2020) On the uncertainty of self-supervised monocular depth estimation. In: CVPR, pp 3224–3234. https://doi.org/10.1109/CVPR42600.2020.00329
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: CVPR, pp 918–927. https://doi.org/10.1109/CVPR.2018.00102
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550. Accessed 15 Jan 2021
Saxena A, Chung SH, Ng AY (2006) Learning depth from single monocular images. In: NIPS, pp 1161–1168
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR, pp 770–779. https://doi.org/10.1109/CVPR.2019.00086
Ullman S (1979) The interpretation of structure from motion. Royal Society of London 203(1153):405–426
Google Scholar
Weder S, Schönberger J, Pollefeys M, Oswald MR (2020) Routedfusion: Learning real-time depth map fusion. In: CVPR, pp 4886–4896. https://doi.org/10.1109/CVPR42600.2020.00494
Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger S (2016) Elasticfusion: Dense slam without a pose graph. Robotics: Science Systems 35(14), 1697–1716. https:/doi.org/10.1177/0278364916669237
Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: CVPR, pp 3917–3925
Yoneda K, Tehrani H, Ogawa T, Hukuyama N, Mita S (2014) Lidar scan feature for localization with highly precise 3-d map. In: Intelligent Vehicles Symposium Proceedings. IEEE pp 1345–1350
Zhang L, Song J, Gao A, Chen J, Bao C, Ma K (2019) Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: ICCV, pp 3712–3721. https://doi.org/10.1109/ICCV.2019.00381
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE multimedia 19(2):4–10. https://doi.org/10.1109/MMUL.2012.24
Article Google Scholar
Zhao C, Sun Q, Zhang C, Tang Y, Qian F (2017) Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences 63:1612–1627
Article Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR, pp. 6612–6619 (2017). https://doi.org/10.1109/CVPR.2017.700
Zou L, Li Y (2010) A method of stereo vision matching based on opencv. International Conference on Audio. Language and Image Processing, IEEE, pp 185–190
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62071500, 61701313) and Sino-Germen Mobility Programme M-0421.

Author information

Authors and Affiliations

School of Intelligent Systems Engineering, Sun Yat-sen University, 510006, Guangzhou, China
Meng Pan, Huanrong Zhang, Jiahao Wu & Zhi Jin
Guangdong Provincial Key Laboratory of Fire Science and Technology, 510006, Guangzhou, China
Zhi Jin
Guangdong Provincial Key Laboratory of Robotics and Digital Intelligent Manufacturing Technology, 510535, Guangzhou, China
Zhi Jin

Authors

Meng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Huanrong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Jin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, M., Zhang, H., Wu, J. et al. Self-distillation framework for indoor and outdoor monocular depth estimation. Multimed Tools Appl 81, 35899–35913 (2022). https://doi.org/10.1007/s11042-021-11500-z

Download citation

Received: 23 March 2021
Revised: 10 July 2021
Accepted: 19 August 2021
Published: 02 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11042-021-11500-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-distillation framework for indoor and outdoor monocular depth estimation

Abstract

Access this article

Similar content being viewed by others

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Depth Map Decomposition for Monocular Depth Estimation

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-distillation framework for indoor and outdoor monocular depth estimation

Abstract

Access this article

Similar content being viewed by others

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Depth Map Decomposition for Monocular Depth Estimation

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation