Learning stratified 3D reconstruction

Dong, Qiulei; Shu, Mao; Cui, Hainan; Xu, Huarong; Hu, Zhanyi

doi:10.1007/s11432-017-9234-7

Learning stratified 3D reconstruction

Position Paper
Published: 26 December 2017

Volume 61, article number 023101, (2018)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Qiulei Dong^1,3,4,
Mao Shu¹,
Hainan Cui¹,
Huarong Xu^1,2 &
…
Zhanyi Hu^1,3,4

321 Accesses
8 Citations
Explore all metrics

Abstract

Stratified 3D reconstruction, or a layer-by-layer 3D reconstruction upgraded from projective to affine, then to the final metric reconstruction, is a well-known 3D reconstruction method in computer vision. It is also a key supporting technology for various well-known applications, such as streetview, smart3D, oblique photogrammetry. Generally speaking, the existing computer vision methods in the literature can be roughly classified into either the geometry-based approaches for spatial vision or the learning-based approaches for object vision. Although deep learning has demonstrated tremendous success in object vision in recent years, learning 3D scene reconstruction from multiple images is still rare, even not existent, except for those on depth learning from single images. This study is to explore the feasibility of learning the stratified 3D reconstruction from putative point correspondences across images, and to assess whether it could also be as robust to matching outliers as the traditional geometry-based methods do. In this study, a special parsimonious neural network is designed for the learning. Our results show that it is indeed possible to learn a stratified 3D reconstruction from noisy image point correspondences, and the learnt reconstruction results appear satisfactory although they are still not on a par with the state-of-the-arts in the structure-from-motion community due to largely its lack of an explicit robust outlier detector such as random sample consensus (RANSAC). To the best of our knowledge, our study is the first attempt in the literature to learn 3D scene reconstruction from multiple images. Our results also show that how to implicitly or explicitly integrate an outlier detector in learning methods is a key problem to solve in order to learn comparable 3D scene structures to those by the current geometry-based state-of-the-arts. Otherwise any significant advancement of learning 3D structures from multiple images seems difficult, if not impossible. Besides, we even speculate that deep learning might be, in nature, not suitable for learning 3D structure from multiple images, or more generally, for solving spatial vision problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Deep learning models for digital image processing: a review

Article 07 January 2024

References

Roberts R, Sinha S N, Szeliski R, et al. Structure from motion for scenes with large duplicate structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 3137–3144
Google Scholar
Kerl C, Sturm J, Cremers D. Dense visual slam for rgb-d cameras. In: Proceedings of 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, 2013. 2100–2106
Google Scholar
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 2012, 25: 1097–1105
Google Scholar
Hartley R, Zisserman A. Multiple View Geometry in Computer Vision. New York: Cambridge University Press, 2003
MATH Google Scholar
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016
Book Google Scholar
Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 2650–2658
Google Scholar
Kendall A, Grimes M, Cipolla R. Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 2938–2946
Google Scholar
Kulkarni T D, Whitney W F, Kohli P, et al. Deep convolutional inverse graphics network. In: Proceedings of International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015. 2539–2547
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61333015, 61375042, 61421004, 61573359, 61772444).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Qiulei Dong, Mao Shu, Hainan Cui, Huarong Xu & Zhanyi Hu
Department of Computer Science and Technology, Xiamen Institute of Technology, Xiamen, 361024, China
Huarong Xu
University of Chinese Academy of Sciences, Beijing, 100049, China
Qiulei Dong & Zhanyi Hu
Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, 100190, China
Qiulei Dong & Zhanyi Hu

Authors

Qiulei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Mao Shu
View author publications
You can also search for this author in PubMed Google Scholar
Hainan Cui
View author publications
You can also search for this author in PubMed Google Scholar
Huarong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhanyi Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhanyi Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Q., Shu, M., Cui, H. et al. Learning stratified 3D reconstruction. Sci. China Inf. Sci. 61, 023101 (2018). https://doi.org/10.1007/s11432-017-9234-7

Download citation

Received: 22 May 2017
Accepted: 08 August 2017
Published: 26 December 2017
DOI: https://doi.org/10.1007/s11432-017-9234-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning stratified 3D reconstruction

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Image Matching from Handcrafted to Deep Features: A Survey

Deep learning models for digital image processing: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning stratified 3D reconstruction

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Image Matching from Handcrafted to Deep Features: A Survey

Deep learning models for digital image processing: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation