Skip to main content
Log in

Video supervised for 3D reconstruction from single image

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As a long-standing ill-posed problem, 3D reconstruction from a single image is an important research topic in computer vision. The information in a single image can represent an infinite number of possible three-dimensional shapes. To recover reasonable object geometry from a single image requires a correct shape prior. Thus, using what kind of supervision and how to make better use of training data are key issues. In this paper, we propose a framework for 3D reconstruction from single image with video supervision. On the one hand, we build a temporal network to generate fine 3D structure from video input benefiting from its temporal correlation. On the other hand, we introduce the knowledge distillation to transfer the shape prior extracted from the video. Also the mechanism ensures that the student network which for single image reconstruction can make full use of the knowledge learned from the teacher network which receives video input. In the inference phase, we can use the student network independently. Extensive experiments on ShapeNet show the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Barron JT, Malik J (2015) Shape, illumination, and reflectance from shading. IEEE Trans Pattern Anal Mach Intell 37(8):1670–1687

    Article  Google Scholar 

  2. Broadhurst A, Drummond T, Cipolla R (2001) A probabilistic framework for space carving. In: ICCV, pp 388–393. IEEE Computer Society

  3. Brown M, Lowe DG (2005) Unsupervised 3d object recognition and reconstruction in unordered datasets. In: 3DIM, pp 56–63. IEEE Computer Society

  4. Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I, Leonard JJ (2016) Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Robot 32 (6):1309–1332

    Article  Google Scholar 

  5. Chang AX, Funkhouser TA, Guibas LJ, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F (2015) Shapenet: an information-rich 3d model repository. arXiv:1512.03012

  6. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: ECCV (8), Lecture notes in computer science, vol 9912, pp 628–644. Springer

  7. Curless B, Levoy M (1996) A volumetric method for building complex models from range images. In: SIGGRAPH, pp 303–312. ACM

  8. Dibra E, Jain H, Ȯztireli A C, Ziegler R, Gross MH (2017) Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In: CVPR, pp 5504–5514. IEEE Computer Society

  9. Fan H, Su H, Guibas LJ (2017) A point set generation network for 3d object reconstruction from a single image. In: CVPR, pp 2463–2471. IEEE Computer Society

  10. Gadelha M, Maji S, Wang R (2017) 3d shape induction from 2d views of multiple objects. In: 3DV, pp 402–411. IEEE Computer Society

  11. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: AISTATS, JMLR Proceedings, vol 15, pp 315–323. JMLR.org

  12. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS, pp 2672–2680

  13. Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Comput Vis 129(6):1789–1819

    Article  Google Scholar 

  14. Gwak J, Choy CB, Chandraker M, Garg A, Savarese S (2017) Weakly supervised 3d reconstruction with adversarial constraint. In: 3DV, pp 263–272. IEEE Computer Society

  15. Han X, Laga H, Bennamoun M (2021) Image-based 3d object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans Pattern Anal Mach Intell 43(5):1578–1604

    Article  Google Scholar 

  16. Harltey A, Zisserman A (2006) Multiple view geometry in computer vision, 2 edn. Cambridge University Press

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778. IEEE

  18. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  19. Huang P, Matzen K, Kopf J, Ahuja N, Huang J (2018) Deepmvs: learning multi-view stereopsis. In: CVPR, pp 2821–2830. IEEE Computer Society

  20. Insafutdinov E, Dosovitskiy A (2018) Unsupervised learning of shape and pose with differentiable point clouds. In: NeurIPS, pp 2807–2817

  21. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML(2015), vol. 37, pp 448–456. JMLR.org

  22. Kar A, Häne C, Malik J (2017) Learning a multi-view stereo machine. In: NIPS, pp 365–376

  23. Kato H, Harada T (2019) Learning view priors for single-view 3d reconstruction. In: CVPR, pp 9778–9787. computer vision foundation / IEEE

  24. Kato H, Ushiku Y, Harada T (2018) Neural 3d mesh renderer. In: CVPR, pp 3907–3916. IEEE Computer Society

  25. Khodatars M, Shoeibi A, Sadeghi D, Ghaasemi N, Jafari M, Moridian P, Khadem A, Alizadehsani R, Zare A, Kong Y, Khosravi A, Nahavandi S, Hussain S, Acharya UR, Berk M (2021) Deep learning for neuroimaging-based diagnosis and rehabilitation of autism spectrum disorder: a review Computers in Biology and Medicine

  26. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR (Poster)

  27. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR

  28. Laurentini A (1994) The visual hull concept for silhouette-based image understanding. IEEE Trans Pattern Anal Mach Intell 16(2):150–162

    Article  Google Scholar 

  29. Liao X, Li K, Zhu X, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing. https://doi.org/10.1109/JSTSP.2020.3002391

  30. Liao X, Yin J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Transactions on Dependable and Secure Computing. https://doi.org/10.1109/TDSC.2020.3004708

  31. Liao X, Yu Y, Li B, Li Z, Qin Z (2020) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2019.2896270

  32. Lin C, Kong C, Lucey S (2018) Learning efficient point cloud generation for dense 3d object reconstruction. In: AAAI, pp 7114–7121. AAAI Press

  33. Lin C, Wang O, Russell BC, Shechtman E, Kim VG, Fisher M, Lucey S (2019) Photometric mesh optimization for video-aligned 3d object reconstruction. In: CVPR, pp 969–978. Computer Vision Foundation / IEEE

  34. Mandikal P, L NK, Agarwal M, Radhakrishnan VB (2018) 3d-lmnet: latent embedding matching for accurate and diverse 3d point cloud reconstruction from a single image. In: BMVC, p. 55. BMVA Press

  35. Mandikal P, Radhakrishnan VB (2019) Dense 3d point cloud reconstruction using a deep pyramid network. In: WACV, pp 1052–1060. IEEE

  36. Mo K, Guerrero P, Yi L, Su H, Wonka P, Mitra NJ, Guibas LJ (2019) Structurenet: hierarchical graph networks for 3d shape generation. ACM Trans Graph 38(6):242,1–242,19

    Article  Google Scholar 

  37. Mo K, Zhu S, Chang AX, Yi L, Tripathi S, Guibas LJ, Su H (2019) Partnet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: CVPR, pp 909–918. computer vision foundation / IEEE

  38. Ning X, Duan P, Li W, Zhang S (2020) Real-time 3d face alignment using an encoder-decoder network with an efficient deconvolution layer. IEEE Signal Processing Letters

  39. Paschalidou D, Gool LV, Geiger A (2020) Learning unsupervised hierarchical part decomposition of 3d objects from a single RGB image. In: CVPR, pp 1057–1067. IEEE

  40. Paschalidou D, Ulusoy AO, Schmitt C, Gool LV, Geiger A (2018) Raynet: learning volumetric 3d reconstruction with ray potentials. In: CVPR, pp 3897–3906. IEEE computer society

  41. Qi S, Ning X, Yang G, Zhang L, Long P, Cai W, Li W (2021) Review of multi-view 3d object recognition methods based on deep learning. Displays

  42. Rezende DJ, Eslami SMA, Mohamed S, Battaglia PW, Jaderberg M, Heess N (2016) Unsupervised learning of 3d structure from images. In: NIPS, pp 4997–5005

  43. Richter SR, Roth S (2015) Discriminative shape from shading in uncalibrated illumination. In: CVPR, pp 1128–1136. IEEE Computer Society

  44. Richter SR, Roth S (2018) Matryoshka networks: Predicting 3d geometry via nested shape layers. In: CVPR, pp 1936–1944. IEEE Computer Society

  45. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: ICLR (Poster)

  46. Shoeibi A, Khodatars M, Alizadehsani R, Ghassemi N, Jafari M, Moridian P, Khadem A, Sadeghi D, Hussain S, Zare A, Sani ZA, Bazeli J, Khozeimeh F, Khosravi A, Nahavandi S, Acharya UR, Shi P (2020) Automated detection and forecasting of COVID-19 using deep learning techniques: a review. arXiv:2007.10785

  47. Shoeibi A, Khodatars M, Jafari M, Moridian P, Rezaei M, Alizadehsani R, Khozeimeh F, Gorriz JM, Heras J, Panahiazar M, Nahavandi S, Acharya UR (2021) Applications of deep learning techniques for automated multiple sclerosis detection using magnetic resonance imaging: a review. Computers in Biology and Medicine. https://www.sciencedirect.com/science/article/pii/S0010482521004911

  48. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  49. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. ACM Trans Graph 25(3):835–846

    Article  Google Scholar 

  50. Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum J, Freeman WT (2018) Pix3d: dataset and methods for single-image 3d shape modeling. In: CVPR, pp 2974–2983. IEEE computer society

  51. Tatarchenko M, Dosovitskiy A, Brox T (2016) Multi-view 3d models from single images with a convolutional network. In: ECCV (7), lecture notes in computer science, vol 9911, pp 322–337. Springer

  52. Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: ICCV, pp 2107–2115. IEEE computer society

  53. Tulsiani S, Efros AA, Malik J (2018) Multi-view consistency as supervisory signal for learning shape and pose prediction. In: CVPR, pp 2897–2905. IEEE computer cociety

  54. Tulsiani S, Zhou T, Efros AA, Malik J (2017) Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR, pp 209–217. IEEE computer society

  55. Wang K, Chen K, Jia K (2019) Deep cascade generation on point sets. In: IJCAI, pp 3726–3732. ijcai.org

  56. Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y (2018) Pixel2mesh: generating 3d mesh models from single RGB images. In: ECCV (11), lecture notes in computer science, vol. 11215, pp 55–71. Springer

  57. Wen C, Zhang Y, Li Z, Fu Y (2019) Pixel2mesh++: multi-view 3d mesh generation via deformation. In: ICCV, pp 1042–1051. IEEE

  58. Witkin AP (1981) Recovering surface shape and orientation from texture. Artif Intell 17(1–3):17–45

    Article  Google Scholar 

  59. Wu J, Wang Y, Xue T, Sun X, Freeman B, Tenenbaum J (2017) Marrnet: 3d shape reconstruction via 2.5d sketches. In: NIPS, pp 540–550

  60. Wu J, Zhang C, Xue T, Freeman B, Tenenbaum J (2016) Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: NIPS, pp 82–90

  61. Wu J, Zhang C, Zhang X, Zhang Z, Freeman WT, Tenenbaum J (2018) Learning shape priors for single-view 3d completion and reconstruction. In: ECCV (11), lecture notes in computer science, vol 11215, pp 673–691. Springer

  62. Xie H, Yao H, Sun X, Zhou S, Zhang S (2019) Pix2vox: context-aware 3d reconstruction from single and multi-view images. In: ICCV, pp 2690–2698. IEEE

  63. Xu Q, Wang W, Ceylan D, Mech R, Neumann U (2019) DISN: deep implicit surface network for high-quality single-view 3d reconstruction. In: NeurIPS, pp 490–500

  64. Yan X, Yang J, Yumer E, Guo Y, Lee H (2016) Perspective transformer nets: learning single-view 3d object reconstruction without 3d supervision. In: NIPS, pp 1696–1704

  65. Yang B, Rosa S, Markham A, Trigoni N, Wen H (2019) Dense 3d object reconstruction from a single depth view. IEEE Trans Pattern Anal Mach Intell 41(12):2820–2834

    Article  Google Scholar 

  66. Yang B, Wang S, Markham A, Trigoni N (2020) Robust attentional aggregation of deep feature sets for multi-view 3d reconstruction. Int J Comput Vis 128(1):53–73

    Article  MathSciNet  Google Scholar 

  67. Yao Y, Schertler N, Rosales E, Rhodin H, Sigal L, Sheffer A (2020) Front2back: single view 3d shape reconstruction via front to back prediction. In: CVPR, pp 528–537. IEEE

  68. Zhu C, Xu K, Chaudhuri S, Yi R, Zhang H (2018) SCORES: shape composition with recursive substructure priors. ACM Trans Graph 37(6):211,1–211,14

    Article  Google Scholar 

Download references

Funding

This work is supported by the National Natural Science Foundation of China No.42075139, 42077232, 61272219; the National High Technology Research and Development Program of China No. 2007AA01Z334; the Science and technology program of Jiangsu Province No. BE2020082, BE2010072, BE2011058, BY2012190; the China Postdoctoral Science Foundation No. 2017M621700 and Innovation Fund of State Key Laboratory for Novel Software Technology No. ZZKT2018A09.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengxing Sun.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, Y., Sun, Z., Luo, S. et al. Video supervised for 3D reconstruction from single image. Multimed Tools Appl 81, 15061–15083 (2022). https://doi.org/10.1007/s11042-022-12459-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12459-1

Keywords

Navigation