Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch

Wang, Fei

doi:10.1007/978-3-031-72904-1_4

Fei Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15132))

Included in the following conference series:

European Conference on Computer Vision

259 Accesses

Abstract

3D reconstruction from a sketch offers an efficient means of boosting the productivity of 3D modeling. However, such a task remains largely under-explored due to the difficulties caused by the inherent abstractive representation and diversity of sketches. In this paper, we introduce a novel deep neural network model, Sketch2Vox, for 3D reconstruction from a single monocular sketch. Taking a sketch as input, the proposed model first converts it into two different representations, i.e., a binary image and a 2D point cloud. Second, we extract semantic features from them using two newly-developed processing modules, including the SktConv module designed for hierarchical abstract features learning from the binary image and the SktMPFM designed for local and global context feature extraction from the 2D point cloud. Prior to feeding features into the 3D-decoder-refiner module for fine-grained reconstruction, the resultant image-based and point-based feature maps are fused together according to their internal correlation using the proposed cross-modal fusion attention module. Finally, we use an optimization module to refine the details of the generated 3D model. To evaluate the efficiency of our method, we collect a large dataset consisting of more than 12,000 Sketch-Voxel pairs and compare the proposed Sketch2Vox against several state-of-the-art methods. The experimental results demonstrate the proposed method is superior to peer ones with regard to reconstruction quality. The dataset is publicly available on https://drive.google.com/file/d/1aXug8PcLnWaDZiWZrcmhvVNFC4n_eAih/view?usp=sharing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

PotC2Vox: A Point Cloud Data-Driven 3D Reconstruction Method for Single-View Images

From sketch to reality: precision-friendly 3D generation technology

Article 16 May 2024

SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling

References

Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, S.Y., Su, W., Gao, L., Xia, S., Fu, H.: DeepFaceDrawing: deep generation of face images from sketches. ACM Trans. Graph. (TOG) 39(4), 72 (2020)
Article MATH Google Scholar
Chen, T., Lin, L., Chen, R., Hui, X., Wu, H.: Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1371–1384 (2022)
Article MATH Google Scholar
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: European Conference on Computer Vision, pp. 628–644. Springer (2016)
Google Scholar
Dai, G., Xie, J., Fang, Y.: Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27(7), 3374–3386 (2018)
Article MathSciNet MATH Google Scholar
Dai, H., Pears, N., Smith, W.A., Duncan, C.: A 3D morphable model of craniofacial shape and texture variation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3085–3093 (2017)
Google Scholar
Deng, D., Wu, H., Sun, P., Wang, R., Shi, Z., Luo, X.: A new geometric modeling approach for woven fabric based on frenet frame and spiral equation. J. Comput. Appl. Math. 329, 84–94 (2018)
Article MathSciNet MATH Google Scholar
Elhami, G., Scholefield, A.J., Vetterli, M.: Shape from bandwidth: central projection case. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1808–1812. IEEE (2020)
Google Scholar
Fu, K., Peng, J., He, Q., Zhang, H.: Single image 3D object reconstruction based on deep learning: a review. Multimedia Tools Appl. 80, 463–498 (2021)
Article MATH Google Scholar
Han, Z., Ma, B., Liu, Y.S., Zwicker, M.: Reconstructing 3D shapes from multiple sketches using direct shape optimization. IEEE Trans. Image Process. 29, 8721–8734 (2020)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. Int. J. Comput. Vision 128(4), 835–854 (2020)
Article MATH Google Scholar
Hong, F., Pan, L., Cai, Z., Liu, Z.: Garment4D: garment reconstruction from point cloud sequences. In: Advances in Neural Information Processing Systems, vol. 34, pp. 27940–27951 (2021)
Google Scholar
Huang, H., Kalogerakis, E., Yumer, E., Mech, R.: Shape synthesis from sketches via procedural models and convolutional networks. IEEE Trans. Visual Comput. Graphics 23(8), 2003–2013 (2016)
Article Google Scholar
Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lun, Z., Gadelha, M., Kalogerakis, E., Maji, S., Wang, R.: 3D shape reconstruction from sketches via multi-view convolutional networks. In: 2017 International Conference on 3D Vision (3DV), pp. 67–77. IEEE (2017)
Google Scholar
Muraoroshi, W., Miyazaki, D.: Shape from shading and polarization constrained by approximate shape. In: 2021 17th International Conference on Machine Vision and Applications (MVA), pp. 1–5. IEEE (2021)
Google Scholar
Olsen, L., Samavati, F.F., Sousa, M.C., Jorge, J.A.: Sketch-based modeling: a survey. Comput. Graph. 33(1), 85–103 (2009)
Article MATH Google Scholar
Peng, K., Islam, R., Quarles, J., Desai, K.: TMVNet: using transformers for multi-view voxel-based 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 222–230 (2022)
Google Scholar
Pontes, J.K., Kong, C., Sridharan, S., Lucey, S., Eriksson, A., Fookes, C.: Image2Mesh: a learning framework for single image 3D reconstruction. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 365–381. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_23
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter MATH Google Scholar
Samavati, T., Soryani, M.: Deep learning-based 3D reconstruction: a survey. Artif. Intell. Rev. 56, 9175–9219 (2023)
Article MATH Google Scholar
Shi, Z., Meng, Z., Xing, Y., Ma, Y., Wattenhofer, R.: 3D-RETR: end-to-end single and multi-view 3D reconstruction with transformers. arXiv preprint arXiv:2110.08861 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)
Google Scholar
Tachella, J., et al.: Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers. Nat. Commun. 10(1), 1–6 (2019)
Article MathSciNet Google Scholar
Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3D reconstruction networks learn? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3405–3414 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv arXiv:1706.03762 (2017)
Vlavianos, N., Nagakura, T.: An architectural metaverse that combines dynamic and static 3D data in XR: a case study at the monastery of simonos petra. In: Proceedings of the 26th International Conference on Cultural Heritage and New Technologies, pp. 1–6 (2021)
Google Scholar
Wallace, B., Hariharan, B.: Few-shot generalization for single-image 3D reconstruction via priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3818–3827 (2019)
Google Scholar
Wang, F., Lin, S., Luo, X., Wu, H., Wang, R., Zhou, F.: A data-driven approach for sketch-based 3D shape retrieval via similar drawing-style recommendation. Comput. Graph. Forum 36(7), 157–166 (2017)
Article MATH Google Scholar
Wang, F., et al.: SPFusionNet: sketch segmentation using multi-modal data fusion. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1654–1659. IEEE (2019)
Google Scholar
Wang, F., Tang, K., Wu, H., Zhao, B., Cai, H., Zhou, T.: SketchBodyNet: a sketch-driven multi-faceted decoder network for 3D human reconstruction. arXiv preprint arXiv:2310.06577 (2023)
Wang, J., Lin, J., Yu, Q., Liu, R., Chen, Y., Yu, S.X.: 3D shape reconstruction from free-hand sketches. arXiv preprint arXiv:2006.09694 (2020)
Wang, L., Qian, C., Wang, J., Fang, Y.: Unsupervised learning of 3D model reconstruction from hand-drawn sketches. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 1820–1828 (2018)
Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. arXiv:1610.07584 (2016)
Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2020)
Google Scholar
Wu, Z., et al.: 3D shapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2vox: context-aware 3D reconstruction from single and multi-view images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020)
Google Scholar
Xie, H., Yao, H., Zhang, S., Zhou, S., Sun, W.: Pix2vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vision 128(12), 2919–2935 (2020)
Article MATH Google Scholar
Xing, Z., Chen, Y., Ling, Z., Zhou, X., Xiang, Y.: Few-shot single-view 3D reconstruction with memory prior contrastive network. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 55–70. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_4
Xing, Z., Li, H., Wu, Z., Jiang, Y.G.: Semi-supervised single-view 3D reconstruction via prototype shape priors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 535–551. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_31
Yang, S., Xu, M., Xie, H., Perry, S., Xia, J.: Single-view 3D object reconstruction from shape priors in memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3152–3161 (2021)
Google Scholar
Yang, X., et al.: Mobile3DRecon: real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Visual Comput. Graphics 26(12), 3446–3456 (2020)
Article MATH Google Scholar
Zhang, S.H., Guo, Y.C., Gu, Q.W.: Sketch2Model: view-aware 3D modeling from single free-hand sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6012–6021 (2021)
Google Scholar
Zhao, Y., et al.: Metaverse: perspectives from graphics, interactions and visualization. Vis. Inform. 6, 56–67 (2022)
Article MATH Google Scholar
Zhong, Y., Qi, Y., Gryaditskaya, Y., Zhang, H., Song, Y.Z.: Towards practical sketch-based 3D shape generation: the role of professional sketches. IEEE Trans. Circuits Syst. Video Technol. 31(9), 3518–3528 (2020)
Article MATH Google Scholar
Zhou, W., Jia, J., Huang, C., Cheng, Y.: Web3D learning framework for 3D shape retrieval based on hybrid convolutional neural networks. Tsinghua Sci. Technol. 25(1), 93–102 (2019)
Article MATH Google Scholar
Zhou, W., Jia, J., Jiang, W., Huang, C.: Sketch augmentation-driven shape retrieval learning framework based on convolutional neural networks. IEEE Trans. Visual Comput. Graphics 27(8), 3558–3570 (2020)
Article MATH Google Scholar

Download references

Acknowledgements

This study was supported by the Guangdong Provincial Science and Technology Innovation Strategy Special Project (“Big Project + Task List” Project) (STKJ2023069, STKJ202209003); Key Area Special Projects for General Colleges and Universities in Guangdong Province (2022ZDZX1007); Guangdong Basic and Applied Basic Research Foundation (2022A1515011978).

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Shantou University, Shantou, China
Fei Wang

Authors

Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Wang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, F. (2025). Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-72904-1_4
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72903-4
Online ISBN: 978-3-031-72904-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch