Abstract
3D reconstruction from a sketch offers an efficient means of boosting the productivity of 3D modeling. However, such a task remains largely under-explored due to the difficulties caused by the inherent abstractive representation and diversity of sketches. In this paper, we introduce a novel deep neural network model, Sketch2Vox, for 3D reconstruction from a single monocular sketch. Taking a sketch as input, the proposed model first converts it into two different representations, i.e., a binary image and a 2D point cloud. Second, we extract semantic features from them using two newly-developed processing modules, including the SktConv module designed for hierarchical abstract features learning from the binary image and the SktMPFM designed for local and global context feature extraction from the 2D point cloud. Prior to feeding features into the 3D-decoder-refiner module for fine-grained reconstruction, the resultant image-based and point-based feature maps are fused together according to their internal correlation using the proposed cross-modal fusion attention module. Finally, we use an optimization module to refine the details of the generated 3D model. To evaluate the efficiency of our method, we collect a large dataset consisting of more than 12,000 Sketch-Voxel pairs and compare the proposed Sketch2Vox against several state-of-the-art methods. The experimental results demonstrate the proposed method is superior to peer ones with regard to reconstruction quality. The dataset is publicly available on https://drive.google.com/file/d/1aXug8PcLnWaDZiWZrcmhvVNFC4n_eAih/view?usp=sharing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, S.Y., Su, W., Gao, L., Xia, S., Fu, H.: DeepFaceDrawing: deep generation of face images from sketches. ACM Trans. Graph. (TOG) 39(4), 72 (2020)
Chen, T., Lin, L., Chen, R., Hui, X., Wu, H.: Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1371–1384 (2022)
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: European Conference on Computer Vision, pp. 628–644. Springer (2016)
Dai, G., Xie, J., Fang, Y.: Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27(7), 3374–3386 (2018)
Dai, H., Pears, N., Smith, W.A., Duncan, C.: A 3D morphable model of craniofacial shape and texture variation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3085–3093 (2017)
Deng, D., Wu, H., Sun, P., Wang, R., Shi, Z., Luo, X.: A new geometric modeling approach for woven fabric based on frenet frame and spiral equation. J. Comput. Appl. Math. 329, 84–94 (2018)
Elhami, G., Scholefield, A.J., Vetterli, M.: Shape from bandwidth: central projection case. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1808–1812. IEEE (2020)
Fu, K., Peng, J., He, Q., Zhang, H.: Single image 3D object reconstruction based on deep learning: a review. Multimedia Tools Appl. 80, 463–498 (2021)
Han, Z., Ma, B., Liu, Y.S., Zwicker, M.: Reconstructing 3D shapes from multiple sketches using direct shape optimization. IEEE Trans. Image Process. 29, 8721–8734 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. Int. J. Comput. Vision 128(4), 835–854 (2020)
Hong, F., Pan, L., Cai, Z., Liu, Z.: Garment4D: garment reconstruction from point cloud sequences. In: Advances in Neural Information Processing Systems, vol. 34, pp. 27940–27951 (2021)
Huang, H., Kalogerakis, E., Yumer, E., Mech, R.: Shape synthesis from sketches via procedural models and convolutional networks. IEEE Trans. Visual Comput. Graphics 23(8), 2003–2013 (2016)
Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lun, Z., Gadelha, M., Kalogerakis, E., Maji, S., Wang, R.: 3D shape reconstruction from sketches via multi-view convolutional networks. In: 2017 International Conference on 3D Vision (3DV), pp. 67–77. IEEE (2017)
Muraoroshi, W., Miyazaki, D.: Shape from shading and polarization constrained by approximate shape. In: 2021 17th International Conference on Machine Vision and Applications (MVA), pp. 1–5. IEEE (2021)
Olsen, L., Samavati, F.F., Sousa, M.C., Jorge, J.A.: Sketch-based modeling: a survey. Comput. Graph. 33(1), 85–103 (2009)
Peng, K., Islam, R., Quarles, J., Desai, K.: TMVNet: using transformers for multi-view voxel-based 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 222–230 (2022)
Pontes, J.K., Kong, C., Sridharan, S., Lucey, S., Eriksson, A., Fookes, C.: Image2Mesh: a learning framework for single image 3D reconstruction. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 365–381. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_23
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Samavati, T., Soryani, M.: Deep learning-based 3D reconstruction: a survey. Artif. Intell. Rev. 56, 9175–9219 (2023)
Shi, Z., Meng, Z., Xing, Y., Ma, Y., Wattenhofer, R.: 3D-RETR: end-to-end single and multi-view 3D reconstruction with transformers. arXiv preprint arXiv:2110.08861 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)
Tachella, J., et al.: Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers. Nat. Commun. 10(1), 1–6 (2019)
Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3D reconstruction networks learn? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3405–3414 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv arXiv:1706.03762 (2017)
Vlavianos, N., Nagakura, T.: An architectural metaverse that combines dynamic and static 3D data in XR: a case study at the monastery of simonos petra. In: Proceedings of the 26th International Conference on Cultural Heritage and New Technologies, pp. 1–6 (2021)
Wallace, B., Hariharan, B.: Few-shot generalization for single-image 3D reconstruction via priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3818–3827 (2019)
Wang, F., Lin, S., Luo, X., Wu, H., Wang, R., Zhou, F.: A data-driven approach for sketch-based 3D shape retrieval via similar drawing-style recommendation. Comput. Graph. Forum 36(7), 157–166 (2017)
Wang, F., et al.: SPFusionNet: sketch segmentation using multi-modal data fusion. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1654–1659. IEEE (2019)
Wang, F., Tang, K., Wu, H., Zhao, B., Cai, H., Zhou, T.: SketchBodyNet: a sketch-driven multi-faceted decoder network for 3D human reconstruction. arXiv preprint arXiv:2310.06577 (2023)
Wang, J., Lin, J., Yu, Q., Liu, R., Chen, Y., Yu, S.X.: 3D shape reconstruction from free-hand sketches. arXiv preprint arXiv:2006.09694 (2020)
Wang, L., Qian, C., Wang, J., Fang, Y.: Unsupervised learning of 3D model reconstruction from hand-drawn sketches. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 1820–1828 (2018)
Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. arXiv:1610.07584 (2016)
Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2020)
Wu, Z., et al.: 3D shapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2vox: context-aware 3D reconstruction from single and multi-view images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020)
Xie, H., Yao, H., Zhang, S., Zhou, S., Sun, W.: Pix2vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vision 128(12), 2919–2935 (2020)
Xing, Z., Chen, Y., Ling, Z., Zhou, X., Xiang, Y.: Few-shot single-view 3D reconstruction with memory prior contrastive network. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 55–70. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_4
Xing, Z., Li, H., Wu, Z., Jiang, Y.G.: Semi-supervised single-view 3D reconstruction via prototype shape priors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 535–551. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_31
Yang, S., Xu, M., Xie, H., Perry, S., Xia, J.: Single-view 3D object reconstruction from shape priors in memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3152–3161 (2021)
Yang, X., et al.: Mobile3DRecon: real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Visual Comput. Graphics 26(12), 3446–3456 (2020)
Zhang, S.H., Guo, Y.C., Gu, Q.W.: Sketch2Model: view-aware 3D modeling from single free-hand sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6012–6021 (2021)
Zhao, Y., et al.: Metaverse: perspectives from graphics, interactions and visualization. Vis. Inform. 6, 56–67 (2022)
Zhong, Y., Qi, Y., Gryaditskaya, Y., Zhang, H., Song, Y.Z.: Towards practical sketch-based 3D shape generation: the role of professional sketches. IEEE Trans. Circuits Syst. Video Technol. 31(9), 3518–3528 (2020)
Zhou, W., Jia, J., Huang, C., Cheng, Y.: Web3D learning framework for 3D shape retrieval based on hybrid convolutional neural networks. Tsinghua Sci. Technol. 25(1), 93–102 (2019)
Zhou, W., Jia, J., Jiang, W., Huang, C.: Sketch augmentation-driven shape retrieval learning framework based on convolutional neural networks. IEEE Trans. Visual Comput. Graphics 27(8), 3558–3570 (2020)
Acknowledgements
This study was supported by the Guangdong Provincial Science and Technology Innovation Strategy Special Project (“Big Project + Task List” Project) (STKJ2023069, STKJ202209003); Key Area Special Projects for General Colleges and Universities in Guangdong Province (2022ZDZX1007); Guangdong Basic and Applied Basic Research Foundation (2022A1515011978).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, F. (2025). Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-72904-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72903-4
Online ISBN: 978-3-031-72904-1
eBook Packages: Computer ScienceComputer Science (R0)