Skip to main content

Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15132))

Included in the following conference series:

  • 259 Accesses

Abstract

3D reconstruction from a sketch offers an efficient means of boosting the productivity of 3D modeling. However, such a task remains largely under-explored due to the difficulties caused by the inherent abstractive representation and diversity of sketches. In this paper, we introduce a novel deep neural network model, Sketch2Vox, for 3D reconstruction from a single monocular sketch. Taking a sketch as input, the proposed model first converts it into two different representations, i.e., a binary image and a 2D point cloud. Second, we extract semantic features from them using two newly-developed processing modules, including the SktConv module designed for hierarchical abstract features learning from the binary image and the SktMPFM designed for local and global context feature extraction from the 2D point cloud. Prior to feeding features into the 3D-decoder-refiner module for fine-grained reconstruction, the resultant image-based and point-based feature maps are fused together according to their internal correlation using the proposed cross-modal fusion attention module. Finally, we use an optimization module to refine the details of the generated 3D model. To evaluate the efficiency of our method, we collect a large dataset consisting of more than 12,000 Sketch-Voxel pairs and compare the proposed Sketch2Vox against several state-of-the-art methods. The experimental results demonstrate the proposed method is superior to peer ones with regard to reconstruction quality. The dataset is publicly available on https://drive.google.com/file/d/1aXug8PcLnWaDZiWZrcmhvVNFC4n_eAih/view?usp=sharing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  2. Chen, S.Y., Su, W., Gao, L., Xia, S., Fu, H.: DeepFaceDrawing: deep generation of face images from sketches. ACM Trans. Graph. (TOG) 39(4), 72 (2020)

    Article  MATH  Google Scholar 

  3. Chen, T., Lin, L., Chen, R., Hui, X., Wu, H.: Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1371–1384 (2022)

    Article  MATH  Google Scholar 

  4. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: European Conference on Computer Vision, pp. 628–644. Springer (2016)

    Google Scholar 

  5. Dai, G., Xie, J., Fang, Y.: Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27(7), 3374–3386 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dai, H., Pears, N., Smith, W.A., Duncan, C.: A 3D morphable model of craniofacial shape and texture variation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3085–3093 (2017)

    Google Scholar 

  7. Deng, D., Wu, H., Sun, P., Wang, R., Shi, Z., Luo, X.: A new geometric modeling approach for woven fabric based on frenet frame and spiral equation. J. Comput. Appl. Math. 329, 84–94 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  8. Elhami, G., Scholefield, A.J., Vetterli, M.: Shape from bandwidth: central projection case. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1808–1812. IEEE (2020)

    Google Scholar 

  9. Fu, K., Peng, J., He, Q., Zhang, H.: Single image 3D object reconstruction based on deep learning: a review. Multimedia Tools Appl. 80, 463–498 (2021)

    Article  MATH  Google Scholar 

  10. Han, Z., Ma, B., Liu, Y.S., Zwicker, M.: Reconstructing 3D shapes from multiple sketches using direct shape optimization. IEEE Trans. Image Process. 29, 8721–8734 (2020)

    Article  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  12. Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. Int. J. Comput. Vision 128(4), 835–854 (2020)

    Article  MATH  Google Scholar 

  13. Hong, F., Pan, L., Cai, Z., Liu, Z.: Garment4D: garment reconstruction from point cloud sequences. In: Advances in Neural Information Processing Systems, vol. 34, pp. 27940–27951 (2021)

    Google Scholar 

  14. Huang, H., Kalogerakis, E., Yumer, E., Mech, R.: Shape synthesis from sketches via procedural models and convolutional networks. IEEE Trans. Visual Comput. Graphics 23(8), 2003–2013 (2016)

    Article  Google Scholar 

  15. Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  17. Lun, Z., Gadelha, M., Kalogerakis, E., Maji, S., Wang, R.: 3D shape reconstruction from sketches via multi-view convolutional networks. In: 2017 International Conference on 3D Vision (3DV), pp. 67–77. IEEE (2017)

    Google Scholar 

  18. Muraoroshi, W., Miyazaki, D.: Shape from shading and polarization constrained by approximate shape. In: 2021 17th International Conference on Machine Vision and Applications (MVA), pp. 1–5. IEEE (2021)

    Google Scholar 

  19. Olsen, L., Samavati, F.F., Sousa, M.C., Jorge, J.A.: Sketch-based modeling: a survey. Comput. Graph. 33(1), 85–103 (2009)

    Article  MATH  Google Scholar 

  20. Peng, K., Islam, R., Quarles, J., Desai, K.: TMVNet: using transformers for multi-view voxel-based 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 222–230 (2022)

    Google Scholar 

  21. Pontes, J.K., Kong, C., Sridharan, S., Lucey, S., Eriksson, A., Fookes, C.: Image2Mesh: a learning framework for single image 3D reconstruction. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 365–381. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_23

    Chapter  Google Scholar 

  22. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  MATH  Google Scholar 

  23. Samavati, T., Soryani, M.: Deep learning-based 3D reconstruction: a survey. Artif. Intell. Rev. 56, 9175–9219 (2023)

    Article  MATH  Google Scholar 

  24. Shi, Z., Meng, Z., Xing, Y., Ma, Y., Wattenhofer, R.: 3D-RETR: end-to-end single and multi-view 3D reconstruction with transformers. arXiv preprint arXiv:2110.08861 (2021)

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  26. Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3D reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)

    Google Scholar 

  27. Tachella, J., et al.: Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers. Nat. Commun. 10(1), 1–6 (2019)

    Article  MathSciNet  Google Scholar 

  28. Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3D reconstruction networks learn? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3405–3414 (2019)

    Google Scholar 

  29. Vaswani, A., et al.: Attention is all you need. arXiv arXiv:1706.03762 (2017)

  30. Vlavianos, N., Nagakura, T.: An architectural metaverse that combines dynamic and static 3D data in XR: a case study at the monastery of simonos petra. In: Proceedings of the 26th International Conference on Cultural Heritage and New Technologies, pp. 1–6 (2021)

    Google Scholar 

  31. Wallace, B., Hariharan, B.: Few-shot generalization for single-image 3D reconstruction via priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3818–3827 (2019)

    Google Scholar 

  32. Wang, F., Lin, S., Luo, X., Wu, H., Wang, R., Zhou, F.: A data-driven approach for sketch-based 3D shape retrieval via similar drawing-style recommendation. Comput. Graph. Forum 36(7), 157–166 (2017)

    Article  MATH  Google Scholar 

  33. Wang, F., et al.: SPFusionNet: sketch segmentation using multi-modal data fusion. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 1654–1659. IEEE (2019)

    Google Scholar 

  34. Wang, F., Tang, K., Wu, H., Zhao, B., Cai, H., Zhou, T.: SketchBodyNet: a sketch-driven multi-faceted decoder network for 3D human reconstruction. arXiv preprint arXiv:2310.06577 (2023)

  35. Wang, J., Lin, J., Yu, Q., Liu, R., Chen, Y., Yu, S.X.: 3D shape reconstruction from free-hand sketches. arXiv preprint arXiv:2006.09694 (2020)

  36. Wang, L., Qian, C., Wang, J., Fang, Y.: Unsupervised learning of 3D model reconstruction from hand-drawn sketches. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 1820–1828 (2018)

    Google Scholar 

  37. Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. arXiv:1610.07584 (2016)

  38. Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2020)

    Google Scholar 

  39. Wu, Z., et al.: 3D shapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

    Google Scholar 

  40. Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2vox: context-aware 3D reconstruction from single and multi-view images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020)

    Google Scholar 

  41. Xie, H., Yao, H., Zhang, S., Zhou, S., Sun, W.: Pix2vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vision 128(12), 2919–2935 (2020)

    Article  MATH  Google Scholar 

  42. Xing, Z., Chen, Y., Ling, Z., Zhou, X., Xiang, Y.: Few-shot single-view 3D reconstruction with memory prior contrastive network. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 55–70. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_4

  43. Xing, Z., Li, H., Wu, Z., Jiang, Y.G.: Semi-supervised single-view 3D reconstruction via prototype shape priors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 535–551. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_31

  44. Yang, S., Xu, M., Xie, H., Perry, S., Xia, J.: Single-view 3D object reconstruction from shape priors in memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3152–3161 (2021)

    Google Scholar 

  45. Yang, X., et al.: Mobile3DRecon: real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Visual Comput. Graphics 26(12), 3446–3456 (2020)

    Article  MATH  Google Scholar 

  46. Zhang, S.H., Guo, Y.C., Gu, Q.W.: Sketch2Model: view-aware 3D modeling from single free-hand sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6012–6021 (2021)

    Google Scholar 

  47. Zhao, Y., et al.: Metaverse: perspectives from graphics, interactions and visualization. Vis. Inform. 6, 56–67 (2022)

    Article  MATH  Google Scholar 

  48. Zhong, Y., Qi, Y., Gryaditskaya, Y., Zhang, H., Song, Y.Z.: Towards practical sketch-based 3D shape generation: the role of professional sketches. IEEE Trans. Circuits Syst. Video Technol. 31(9), 3518–3528 (2020)

    Article  MATH  Google Scholar 

  49. Zhou, W., Jia, J., Huang, C., Cheng, Y.: Web3D learning framework for 3D shape retrieval based on hybrid convolutional neural networks. Tsinghua Sci. Technol. 25(1), 93–102 (2019)

    Article  MATH  Google Scholar 

  50. Zhou, W., Jia, J., Jiang, W., Huang, C.: Sketch augmentation-driven shape retrieval learning framework based on convolutional neural networks. IEEE Trans. Visual Comput. Graphics 27(8), 3558–3570 (2020)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This study was supported by the Guangdong Provincial Science and Technology Innovation Strategy Special Project (“Big Project + Task List” Project) (STKJ2023069, STKJ202209003); Key Area Special Projects for General Colleges and Universities in Guangdong Province (2022ZDZX1007); Guangdong Basic and Applied Basic Research Foundation (2022A1515011978).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, F. (2025). Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72904-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72903-4

  • Online ISBN: 978-3-031-72904-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics