Abstract
The rise of artificial intelligence generated content (AIGC) has been remarkable in the language and image fields, but artificial intelligence (AI) generated three-dimensional (3D) models are still under-explored due to their complex nature and lack of training data. The conventional approach of creating 3D content through computer-aided design (CAD) is labor-intensive and requires expertise, making it challenging for novice users. To address this issue, we propose a sketch-based 3D modeling approach, Deep3DSketch-im, which uses a single freehand sketch for modeling. This is a challenging task due to the sparsity and ambiguity. Deep3DSketch-im uses a novel data representation called the signed distance field (SDF) to improve the sketch-to-3D model process by incorporating an implicit continuous field instead of voxel or points, and a specially designed neural network that can capture point and local features. Extensive experiments are conducted to demonstrate the effectiveness of the approach, achieving state-of-the-art (SOTA) performance on both synthetic and real datasets. Additionally, users show more satisfaction with results generated by Deep3DSketch-im, as reported in a user study. We believe that Deep3DSketch-im has the potential to revolutionize the process of 3D modeling by providing an intuitive and easy-to-use solution for novice users.
摘要
人工智能生成内容(AIGC)在语言和图像领域的崛起值得注意,但由于其复杂性和缺乏训练数据,基于人工智能生成三维模型仍未被充分探索。通过计算机辅助设计(CAD)创建三维内容的传统方法需大量人力和专业知识,这对于新手用户来说具有挑战性。为解决此问题,提出一种基于草图的三维建模方法,名为Deep3DSketch-im,它利用单个手绘草图进行建模。由于草图的稀疏性和模棱两可性,这是一项具有挑战性的任务。Deep3DSketch-im使用一种称作“有符号距离场(SDF)”的新型数据表示,通过将隐式连续场整合至从草图到三维模型的过程,以及一个特别设计的可以捕捉点和局部特征的神经网络,改进从草图到三维模型的过程。进行了大量实验证明该方法的有效性,在合成数据集和真实数据集上均取得更优的性能。此外,用户研究报告显示,用户对Deep3DSketch-im生成的结果更加满意。我们相信,Deep3DSketch-im有潜力通过为新手用户提供直观易用的解决方案来彻底改变三维建模的过程。
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Our project data can be found at https://tianrunchen.github.io/Deep3DSketch-im. Other data that support the findings of this study are available from the corresponding authors upon reasonable request.
References
Cai YJ, Wang YW, Zhu YH, et al., 2021. A unified 3D human motion synthesis model via conditional variational autoencoder. IEEE/CVF Int Conf on Computer Vision, p.11625–11635. https://doi.org/10.1109/ICCV48922.2021.01144
Chang AX, Funkhouser T, Guibas L, et al., 2015. ShapeNet: an information-rich 3D model repository. https://arxiv.org/abs/1512.03012
Chen DY, Tian XP, Shen YT, et al., 2003. On visual similarity based 3D model retrieval. Comput Graph Forum, 22(3):223–232. https://doi.org/10.1111/1467-8659.00669
Chen TR, Fu CL, Zhu LY, et al., 2023a. Deep3DSketch: 3D modeling from free-hand sketches with view- and structural-aware adversarial training. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.1–5. https://doi.org/10.1109/ICASSP49357.2023.10096348
Chen TR, Fu CL, Zang Y, et al., 2023b. Deep3DSketch+: rapid 3D modeling from single free-hand sketches. Proc 29th Int Conf on Multimedia Modeling, p.16–28. https://doi.org/10.1007/978-3-031-27818-1_2
Chen TR, Ding CT, Zhu LY, et al., 2023c. Reality3DSketch: rapid 3D modeling of objects from single freehand sketches. IEEE Trans Multim, early access. https://doi.org/10.1109/TMM.2023.3327533
Chen ZQ, Zhang H, 2019. Learning implicit fields for generative shape modeling. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5932–5941. https://doi.org/10.1109/CVPR.2019.00609
Chester I, 2007. Teaching for CAD expertise. Int J Technol Des Educ, 17:23–35. https://doi.org/10.1007/s10798-006-9015-z
Cohen JM, Markosian L, Zeleznik RC, et al., 1999. An interface for sketching 3D curves. Symp on Interactive 3D Graphics, p.17–21. https://doi.org/10.1145/300523.300655
Deng CY, Huang JH, Yang YL, 2020. Interactive modeling of lofted shapes from a single image. Comput Visual Med, 6(3):279–289. https://doi.org/10.1007/s41095-019-0153-0
Fu X, Zhang SZ, Chen TR, et al., 2022. Panoptic NeRF: 3D-to-2D label transfer for panoptic urban scene segmentation. Int Conf on 3D Vision, p.1–11. https://doi.org/10.1109/3DV57658.2022.00042
Gao CJ, Yu Q, Sheng L, et al., 2022. SketchSampler: sketch-based 3D reconstruction via view-dependent depth sampling. Proc 17th European Conf on Computer Vision, p.464–479. https://doi.org/10.1007/978-3-031-19769-7_27
Guillard B, Remelli E, Yvernay P, et al., 2021. Sketch2Mesh: reconstructing and editing 3D shapes from sketches. IEEE/CVF Int Conf on Computer Vision, p.13003–13012. https://doi.org/10.1109/ICCV48922.2021.01278
Huang SS, Wang YH, 2024. Controllable image generation based on causal representation learning. Front Inform Technol Electron Eng, 25(1):135–148. https://doi.org/10.1631/FITEE.2300303
Jo K, Shim G, Jung S, et al., 2023. CG-NeRF: conditional generative neural radiance fields for 3D-aware image synthesis. IEEE/CVF Winter Conf on Applications of Computer Vision, p.724–733. https://doi.org/10.1109/WACV56688.2023.00079
Kar A, Häne C, Malik J, 2017. Learning a multi-view stereo machine. Proc 31st Int Conf on Neural Information Processing Systems, p.364–375.
Kato H, Ushiku Y, Harada T, 2018. Neural 3D mesh renderer. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3907–3916. https://doi.org/10.1109/CVPR.2018.00411
Lei YM, Li JQ, 2024. Prompt learning in computer vision: a survey. Front Inform Technol Electron Eng, 25(1):42–63. https://doi.org/10.1631/FITEE.2300389
Li CJ, Pan H, Bousseau A, et al., 2020. Sketch2CAD: sequential CAD modeling by sketching in context. ACM Trans Graph, 39(6):164. https://doi.org/10.1145/3414685.3417807
Lin CH, Wang CY, Lucey S, 2020. SDF-SRN: learning signed distance 3D object reconstruction from static images. Proc 34th Int Conf on Neural Information Processing Systems, Article 961.
Lin GY, Yang L, Zhang CY, et al., 2023. Patch-Grid: an efficient and feature-preserving neural implicit surface representation. https://arxiv.org/abs/2308.13934
Liu SC, Saito S, Chen WK, et al., 2019a. Learning to infer implicit surfaces without 3D supervision. Proc 33rd Int Conf on Neural Information Processing Systems, Article 32.
Liu SC, Chen WK, Li TY, et al., 2019b. Soft rasterizer: a differentiable renderer for image-based 3D reasoning. IEEE/CVF Int Conf on Computer Vision, p.7707–7716. https://doi.org/10.1109/ICCV.2019.00780
Mahapatra C, Jensen JK, McQuaid M, et al., 2019. Barriers to end-user designers of augmented fabrication. CHI Conf on Human Factors in Computing Systems, Article 383. https://doi.org/10.1145/3290605.3300613
Metzer G, Richardson E, Patashnik O, et al., 2022. LatentNeRF for shape-guided generation of 3D shapes and textures. https://arxiv.org/abs/2211.07600
Michel O, Bar-On R, Liu R, et al., 2022. Text2Mesh: text-driven neural stylization for meshes. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.13482–13492. https://doi.org/10.1109/CVPR52688.2022.01313
Park JJ, Florence P, Straub J, et al., 2019. DeepSDF: learning continuous signed distance functions for shape representation. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.165–174. https://doi.org/10.1109/CVPR.2019.00025
Reddy EJ, Rangadu VP, 2018. Development of knowledge based parametric CAD modeling system for spur gear: an approach. Alex Eng J, 57(4):3139–3149. https://doi.org/10.1016/j.aej.2018.07.010
Seufert M, 2019. Fundamental advantages of considering quality of experience distributions over mean opinion scores. Proc 11th Int Conf on Quality of Multimedia Experience, p.1–6. https://doi.org/10.1109/QoMEX.2019.8743296
Tong X, 2022. Three-dimensional shape space learning for visual concept construction: challenges and research progress. Front Inform Technol Electron Eng, 23(9):1290–1297. https://doi.org/10.1631/FITEE.2200318
Tong YZ, Yuan JK, Zhang M, et al., 2023. Quantitatively measuring and contrastively exploring heterogeneity for domain generalization. Proc 29th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.2189–2200. https://doi.org/10.1145/3580305.3599481
Wang F, Kang L, Li Y, 2015. Sketch-based 3D shape retrieval using convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1875–1883. https://doi.org/10.1109/CVPR.2015.7298797
Wang WY, Xu QG, Ceylan D, et al., 2019. DISN: deep implicit surface network for high-quality single-view 3D reconstruction. Proc 33rd Int Conf on Neural Information Processing Systems, Article 45.
Xu R, Wang ZX, Dou ZY, et al., 2022. RFEPS: reconstructing feature-line equipped polygonal surface. ACM Trans Graph, 41(6):228. https://doi.org/10.1145/3550454.3555443
Xu R, Dou ZY, Wang NN, et al., 2023. Globally consistent normal orientation for point clouds by regularizing the winding-number field. ACM Trans Graph, 42(4):111. https://doi.org/10.1145/3592129
Yang L, Liang YQ, Li X, et al., 2023. Neural parametric surfaces for shape modeling. https://arxiv.org/abs/2309.09911
Yao SY, Zhong RZ, Yan YC, et al., 2022. DFA-NeRF: personalized talking head generation via disentangled face attributes neural rendering. https://arxiv.org/abs/2201.00791
Yu A, Ye V, Tancik M, et al., 2021. pixelNeRF: neural radiance fields from one or few images. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4576–4585. https://doi.org/10.1109/CVPR46437.2021.00455
Zang Y, Fu CL, Chen TR, et al., 2023. Deep3DSketch+: obtaining customized 3D model by single free-hand sketch through deep learning. https://arxiv.org/abs/2310.18609
Zhang SH, Guo YC, Gu QW, 2021. Sketch2Model: view-aware 3D modeling from single free-hand sketches. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6000–6017. https://doi.org/10.1109/CVPR46437.2021.00595
Zhang SZ, Peng SD, Chen TR, et al., 2023. Painting 3D nature in 2D: view synthesis of natural scenes from a single semantic mask. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8518–8528. https://doi.org/10.1109/CVPR52729.2023.00823
Zhong Y, Gryaditskaya Y, Zhang HG, et al., 2020. Deep sketch-based modeling: tips and tricks. Int Conf on 3D Vision, p.543–552. https://doi.org/10.1109/3DV50981.2020.00064
Zhou J, Ke P, Qiu XP, et al., 2023. ChatGPT: potential, prospects, and limitations. Front Inform Technol Electron Eng, early access. https://doi.org/10.1631/FITEE.2300089
Zhu DD, Li YC, Zhang M, et al., 2023a. Bridging the gap: neural collapse inspired prompt tuning for generalization under class imbalance. https://arxiv.org/abs/2306.15955v2
Zhu DD, Li YC, Shao YF, et al., 2023b. Generalized universal domain adaptation with generative flow networks. Proc 31st ACM Int Conf on Multimedia, p.8304–8315. https://doi.org/10.1145/3581783.3612225
Zhu DD, Li YC, Yuan JK, et al., 2023c. Universal domain adaptation via compressive attention matching. IEEE/CVF Int Conf on Computer Vision, p.6974–6985.
Author information
Authors and Affiliations
Contributions
Tianrun CHEN designed the research. Tianrun CHEN and Runlong CAO processed the data and performed the experiments. Tianrun CHEN drafted the paper. Zejian LI, Ying ZANG, and Lingyun SUN revised and finalized the paper.
Corresponding authors
Ethics declarations
Lingyun SUN is an editor-in-chief assistant of this special issue, and he was not involved with the peer review process of this paper. All the authors declare that they have no conflict of interest.
Additional information
Project supported by the National Key R&D Program of China (No. 2022YFB3303301), the National Natural Science Foundation of China (Nos. 62006208, 62107035, and 62207024), and the Public Welfare Research Program of Huzhou Science and Technology Bureau, China (No. 2022GZ01)
Rights and permissions
About this article
Cite this article
Chen, T., Cao, R., Li, Z. et al. Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches. Front Inform Technol Electron Eng 25, 149–159 (2024). https://doi.org/10.1631/FITEE.2300314
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2300314
Key words
- Content creation
- Sketch
- Three-dimensional (3D) modeling
- 3D reconstruction
- Shape from X
- Artificial intelligence (AI)