demonstration

3D Creation at Your Fingertips: From Text or Image to 3D Assets

Authors:

Tao MeiAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9408 - 9410

https://doi.org/10.1145/3581783.3612678

Published: 27 October 2023 Publication History

Get Access

Abstract

We demonstrate an automatic 3D creation system, which can create realistic 3D assets solely from a text or image prompt without requiring any specialized 3D modeling skills. Users can either describe the object they envision in natural language or upload a reference image that records what they have seen with the phone. Our system will generate a high-quality 3D mesh that faithfully matches the users' input. We propose a coarse-to-fine framework to achieve this goal. Specifically, we first obtain a low-resolution mesh instantly by utilizing a pre-trained text/image conditional 3D generative model. Using such coarse mesh as the initialization, we further optimize a high-resolution textured 3D mesh with fine-grained appearance guidance from large-scale 2D diffusion models. Our system can create visually-pleasing results in minutes, which is significantly faster than existing methods. Meanwhile, the system ensures that the resulting 3D assets are precisely aligned with the input text or image prompt. With these advanced capabilities, our demonstration provides a streamlined and intuitive platform for users to incorporate 3D creation into their daily lives.

References

[1]

Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019a. Animating Your Life: Real-Time Video-to-Animation Translation. In ACM MM Demo.

Google Scholar

[2]

Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019b. Mocycle-gan: Unpaired video-to-video translation. In ACM MM.

Digital Library

Google Scholar

[3]

Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).

Google Scholar

[4]

Yehao Li, Ting Yao, Yingwei Pan, and Tao Mei. 2022. Contextual transformer networks for visual recognition. IEEE TPAMI (2022).

Crossref

Google Scholar

[5]

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.

Google Scholar

[6]

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot One Image to 3D Object. ArXiv, Vol. abs/2303.11328 (2023).

Google Scholar

[7]

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv preprint arXiv:2212.08751 (2022).

Google Scholar

[8]

Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, and Tao Mei. 2017. To create what you tell: Generating videos from captions. In ACM MM.

Google Scholar

[9]

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR.

Google Scholar

[10]

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 6087--6101.

Google Scholar

[11]

Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, and Tao Mei. 2023. Dual vision transformer. IEEE TPAMI (2023).

Digital Library

Google Scholar

[12]

Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. 2022. Wave-vit: Unifying wavelet and transformers for visual representation learning. In ECCV.

Google Scholar

Cited By

View all

Kiourtis AMavrogiorgou AMakridis GKyriazis DSoldatos JFatouros GNtalaperas DPapageorgiou XAlmeida BGuedes JMaló POliveira JScholze SRosinha AReis JFalsetta M(2024)XR5.0: Human-Centric AI-Enabled Extended Reality Applications for Industry 5.02024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749931(314-323)Online publication date: 30-Oct-2024
https://doi.org/10.23919/FRUCT64283.2024.10749931
Paweroi RKöppen M(2024)Framework for Integration of Generative AI into Metaverse Asset Creation2024 2nd International Conference on Intelligent Metaverse Technologies & Applications (iMETA)10.1109/iMETA62882.2024.10808057(027-033)Online publication date: 26-Nov-2024
https://doi.org/10.1109/iMETA62882.2024.10808057
Chen YPant YYang HYao TMeit T(2024)VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00468(4896-4905)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00468
Show More Cited By

Index Terms

3D Creation at Your Fingertips: From Text or Image to 3D Assets
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia content creation

Recommendations

Mesh-controllable multi-level-of-detail text-to-3D generation
Abstract
Text-to-3D generation is a challenging but significant task and has gained widespread attention. Its capability to rapidly generate 3D digital assets holds huge potential application value in fields such as film, video games, and virtual reality. ...
Graphical abstract

Display Omitted
Highlights
- We propose a two-stage framework capable of generating optimized meshes with textures based on the input text and the specified LOD.
- The framework focuses on usability of text-to-3D outputs, offering both controllability and ...
3D Character Model Creation from Cel Animation
CW '04: Proceedings of the 2004 International Conference on Cyberworlds

When creating a cel animation, the animators often use 3D character models to add some effects on the character or to generate intermediate images between the key frames. However, it is a troublesome and time-consuming task to create a 3D model. In this ...
3D puppetry: a kinect-based interface for 3D animation
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technology

We present a system for producing 3D animations using physical objects (i.e., puppets) as input. Puppeteers can load 3D models of familiar rigid objects, including toys, into our system and use them as puppets for an animation. During a performance, the ...

Comments

Information & Contributors

Information

Published In

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Check for updates

Author Tags

Qualifiers

Demonstration

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
198
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)9

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kiourtis AMavrogiorgou AMakridis GKyriazis DSoldatos JFatouros GNtalaperas DPapageorgiou XAlmeida BGuedes JMaló POliveira JScholze SRosinha AReis JFalsetta M(2024)XR5.0: Human-Centric AI-Enabled Extended Reality Applications for Industry 5.02024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749931(314-323)Online publication date: 30-Oct-2024
https://doi.org/10.23919/FRUCT64283.2024.10749931
Paweroi RKöppen M(2024)Framework for Integration of Generative AI into Metaverse Asset Creation2024 2nd International Conference on Intelligent Metaverse Technologies & Applications (iMETA)10.1109/iMETA62882.2024.10808057(027-033)Online publication date: 26-Nov-2024
https://doi.org/10.1109/iMETA62882.2024.10808057
Chen YPant YYang HYao TMeit T(2024)VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00468(4896-4905)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00468
Mishra SKumar DGrover EHemanth J(2024)Realistic 3D Object Generation Using Seam Aware Landmark Detectors with Texture and LightingIranian Journal of Science and Technology, Transactions of Electrical Engineering10.1007/s40998-024-00778-yOnline publication date: 11-Dec-2024
https://doi.org/10.1007/s40998-024-00778-y
Yang HChen YPan YYao TChen ZWu ZJiang YMei T(2024)DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D GenerationComputer Vision – ECCV 202410.1007/978-3-031-73202-7_10(162-178)Online publication date: 21-Nov-2024
https://doi.org/10.1007/978-3-031-73202-7_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Mesh-controllable multi-level-of-detail text-to-3D generation

3D Character Model Creation from Cel Animation

3D puppetry: a kinect-based interface for 3D animation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations