skip to main content
10.1145/3581783.3612678acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

3D Creation at Your Fingertips: From Text or Image to 3D Assets

Published: 27 October 2023 Publication History

Abstract

We demonstrate an automatic 3D creation system, which can create realistic 3D assets solely from a text or image prompt without requiring any specialized 3D modeling skills. Users can either describe the object they envision in natural language or upload a reference image that records what they have seen with the phone. Our system will generate a high-quality 3D mesh that faithfully matches the users' input. We propose a coarse-to-fine framework to achieve this goal. Specifically, we first obtain a low-resolution mesh instantly by utilizing a pre-trained text/image conditional 3D generative model. Using such coarse mesh as the initialization, we further optimize a high-resolution textured 3D mesh with fine-grained appearance guidance from large-scale 2D diffusion models. Our system can create visually-pleasing results in minutes, which is significantly faster than existing methods. Meanwhile, the system ensures that the resulting 3D assets are precisely aligned with the input text or image prompt. With these advanced capabilities, our demonstration provides a streamlined and intuitive platform for users to incorporate 3D creation into their daily lives.

References

[1]
Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019a. Animating Your Life: Real-Time Video-to-Animation Translation. In ACM MM Demo.
[2]
Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019b. Mocycle-gan: Unpaired video-to-video translation. In ACM MM.
[3]
Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).
[4]
Yehao Li, Ting Yao, Yingwei Pan, and Tao Mei. 2022. Contextual transformer networks for visual recognition. IEEE TPAMI (2022).
[5]
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.
[6]
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot One Image to 3D Object. ArXiv, Vol. abs/2303.11328 (2023).
[7]
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv preprint arXiv:2212.08751 (2022).
[8]
Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, and Tao Mei. 2017. To create what you tell: Generating videos from captions. In ACM MM.
[9]
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR.
[10]
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 6087--6101.
[11]
Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, and Tao Mei. 2023. Dual vision transformer. IEEE TPAMI (2023).
[12]
Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. 2022. Wave-vit: Unifying wavelet and transformers for visual representation learning. In ECCV.

Cited By

View all
  • (2024)XR5.0: Human-Centric AI-Enabled Extended Reality Applications for Industry 5.02024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749931(314-323)Online publication date: 30-Oct-2024
  • (2024)Framework for Integration of Generative AI into Metaverse Asset Creation2024 2nd International Conference on Intelligent Metaverse Technologies & Applications (iMETA)10.1109/iMETA62882.2024.10808057(027-033)Online publication date: 26-Nov-2024
  • (2024)VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00468(4896-4905)Online publication date: 16-Jun-2024
  • Show More Cited By

Index Terms

  1. 3D Creation at Your Fingertips: From Text or Image to 3D Assets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Check for updates

    Author Tags

    1. cross-modal generation
    2. image-to-3d
    3. text-to-3d

    Qualifiers

    • Demonstration

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)XR5.0: Human-Centric AI-Enabled Extended Reality Applications for Industry 5.02024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749931(314-323)Online publication date: 30-Oct-2024
    • (2024)Framework for Integration of Generative AI into Metaverse Asset Creation2024 2nd International Conference on Intelligent Metaverse Technologies & Applications (iMETA)10.1109/iMETA62882.2024.10808057(027-033)Online publication date: 26-Nov-2024
    • (2024)VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00468(4896-4905)Online publication date: 16-Jun-2024
    • (2024)Realistic 3D Object Generation Using Seam Aware Landmark Detectors with Texture and LightingIranian Journal of Science and Technology, Transactions of Electrical Engineering10.1007/s40998-024-00778-yOnline publication date: 11-Dec-2024
    • (2024)DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D GenerationComputer Vision – ECCV 202410.1007/978-3-031-73202-7_10(162-178)Online publication date: 21-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media