skip to main content
10.1145/3610543.3626169acmconferencesArticle/Chapter ViewAbstractPublication Pagessiggraph-asiaConference Proceedingsconference-collections
research-article

CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models

Published:28 November 2023Publication History

ABSTRACT

We propose CLIP-Head, a novel approach towards text-driven neural parametric 3D head model generation. Our method takes simple text prompts in natural language, describing the appearance & facial expressions, and generates 3D neural head avatars with accurate geometry and high-quality texture maps. Unlike existing approaches, which use conventional parametric head models with limited control and expressiveness, we leverage Neural Parametric Head Models (NPHM), offering disjoint latent codes for the disentangled encoding of identities and expressions. To facilitate the text-driven generation, we propose two weakly-supervised mapping networks to map the CLIP’s encoding of input text prompt to NPHM’s disjoint identity and expression vector. The predicted latent codes are then fed to a pre-trained NPHM network to generate 3D head geometry. Since NPHM mesh doesn’t support textures, we propose a novel aligned parametrization technique, followed by text-driven generation of texture maps by leveraging a recently proposed controllable diffusion model for the task of text-to-image synthesis. Our method is capable of generating 3D head meshes with arbitrary appearances and a variety of facial expressions, along with photoreal texture details. We show superior performance with existing state-of-the-art methods, both qualitatively & quantitatively, and demonstrate potentially useful applications of our method. We have released our code at https://raipranav384.github.io/clip_head.

Skip Supplemental Material Section

Supplemental Material

CLIP-Head-Summary-Video.mp4

mp4

71.4 MB

CLIP-Head-Summary-Video.mp4

mp4

71.4 MB

References

  1. Yijun Fu Zhenglin Zhou Gang Yu Zhibin Wang Bin Fu Tao Chen Guosheng Lin Chunhua Shen Chi Zhang, Yiwen Chen. 2023. StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation. arxiv:2305.19012 [cs.CV]Google ScholarGoogle Scholar
  2. Kyle Olszewski Chaoyang Wang Luc Van Gool Sergey Tulyakov Evangelos Ntavelis, Aliaksandr Siarohin. 2023. AutoDecoding Latent 3D Diffusion Models. arxiv:2307.05445 [cs.CV]Google ScholarGoogle Scholar
  3. Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ArXiv (2023).Google ScholarGoogle Scholar
  4. Linjia Huang Yiyu Zhuang Yuanxun Lu Xun Cao Menghua Wu, Hao ZhuB. 2023. High-fidelity 3D Face Generation from Natural Language Descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  5. Richard Liu Sagie Benaim Rana Hanocka Oscar Michel, Roi Bar-On. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).Google ScholarGoogle Scholar
  6. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning.Google ScholarGoogle Scholar
  7. Dominik Lorenz Patrick Esser Bjorn Ommer Robin Rombach, Andreas Blattmann. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. CoRR abs/2112.10752 (2021). arXiv:2112.10752Google ScholarGoogle Scholar
  8. Angela Dai Matthias Niessner Shivangi Aneja, Justus Thies. 2023. ClipFace: Text-Guided Editing of Textured 3D Morphable Models. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23).Google ScholarGoogle Scholar
  9. Markos Georgopoulos Martin Runz Lourdes Agapito Matthias Nießner Simon Giebenhain, Tobias Kirschstein. 2023. Learning Neural Parametric Head Models. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  10. Michael J. Black Hao Li Javier Romero Tianye Li, Timo Bolkart. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017).Google ScholarGoogle Scholar
  11. Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. arxiv:2302.05543 [cs.CV]Google ScholarGoogle Scholar

Index Terms

  1. CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SA '23: SIGGRAPH Asia 2023 Technical Communications
        November 2023
        127 pages
        ISBN:9798400703140
        DOI:10.1145/3610543

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 November 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate178of869submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)139
        • Downloads (Last 6 weeks)5

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format