research-article

CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models

Authors:
Pranav Manu

IIIT Hyderabad, India

IIIT Hyderabad, India

0009-0000-4941-0033
View Profile

,
Astitva Srivastava

IIIT Hyderabad, India

IIIT Hyderabad, India

0000-0001-6600-722X
View Profile

,
Avinash Sharma

IIIT Hyderabad, India

IIIT Hyderabad, India

0000-0001-5013-5024
View Profile

SA '23: SIGGRAPH Asia 2023 Technical CommunicationsNovember 2023Article No.: 29Pages 1–4https://doi.org/10.1145/3610543.3626169

Published:28 November 2023Publication History

SA '23: SIGGRAPH Asia 2023 Technical Communications

Pages 1–4

ABSTRACT

We propose CLIP-Head, a novel approach towards text-driven neural parametric 3D head model generation. Our method takes simple text prompts in natural language, describing the appearance & facial expressions, and generates 3D neural head avatars with accurate geometry and high-quality texture maps. Unlike existing approaches, which use conventional parametric head models with limited control and expressiveness, we leverage Neural Parametric Head Models (NPHM), offering disjoint latent codes for the disentangled encoding of identities and expressions. To facilitate the text-driven generation, we propose two weakly-supervised mapping networks to map the CLIP’s encoding of input text prompt to NPHM’s disjoint identity and expression vector. The predicted latent codes are then fed to a pre-trained NPHM network to generate 3D head geometry. Since NPHM mesh doesn’t support textures, we propose a novel aligned parametrization technique, followed by text-driven generation of texture maps by leveraging a recently proposed controllable diffusion model for the task of text-to-image synthesis. Our method is capable of generating 3D head meshes with arbitrary appearances and a variety of facial expressions, along with photoreal texture details. We show superior performance with existing state-of-the-art methods, both qualitatively & quantitatively, and demonstrate potentially useful applications of our method. We have released our code at https://raipranav384.github.io/clip_head.

Supplemental Material

CLIP-Head-Summary-Video.mp4

mp4

71.4 MB

Download

CLIP-Head-Summary-Video.mp4

mp4

71.4 MB

Download

Available for Download

pdf

Supplementary Draft and VideoS (4.9 MB)

pdf

Supplementary Material for CLIP-Head (4.9 MB)

References

Yijun Fu Zhenglin Zhou Gang Yu Zhibin Wang Bin Fu Tao Chen Guosheng Lin Chunhua Shen Chi Zhang, Yiwen Chen. 2023. StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation. arxiv:2305.19012 [cs.CV]Google Scholar
Kyle Olszewski Chaoyang Wang Luc Van Gool Sergey Tulyakov Evangelos Ntavelis, Aliaksandr Siarohin. 2023. AutoDecoding Latent 3D Diffusion Models. arxiv:2307.05445 [cs.CV]Google Scholar
Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ArXiv (2023).Google Scholar
Linjia Huang Yiyu Zhuang Yuanxun Lu Xun Cao Menghua Wu, Hao ZhuB. 2023. High-fidelity 3D Face Generation from Natural Language Descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Richard Liu Sagie Benaim Rana Hanocka Oscar Michel, Roi Bar-On. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).Google Scholar
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning.Google Scholar
Dominik Lorenz Patrick Esser Bjorn Ommer Robin Rombach, Andreas Blattmann. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. CoRR abs/2112.10752 (2021). arXiv:2112.10752Google Scholar
Angela Dai Matthias Niessner Shivangi Aneja, Justus Thies. 2023. ClipFace: Text-Guided Editing of Textured 3D Morphable Models. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23).Google Scholar
Markos Georgopoulos Martin Runz Lourdes Agapito Matthias Nießner Simon Giebenhain, Tobias Kirschstein. 2023. Learning Neural Parametric Head Models. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Michael J. Black Hao Li Javier Romero Tianye Li, Timo Bolkart. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017).Google Scholar
Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. arxiv:2302.05543 [cs.CV]Google Scholar

Index Terms

CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
  2. Computer graphics
    1. Shape modeling
      1. Mesh models

Recommendations

ClipFace: Text-guided Editing of Textured 3D Morphable Models
SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We ...
Read More
Saliency-guided 3D head pose estimation on 3D expression models
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Head pose is an important indicator of a person's attention, gestures, and communicative behavior with applications in human-computer interaction, multimedia, and vision systems. Robust head pose estimation is a prerequisite for spontaneous facial ...
Read More
Head-Pose Invariant Facial Expression Recognition Using Convolutional Neural Networks
ICMI '02: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces

Automatic face analysis has to cope with pose and lighting variations. Especially pose variations are difficult to tackle and many face analysis methods require the use of sophisticated normalization and initialization procedures. We propose a data-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SA '23: SIGGRAPH Asia 2023 Technical Communications
November 2023
127 pages
ISBN:9798400703140
DOI:10.1145/3610543
Editors:
June Kim,
Rajesh Sharma
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CLIP
UV parametrization.
neural parametric models
parametric models
text-to-3D
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate178of869submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 139
  Total Downloads
- Downloads (Last 12 months)139
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models

SA '23: SIGGRAPH Asia 2023 Technical Communications

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

ClipFace: Text-guided Editing of Textured 3D Morphable Models

Saliency-guided 3D head pose estimation on 3D expression models

Head-Pose Invariant Facial Expression Recognition Using Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models

SA '23: SIGGRAPH Asia 2023 Technical Communications

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

ClipFace: Text-guided Editing of Textured 3D Morphable Models

Saliency-guided 3D head pose estimation on 3D expression models

Head-Pose Invariant Facial Expression Recognition Using Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media