skip to main content
10.1145/3581783.3612673acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application

Published: 27 October 2023 Publication History

Abstract

The traditional controllable face generation refers to the controllability of coarse-grained ranges such as facial features, expression postures, or viewing angles, but specific application scenarios require finer-grained control. This paper proposes a fine-grained and controllable face generation technology, CFTF. CFTF allows users to participate deeply in the face generation process through multiple rounds of language feedback. It not only enables control over coarse-grained features such as gender and viewing angle, but also provides flexible control over details such as hair color, accessories, and iris color. We apply CFTF to the suspect portrait scene, and perform multiple rounds of human-computer interaction based on the eyewitness's painting sketch of the suspect and descriptions of their facial features, realizing the "Human-in-the-loop" collaborative portrait drawing.

Supplemental Material

MP4 File
This is a presentation video about our research paper "CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application". The video showcases the layout of our demo pages and presents a vivid example of our demo's functionality. Finally, a brief introduction to our technological approach is provided.

References

[1]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020.
[2]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021.
[3]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022.
[4]
Vivian Liu and Lydia B Chilton. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI '22, New York, NY, USA, 2022. Association for Computing Machinery.
[5]
Openai. Gpt-4 technical report. https://openai.com/research/gpt-4. 2023.12.31.
[6]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684--10695, 2022.
[7]
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
[8]
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt, 2023.
[9]
Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Tianlong Ma, and Liang He. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems, 2022.
[10]
Lvmin Zhang and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023.
[11]
Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, and Xin Wang. Controllable text-to-image generation with gpt-4. arXiv preprint arXiv:2305.18583, 2023.
[12]
Youpeng Zhao, Huadong Tang, Yingying Jiang, Qiang Wu, et al. Lightweight vision transformer with cross feature attention. arXiv preprint arXiv:2207.07268, 2022.

Cited By

View all
  • (2024)Applications, Challenges, and Future Directions of Human-in-the-Loop LearningIEEE Access10.1109/ACCESS.2024.340154712(75735-75760)Online publication date: 2024

Index Terms

  1. CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Check for updates

      Author Tags

      1. aigc
      2. diffusion model
      3. human-in-the-loop
      4. image generation

      Qualifiers

      • Demonstration

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)62
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Applications, Challenges, and Future Directions of Human-in-the-Loop LearningIEEE Access10.1109/ACCESS.2024.340154712(75735-75760)Online publication date: 2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media