skip to main content
10.1145/3581783.3612673acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application

Published:27 October 2023Publication History

ABSTRACT

The traditional controllable face generation refers to the controllability of coarse-grained ranges such as facial features, expression postures, or viewing angles, but specific application scenarios require finer-grained control. This paper proposes a fine-grained and controllable face generation technology, CFTF. CFTF allows users to participate deeply in the face generation process through multiple rounds of language feedback. It not only enables control over coarse-grained features such as gender and viewing angle, but also provides flexible control over details such as hair color, accessories, and iris color. We apply CFTF to the suspect portrait scene, and perform multiple rounds of human-computer interaction based on the eyewitness's painting sketch of the suspect and descriptions of their facial features, realizing the "Human-in-the-loop" collaborative portrait drawing.

Skip Supplemental Material Section

Supplemental Material

3581783.3612673-video.mp4

mp4

150.2 MB

References

  1. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020.Google ScholarGoogle Scholar
  2. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021.Google ScholarGoogle Scholar
  3. Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022.Google ScholarGoogle Scholar
  4. Vivian Liu and Lydia B Chilton. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI '22, New York, NY, USA, 2022. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Openai. Gpt-4 technical report. https://openai.com/research/gpt-4. 2023.12.31.Google ScholarGoogle Scholar
  6. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684--10695, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.Google ScholarGoogle Scholar
  8. Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt, 2023.Google ScholarGoogle Scholar
  9. Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Tianlong Ma, and Liang He. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lvmin Zhang and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  11. Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, and Xin Wang. Controllable text-to-image generation with gpt-4. arXiv preprint arXiv:2305.18583, 2023.Google ScholarGoogle Scholar
  12. Youpeng Zhao, Huadong Tang, Yingying Jiang, Qiang Wu, et al. Lightweight vision transformer with cross feature attention. arXiv preprint arXiv:2207.07268, 2022.Google ScholarGoogle Scholar

Index Terms

  1. CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '23: Proceedings of the 31st ACM International Conference on Multimedia
        October 2023
        9913 pages
        ISBN:9798400701085
        DOI:10.1145/3581783

        Copyright © 2023 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2023

        Check for updates

        Qualifiers

        • demonstration

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)65
        • Downloads (Last 6 weeks)14

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader