research-article

Box-FaceS: A Bidirectional Method for Box-Guided Face Component Editing

Authors:
Wenjing Huang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Shikui Tu

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Lei Xu

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 6061–6071https://doi.org/10.1145/3503161.3548392

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6061–6071

ABSTRACT

While the quality of face manipulation has been improved tremendously, the ability to control face components, e.g., eyebrows, is still limited. Although existing methods have realized component editing with user-provided geometry guidance, such as masks or sketches, their performance is largely dependent on the user's painting efforts. To address these issues, we propose Box-FaceS, a bidirectional method that can edit face components by simply translating and zooming the bounding boxes. This framework learns representations for every face component, independently, as well as a high-dimensional tensor capturing face outlines. To enable box-guided face editing, we develop a novel Box Adaptive Modulation (BAM) module for the generator, which first transforms component embeddings to style parameters and then modulates visual features inside a given box-like region on the face outlines. A cooperative learning scheme is proposed to impose independence between face outlines and component embeddings. As a result, it is flexible to determine the component style by its embedding, and to control its position and size by the provided bounding box. Box-FaceS also learns to transfer components between two faces while maintaining the consistency of image content. In particular, Box-FaceS can generate creative faces with reasonable exaggerations, requiring neither supervision nor complex spatial morphing operations. Through the comparisons with state-of-the-art methods, Box-FaceS shows its superiority in component editing, both qualitatively and quantitatively. To the best of our knowledge, Box-FaceS is the first approach that can freely edit the position and shape of the face components without editing the face masks or sketches. Our implementation is available at https://github.com/CMACH508/Box-FaceS.

Supplemental Material

Available for Download

mp4

MM22-fp2993.mp4 (672.7 MB)

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE international conference on computer vision. 4432--4441.Google ScholarCross Ref
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN: How to Edit the Embedded Images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296--8305.Google ScholarCross Ref
Rameen Abdal, Peihao Zhu, Niloy Mitra, and Peter Wonka. 2020. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. arXiv e-prints (2020), arXiv-2008.Google Scholar
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2019. Visualizing and Understanding Generative Adversarial Networks. In International Conference on Learning Representations.Google Scholar
Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, and Hongbo Fu. 2020. Deep- FaceDrawing: Deep generation of face images from sketches. ACM Transactions on Graphics (TOG) 39, 4 (2020), 72--1.Google ScholarDigital Library
Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, and Hongbo Fu. 2020. Deep- FaceDrawing: Deep generation of face images from sketches. ACM Transactions on Graphics (TOG) 39, 4 (2020), 72-1.Google ScholarDigital Library
Anton Cherepkov, Andrey Voynov, and Artem Babenko. 2021. Navigating the gan parameter space for semantic image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3671--3680.Google ScholarCross Ref
Edo Collins, Raja Bala, Bob Price, and Sabine Susstrunk. 2020. Editing in style: Uncovering the local semantics of gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5771--5780.Google ScholarCross Ref
Qiyao Deng, Jie Cao, Yunfan Liu, Zhenhua Chai, Qi Li, and Zhenan Sun. 2020. Reference Guided Face Component Editing. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Christian Bessiere (Ed.). ijcai.org, 502--508. https://doi.org/10.24963/ijcai.2020/70Google ScholarCross Ref
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).Google Scholar
Shuyang Gu, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen, and Lu Yuan. 2019. Mask-guided portrait editing with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3436--3445.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. arXiv preprint arXiv:1706.08500 (2017).Google Scholar
Wenjin Huang, Shikui Tu, and Lei Xu. 2019. Revisit lmser from a deep learning perspective. In International Conference on Intelligent Science and Big Data Engineering. Springer, 197--208.Google ScholarCross Ref
Wenjing Huang, Shikui Tu, and Lei Xu. 2020. Deep CNN based Lmser and strengths of two built-in dualities. Neural Processing Letters (2020), 1--17.Google Scholar
Wenjing Huang, Shikui Tu, and Lei Xu. 2022. IA-FaceS: A Bidirectional Method for Semantic Face Editing. https://doi.org/10.48550/ARXIV.2203.13097Google Scholar
Erik HÃd'rkÃűnen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. In Proc. NeurIPS.Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.Google Scholar
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (2020), 12104--12114.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.Google ScholarCross Ref
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.Google ScholarCross Ref
Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, and Youngjung Uh. 2021. Exploiting Spatial Dimensions of Latent in GAN for Real-Time Image Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 852--861.Google ScholarCross Ref
Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, and Youngjung Uh. 2021. Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing. https://doi.org/10.48550/ARXIV.2104.14754Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision 123, 1 (2017), 32--73.Google Scholar
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Peiying Li, Shikui Tu, and Lei Xu. 2019. GAN flexible Lmser for super-resolution. In Proceedings of the 27th ACM International Conference on Multimedia. 756--764.Google ScholarDigital Library
Wenbin Li, Wei Xiong, Haofu Liao, Jing Huo, Yang Gao, and Jiebo Luo. 2020. Carigan: Caricature generation through weakly paired adversarial learning. Neural Networks 132 (2020), 66--74.Google ScholarCross Ref
Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. 2021. EditGAN: High-Precision Semantic Image Editing. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. 2021. EditGAN: High-Precision Semantic Image Editing. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Ziwei Liu, Ping Luo, XiaogangWang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).Google ScholarDigital Library
Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481--3490.Google Scholar
Stanislav Pidhorskyi, Donald A Adjeroh, and Gianfranco Doretto. 2020. Adversarial latent autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14104--14113.Google ScholarCross Ref
Wei Shen and Rujie Liu. 2017. Learning residual images for face attribute manipulation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4030--4038.Google ScholarCross Ref
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9243--9252.Google ScholarCross Ref
Yichun Shi, Debayan Deb, and Anil K Jain. 2019. Warpgan: Automatic caricature generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10762--10771.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8798--8807.Google ScholarCross Ref
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.Google ScholarDigital Library
Zongze Wu, Dani Lischinski, and Eli Shechtman. 2021. Stylespace analysis: Disentangled controls for stylegan image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12863--12872.Google ScholarCross Ref
Lei Xu. 1993. Least mean square error reconstruction principle for self-organizing neural-nets. Neural networks 6, 5 (1993), 627--648.Google Scholar
Lei Xu. 2019. An overview and perspectives on bidirectional intelligence: Lmser duality, double IA harmony, and causal computation. IEEE/CAA Journal of Automatica Sinica 6, 4 (2019), 865--893. https://doi.org/10.1109/JAS.2019.1911603Google ScholarCross Ref
Ceyuan Yang, Yujun Shen, and Bolei Zhou. 2021. Semantic hierarchy emerges in deep generative representations for scene synthesis. International Journal of Computer Vision 129, 5 (2021), 1451--1466.Google ScholarCross Ref
Xu Yao, Alasdair Newson, Yann Gousseau, and Pierre Hellier. 2021. A latent transformer for disentangled face editing in images and videos. In Proceedings of the IEEE/CVF international conference on computer vision. 13789--13798.Google ScholarCross Ref
Xu Yao, Alasdair Newson, Yann Gousseau, and Pierre Hellier. 2021. Learning non-linear disentangled editing for stylegan. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2418--2422.Google ScholarCross Ref
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.Google ScholarCross Ref
Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-domain gan inversion for real image editing. In European Conference on Computer Vision. Springer, 592--608.Google ScholarDigital Library
Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5104--5113.Google ScholarCross Ref

Index Terms

Box-FaceS: A Bidirectional Method for Box-Guided Face Component Editing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation

Recommendations

Recognition of Faces Using Improved Principal Component Analysis
ICMLC '10: Proceedings of the 2010 Second International Conference on Machine Learning and Computing

Face recognition has been an important issue in computer vision and pattern recognition over the last several decades. While a human can recognize faces easily, automated face recognition remains a great challenge in computer-based automated recognition ...
Read More
Open-set face recognition across look-alike faces in real-world scenarios

The open-set problem is among the problems that have significantly changed the performance of face recognition algorithms in real-world scenarios. Open-set operates under the supposition that not all the probes have a pair in the gallery. Most face ...
Read More
Learning from Partially Occluded Faces
ICPRAM 2016: Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods

Although face recognition methods in controlled environments have achieved high accuracy results, there

are still problems in real-life situations. Some of the challenges include changes in face expressions, pose,

lighting conditions or presence of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bidirectional method
box adaptive modulation
cooperative learning
face component editing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Box-FaceS: A Bidirectional Method for Box-Guided Face Component Editing

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Recognition of Faces Using Improved Principal Component Analysis

Open-set face recognition across look-alike faces in real-world scenarios

Learning from Partially Occluded Faces