ADORE: Adaptive Diffusion Optimized Restoration for AI-Generated Facial Imagery

Li, Junxue; Chen, Hong; Qi, Guanglei

doi:10.1007/978-981-99-8552-4_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14435))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

345 Accesses

Abstract

We introduce ADORE (Adaptive Diffusion Optimized Restoration), a pioneering solution that addresses facial distortion issues in diffusion-based, language-guided image generation. ADORE enhances facial quality based on image characteristics and style, improving the visual fidelity of AI-generated images. It also mitigates boundary distortions during the face-background fusion process, offering a novel approach to address instability issues by using generative models for image restoration. Rigorous experiments validate ADORE’s proficiency in achieving high-quality, style-consistent facial restorations. ADORE supports text-driven, fine-tuned facial refinement, leveraging the model’s open-domain synthesis capability. As the first method tailored to enhance facial generation quality in text-to-image models, with its versatility and innovative solutions, ADORE successfully addresses a pressing issue and paves new avenues in image generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Cao, Y., et al.: A comprehensive survey of AI-generated content (AIGC): a history of generative AI from GAN to ChatGPT. arXiv preprint arXiv:2303.04226 (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Kim, K., et al.: DiffFace: diffusion-based face swapping with facial guidance. arXiv preprint arXiv:2212.13344 (2022)
Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: DPM-Solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927 (2022)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: DPM-Solver++: fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095 (2022)
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471 (2022)
Google Scholar
Peng, Y., Zhao, C., Xie, H., Fukusato, T., Miyata, K.: DiffFaceSketch: high-fidelity face image synthesis with sketch-guided latent diffusion model. arXiv preprint arXiv:2302.06908 (2023)
Qiu, X., Han, C., Zhang, Z., Li, B., Guo, T., Nie, X.: DiffBFR: bootstrapping diffusion model towards blind face restoration. arXiv preprint arXiv:2305.04517 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695 (2022)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Wang, T., et al.: A survey of deep face restoration: denoise, super-resolution, deblur, artifact removal. arXiv preprint arXiv:2211.02831 (2022)
Wang, X., Li, Y., Zhang, H., Shan, Y.: Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9168–9178 (2021)
Google Scholar
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914 (2021)
Google Scholar
Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796 (2022)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider Face: a face detection benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)
Google Scholar
Yang, T., Ren, P., Xie, X., Zhang, L.: Gan prior embedded network for blind face restoration in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 672–681 (2021)
Google Scholar
Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022)
Yue, Z., Loy, C.C.: DifFace: blind face restoration with diffused error contraction. arXiv preprint arXiv:2212.06512 (2022)
Zhou, S., Chan, K., Li, C., Loy, C.C.: Towards robust blind face restoration with codebook lookup transformer. Adv. Neural. Inf. Process. Syst. 35, 30599–30611 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the project “Digital Twin Application Demonstration for New Museum Public Service Models”, a key research topic under the National Key Research and Development Program of China, Grant No. 2022YFF0904305.

Author information

Authors and Affiliations

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing, China
Junxue Li & Hong Chen
Key Laboratory of Interactive Technology and Experience System (BUPT), Ministry of Culture and Tourism, Beijing, China
Junxue Li & Hong Chen
Century College Beijing University of Posts and Telecommunications, Beijing, China
Guanglei Qi

Authors

Junxue Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guanglei Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Chen .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Chen, H., Qi, G. (2024). ADORE: Adaptive Diffusion Optimized Restoration for AI-Generated Facial Imagery. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_29

Download citation

DOI: https://doi.org/10.1007/978-981-99-8552-4_29
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8551-7
Online ISBN: 978-981-99-8552-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ADORE: Adaptive Diffusion Optimized Restoration for AI-Generated Facial Imagery