Abstract
We introduce ADORE (Adaptive Diffusion Optimized Restoration), a pioneering solution that addresses facial distortion issues in diffusion-based, language-guided image generation. ADORE enhances facial quality based on image characteristics and style, improving the visual fidelity of AI-generated images. It also mitigates boundary distortions during the face-background fusion process, offering a novel approach to address instability issues by using generative models for image restoration. Rigorous experiments validate ADORE’s proficiency in achieving high-quality, style-consistent facial restorations. ADORE supports text-driven, fine-tuned facial refinement, leveraging the model’s open-domain synthesis capability. As the first method tailored to enhance facial generation quality in text-to-image models, with its versatility and innovative solutions, ADORE successfully addresses a pressing issue and paves new avenues in image generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cao, Y., et al.: A comprehensive survey of AI-generated content (AIGC): a history of generative AI from GAN to ChatGPT. arXiv preprint arXiv:2303.04226 (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Kim, K., et al.: DiffFace: diffusion-based face swapping with facial guidance. arXiv preprint arXiv:2212.13344 (2022)
Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: DPM-Solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927 (2022)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: DPM-Solver++: fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095 (2022)
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471 (2022)
Peng, Y., Zhao, C., Xie, H., Fukusato, T., Miyata, K.: DiffFaceSketch: high-fidelity face image synthesis with sketch-guided latent diffusion model. arXiv preprint arXiv:2302.06908 (2023)
Qiu, X., Han, C., Zhang, Z., Li, B., Guo, T., Nie, X.: DiffBFR: bootstrapping diffusion model towards blind face restoration. arXiv preprint arXiv:2305.04517 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695 (2022)
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Wang, T., et al.: A survey of deep face restoration: denoise, super-resolution, deblur, artifact removal. arXiv preprint arXiv:2211.02831 (2022)
Wang, X., Li, Y., Zhang, H., Shan, Y.: Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9168–9178 (2021)
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914 (2021)
Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796 (2022)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider Face: a face detection benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)
Yang, T., Ren, P., Xie, X., Zhang, L.: Gan prior embedded network for blind face restoration in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 672–681 (2021)
Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022)
Yue, Z., Loy, C.C.: DifFace: blind face restoration with diffused error contraction. arXiv preprint arXiv:2212.06512 (2022)
Zhou, S., Chan, K., Li, C., Loy, C.C.: Towards robust blind face restoration with codebook lookup transformer. Adv. Neural. Inf. Process. Syst. 35, 30599–30611 (2022)
Acknowledgements
This work was supported in part by the project “Digital Twin Application Demonstration for New Museum Public Service Models”, a key research topic under the National Key Research and Development Program of China, Grant No. 2022YFF0904305.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, J., Chen, H., Qi, G. (2024). ADORE: Adaptive Diffusion Optimized Restoration for AI-Generated Facial Imagery. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_29
Download citation
DOI: https://doi.org/10.1007/978-981-99-8552-4_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8551-7
Online ISBN: 978-981-99-8552-4
eBook Packages: Computer ScienceComputer Science (R0)