Abstract
Masked face inpainting, aiming to restore realistic facial details and complete textures, remains a challenging task. In this paper, an unsupervised masked face inpainting method based on contrastive learning and attention mechanism is proposed. First, to overcome the constraint of a paired training dataset, a contrastive learning network framework is constructed by comparing features extracted from inpainted face image patches with those from input masked face image patches. Subsequently, to extract more effective facial features, a feature attention module is designed, which can focus on the significant feature information and establish long-range dependency relationships. In addition, a PatchGAN-based discriminator is refined with spectral normalization to enhance the stability of training the proposed network and guide the generator in producing more realistic face images. Numerous experiment results indicate that our approach can obtain better masked face inpainting results than the comparison approaches overall in terms of both subjective and objective evaluations, as well as face recognition accuracy.








Similar content being viewed by others
Data availability statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Boutros, F., Damer, N., Kirchbuchner, F., et al.: Self-restrained triplet loss for accurate masked face recognition. Pattern Recogn. 124, 108473 (2022)
Li, X., Shao, C., Zhou, Y., et al.: Face mask removal based on generative adversarial network and texture network. 2021 4th International Conference on Robotics, Control and Automation Engineering (RCAE). IEEE 86–89. (2021)
Ma, X., Zhou, X., Huang, H., et al.: Contrastive attention network with dense field estimation for face completion. Pattern Recogn. 124, 108465 (2022)
Kumar, V., Mukherjee, J., Mandal, S.K.D.: Image inpainting through metric labeling via guided patch mixing. IEEE Trans. Image Process. 25(11), 5212–5226 (2016)
Li, S., Zhu, C., Sun, M.T.: Hole filling with multiple reference views in DIBR view synthesis. IEEE Trans. Multim. 20(8), 1948–1959 (2018)
Nguyen, T.D., Kim, B., Hong, M.C.: New hole-filling method using extrapolated spatio-temporal background information for a synthesized free-view. IEEE Trans. Multim. 21(6), 1345–1358 (2019)
Zhuang, Y., Wang, Y., Shih, T.K., et al.: Patch-guided facial image inpainting by shape propagation. J. Zhejiang Univ.-Sci. A 10(2), 232–238 (2009)
Wang, Z., M., Tao, J. H.: Reconstruction of partially occluded face by fast recursive PCA. 2007 International Conference on Computational Intelligence and Security Workshops (CISW 2007). IEEE. 304–307 (2007)
Deng, Y., Dai, Q., Zhang, Z.: Graph Laplace for occluded face completion and recognition. IEEE Trans. Image Process. 20(8), 2329–2338 (2011)
Modak, G., Das, S.S., Miraj, M.A.I., et al.: A deep learning framework to reconstruct face under mask. 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA). IEEE. 200–205 (2022)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: ‘Generative adversarial nets.’ Proc. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)
Lin, C.T., Huang, S.W., Wu, Y.Y., et al.: GAN-based day-to-night image style transfer for nighttime vehicle detection. IEEE Trans. Intell. Transp. Syst. 22(2), 951–963 (2020)
Jiang, Y., Xu, J., Yang, B., et al.: Image inpainting based on generative adversarial networks. IEEE Access 8, 22884–22892 (2020)
Wan, W., Yang, Y., Huang, S., et al.: FRAN: feature-filtered residual attention network for realistic face sketch-to-photo transformation. Appl. Intell. 53, 15946–15956 (2022)
Farahanipad, F., Rezaei, M., Nasr, M. et al.: GAN-based face reconstruction for masked-face. Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments. 583–587. (2022)
Chen, G., Zhang, G., Yang, Z., et al.: Multi-scale patch-GAN with edge detection for image inpainting. Appl. Intell. 53(4), 3917–3932 (2023)
Yu, J., Lin, Z., Yang, J., et al.: Free-form image inpainting with gated convolution. Proceedings of the IEEE/CVF International Conference on Computer Vision. 4471–4480 (2019)
Zhang, X., Wang, X., Shi, C., et al.: De-gan: Domain embedded gan for high quality face image inpainting. Pattern Recogn. 124, 108415 (2022)
He, L., Qiang, Z., Shao, X., et al.: Research on high-resolution face image inpainting method based on StyleGAN. Electronics 11(10), 1620 (2022)
Ma, B., An, X., Sun, N.: Face image inpainting algorithm via progressive generation network. 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP). IEEE. 175–179 (2020)
Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. Proceedings of the IEEE/CVF International Conference on Computer Vision. 14134–14143. (2021)
Wang, Q., Fan, H., Sun, G., et al.: Recurrent generative adversarial network for face completion. IEEE Trans. Multimedia 23, 429–442 (2020)
Fang, Y., Li, Y., Tu, X., et al.: Face completion with hybrid dilated convolution. Signal Process.: Image Commun. 80, 115664 (2020)
Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. International Conference on Machine Learning. PMLR, 1597–1607. (2020)
Dyer, C.: Notes on noise contrastive estimation and negative sampling. arXiv preprint arXiv:1410.8251, (2014)
Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, (2018)
Lin, Y., Zhang, S., Chen, T., et al.: Exploring negatives in contrastive learning for unpaired image-to-image translation. Proceedings of the 30th ACM International Conference on Multimedia. 1186–1194. (2022)
Jung, C., Kwon, G., Ye, J.C.: Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18260–18269. (2022)
Chen, X., Pan, J., Jiang, K., et al.: Unpaired deep image deraining using dual contrastive learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2017–2026. (2022)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst., 2017, 30.
Xiao, Z., Li, D.: Generative image inpainting by hybrid contextual attention network. In: MultiMedia Modeling 27th International Conference, MMM 2021, Prague, Czech Republic Republic, June 22–24, pp. 162–173. Springer International Publishing, Cham (2021)
Qin, J., Bai, H., Zhao, Y.: Multi-scale attention network for image inpainting. Comput. Vis. Image Underst. 204, 103155 (2021)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141 (2018)
Qin, X., Wang, Z., Bai, Y., et al.: FFA-Net: Feature fusion attention network for single image dehazing. Proceed AAAI Conf. Artif. Intell. 34(07), 11908–11915 (2020)
Park, T., Efros, A.A., Zhang, R., et al.: Contrastive learning for unpaired image-to-image translation. In: Computer Vision–ECCV 2020 16th European Conference, Glasgow, UK, August 23–28, pp. 319–345. Springer International Publishing, Cham (2020)
Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134. (2017)
Zhu, J.Y., Park, T., Isola, P., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision. 2223–2232. (2017)
Fu, H., Gong, M., Wang, C., et al.: Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2427–2436. (2019)
Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, pp. 800–815. Springer International Publishing, Cham (2020)
Han, J., Shoeiby, M., Petersson, L., et al.: Dual contrastive learning for unsupervised image-to-image translation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 746–755. (2021)
Xie, S., Xu, Y., Gong, M., et al.: Unpaired image-to-image translation with shortest path regularization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10177–10187. (2023)
Dong, J., Wang, W., Tan, T,.: Casia image tampering detection evaluation database. 2013 IEEE China Summit and International Conference on Signal and Information Processing. IEEE. 422–426. (2013)
Huang, G.B, Mattar, M., Berg, T., et al.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Workshop on faces in 'Real-Life' Images: detection, alignment, and recognition. (2008)
Anwar, A., Raychowdhury, A.: Masked face recognition for secure authentication. arXiv preprint arXiv:2008.11104, (2020)
Liu, H., Jiang, B., Xiao, Y., et al.: Coherent semantic attention for image inpainting. Proceedings of the IEEE/CVF International Conference on Computer Vision. Pp. 4170–4179 (2019)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Zhang, R., Isola, P., Efros A.A., et al.: The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595 (2018)
Cao, Q., Shen, L., Xie, W., et al.: Vggface2: A dataset for recognising faces across pose and age//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE. 67–74 (2018)
Zhou, B., Lapedriza, A., Khosla, A., et al.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Acknowledgements
This study has been supported in part by the National Natural Science Foundation of China (62261025, 62362032, and 62262023), by the Natural Science Foundation of Jiangxi Province (20232BAB212015), and by the Xizang Autonomous Region Science and Technology Plan (XZ202303ZY0005G).
Author information
Authors and Affiliations
Contributions
W. Wan and S. Chen wrote the main manuscript text, Y. Zhang built the Methodology, and L. Yao prepared the figures and tables. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Ting Yao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, W., Chen, S., Yao, L. et al. Unsupervised masked face inpainting based on contrastive learning and attention mechanism. Multimedia Systems 30, 209 (2024). https://doi.org/10.1007/s00530-024-01411-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01411-y