Abstract
Face sketch is a concise representation of the human face, and it has a variety of applications in criminal investigation, biometrics, and social entertainment. It is well known that facial attribute is an underlying representation of the facial description. However, generating vivid face sketches, especially sketches with rich details, from given facial attributes text is still a challenging task as the text information is limited. Existing work synthetic face sketch is not realistic, especially the facial areas are not natural enough, even distorted. We aim to relieve the situation by introducing face prior knowledge, such as landmarks. This paper proposes a method, called LAGAN, that Landmark Aided Text to Face Sketch Generation. Specifically, we design a novel scale translation-invariant similarity loss based on the facial landmarks. It can measure the mutual similarity between real sketch and synthetic sketch and also measure the self similarity based on the symmetry of face attributes. Further to counter data deficiency, we construct a novel facial attribute text to sketch dataset called TextCUFSF with CUFSF face sketch dataset. Each sketch has 4 manual annotations. Qualitative and quantitative experiments demonstrate the effectiveness of our proposed method for sketch synthesis with attribute text. The code and data are available: https://github.com/chaowentao/LAGAN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chang, L., Zhou, M., Han, Y., Deng, X.: Face sketch synthesis via sparse representation. In: ICPR, pp. 2146–2149 (2010)
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp. 8789–8797 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Di, X., Patel, V.M.: Face synthesis from visual attributes via sketch using conditional vaes and gans. arXiv preprint arXiv:1801.00077 (2017)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS, pp. 2672–2680 (2014)
Gorti, S.K., Ma, J.: Text-to-image-to-text translation using cycle consistent adversarial networks. arXiv preprint arXiv:1808.04538 (2018)
Han, H., Jain, A.K., Wang, F., Shan, S., Chen, X.: Heterogeneous face attribute estimation: a deep multi-task learning approach. TPAMI 40(11), 2597–2609 (2017)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Klare, B.F., Klum, S., Klontz, J.C., Taborsky, E., Akgul, T., Jain, A.K.: Suspect identification based on descriptive facial attributes. In: IJCB, pp. 1–8 (2014)
Kurach, K., Lucic, M., Zhai, X., Michalski, M., Gelly, S.: The GAN landscape: losses, architectures, regularization, and normalization (2018). CoRR abs/1807.04720
Li, B., Qi, X., Lukasiewicz, T., Torr, P.H.: Controllable text-to-image generation. In: NeurIPS, pp. 2065–2075 (2019)
Liao, W., Hu, K., Yang, M.Y., Rosenhahn, B.: Text to image generation with semantic-spatial aware GAN. arXiv preprint arXiv:2104.00567 (2021)
Lucic, M., Kurach, K., Michalski, M., Gelly, S., Bousquet, O.: Are gans created equal? a large-scale study. arXiv preprint arXiv:1711.10337 (2017)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML, pp. 1060–1069 (2016)
Reed, S.E., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: NeurIPS, pp. 217–225 (2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NeurIPS, pp. 2234–2242 (2016)
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: CVPR, pp. 6836–6845 (2017)
Shen, W., Liu, R.: Learning residual images for face attribute manipulation. In: CVPR, pp. 1225–1233 (2017)
Song, L., Lu, Z., He, R., Sun, Z., Tan, T.: Geometry guided adversarial facial expression synthesis. arXiv preprint arXiv:1712.03474 (2017)
Song, Y., Bao, L., Yang, Q., Yang, M.-H.: Real-time exemplar-based face sketch synthesis. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 800–813. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_51
Sun, R., Huang, C., Shi, J., Ma, L.: Mask-aware photorealistic face attribute manipulation. arXiv preprint arXiv:1804.08882 (2018)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Tan, Y., Tang, L., Wang, X.: An improved criminisi inpainting algorithm based on sketch image. J. Comput. Theor. Nanosci. 14(8), 3851–3860 (2017)
Tang, X., Wang, X.: Face sketch recognition. TCSVT 14(1), 50–57 (2004)
Tao, M., et al.: DF-GAN: a simple and effective baseline for text-to-image synthesis. arXiv preprint arXiv:2008.05865 (2020)
Tome, P., Vera-Rodriguez, R., Fierrez, J., Ortega-Garcia, J.: Facial soft biometric features for forensic face recognition. Forensic Sci. Int. 257, 271–284 (2015)
Wang, N., Gao, X., Sun, L., Li, J.: Bayesian face sketch synthesis. TIP 26(3), 1264–1274 (2017)
Wang, N., Li, J., Sun, L., Song, B., Gao, X.: Training-free synthesized face sketch recognition using image quality assessment metrics. arXiv preprint arXiv:1603.07823 (2016)
Wang, N., Tao, D., Gao, X., Li, X., Li, J.: A comprehensive survey to face hallucination. IJCV 106(1), 9–30 (2014)
Wang, X., Tang, X.: Face photo-sketch synthesis and recognition. TPAMI 31(11), 1955–1967 (2008)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004)
Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: CVPR, pp. 532–539 (2013)
Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR, pp. 1316–1324 (2018)
Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2Image: conditional image generation from visual attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 776–791. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_47
Yi, Z., Zhang, H., Tan, P., Gong, M.: Dualgan: unsupervised dual learning for image-to-image translation. In: ICCV, pp. 2868–2876 (2017)
Yuan, M., Peng, Y.: Text-to-image synthesis via symmetrical distillation networks. arXiv preprint arXiv:1808.06801 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: ICML, pp. 7354–7363. PMLR (2019)
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916 (2017)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV, pp. 5907–5915 (2017)
Zhang, M., Wang, N., Li, Y., Wang, R., Gao, X.: Face sketch synthesis from coarse to fine. In: AAAI, pp. 7558–7565 (2018)
Zhang, W., Wang, X., Tang, X.: Coupled information-theoretic encoding for face photo-sketch recognition. In: CVPR, pp. 513–520. IEEE (2011)
Zou, C., et al.: SketchyScene: richly-annotated scene sketches. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 438–454. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_26
Acknowledgement
This work was supported by the National Key Research and Development Program of China under grant No. 2019YFC1521104, Natural Science Foundation of China (61772050, 62172247).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chao, W., Chang, L., Xi, F., Duan, F. (2022). LAGAN: Landmark Aided Text to Face Sketch Generation. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13537. Springer, Cham. https://doi.org/10.1007/978-3-031-18916-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-18916-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18915-9
Online ISBN: 978-3-031-18916-6
eBook Packages: Computer ScienceComputer Science (R0)