Abstract
Single-image training is a research hotspot task of generating adversarial networks, especially in tasks such as image editing and image coordination. However, the existing network has a series of problems such as a long training time, poor image quality, and an unstable training model. Based on the research hot issues, we propose a single-image generation adversarial network of the self-attention mechanism and discuss the changes of the model when the self-attention mechanism is placed in different positions of the generator. We introduced the spectral normalization in the generator and discriminator networks to stabilize the training process and compared the influence of the learning rate on the network. We used artificial vision and model evaluation methods to test the performance of the model on three representative datasets and compared with the current more advanced models. Experiments show that our proposed model has better performance than single-sample generative adversarial networks, reducing Single Image Fréchet Inception Distance (SIFID) from 4.80 to 2.057 on the challenging Generation datasets, reducing SIFID from 0.06 to 0.02 on the Places datasets, and reducing SIFID from 0.23 to 0.04 on the LSUN datasets. The training time of our model is one-ninth of the single-sample generation adversarial network, which can obtain the overall structure of the single training sample, which has great research significance.
Similar content being viewed by others
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM. 63, 139–144 (2020). https://doi.org/10.1145/3422622
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3730–3738 (2015). https://doi.org/10.1109/ICCV.2015.425
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5908–5916 (2017). https://doi.org/10.1109/ICCV.2017.629
Cheng, P., He, S., Stojanovic, V., Luan, X., Liu, F.: Fuzzy fault detection for Markov jump systems with partly accessible hidden information: an event-triggered approach. IEEE Trans. Cybernet. (2021). https://doi.org/10.1109/TCYB.2021.3050209
Wei, T., Li, X., Stojanovic, V.: Input-to-state stability of impulsive reaction–diffusion neural networks with infinite distributed delays. Nonlinear Dyn. 103, 1733–1755 (2021). https://doi.org/10.1007/s11071-021-06208-6
Tao, H., Li, X., Paszke, W., Stojanovic, V., Yang, H.: Robust PD-type iterative learning control for discrete systems with multiple time-delays subjected to polytopic uncertainty and restricted frequency-domain. Multidim. Syst. Sign Process. 32, 671–692 (2021). https://doi.org/10.1007/s11045-020-00754-9
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016). https://doi.org/10.1109/CVPR.2016.278
Shocher, A., Bagon, S., Isola, P., Irani, M.: InGAN: capturing and retargeting the “DNA” of a natural image. IEEE Comput. Soc. (2019). https://doi.org/10.1109/ICCV.2019.00459
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. IEEE Comput. Soc. (2017). https://doi.org/10.1109/CVPR.2017.19
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. IEEE Comput. Soc. (2017). https://doi.org/10.1109/CVPR.2017.632
Zeng, W., Zhao, M., Gao, Y., Zhang, Z.: TileGAN: category-oriented attention-based high-quality tiled clothes generation from dressed person. Neural Comput. Appl. 32, 17587–17600 (2020). https://doi.org/10.1007/s00521-020-04928-1
Wang, C., Xing, X., Yao, G., Su, Z.: Single image deraining via deep shared pyramid network. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01944-z
Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: learning a generative model from a single natural image. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4569–4579 (2019). https://doi.org/10.1109/ICCV.2019.00467
Fang, H., Zhu, G., Stojanovic, V., Nie, R., He, S., Luan, X., Liu, F.: Adaptive optimization algorithm for nonlinear Markov jump systems with partial unknown dynamics. Int. J. Robust Nonlinear Control 31, 2126–2140 (2021). https://doi.org/10.1002/rnc.5350
Zhou, D., Liu, Y., Li, X., Zhang, C.: Single-image super-resolution based on local biquadratic spline with edge constraints and adaptive optimization in transform domain. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-02007-z
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGan. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8107–8116 (2020). https://doi.org/10.1109/CVPR42600.2020.00813
Wang, M., Chen, Z., Wu, Q.M.J., Jian, M.: Improved face super-resolution generative adversarial networks. Mach. Vis. Appl. 31, 22 (2020). https://doi.org/10.1007/s00138-020-01073-6
Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. IEEE Comput. Soc. (2008). https://doi.org/10.1109/CVPR.2008.4587842
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1947–1962 (2019). https://doi.org/10.1109/TPAMI.2018.2856256
Karnewar, A., Wang, O.: MSG-GAN: Multi-scale gradients for generative adversarial networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7796–7805 (2020). https://doi.org/10.1109/CVPR42600.2020.00782
Dudhane, A., Aulakh, H.S., Murala, S.: RI-GAN: An end-to-end network for single image haze removal. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2014–2023 (2019). https://doi.org/10.1109/CVPRW.2019.00253
Torfi, A., Beyki, M., Fox, E.A.: On the evaluation of generative adversarial networks by discriminative models. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 991–998 (2021). https://doi.org/10.1109/ICPR48806.2021.9412214
Wang, W., Wang, A., Ai, Q., Liu, C., Liu, J.: AAGAN: enhanced single image dehazing with attention-to-attention generative adversarial network. IEEE Access 7, 173485–173498 (2019). https://doi.org/10.1109/ACCESS.2019.2957057
Shocher, A., Cohen, N., Irani, M.: Zero-shot super-resolution using deep internal learning. IEEE Comput. Soc. (2018). https://doi.org/10.1109/CVPR.2018.00329
Zhou, Y., Zhu, Z., Bai, X., Lischinski, D., Cohen-Or, D., Huang, H.: Non-stationary texture synthesis by adversarial expansion. ACM Trans Graph 37, 49:1-49:13 (2018). https://doi.org/10.1145/3197517.3201285
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. Comput. Res. Repos. (CoRR) arXiv:1511.06434 (2015)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4396–4405 (2019). https://doi.org/10.1109/CVPR.2019.00453
Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation (2017)
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. IEEE Comput. Soc. (2017). https://doi.org/10.1109/CVPR.2017.202
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018). https://doi.org/10.1109/CVPR.2018.00917
Wang, W., Cui, Y., Li, G., Jiang, C., Deng, S.: A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput. Appl. 32, 14613–14622 (2020). https://doi.org/10.1007/s00521-020-05148-3
Li, H., Zhang, H., Qi, X., Ruigang, Y., Huang, G.: Improved techniques for training adaptive deep networks. IEEE Comput. Soc. (2019). https://doi.org/10.1109/ICCV.2019.00198
Zhang, T., Li, Z., Zhu, Q., Zhang, D.: Improved procedures for training primal wasserstein GANs. In: 2019 IEEE SmartWorld, ubiquitous intelligence computing, advanced trusted computing, scalable computing communications, cloud big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1601–1607 (2019). https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00286
Xiaopeng, C., Jiangzhong, C., Yuqin, L., Qingyun, D.: Improved training of spectral normalization generative adversarial networks. In: 2020 2nd World Symposium on Artificial Intelligence (WSAI), pp. 24–28 (2020). https://doi.org/10.1109/WSAI49636.2020.9143310
Roth, K., Lucchi, A., Nowozin, S., Hofmann, T.: Stabilizing training of generative adversarial networks through regularization. In: Advances in Neural Information Processing Systems 30, pp. 2019–2029. Curran (2018). https://doi.org/10.3929/ethz-b-000223162
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv:1802.05957 [cs, stat]. (2018)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5767–5777 (2017)
Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep painterly harmonization. Comput. Graph. Forum. 37, 95–106 (2018). https://doi.org/10.1111/cgf.13478
Hinz, T., Fisher, M., Wang, O., Wermter, S.: Improved techniques for training single-image GANs (2020)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer vision—ECCV 2016, pp. 649–666. Springer International Publishing, Cham (2016)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, X., Zhao, H., Yang, D. et al. SA-SinGAN: self-attention for single-image generation adversarial networks. Machine Vision and Applications 32, 104 (2021). https://doi.org/10.1007/s00138-021-01228-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01228-z