Skip to main content
Log in

SA-SinGAN: self-attention for single-image generation adversarial networks

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Single-image training is a research hotspot task of generating adversarial networks, especially in tasks such as image editing and image coordination. However, the existing network has a series of problems such as a long training time, poor image quality, and an unstable training model. Based on the research hot issues, we propose a single-image generation adversarial network of the self-attention mechanism and discuss the changes of the model when the self-attention mechanism is placed in different positions of the generator. We introduced the spectral normalization in the generator and discriminator networks to stabilize the training process and compared the influence of the learning rate on the network. We used artificial vision and model evaluation methods to test the performance of the model on three representative datasets and compared with the current more advanced models. Experiments show that our proposed model has better performance than single-sample generative adversarial networks, reducing Single Image Fréchet Inception Distance (SIFID) from 4.80 to 2.057 on the challenging Generation datasets, reducing SIFID from 0.06 to 0.02 on the Places datasets, and reducing SIFID from 0.23 to 0.04 on the LSUN datasets. The training time of our model is one-ninth of the single-sample generation adversarial network, which can obtain the overall structure of the single training sample, which has great research significance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM. 63, 139–144 (2020). https://doi.org/10.1145/3422622

    Article  Google Scholar 

  2. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3730–3738 (2015). https://doi.org/10.1109/ICCV.2015.425

  3. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5908–5916 (2017). https://doi.org/10.1109/ICCV.2017.629

  4. Cheng, P., He, S., Stojanovic, V., Luan, X., Liu, F.: Fuzzy fault detection for Markov jump systems with partly accessible hidden information: an event-triggered approach. IEEE Trans. Cybernet. (2021). https://doi.org/10.1109/TCYB.2021.3050209

    Article  Google Scholar 

  5. Wei, T., Li, X., Stojanovic, V.: Input-to-state stability of impulsive reaction–diffusion neural networks with infinite distributed delays. Nonlinear Dyn. 103, 1733–1755 (2021). https://doi.org/10.1007/s11071-021-06208-6

    Article  Google Scholar 

  6. Tao, H., Li, X., Paszke, W., Stojanovic, V., Yang, H.: Robust PD-type iterative learning control for discrete systems with multiple time-delays subjected to polytopic uncertainty and restricted frequency-domain. Multidim. Syst. Sign Process. 32, 671–692 (2021). https://doi.org/10.1007/s11045-020-00754-9

    Article  MathSciNet  MATH  Google Scholar 

  7. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016). https://doi.org/10.1109/CVPR.2016.278

  8. Shocher, A., Bagon, S., Isola, P., Irani, M.: InGAN: capturing and retargeting the “DNA” of a natural image. IEEE Comput. Soc. (2019). https://doi.org/10.1109/ICCV.2019.00459

    Article  Google Scholar 

  9. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. IEEE Comput. Soc. (2017). https://doi.org/10.1109/CVPR.2017.19

    Article  Google Scholar 

  10. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. IEEE Comput. Soc. (2017). https://doi.org/10.1109/CVPR.2017.632

    Article  Google Scholar 

  11. Zeng, W., Zhao, M., Gao, Y., Zhang, Z.: TileGAN: category-oriented attention-based high-quality tiled clothes generation from dressed person. Neural Comput. Appl. 32, 17587–17600 (2020). https://doi.org/10.1007/s00521-020-04928-1

    Article  Google Scholar 

  12. Wang, C., Xing, X., Yao, G., Su, Z.: Single image deraining via deep shared pyramid network. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01944-z

    Article  Google Scholar 

  13. Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: learning a generative model from a single natural image. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4569–4579 (2019). https://doi.org/10.1109/ICCV.2019.00467

  14. Fang, H., Zhu, G., Stojanovic, V., Nie, R., He, S., Luan, X., Liu, F.: Adaptive optimization algorithm for nonlinear Markov jump systems with partial unknown dynamics. Int. J. Robust Nonlinear Control 31, 2126–2140 (2021). https://doi.org/10.1002/rnc.5350

    Article  Google Scholar 

  15. Zhou, D., Liu, Y., Li, X., Zhang, C.: Single-image super-resolution based on local biquadratic spline with edge constraints and adaptive optimization in transform domain. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-02007-z

    Article  Google Scholar 

  16. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGan. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8107–8116 (2020). https://doi.org/10.1109/CVPR42600.2020.00813

  17. Wang, M., Chen, Z., Wu, Q.M.J., Jian, M.: Improved face super-resolution generative adversarial networks. Mach. Vis. Appl. 31, 22 (2020). https://doi.org/10.1007/s00138-020-01073-6

    Article  Google Scholar 

  18. Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. IEEE Comput. Soc. (2008). https://doi.org/10.1109/CVPR.2008.4587842

    Article  Google Scholar 

  19. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1947–1962 (2019). https://doi.org/10.1109/TPAMI.2018.2856256

    Article  Google Scholar 

  20. Karnewar, A., Wang, O.: MSG-GAN: Multi-scale gradients for generative adversarial networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7796–7805 (2020). https://doi.org/10.1109/CVPR42600.2020.00782

  21. Dudhane, A., Aulakh, H.S., Murala, S.: RI-GAN: An end-to-end network for single image haze removal. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2014–2023 (2019). https://doi.org/10.1109/CVPRW.2019.00253

  22. Torfi, A., Beyki, M., Fox, E.A.: On the evaluation of generative adversarial networks by discriminative models. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 991–998 (2021). https://doi.org/10.1109/ICPR48806.2021.9412214

  23. Wang, W., Wang, A., Ai, Q., Liu, C., Liu, J.: AAGAN: enhanced single image dehazing with attention-to-attention generative adversarial network. IEEE Access 7, 173485–173498 (2019). https://doi.org/10.1109/ACCESS.2019.2957057

    Article  Google Scholar 

  24. Shocher, A., Cohen, N., Irani, M.: Zero-shot super-resolution using deep internal learning. IEEE Comput. Soc. (2018). https://doi.org/10.1109/CVPR.2018.00329

    Article  Google Scholar 

  25. Zhou, Y., Zhu, Z., Bai, X., Lischinski, D., Cohen-Or, D., Huang, H.: Non-stationary texture synthesis by adversarial expansion. ACM Trans Graph 37, 49:1-49:13 (2018). https://doi.org/10.1145/3197517.3201285

    Article  Google Scholar 

  26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)

  27. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. Comput. Res. Repos. (CoRR) arXiv:1511.06434 (2015)

  28. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4396–4405 (2019). https://doi.org/10.1109/CVPR.2019.00453

  29. Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation (2017)

  30. Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. IEEE Comput. Soc. (2017). https://doi.org/10.1109/CVPR.2017.202

    Article  Google Scholar 

  31. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018). https://doi.org/10.1109/CVPR.2018.00917

  32. Wang, W., Cui, Y., Li, G., Jiang, C., Deng, S.: A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput. Appl. 32, 14613–14622 (2020). https://doi.org/10.1007/s00521-020-05148-3

    Article  Google Scholar 

  33. Li, H., Zhang, H., Qi, X., Ruigang, Y., Huang, G.: Improved techniques for training adaptive deep networks. IEEE Comput. Soc. (2019). https://doi.org/10.1109/ICCV.2019.00198

    Article  Google Scholar 

  34. Zhang, T., Li, Z., Zhu, Q., Zhang, D.: Improved procedures for training primal wasserstein GANs. In: 2019 IEEE SmartWorld, ubiquitous intelligence computing, advanced trusted computing, scalable computing communications, cloud big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1601–1607 (2019). https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00286

  35. Xiaopeng, C., Jiangzhong, C., Yuqin, L., Qingyun, D.: Improved training of spectral normalization generative adversarial networks. In: 2020 2nd World Symposium on Artificial Intelligence (WSAI), pp. 24–28 (2020). https://doi.org/10.1109/WSAI49636.2020.9143310

  36. Roth, K., Lucchi, A., Nowozin, S., Hofmann, T.: Stabilizing training of generative adversarial networks through regularization. In: Advances in Neural Information Processing Systems 30, pp. 2019–2029. Curran (2018). https://doi.org/10.3929/ethz-b-000223162

  37. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv:1802.05957 [cs, stat]. (2018)

  38. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5767–5777 (2017)

  39. Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep painterly harmonization. Comput. Graph. Forum. 37, 95–106 (2018). https://doi.org/10.1111/cgf.13478

    Article  Google Scholar 

  40. Hinz, T., Fisher, M., Wang, O., Wermter, S.: Improved techniques for training single-image GANs (2020)

  41. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer vision—ECCV 2016, pp. 649–666. Springer International Publishing, Cham (2016)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongdong Zhao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Zhao, H., Yang, D. et al. SA-SinGAN: self-attention for single-image generation adversarial networks. Machine Vision and Applications 32, 104 (2021). https://doi.org/10.1007/s00138-021-01228-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01228-z

Keywords

Navigation