Skip to main content
Log in

Generative adversarial network with hybrid attention and compromised normalization for multi-scene image conversion

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In order to generate high-quality realistic images, this paper proposes an image conversion algorithm based on hybrid attention generation adversarial network. This network is composed of the generator and discriminator, both of which are jointly trained through the loss function. The generator is constructed by using a three-stage structure of down-sampling, residual and up-sampling blocks where the residual block uses a hybrid attention mechanism. The compromised instance and layer normalization is also proposed by weighting the output of the fully connected layer.  The multi-scale PatchGAN is introduced as the discriminator. The proposed network can produce more realistic images using a new loss function, which comprises four iterms: generation adversarial loss, \(L_1\) regularization loss, VGG loss and feature matching loss. The experimental results demonstrated that the proposed method can produce more realistic and detailed images than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232

  2. Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV), pp 172–189

  3. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  4. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  6. Wang C, Zheng H, Yu Z, Zheng Z, Gu Z, Zheng B (2018) Discriminative region proposal adversarial networks for high-quality image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV), pp 770–785

  7. Wang W, Li Z (2018) Advances in generative adversarial network. Tongxin Xuebao/J Commun 39(2):133–146

    Google Scholar 

  8. Xiao J, Zhou J, Lei J, Xu C, Sui H (2020) Image hazing algorithm based on generative adversarial networks. IEEE Access 8:15883–15894

    Article  Google Scholar 

  9. Xun Huang SB (2016) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision, pp 1501–1510

  10. Goodfellow IJ, Jean P-A, Mehdi M, Bing X, Yoshua B (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  11. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, pp 214–223

  12. Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (TOG) 36(4):1–14

    Article  Google Scholar 

  13. Bhattacharjee D, Kim S,Vizier G, Salzmann M (2020) Dunit: detection-based unsupervised image-to-image translation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  14. Chen Y-C, Xu X, Jia J (2020) Domain adaptive image-to-image translation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  15. Jiang S, Chen Y, Yang J, Zhang C, Zhao T (2019) Mixture variational autoencoders. Lecture Notes in Computer Science 128:263–269

    Google Scholar 

  16. Larsen ABL, Boesen, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric. In: 33rd international conference on machine learning, pp 2341–2349

  17. Xu W, Shawn K, Wang G (2019) Toward learning a unified many-to-many mapping for diverse image translation. Pattern Recognit 93:570–580

    Article  Google Scholar 

  18. Li W, Fan L, Wang Z, Ma C, Cui X (2021) Tackling mode collapse in multi-generator GANs with orthogonal vectors. Pattern Recognit 110:107646

    Article  Google Scholar 

  19. Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807

  20. Mao X, Li Q, Xie H, Lau R, Zhen W, Smolley S (2017) Least squares generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2813–2821

  21. Zhang Y, Xiao J, Peng J, Ding Y, Liu J, Guo Z, Zong X (2018) Kernel wiener filtering model with low-rank approximation for image denoising. Inf Sci 462:402–416

    Article  MathSciNet  Google Scholar 

  22. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423

  23. Li R, Cao W, Jiao Q, Wu S, Wong H-S (2020) Simplified unsupervised image translation for semantic segmentation adaptation. Pattern Recognit 105:107343

    Article  Google Scholar 

  24. Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M-H (2017) Diversified texture synthesis with feed-forward networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3920–3928

  25. Chen D, Yuan L, Liao J, Yu N, Hua G (2017) Stylebank: an explicit representation for neural image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1897–1906

  26. Liu M-Y, Tuzel O ( 2016) Coupled generative adversarial networks. In: Advances in neural information processing systems, pp 469–477

  27. Luan F, Paris S, Shechtman E, Bala K (2017) Deep photo style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4990–4998

  28. Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3722–3731

  29. Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision, pp 1511–1520

  30. Dosovitskiy A, Brox T (2016) Generating images with perceptual similarity metrics based on deep networks. In: Advances in neural information processing systems, pp 658–666

  31. Dai T, Cai J, Zhang Y, Xia S-T, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11057–11066

  32. Zhang Y, Tian Y, Kong Y,Zhong B, Fu Y (2018)Residual dense network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2472–2481

  33. Zhang H, Goodfellow I, Metaxas DN, Odena A. Self-attention generative adversarial networks. Machine Learning. arXiv:1805.08318

  34. Miyato T, Koyama M (2018) cGANs with projection discriminator. In: Proceedings of the international conference on learning representations

  35. Park T, Liu M-Y, Wang T-C, Zhu J-Y (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2337–2346

  36. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  37. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. Lecture Notes in Computer Science 11211:3–19

    Article  Google Scholar 

  38. Li B, Ren W et al (2018) Benchmarking single-image dehazing and beyond. IEEE Trans Image Process 28(1):492–504

    Article  MathSciNet  Google Scholar 

  39. Scharstein D, Hirschmüller H, Kitajima Y, Krathwohl G, Nešić N, Wang X, Westling P (2014) High-resolution stereo datasets with subpixel-accurate ground truth. In: German conference on pattern recognition, pp 31–42

  40. Choi LK, You J, Bovik AC (2015) Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans Image Process 24(11):3888–3901

    Article  MathSciNet  Google Scholar 

  41. Bi’nkowski M, Sutherland DJ, Arbel M, Gretton A (2018) Demystifying MMD GANs. In: International conference on learning representations

Download references

Acknowledgements

This work is funded by the Social Science Foundation of Shaanxi Province (Grant No. 2019H010), the New Star of Youth Science and Technology of Shaanxi Province (Grant No. 2020KJXX-007) and the Open Project Program Foundation of the Key Laboratory of Opto-Electronics Information Processing, Chinese Academy of Sciences (OEIP-O-202009). The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jinsheng Xiao or Yongqin Zhang.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, J., Zhang, S., Yao, Y. et al. Generative adversarial network with hybrid attention and compromised normalization for multi-scene image conversion. Neural Comput & Applic 34, 7209–7225 (2022). https://doi.org/10.1007/s00521-021-06841-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06841-7

Keywords

Navigation