Advancements in Photorealistic Style Translation with a Hybrid Generative Adversarial Network

Cheng, Keyang; Tahir, Rabia; Wan, Hao

doi:10.1007/978-981-97-8505-6_24

Keyang Cheng¹⁵,
Rabia Tahir¹⁵ &
Hao Wan¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15034))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

139 Accesses

Abstract

Photorealistic style translation has gained significant attention in the field of computer vision and graphics due to its potential applications in many areas such as content generation, artistic expression and image editing. In this paper, we propose an improved hybrid Generative Adversarial Network (GAN) framework to perform photorealistic style translation. This model aims to overcome limitations observed in the existing literature by leveraging the strength of both GAN and Autoencoders. A fast and efficient feature extractor method is developed based on EfficientNet B2 to improve the performance of style transfer method. Moreover, two loss functions are proposed which further optimize the efficiency of the model in terms of accuracy and realism. We conduct extensive experiments on state-of-the-art datasets and demonstrate the effectiveness of the proposed model. Additionally, we provide analysis of results and discuss potential avenues for future research in the field of photorealistic style translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Pang, Y., Lin, J., Qin, T., et al.: Image-to-image translation: methods and applications. IEEE Trans. Multimed. 3859–3881 (2021)
Google Scholar
Huo, Z., Li, X., Qiao, Y., et al.: Efficient photorealistic style transfer with multi-order image statistics. Appl. Intell. 52(11), 12533–12545 (2022)
Google Scholar
Chiu, T.Y., Gurari, D.: PhotoWCT2: compact autoencoder for photorealistic style transfer resulting from blockwise training and skip connections of high-frequency residuals. In: IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2868–2877 (2022)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, part II, no. 14, pp. 694–711 (2016)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv:1508.06576 (2015)
Sheng, L., Lin, Z., Shao, Z., et al.: Avatar-net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8242–8250 (2018)
Google Scholar
Liu, A.H., Liu, Y.C., Wang, Y.C.: A unified feature disentangler for multi-domain image translation and manipulation. Adv. Neural Inf. Process. Syst. 31 (2018)
Google Scholar
Cheng, K., Tahir, R., Eric, L., et al.: An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimed. Tools Appl. 79(19), 13725–13752 (2020)
Google Scholar
Goodfellow, I., Abadei, P., Mirza, J., et al.: Generative adversarial nets. In: Adv. Neural Inf. Process. Syst. (NIPS) 27(1) (2014)
Google Scholar
Luan, F., Paris, S., Shechtman, E., et al.: Deep photo style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4990–4998 (2017)
Google Scholar
Jiang, L., Zhang, C., Huang, M., et al.: Tsit: a simple and versatile framework for image-to-image translation. In: European Conference on Computer Vision, part III, no. 16, pp. 206–222 (2020)
Google Scholar
Yoo, J., Uh, Y., Chun, S., et al.: Photorealistic style transfer via wavelet transforms. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9036–9045 (2019)
Google Scholar
Rao, D., Wu, X.J., Li, H., et al.: UMFA: a photorealistic style transfer method based on U-Net and multi-layer feature aggregation. J. Electron. Imaging 30(5), 053013 (2021)
Google Scholar
Sanakoyeu, A., Kotovenko, D., Lang, S., et al.: A style-aware content loss for real-time hd style transfer. In: European Conference on Computer Vision (ECCV), pp. 698–694 (2018)
Google Scholar
An, J., Xiong, J., Huan, J., et al.: Ultrafast photorealistic style transfer via neural architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 10443–10450 (2020)
Google Scholar
Qiao, Y., Cui, J., Huang, F., et al.: Efficient style-corpus constrained learning for photorealistic style transfer. IEEE Trans. Image Process. 30, 3154–3166 (2021)
Google Scholar
Bui, N.T., Nguyen, N.T., Cao, X.N.: Structure-aware photorealistic style transfer using ghost bottlenecks. In: International Conference on Pattern Recognition and Artificial Intelligence, pp. 15–24 (2022)
Google Scholar
Sunwoo, K., Soohyun, K., Seungryong, K.: Deep translation prior: test-time training for photorealistic style transfer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 1183–1191 (2022)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)
Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Park, T., Zhu, J.Y., Wang, O., et al.: Swapping autoencoder for deep image manipulation. Adv. Neural Inf. Process. Syst. 33, 7198–7211 (2020)
Google Scholar
Micikevicius, P., Narang, S., Alben, J., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
Qin, Z., Kim, D., Gedeon, T.: Rethinking softmax with cross-entropy: neural network classifier as mutual information estimator. arXiv:1911.10688 (2019)
Jiang, L., Dai, B., Wu, W., et al.: Focal frequency loss for image reconstruction and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13919–13929 (2021)
Google Scholar
Zhu, J., Park, Y., Isola, T., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar
Hong, K., Jeon, S., Yang, H., Fu, J., Byun, H.: Domain-aware universal style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14609–14617 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 (2016)
Google Scholar
Dimech’s, A.: Peak Signal-to-Noise Ratio (PSNR) in Python. Adam Dimech’s Coding. https://code.adonline.id.au/peak-signal-to-noise-ratio-python/ (2021)
Parfenenkov, B.B., Panachev, M.A.: Comparison of some image quality approaches. AIST (Supplement), 48–53 (2014)
Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers. IEEE (2003)
Google Scholar

Download references

Acknowledgement

This research is supported by Natural Science Foundation of China (No. 61972183).

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, China
Keyang Cheng, Rabia Tahir & Hao Wan

Authors

Keyang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Rabia Tahir
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rabia Tahir .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, K., Tahir, R., Wan, H. (2025). Advancements in Photorealistic Style Translation with a Hybrid Generative Adversarial Network. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15034. Springer, Singapore. https://doi.org/10.1007/978-981-97-8505-6_24

Download citation

DOI: https://doi.org/10.1007/978-981-97-8505-6_24
Published: 07 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8504-9
Online ISBN: 978-981-97-8505-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics