Abstract
Photorealistic style translation has gained significant attention in the field of computer vision and graphics due to its potential applications in many areas such as content generation, artistic expression and image editing. In this paper, we propose an improved hybrid Generative Adversarial Network (GAN) framework to perform photorealistic style translation. This model aims to overcome limitations observed in the existing literature by leveraging the strength of both GAN and Autoencoders. A fast and efficient feature extractor method is developed based on EfficientNet B2 to improve the performance of style transfer method. Moreover, two loss functions are proposed which further optimize the efficiency of the model in terms of accuracy and realism. We conduct extensive experiments on state-of-the-art datasets and demonstrate the effectiveness of the proposed model. Additionally, we provide analysis of results and discuss potential avenues for future research in the field of photorealistic style translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pang, Y., Lin, J., Qin, T., et al.: Image-to-image translation: methods and applications. IEEE Trans. Multimed. 3859–3881 (2021)
Huo, Z., Li, X., Qiao, Y., et al.: Efficient photorealistic style transfer with multi-order image statistics. Appl. Intell. 52(11), 12533–12545 (2022)
Chiu, T.Y., Gurari, D.: PhotoWCT2: compact autoencoder for photorealistic style transfer resulting from blockwise training and skip connections of high-frequency residuals. In: IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2868–2877 (2022)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, part II, no. 14, pp. 694–711 (2016)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv:1508.06576 (2015)
Sheng, L., Lin, Z., Shao, Z., et al.: Avatar-net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8242–8250 (2018)
Liu, A.H., Liu, Y.C., Wang, Y.C.: A unified feature disentangler for multi-domain image translation and manipulation. Adv. Neural Inf. Process. Syst. 31 (2018)
Cheng, K., Tahir, R., Eric, L., et al.: An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimed. Tools Appl. 79(19), 13725–13752 (2020)
Goodfellow, I., Abadei, P., Mirza, J., et al.: Generative adversarial nets. In: Adv. Neural Inf. Process. Syst. (NIPS) 27(1) (2014)
Luan, F., Paris, S., Shechtman, E., et al.: Deep photo style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4990–4998 (2017)
Jiang, L., Zhang, C., Huang, M., et al.: Tsit: a simple and versatile framework for image-to-image translation. In: European Conference on Computer Vision, part III, no. 16, pp. 206–222 (2020)
Yoo, J., Uh, Y., Chun, S., et al.: Photorealistic style transfer via wavelet transforms. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9036–9045 (2019)
Rao, D., Wu, X.J., Li, H., et al.: UMFA: a photorealistic style transfer method based on U-Net and multi-layer feature aggregation. J. Electron. Imaging 30(5), 053013 (2021)
Sanakoyeu, A., Kotovenko, D., Lang, S., et al.: A style-aware content loss for real-time hd style transfer. In: European Conference on Computer Vision (ECCV), pp. 698–694 (2018)
An, J., Xiong, J., Huan, J., et al.: Ultrafast photorealistic style transfer via neural architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 10443–10450 (2020)
Qiao, Y., Cui, J., Huang, F., et al.: Efficient style-corpus constrained learning for photorealistic style transfer. IEEE Trans. Image Process. 30, 3154–3166 (2021)
Bui, N.T., Nguyen, N.T., Cao, X.N.: Structure-aware photorealistic style transfer using ghost bottlenecks. In: International Conference on Pattern Recognition and Artificial Intelligence, pp. 15–24 (2022)
Sunwoo, K., Soohyun, K., Seungryong, K.: Deep translation prior: test-time training for photorealistic style transfer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 1183–1191 (2022)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Park, T., Zhu, J.Y., Wang, O., et al.: Swapping autoencoder for deep image manipulation. Adv. Neural Inf. Process. Syst. 33, 7198–7211 (2020)
Micikevicius, P., Narang, S., Alben, J., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
Qin, Z., Kim, D., Gedeon, T.: Rethinking softmax with cross-entropy: neural network classifier as mutual information estimator. arXiv:1911.10688 (2019)
Jiang, L., Dai, B., Wu, W., et al.: Focal frequency loss for image reconstruction and synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13919–13929 (2021)
Zhu, J., Park, Y., Isola, T., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Hong, K., Jeon, S., Yang, H., Fu, J., Byun, H.: Domain-aware universal style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14609–14617 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 (2016)
Dimech’s, A.: Peak Signal-to-Noise Ratio (PSNR) in Python. Adam Dimech’s Coding. https://code.adonline.id.au/peak-signal-to-noise-ratio-python/ (2021)
Parfenenkov, B.B., Panachev, M.A.: Comparison of some image quality approaches. AIST (Supplement), 48–53 (2014)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers. IEEE (2003)
Acknowledgement
This research is supported by Natural Science Foundation of China (No. 61972183).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cheng, K., Tahir, R., Wan, H. (2025). Advancements in Photorealistic Style Translation with a Hybrid Generative Adversarial Network. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15034. Springer, Singapore. https://doi.org/10.1007/978-981-97-8505-6_24
Download citation
DOI: https://doi.org/10.1007/978-981-97-8505-6_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8504-9
Online ISBN: 978-981-97-8505-6
eBook Packages: Computer ScienceComputer Science (R0)