Abstract
Image generation has always been one of the important research directions in the field of computer vision. It has rich applications in virtual reality, image design, and video synthesis. Our experiments proved that the proposed multi-style image generative network can efficiently generate high-quality images with different artistic styles based on the semantic images. Compared with the current state-of-the-art methods, the result generation speed of our proposed method is the fastest. In this paper, we focus on implementing arbitrary style transfer based on semantic images with high resolution (\(512\times 1024\)). We propose a new multi-channel generative adversarial network which uses fewer parameters to generate multi-style images. The network framework consists of a content feature extraction network, a style feature extraction network, and a content-stylistic feature fusion network. Our qualitative experiments show that the proposed multi-style image generation network can efficiently generate semantic-based, high-quality images with multiple artistic styles and with greater clarity and richer details. We adopt a user preference study, and the results show that the results generated by our method are more popular. Our speed study shows that our proposed method has the fastest result generation speed compared to the current state-of-the-art methods. We publicly release the source code of our project, which can be accessed at https://github.com/JuanMaoHSQ/Multi-style-image-generation-based-on-semantic-image.
Similar content being viewed by others
Data Availability
The datasets generated during and analyzed during the current study are available in the cityscapes repository, https://www.cityscapes-dataset.com/ and the WikiArt repository, https://www.wikiart.org/.
References
Virtusio, J.J., Ople, J.J.M., Tan, D.S., Tanveer, M., Kumar, N., Hua, K.-L.: Neural style palette: a multimodal and interactive style transfer from a single style image. IEEE Trans. Multimedia 23, 2245–2258 (2021)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Adv. Neural. Inf. Process. Syst. 3, 2672–2680 (2014)
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.-H.: Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 385–395 (2017)
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference Computer Vision 2016, pp. 694–711 (2016)
Liu, J., Yang, W., Sun, X., Zeng, W.: Photo stylistic brush: robust style transfer via superpixel-based bipartite graph. IEEE Trans. Multimedia 20, 1724–1737 (2017)
Reimann, M., Klingbeil, M., Pasewaldt, S., Semmo, A., Trapp, M., Döllner, J.: Locally controllable neural style transfer on mobile devices. Vis. Comput. 35, 1531–1547 (2019)
Wang, L., Wang, Z., Yang, X., Hu, S.-M., Zhang, J.: Photographic style transfer. Vis. Comput. 36, 317–331 (2020)
Zhao, H.-H., Rosin, P.L., Lai, Y.-K., Wang, Y.-N.: Automatic semantic style transfer using deep convolutional neural networks and soft masks. Vis. Comput. 36, 1307–1324 (2020)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)
Shen, F., Yan, S., Zeng, G.: Neural style transfer via meta networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8061–8069 (2018)
Park, D.Y., Lee, K.H.: Arbitrary style transfer with style-attentional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5880–5888 (2019)
Li, X., Liu, S., Kautz, J., Yang, M.-H.: Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3809–3817 (2019)
Yu, X., Zhou, G.: Arbitrary style transfer via content consistency and style consistency. Vis. Comput. 39, 1–14 (2023)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Virtusio, J.J., Tan, D.S., Cheng, W.H., Tanveer, M., Hua, K.L.: Enabling artistic control over pattern density and stroke strength. IEEE Transactions on Multimedia PP(99), 1–1 (2020)
Li, P., Zhao, L., Xu, D., Lu, D.: Incorporating multiscale contextual loss for image style transfer. In: 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), pp. 241–245 (2018)
Dey, N., Blanc-Feraud, L., Zimmer, C., Roux, P., Kam, Z., Olivo-Marin, J.-C., Zerubia, J.: Richardson–Lucy algorithm with total variation regularization for 3d confocal microscope deconvolution. Microsc. Res. Tech. 69, 260–266 (2006)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
Li, B., Yu, Y., Wang, M.: Semantic image synthesis with trilateral generative adversarial networks. In: 2020 The 4th International Conference on Video and Image Processing, pp. 218–224 (2020)
Deng, Y., Tang, F., Dong, W., Sun, W., Huang, F., Xu, C.: Arbitrary style transfer via multi-adaptation network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2719–2727 (2020)
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: unifying structure and texture similarity. IEEE Trans. Pattern Anal. Mach. Intell. 44(5), 2567–2581 (2020)
Funding
This work is supported by National Natural Science Foundation of China (61807002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Y., Li, D., Li, B. et al. Multi-style image generation based on semantic image. Vis Comput 40, 3411–3426 (2024). https://doi.org/10.1007/s00371-023-03042-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03042-2