CaVIT: An integrated method for image style transfer using parallel CNN and vision transformer

Zhang, ZaiFang; Lu, ShunLu; Guo, Qing; Gao, Nan; Yang, YuXiao

doi:10.1007/s10489-024-06114-5

CaVIT: An integrated method for image style transfer using parallel CNN and vision transformer

Published: 13 January 2025

Volume 55, article number 306, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

ZaiFang Zhang ORCID: orcid.org/0000-0002-1073-242X¹,
ShunLu Lu¹,
Qing Guo¹,
Nan Gao² &
…
YuXiao Yang²

116 Accesses
Explore all metrics

Abstract

This study focuses on image style transfer, aiming to generate images with the desired style while preserving the underlying content structure. Existing models face challenges in accurately representing both content and style features. To address this, an integrated method for image style transfer is proposed, utilizing a parallel CNN and Vision Transformer (CaVIT). It combines a Convolutional Neural Network (CNN) with a Vision Transformer (VIT) to achieve enhanced performance. Our method utilizes VGG-19 with residual blocks to encode style features for enhanced refinement. Additionally, the PA-Trans Encoder Layer is introduced, inspired by the Transformer Encoder Layer, to efficiently encode content features while preserving the complete content structure. The fused features are then decoded into stylized images using a CNN decoder. Qualitative and quantitative evaluations demonstrate that our proposed method outperforms existing models, delivering high-quality results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LVAST: a lightweight vision transformer for effective arbitrary style transfer

Article 18 December 2024

NCCNet: Arbitrary Neural Style Transfer with Multi-channel Conversion

Two-stream FCNs to balance content and style for style transfer

Article 08 June 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data that support the findings of this study are publicly accessible at the official website address: https://cocodataset.org/#download and https://paperswithcode.com/dataset/wikiart.

References

Abed A, Akrout B, Amous I (2022) Semantic heads segmentation and counting in crowded retail environment with convolutional neural networks using top view depth images. SN Comp Sci 4(1):61
Article Google Scholar
Abed A, Akrout B, Amous I (2024) Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images. Arab J Sci Eng 49(3):3735–3749
Article MATH Google Scholar
Jing Y, Yang Y, Feng Z, Ye J, Yu Y, Song M (2019) Neural style transfer: A review. IEEE Trans Visualiz Comp Grap 26(11):3365–85
Article MATH Google Scholar
Wei LY, Levoy M (2000) Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp 479–488
Gao W, Li Y, Yin Y, Yang MH (2020) Fast video multi-style transfer. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3222–3230
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, Springer International Publishing, pp 694–711
Chen D, Yuan L, Liao J, Yu N, Hua G (2017) Stylebank: an explicit representation for neural image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1897–1906
Puy G, Pérez P (2019) A flexible convolutional solver for fast style transfers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8963–8972
Dumoulin V, Shlens J, Kudlur M (2016) A learned representation for artistic style. arXiv preprint arXiv:1610.07629.
Ulyanov D, Lebedev V, Vedaldi A, Lempitsky V (2016) Texture networks: feed-forward synthesis of textures and stylized images. arXiv preprint arXiv:1603.03417.
Zhang H, Dana K (2018) Multi-style generative network for real-time transfer. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0-0
An J, Huang S, Song Y, Dou D, Liu W, Luo J (2021) Artflow: Unbiased image style transfer via reversible neural flows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 862–871
Deng Y, Tang F, Dong W, Ma C, Pan X, Wang L, Xu C (2022) Stytr2: image style transfer with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11326–11336
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415.
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision, pp 1501–1510
Lu J, Barnes C, DiVerdi S, Finkelstein A (2013) Realbrush: Painting with examples of physical media. ACM Trans Grap (TOG) 32(4):1–12
Article Google Scholar
Hertzmann A (1998) Painterly rendering with curved brush strokes of multiple sizes. In: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp 453–460
Portilla J, Simoncelli EP (2000) A parametric texture model based on joint statistics of complex wavelet coefficients. Int J Comput Vision 40:49–70
Article MATH Google Scholar
Simonyan K (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Risser E, Wilmot P, Barnes C (2017) Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893.
Li Y, Wang N, Liu J, Hou X (2017) Demystifying neural style transfer. arXiv preprint arXiv:1701.01036.
Zhang Y, Tang F, Dong W, Huang H, Ma C, Lee TY, Xu C (2022) Domain enhanced arbitrary image style transfer via contrastive learning. In: ACM SIGGRAPH 2022 conference proceedings, pp 1–8
Park DY, Lee KH (2019) Arbitrary style transfer with style-attentional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5880–5888
Jing Y, Liu X, Ding Y, Wang X, Ding E, Song M, Wen S (2020) Dynamic instance normalization for arbitrary style transfer. In: Proceedings of the AAAI conference on artificial intelligence, 34(04):4369–4376
Svoboda J, Anoosheh A, Osendorfer C, Masci J (2020) Two-stage peer-regularized feature recombination for arbitrary image style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13816–13825
Li X, Liu S, Kautz J, Yang MH (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3809–3817
Yao Y, Ren J, Xie X, Liu W, Liu YJ, Wang J (2019) Attention-aware multi-stroke style transfer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1467–1475
Liu S, Lin T, He D, Li F, Wang M, Li X, Sun Z, Li Q, Ding E (2021) Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6649–6658
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Article MathSciNet MATH Google Scholar
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Kotovenko D, Sanakoyeu A, Ma P, Lang S, Ommer B (2019) A content transformation block for image style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10032–10041
Liu MY, Huang X, Mallya A, Karras T, Aila T, Lehtinen J, Kautz J (2019) Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10551–10560
Cao Y, Chandrasekar A, Radhika T, Vijayakumar V (2024) Input-to-state stability of stochastic Markovian jump genetic regulatory networks. Mathemat Comput Simulat. 222:174–87
Article MathSciNet MATH Google Scholar
Radhika T, Chandrasekar A, Vijayakumar V et al (2023) Analysis of Markovian jump stochastic Cohen-Grossberg BAM neural networks with time delays for exponential input-to-state stability. Neural Proc Lett 55(8):11055–11072
Article MATH Google Scholar
Dosovitskiy A (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Zhu X, Su W, Lu L, Li B, Wang X, and Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv: 2010.04159
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
MATH Google Scholar
Wu X, Hu Z, Sheng L, Xu D (2021) Styleformer: real-time arbitrary style transfer via parametric style composition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 14618–14627
Wang B, Komatsuzaki A (2021) GPT-J-6B: a 6 billion parameter autoregressive language model. URL https://github. com/kingoflolz/mesh-transformer-jax
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S, Schuh P (2023) Palm: Scaling language modeling with pathways. J Mach Learn Res 24(240):1–13
Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, Proceedings, Part V 13 2014, Springer International Publishing, pp 740–755
Phillips F, Mackintosh B (2011) Wiki Art Gallery Inc: A case for critical thinking. Issues Account Educ 26(3):593–608
Article MATH Google Scholar
Deng Y, Tang F, Dong W, Ma C, Huang F, Deussen O, Xu C (2020) Exploring the representativity of art paintings. IEEE Trans Multimedia 23:2794–2805
Article MATH Google Scholar
Chen H, Wang Z, Zhang H, Zuo Z, Li A, Xing W, Lu D (2021) Artistic style transfer with internal-external learning and contrastive learning. Adv Neural Inf Process Syst 34:26561–26573
Google Scholar
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595

Download references

Author information

Authors and Affiliations

School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, 200444, China
ZaiFang Zhang, ShunLu Lu & Qing Guo
College of Sciences, Shanghai University, Shanghai, 200444, China
Nan Gao & YuXiao Yang

Authors

ZaiFang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
ShunLu Lu
View author publications
You can also search for this author inPubMed Google Scholar
Qing Guo
View author publications
You can also search for this author inPubMed Google Scholar
Nan Gao
View author publications
You can also search for this author inPubMed Google Scholar
YuXiao Yang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to ZaiFang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Z., Lu, S., Guo, Q. et al. CaVIT: An integrated method for image style transfer using parallel CNN and vision transformer. Appl Intell 55, 306 (2025). https://doi.org/10.1007/s10489-024-06114-5

Download citation

Accepted: 23 November 2024
Published: 13 January 2025
DOI: https://doi.org/10.1007/s10489-024-06114-5

Index Terms

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CaVIT: An integrated method for image style transfer using parallel CNN and vision transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LVAST: a lightweight vision transformer for effective arbitrary style transfer

NCCNet: Arbitrary Neural Style Transfer with Multi-channel Conversion

Two-stream FCNs to balance content and style for style transfer

Explore related subjects

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Index Terms

Subscribe and save

Buy Now