Abstract
Arbitrary-style-per-model (ASPM) style transfer algorithms transfer arbitrary styles based on a single model. Statistics-based learning algorithms of ASPM, represented by adaptive instance normalization (AdaIN), conduct instance normalization and then perform an affine transformation on target features. These algorithms are computationally efficient and easy to embed in convolutional neural networks. Consequently, they are widely used in image synthesis tasks to control the style of the resulting images. However, the style of stylized images may be a combination of content and stylized images, which suggests that these methods do not transform styles accurately. In this work, we rethink the function of AdaIN in controlling style. We show that the role of AdaIN is to (1) give each input content image a specific optimization target, (2) dynamically set cross-channel correlations, and (3) act as a feature selector after combining it with an activation function. Accordingly, we propose adaptive style modulation (AdaSM), which fully leverages the three roles mentioned above and thereby enables more precise control of global style. Experimental results show that AdaSM provides superior style controllability, alleviates the style blending problem, and outperforms state-of-the-art methods in artistic style transfer tasks.
Similar content being viewed by others
Data Availibility
The datasets used during this study are available upon request to the authors.
References
Gooch B, Gooch A (2001) Non-photorealistic rendering. AK Peters/CRC Press
Strothotte T, Schlechtweg S (2002) Non-photorealistic computer graphics: modeling, rendering, and animation. Morgan Kaufmann Publishers Inc
Rosin P, Collomosse J (2012) Image and video-based artistic stylisation. Springer
Jing Y, Yang Y, Feng Z, Ye J, Yu Y, Song M (2019) Neural style transfer: a review. IEEE Trans Vis Comput Graph 26(11):3365–3385
Misra J, Saha I (2010) Artificial neural networks in hardware: a survey of two decades of progress. Neurocomputing 74(1–3):239–255
Cao Y, Cao Y, Wen S, Huang T, Zeng Z (2019) Passivity analysis of delayed reaction-diffusion memristor-based neural networks. Neural Netw 109:159–167
Cao Y, Liu N, Zhang C, Zhang T, Luo Z-F (2022) Synchronization of multiple reaction-diffusion memristive neural networks with known or unknown parameters and switching topologies. Knowl Based Syst 254:109595
Wang Z, Joshi S, Savel’ev S, Song W, Midya R, Li Y, Rao M, Yan P, Asapu S, Zhuo Y et al (2018) Fully memristive neural networks for pattern classification with unsupervised learning. Nat Electron 1(2):137–145
Pershin YV, Di Ventra M (2010) Experimental demonstration of associative memory with memristive neural networks. Neural Netw 23(7):881–886
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456 . PMLR
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Xu J, Sun X, Zhang Z, Zhao G, Lin J (2019) Understanding and improving layer normalization. In: Advances in neural information processing systems, vol 32, pp 4383–4393
Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in neural information processing systems, vol 29, pp 901–909
Wu Y, He K (2018) Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International Conference on Computer Vision, pp 2018–2025
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Wen S, Xiao S, Yang Y, Yan Z, Zeng Z, Huang T (2018) Adjusting learning rate of memristor-based multilayer neural networks via fuzzy method. IEEE Trans Comput Aided Des Integr Circ Syst 38(6):1084–1094
Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser, Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30, pp 5998–6008
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp 7354–7363 . PMLR
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6848–6856
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5693–5703
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2414–2423
Gatys L, Ecker AS, Bethge M (2015) Texture synthesis using convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 262–270
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, Springer, pp 694–711
Ulyanov D, Lebedev V, Vedaldi A, Lempitsky VS (2016) Texture networks: feed-forward synthesis of textures and stylized images. In: ICML, p 4
Ulyanov D, Vedaldi A, Lempitsky V (2017) Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6924–6932
Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European Conference on Computer Vision, Springer, pp 702–716
Dumoulin V, Shlens J, Kudlur M (2016) A learned representation for artistic style. arXiv preprint arXiv:1610.07629
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1501–1510
Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M-H (2017) Universal style transfer via feature transforms. In: Advances in Neural Information Processing Systems, pp 386–396
Li X, Liu S, Kautz J, Yang M.-H (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3809–3817
Sheng L, Lin Z, Shao J, Wang X (2018) Avatar-net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8242–8250
Park DY, Lee KH (2019) Arbitrary style transfer with style-attentional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5880–5888
Deng Y, Tang F, Dong W, Sun W, Huang F, Xu C (2020) Arbitrary style transfer via multi-adaptation network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 2719–2727
Yao Y, Ren J, Xie X, Liu W, Liu Y-J, Wang J (2019) Attention-aware multi-stroke style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1467–1475
Chen H, Wang Z, Zhang H, Zuo Z, Li A, Xing W, Lu D et al (2021) Artistic style transfer with internal-external learning and contrastive learning. Adv Neural Inf Process Syst 34:26561–26573
Ghiasi G, Lee H, Kudlur M, Dumoulin V, Shlens J (2017) Exploring the structure of a real-time, arbitrary neural artistic stylization network. In: Proceedings of the British machine vision conference, pp 114.1–114.12
Park DY, Lee KH (2019) Arbitrary style transfer with style-attentional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5880–5888
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8110–8119
Karras T, Aittala M, Laine S, Härkönen E, Hellsten J, Lehtinen J, Aila T (2021) Alias-free generative adversarial networks. Adv Neural Inf Process Syst 34:852–863
Choi Y, Uh Y, Yoo J, Ha J-W (2020) Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8188–8197
Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision, pp 172–189
Wang Y, Gonzalez-Garcia A, van de Weijer J, Herranz L (2019) Sdit: scalable and diverse cross-domain image translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1267–1276
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4401–4410
Li Y, Wang N, Liu J, Hou X (2017) Demystifying neural style transfer. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp 2230–2236
Chandran P, Zoss G, Gotardo P, Gross M, Bradley D (2021) Adaptive convolutions for structure-aware style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7972–7981
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp 740–755 . Springer
Wikiart P (2016) www.kaggle.com/c/painter-by-numbers
Li X, Liu S, Kautz J, Yang M-H (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3809–3817
Sheng L, Lin Z, Shao J, Wang X (2018) Avatar-net: multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8242–8250
Liu S, Lin T, He D, Li F, Wang M, Li X, Sun Z, Li Q, Ding E (2021) Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6649–6658
Acknowledgements
The research was supported by the Key Laboratory of Spectral Imaging Technology, Xi’an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, the Key Laboratory of Biomedical Spectroscopy of Xi’an, the Outstanding Award for Talent Project of the Chinese Academy of Sciences, “From 0 to 1” Original Innovation Project of the Basic Frontier Scientific Research Program of the Chinese Academy of Sciences, and Autonomous Deployment Project of Xi’an Institute of Optics and Precision Mechanics of Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Hu, B., Huang, Y. et al. Adaptive Style Modulation for Artistic Style Transfer. Neural Process Lett 55, 6213–6230 (2023). https://doi.org/10.1007/s11063-022-11135-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-11135-7