skip to main content
10.1145/3689094.3689468acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Scene Classification on Fine Arts with Style Transfer

Published: 28 October 2024 Publication History

Abstract

Large-scale photographic datasets like ImageNet and Places365 have significantly improved scene classification performance in natural images. However, scene classification in artistic imagery remains underexplored until recently. We propose a multi-step transfer learning technique that gradually adapts scene recognition algorithms from photographs to artistic scene representations. Our experiments demonstrate that integrating a stylized version of Places365, and fine-tuning with a weakly supervised artistic scene dataset, drastically increases scene recognition performance in artworks.We evaluate our method using two state-of-the-art scene recognition methods and analyze the impact of our adaptations with a series of ablation studies.

References

[1]
Taylor Arnold and Lauren Tilton. 2019. Distant viewing: analyzing large visual corpora. Digital Scholarship in the Humanities 34, Supplement_1 (2019), i3--i16.
[2]
Nikolay Banar, Walter Daelemans, and Mike Kestemont. 2023. Transfer Learning for the Visual Arts: The Multi-modal Retrieval of Iconclass Codes. ACM Journal on Computing and Cultural Heritage 16, 2 (2023), 1--16.
[3]
Ahmed Bassiouny and Motaz El-Saban. 2014. Semantic segmentation as image representation for scene recognition. In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 981--985. https://doi.org/10.1109/ICIP.2014.7025197
[4]
Valentine Bernasconi, Eva Cetini?, and Leonardo Impett. 2023. A Computational Approach to Hand Pose Recognition in Early Modern Paintings. Journal of Imaging 9, 6 (2023), 120.
[5]
Hans Brandhorst and Etienne Posthumus. 2016. Iconclass: a key to collaboration in the digital humanities. In The Routledge Companion to Medieval Iconography. Routledge, 201--218.
[6]
Haibo Chen, Lei Zhao, Zhizhong Wang, Huiming Zhang, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2021. Artistic Style Transfer with Internal-external Learning and Contrastive Learning. Advances in Neural Information Processing Systems 34 (2021), 26561--26573.
[7]
Tian Qi Chen and Mark Schmidt. 2016. Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337 abs/1612.04337 (2016).
[8]
Razvan George Condorovici, Corneliu Florea, and Constantin Vertan. 2013. Painting Scene Recognition Using Homogenous Shapes. In Advanced Concepts for Intelligent Vision Systems, Jacques Blanc-Talon, Andrzej Kasinski, Wilfried Philips, Dan Popescu, and Paul Scheunders (Eds.). Springer International Publishing, Cham, 262--273.
[9]
Leendert D Couprie. 1983. Iconclass: an iconographic classification system. Art libraries journal 8, 2 (1983), 32--49.
[10]
Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021).
[12]
Ahmed Elgammal, Bingchen Liu, Diana Kim, Mohamed Elhoseiny, and Marian Mazzone. 2018. The shape of art history in the eyes of the machine. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. 2183--2191.
[13]
Corneliu Florea, Mihai Badea, Laura Florea, and Constantin Vertan. 2017. Domain transfer for delving into deep networks capacity to de-abstract art. In Image Analysis: 20th Scandinavian Conference, SCIA 2017, Tromsø, Norway, June 12--14, 2017, Proceedings, Part I 20. Springer, 337--349.
[14]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2414--2423.
[15]
Peter Hall, Hongping Cai, Qi Wu, and Tadeo Corradi. 2015. Cross-depiction problem: Recognition and synthesis of photographs and artwork. Computational Visual Media 1 (2015), 91--103.
[16]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000--16009.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[18]
Luis Herranz, Shuqiang Jiang, and Xiangyang Li. 2016. Scene recognition with cnns: objects, scales and dataset bias. In Proceedings of the IEEE conference on computer vision and pattern recognition. 571--579.
[19]
Aaron Hertzmann. 1998. Painterly rendering with curved brush strokes of multiple sizes. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques. 453--460.
[20]
Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, and David H. Salesin. 2023. Image Analogies (1 ed.). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3596711.3596770
[21]
Kai-Lung Hua, Trang-Thi Ho, Kevin-Alfianto Jangtjik, Yu-Jen Chen, and Mei- Chen Yeh. 2020. Artist-based painting classification using Markov random fields with convolution neural network. Multimedia Tools and Applications 79 (2020), 12635--12658.
[22]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[23]
Leonardo Impett. 2020. Analyzing gesture in digital art history. In The Routledge Companion to Digital Humanities and Art History. Routledge, 386--407.
[24]
David Kadish, Sebastian Risi, and Anders Sundnes Løvlie. 2021. Improving object detection in art images using only style transfer. In 2021 international joint conference on neural networks (IJCNN). IEEE, 1--8.
[25]
Ronak Kosti, Jose M. Alvarez, Adria Recasens, and Agata Lapedriza. 2020. Context Based Emotion Recognition Using EMOTIC Dataset. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 11 (2020), 2755--2766. https://doi.org/10. 1109/TPAMI.2019.2916866
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
[27]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. 2169--2178. https://doi.org/10.1109/CVPR.2006.68
[28]
Chuan Li and Michael Wand. 2016. Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2479--2486.
[29]
Li-Jia Li, Hao Su, Yongwhan Lim, and Li Fei-Fei. 2012. Objects as attributes for scene classification. In Trends and Topics in Computer Vision: ECCV 2010 Workshops, Heraklion, Crete, Greece, September 10--11, 2010, Revised Selected Papers, Part I 11. Springer, 57--69.
[30]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740-- 755.
[31]
Shumei Liu, Haiting Huang, Mathias Zinnen, Andreas Maier, and Vincent Christlein. 2024. Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces. arXiv:2407.11701 [cs.CV] https://arxiv.org/abs/ 2407.11701
[32]
Shaopeng Liu, Guohui Tian, and Yuan Xu. 2019. A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter. Neurocomputing 338 (2019), 191--206.
[33]
Prathmesh Madhu, Ronak Kosti, Lara Mührenberg, Peter Bell, Andreas Maier, and Vincent Christlein. 2019. Recognizing Characters in Art History Using Deep Learning. In Proceedings of the 1st Workshop on Structuring and Understanding of Multimedia HeritAge Contents (Nice, France) (SUMAC '19). Association for Computing Machinery, New York, NY, USA, 15--22. https://doi.org/10.1145/ 3347317.3357242
[34]
Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Dirk Suckow, Peter Bell, Andreas Maier, and Vincent Christlein. 2023. ICC: Explainable feature learning for art history using image compositions. Pattern Recognition 136 (2023), 109153. https://doi.org/10.1016/j.patcog.2022.109153
[35]
Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten Bendschus, Corinna Reinhardt, Peter Bell, Andreas Maier, and Vincent Christlein. 2022. Enhancing human pose estimation in ancient vase paintings via perceptuallygrounded style transfer learning. ACM Journal on Computing and Cultural Heritage 16, 1 (2022), 1--17.
[36]
Lizzie Marx, Mathias Zinnen, Sofia Collette Ehrich, William Tullett, Cecilia Bembibre, and Inger Leemans. 2023. Seeing Smell: Sourcing Olfactory Imagery Using Artificial Intelligence. Arts et Savoirs 20 (2023).
[37]
Tanvi A Patel, Vipul K Dabhi, and Harshadkumar B Prajapati. 2020. Survey on scene classification techniques. In 2020 6th international conference on advanced computing and communication systems (ICACCS). IEEE, 452--458.
[38]
Vishal Patoliya, Mathias Zinnen, Andreas Maier, and Vincent Christlein. 2024. Smell and Emotion: Recognising emotions in smell-related artworks. arXiv preprint arXiv:2407.04592 (2024).
[39]
Ariadna Quattoni and Antonio Torralba. 2009. Recognizing Indoor Scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 413--420.
[40]
Artem Reshetnikov, Maria-Cristina Marinescu, and Joaquim More Lopez. 2022. Deart: Dataset of european art. In European Conference on Computer Vision. Springer, 218--233.
[41]
Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, et al. 2023. Hiera: A hierarchical vision transformer without the bellsand- whistles. In International Conference on Machine Learning. PMLR, 29441-- 29454.
[42]
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22, 8 (2000), 888-- 905.
[43]
Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, and Laurens van der Maaten. 2022. Revisiting Weakly Supervised Pre-Training of Visual Perception Models. In CVPR.
[44]
Jae Woong Soh, Sunwoo Cho, and Nam Ik Cho. 2020. Meta-transfer learning for zero-shot super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3516--3525.
[45]
Matthias Springstein, Stefanie Schneider, Christian Althaus, and Ralph Ewerth. 2022. Semi-supervised Human Pose Estimation in Art-historical Images. In Proceedings of the 30th ACM International Conference on Multimedia. 1107--1116.
[46]
Gjorgji Strezoski and Marcel Worring. 2018. Omniart: a large-scale artistic benchmark. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 4 (2018), 1--21.
[47]
Dmitry Ulyanov, Vadim Lebedev, Victor Lempitsky, et al. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. In International Conference on Machine Learning. PMLR, 1349--1357.
[48]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[49]
Tengfei Wang, Hao Ouyang, and Qifeng Chen. 2021. Image inpainting with external-internal learning and monochromic bottleneck. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5120--5129.
[50]
Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, and Yu Qiao. 2017. Weakly supervised patchnets: Describing and aggregating local patches for scene recognition. IEEE Transactions on Image Processing 26, 4 (2017), 2028--2041.
[51]
MelvinWevers. 2021. Scene Detection in De Boer Historical Photo Collection:. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, Vienna, Austria, 601--610. https://doi.org/10.5220/0010288206010610
[52]
MelvinWevers, Nico Vriend, and Alexander de Bruin. 2022. What to do with 2.000. 000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection. TMG Journal for Media History 25, 1 (2022), 1.
[53]
Holger Winnemöller, Sven C Olsen, and Bruce Gooch. 2006. Real-time video abstraction. ACM Transactions On Graphics (TOG) 25, 3 (2006), 1221--1226.
[54]
Jianxiong Xiao, Krista A Ehinger, James Hays, Antonio Torralba, and Aude Oliva. 2016. Sun Database: Exploring a Large Collection of Scene Categories. International Journal of Computer Vision 119 (2016), 3--22.
[55]
Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. 2010. Sun Database: Large-Scale Scene Recognition from Abbey to Zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 3485--3492.
[56]
Jian Yao, Sanja Fidler, and Raquel Urtasun. 2012. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 702--709.
[57]
Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, and Li Liu. 2021. Deep learning for scene classification: A survey. arXiv preprint arXiv:2101.10531 (2021), arXiv--2101.
[58]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452--1464.
[59]
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning Deep Features for Scene Recognition using Places Database. Advances in neural information processing systems 27 (2014).
[60]
Mathias Zinnen, Azhar Hussian, Hang Tran, Prathmesh Madhu, Andreas Maier, and Vincent Christlein. 2023. SniffyArt: The dataset of smelling persons. In Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents. 49--58.
[61]
Mathias Zinnen, Prathmesh Madhu, Peter Bell, Andreas Maier, and Vincent Christlein. 2023. Transfer Learning for Olfactory Object Detection. arXiv preprint arXiv:2301.09906 (2023), arXiv--2301.
[62]
Mathias Zinnen, Prathmesh Madhu, Inger Leemans, Peter Bell, Azhar Hussian, Hang Tran, Ali Hürriyeto?lu, Andreas Maier, and Vincent Christlein. 2024. Smelly, dense, and spreaded: The Object Detection for Olfactory References (ODOR) dataset. Expert Systems with Applications (2024), 124576. https://doi.org/10.1016/j.eswa.2024.124576

Cited By

View all
  • (2024)Recognizing sensory gestures in historical artworksMultimedia Tools and Applications10.1007/s11042-024-20502-6Online publication date: 26-Dec-2024

Index Terms

  1. Scene Classification on Fine Arts with Style Transfer

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SUMAC '24: Proceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents
    October 2024
    67 pages
    ISBN:9798400712050
    DOI:10.1145/3689094
    • Program Chairs:
    • Valerie Gouet-Brunet,
    • Ronak Kosti,
    • Li Weng
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fine arts
    2. scene classification
    3. style transfer
    4. transfer learning

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    Overall Acceptance Rate 5 of 6 submissions, 83%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Recognizing sensory gestures in historical artworksMultimedia Tools and Applications10.1007/s11042-024-20502-6Online publication date: 26-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media