Skip to main content

Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Abstract

Art-historic documents often contain multimodal data in terms of images of artworks and metadata, descriptions, or interpretations thereof. Most research efforts have focused either on image analysis or text analysis independently since the associations between the two modes are usually lost during digitization. In this work, we focus on the task of alignment of images and textual descriptions in art-historic digital collections. To this end, we reproduce an existing approach that learns alignments in a semi-supervised fashion. We identify several challenges while automatically aligning images and texts, specifically for the cultural heritage domain, which limit the scalability of previous works. To improve the performance of alignment, we introduce various enhancements to extend the existing approach that show promising results.

N. Jain and C. Bartz—Both authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    openglam.org.

  2. 2.

    https://wpi.art/.

  3. 3.

    https://github.com/HPI-DeepLearning/semantic_analysis_of_cultural_heritage_data.

References

  1. Bartz, C., Jain, N., Krestel, R.: Automatic matching of paintings and descriptions in art-historic archives using multimodal analysis. In: Proceedings of the International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI), pp. 23–28 (2020)

    Google Scholar 

  2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  3. Bradski, G., Kaehler, A.D., Opencv, D.: Dobb’s journal of software tools. OpenCV Libr 25, 120 (2000)

    Google Scholar 

  4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, (EMNLP), pp. 1724–1734 (2014)

    Google Scholar 

  5. Cornia, M., Stefanini, M., Baraldi, L., Corsini, M., Cucchiara, R.: Explaining digital humanities by aligning images and textual descriptions. Pattern Recogn. Lett. 129, 166–172 (2020)

    Article  Google Scholar 

  6. de Boer, V., Wielemaker, J., van Gent, J., Hildebrand, M., Isaac, A., van Ossenbruggen, J., Schreiber, G.: Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 733–747. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_56

    Chapter  Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)

    Google Scholar 

  8. Dijkshoorn, C., Jongma, L., Aroyo, L., Van Ossenbruggen, J., Schreiber, G., ter Weele, W., Wielemaker, J.: The rijksmuseum collection as linked data. Semantic Web 9(2), 221–230 (2018)

    Article  Google Scholar 

  9. Elgammal, A., Liu, B., Kim, D., Elhoseiny, M., Mazzone, M.: The shape of art history in the eyes of the machine. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2018)

    Google Scholar 

  10. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  11. Garcia, N., Renoust, B., Nakashima, Y.: Context-aware embeddings for automatic art analysis. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR), pp. 25–33. ICMR ’19, Ottawa ON, Canada, June 2019

    Google Scholar 

  12. Garcia, N., Renoust, B., Nakashima, Y.: Understanding art through multi-modal retrieval in paintings. arXiv:1904.10615 [cs], April 2019

  13. Garcia, N., Renoust, B., Nakashima, Y.: ContextNet: representation and exploration for painting classification and retrieval in context. Int. J. Multimed. Inf. Retrieval 9(1), 17–30 (2019). https://doi.org/10.1007/s13735-019-00189-4

    Article  Google Scholar 

  14. Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. In: Proceedings of the ECCV Workshops (Workshop on Computer Vision for Art Analysis), pp. 676–691 (2018)

    Google Scholar 

  15. Gatys, L.A., Ecker, A.S., Bethge, M.: A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, q-bio] (2015)

  16. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the International Conference on Learning Representations, September 2018

    Google Scholar 

  17. Harris, M., Levene, M., Zhang, D., Levene, D.: Finding parallel passages in cultural heritage archives. J. Comput. Cultural Heritage 11(3), 1–24 (2018)

    Article  Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  19. Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1731–1741 (2017)

    Google Scholar 

  20. Huang, X., Zhong, S.h., Xiao, Z.: Fine-art painting classification via two-channel deep residual network. In: Advances in Multimedia Information Processing (PCM), pp. 79–88 (2018)

    Google Scholar 

  21. Huang, Y., Wang, L.: ACMM: Aligned cross-modal memory for few-shot image and sentence matching. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 5774–5783 (2019)

    Google Scholar 

  22. Hyvönen, E., Rantala, H.: Knowledge-based relation discovery in cultural heritage knowledge graphs. In: Proceedings of the Digital Humanities in the Nordic Countries Conference (DHN), pp. 230–239 (2019)

    Google Scholar 

  23. Jain, N., Krestel, R.: Who is Mona L.? identifying mentions of artworks in historical archives. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) TPDL 2019. LNCS, vol. 11799, pp. 115–122. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30760-8_10

    Chapter  Google Scholar 

  24. Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., Song, M.: Neural style transfer: a review. Trans. Vis. Comput. Graph. 26(11), 3365–3385 (2019)

    Article  Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Represenations (ICLR), San Diego (2015)

    Google Scholar 

  26. Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv:1411.2539 [cs] (2014)

  27. Lee, C.Y., Batra, T., Baig, M.H., Ulbricht, D.: Sliced Wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10285–10295 (2019)

    Google Scholar 

  28. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  29. Liu, Y., Guo, Y., Liu, L., Bakker, E.M., Lew, M.S.: CycleMatch: a cycle-consistent embedding network for image-text matching. Pattern Recogn. 93, 365–379 (2019)

    Article  Google Scholar 

  30. Miller, G.A.: WordNet: An electronic lexical database. MIT press (1998)

    Google Scholar 

  31. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  32. Segers, R., et al.: Hacking History via Event Extraction. In: Proceedings of the International Conference on Knowledge Capture (K-CAP), pp. 161–162 (2011)

    Google Scholar 

  33. Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007)

    Google Scholar 

  34. Stefanini, M., Cornia, M., Baraldi, L., Corsini, M., Cucchiara, R.: Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain. In: Image Analysis and Processing (ICIAP), pp. 729–740 (2019)

    Google Scholar 

  35. Thomas, C., Kovashka, A.: Artistic object recognition by unsupervised style adaptation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp. 460–476 (2019)

    Google Scholar 

  36. Van Hooland, S., Verborgh, R.: Linked Data for Libraries, Archives and Museums: How to Clean. Link and Publish your Metadata, Facet Publishing (2014)

    Google Scholar 

  37. Yang, S., Oh, B.M., Merchant, D., Howe, B., West, J.: Classifying digitized art type and time period. In: Proceedings of the Workshop on Data Science for Digital Art History (DSDAH) (2018)

    Google Scholar 

Download references

Acknowledgement

We thank the Wildenstein Plattner Institute for providing access to their art-historic archives.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Bartz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jain, N., Bartz, C., Bredow, T., Metzenthin, E., Otholt, J., Krestel, R. (2021). Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68796-0_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68795-3

  • Online ISBN: 978-3-030-68796-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics