Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections

Jain, Nitisha; Bartz, Christian; Bredow, Tobias; Metzenthin, Emanuel; Otholt, Jona; Krestel, Ralf

doi:10.1007/978-3-030-68796-0_37

Nitisha Jain ORCID: orcid.org/0000-0002-7429-7949¹⁶,
Christian Bartz ORCID: orcid.org/0000-0002-1800-0442¹⁶,
Tobias Bredow¹⁶,
Emanuel Metzenthin¹⁶,
Jona Otholt¹⁶ &
…
Ralf Krestel ORCID: orcid.org/0000-0002-5036-8589¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12663))

Included in the following conference series:

International Conference on Pattern Recognition

2619 Accesses
2 Citations

Abstract

Art-historic documents often contain multimodal data in terms of images of artworks and metadata, descriptions, or interpretations thereof. Most research efforts have focused either on image analysis or text analysis independently since the associations between the two modes are usually lost during digitization. In this work, we focus on the task of alignment of images and textual descriptions in art-historic digital collections. To this end, we reproduce an existing approach that learns alignments in a semi-supervised fashion. We identify several challenges while automatically aligning images and texts, specifically for the cultural heritage domain, which limit the scalability of previous works. To improve the performance of alignment, we introduce various enhancements to extend the existing approach that show promising results.

N. Jain and C. Bartz—Both authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bartz, C., Jain, N., Krestel, R.: Automatic matching of paintings and descriptions in art-historic archives using multimodal analysis. In: Proceedings of the International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI), pp. 23–28 (2020)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Bradski, G., Kaehler, A.D., Opencv, D.: Dobb’s journal of software tools. OpenCV Libr 25, 120 (2000)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, (EMNLP), pp. 1724–1734 (2014)
Google Scholar
Cornia, M., Stefanini, M., Baraldi, L., Corsini, M., Cucchiara, R.: Explaining digital humanities by aligning images and textual descriptions. Pattern Recogn. Lett. 129, 166–172 (2020)
Article Google Scholar
de Boer, V., Wielemaker, J., van Gent, J., Hildebrand, M., Isaac, A., van Ossenbruggen, J., Schreiber, G.: Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 733–747. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_56
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Google Scholar
Dijkshoorn, C., Jongma, L., Aroyo, L., Van Ossenbruggen, J., Schreiber, G., ter Weele, W., Wielemaker, J.: The rijksmuseum collection as linked data. Semantic Web 9(2), 221–230 (2018)
Article Google Scholar
Elgammal, A., Liu, B., Kim, D., Elhoseiny, M., Mazzone, M.: The shape of art history in the eyes of the machine. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2018)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Google Scholar
Garcia, N., Renoust, B., Nakashima, Y.: Context-aware embeddings for automatic art analysis. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR), pp. 25–33. ICMR ’19, Ottawa ON, Canada, June 2019
Google Scholar
Garcia, N., Renoust, B., Nakashima, Y.: Understanding art through multi-modal retrieval in paintings. arXiv:1904.10615 [cs], April 2019
Garcia, N., Renoust, B., Nakashima, Y.: ContextNet: representation and exploration for painting classification and retrieval in context. Int. J. Multimed. Inf. Retrieval 9(1), 17–30 (2019). https://doi.org/10.1007/s13735-019-00189-4
Article Google Scholar
Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. In: Proceedings of the ECCV Workshops (Workshop on Computer Vision for Art Analysis), pp. 676–691 (2018)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, q-bio] (2015)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the International Conference on Learning Representations, September 2018
Google Scholar
Harris, M., Levene, M., Zhang, D., Levene, D.: Finding parallel passages in cultural heritage archives. J. Comput. Cultural Heritage 11(3), 1–24 (2018)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1731–1741 (2017)
Google Scholar
Huang, X., Zhong, S.h., Xiao, Z.: Fine-art painting classification via two-channel deep residual network. In: Advances in Multimedia Information Processing (PCM), pp. 79–88 (2018)
Google Scholar
Huang, Y., Wang, L.: ACMM: Aligned cross-modal memory for few-shot image and sentence matching. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 5774–5783 (2019)
Google Scholar
Hyvönen, E., Rantala, H.: Knowledge-based relation discovery in cultural heritage knowledge graphs. In: Proceedings of the Digital Humanities in the Nordic Countries Conference (DHN), pp. 230–239 (2019)
Google Scholar
Jain, N., Krestel, R.: Who is Mona L.? identifying mentions of artworks in historical archives. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) TPDL 2019. LNCS, vol. 11799, pp. 115–122. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30760-8_10
Chapter Google Scholar
Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., Song, M.: Neural style transfer: a review. Trans. Vis. Comput. Graph. 26(11), 3365–3385 (2019)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Represenations (ICLR), San Diego (2015)
Google Scholar
Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv:1411.2539 [cs] (2014)
Lee, C.Y., Batra, T., Baig, M.H., Ulbricht, D.: Sliced Wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10285–10295 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, Y., Guo, Y., Liu, L., Bakker, E.M., Lew, M.S.: CycleMatch: a cycle-consistent embedding network for image-text matching. Pattern Recogn. 93, 365–379 (2019)
Article Google Scholar
Miller, G.A.: WordNet: An electronic lexical database. MIT press (1998)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Segers, R., et al.: Hacking History via Event Extraction. In: Proceedings of the International Conference on Knowledge Capture (K-CAP), pp. 161–162 (2011)
Google Scholar
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007)
Google Scholar
Stefanini, M., Cornia, M., Baraldi, L., Corsini, M., Cucchiara, R.: Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain. In: Image Analysis and Processing (ICIAP), pp. 729–740 (2019)
Google Scholar
Thomas, C., Kovashka, A.: Artistic object recognition by unsupervised style adaptation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp. 460–476 (2019)
Google Scholar
Van Hooland, S., Verborgh, R.: Linked Data for Libraries, Archives and Museums: How to Clean. Link and Publish your Metadata, Facet Publishing (2014)
Google Scholar
Yang, S., Oh, B.M., Merchant, D., Howe, B., West, J.: Classifying digitized art type and time period. In: Proceedings of the Workshop on Data Science for Digital Art History (DSDAH) (2018)
Google Scholar

Download references

Acknowledgement

We thank the Wildenstein Plattner Institute for providing access to their art-historic archives.

Author information

Authors and Affiliations

Hasso Plattner Institute, University of Potsdam, 14482, Potsdam, Germany
Nitisha Jain, Christian Bartz, Tobias Bredow, Emanuel Metzenthin, Jona Otholt & Ralf Krestel

Authors

Nitisha Jain
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bartz
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Bredow
View author publications
You can also search for this author in PubMed Google Scholar
Emanuel Metzenthin
View author publications
You can also search for this author in PubMed Google Scholar
Jona Otholt
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Krestel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Bartz .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, N., Bartz, C., Bredow, T., Metzenthin, E., Otholt, J., Krestel, R. (2021). Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-68796-0_37
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68795-3
Online ISBN: 978-3-030-68796-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)