To read this content please select one of the options below:

HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans

Ivana Tanasijević (Department of Computer Science, Faculty of Mathematics, University of Belgrade, Belgrade, Serbia)
Gordana Pavlović-Lažetić (Department of Computer Science, Faculty of Mathematics, University of Belgrade, Belgrade, Serbia)

The Electronic Library

ISSN: 0264-0473

Article publication date: 28 October 2020

Issue publication date: 12 December 2020

293

Abstract

Purpose

The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection.

Design/methodology/approach

Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resources, a compiled domain-specific classification scheme and domain-oriented corpus analysis.

Findings

The proposed methodology for automatic annotation of a collection of intangible cultural heritage, applied on the cultural heritage of the Balkans, has very good results according to F measure, which is 0.87 for the named entity and 0.90 for topic annotation. The overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvements.

Originality/value

Although cultural heritage has a significant role in the development of identity of a group or an individual, it is one of those specific domains that have not yet been fully explored in case of many languages. A methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritage.

Keywords

Acknowledgements

This work has been carried out within the project III47003 of the Ministry of Science and Technological Development, Serbia.

Citation

Tanasijević, I. and Pavlović-Lažetić, G. (2020), "HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans", The Electronic Library, Vol. 38 No. 5/6, pp. 905-918. https://doi.org/10.1108/EL-03-2020-0052

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles