-
Views
-
Cite
Cite
Suleiman Odat, Tudor Groza, Jane Hunter, Extracting structured data from publications in the Art Conservation Domain, Digital Scholarship in the Humanities, Volume 30, Issue 2, June 2015, Pages 225–245, https://doi.org/10.1093/llc/fqu002
- Share Icon Share
Abstract
The most common method of publishing new discoveries about art conservation techniques and research has been through traditional full-text publications. Such corpora typically only support searching via metadata (e.g. title, authors, or keywords) and full-text. In particular, it is difficult to discover valuable information about the chemical processes, experimental results, or preservation treatments associated with the conservation of paintings from a specific genre. This article addresses this problem by focusing on the extraction of structured data (that complies with a pre-defined ontology) from a distributed corpus of publications about painting conservation. Our specific extraction method involves a unique combination of named entity recognition (using gazetteer-based and machine learning-based methods) followed by relationship extraction (using rule-based and machine learning-based methods). The resulting structured data are stored in a resource description framework triple store, and a Web-based graphical user interface enables the SPARQL querying, retrieval, and display of the search results. The results from applying our techniques to a corpus of publications on art conservation indicate that our approach achieves higher quality precision and recall in extracting named entities and relations from publications, relative to alternative existing approaches.