Extracting structured data from publications in the Art Conservation Domain

Odat, Suleiman; Groza, Tudor; Hunter, Jane

doi:10.1093/llc/fqu002

Abstract

The most common method of publishing new discoveries about art conservation techniques and research has been through traditional full-text publications. Such corpora typically only support searching via metadata (e.g. title, authors, or keywords) and full-text. In particular, it is difficult to discover valuable information about the chemical processes, experimental results, or preservation treatments associated with the conservation of paintings from a specific genre. This article addresses this problem by focusing on the extraction of structured data (that complies with a pre-defined ontology) from a distributed corpus of publications about painting conservation. Our specific extraction method involves a unique combination of named entity recognition (using gazetteer-based and machine learning-based methods) followed by relationship extraction (using rule-based and machine learning-based methods). The resulting structured data are stored in a resource description framework triple store, and a Web-based graphical user interface enables the SPARQL querying, retrieval, and display of the search results. The results from applying our techniques to a corpus of publications on art conservation indicate that our approach achieves higher quality precision and recall in extracting named entities and relations from publications, relative to alternative existing approaches.

You do not currently have access to this article.

Download all slides

Month:	Total Views:
December 2016	3
January 2017	1
February 2017	5
April 2017	2
May 2017	3
June 2017	1
August 2017	3
September 2017	2
October 2017	3
November 2017	10
February 2018	6
April 2018	6
June 2018	1
July 2018	2
October 2018	3
December 2018	1
January 2019	3
February 2019	3
March 2019	6
May 2019	1
June 2019	3
July 2019	2
August 2019	4
November 2019	2
December 2019	3
January 2020	3
February 2020	2
March 2020	22
April 2020	4
July 2020	5
October 2020	2
November 2020	3
December 2020	1
January 2021	10
February 2021	2
March 2021	2
April 2021	1
May 2021	3
June 2021	2
August 2021	2
September 2021	1
October 2021	8
November 2021	7
December 2021	1
January 2022	1
February 2022	1
March 2022	6
April 2022	1
May 2022	3
August 2022	1
October 2022	2
November 2022	1
February 2023	3
April 2023	4
June 2023	2
July 2023	3
August 2023	2
September 2023	3
October 2023	2
December 2023	4
January 2024	1
February 2024	1
March 2024	1

Extracting structured data from publications in the Art Conservation Domain

Abstract

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Extracting structured data from publications in the Art Conservation Domain

Abstract

Sign in

Personal account

Institutional access

Institutional account management

Get help with access

Institutional access

IP based access

Sign in through your institution

Sign in with a library card

Society Members

Sign in through society site

Sign in using a personal account

Personal account

Viewing your signed in accounts

Signed in but can't access content

Institutional account management

Purchase

Short-term Access

Rental

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only