Published June 30, 2023 | Version v1
Conference paper Open

From unstructured texts to RDF-star-based open research data queryable by references

  • 1. University of Basel, Switzerland
  • 1. University of Graz
  • 2. Belgrade Center for Digital Humanities
  • 3. Le Mans Université
  • 4. Digital Humanities im deutschsprachigen Raum

Description

Humanities textual data is full of references to persons and locations given in various languages. Researchers want to perform queries to retrieve data, in which a certain place or a person is mentioned, irrespective of the language of the text. In this paper, I present how we automatically extract named entities (geolocation information and person references) from textual data and homogenize and store them as Linked Open Data (LOD) with unique identifiers such as the GeoName ID and the GND (Gemeinsame Normdatei) number. Then the plain references in the text are substituted with standoff links to the corresponding RDF resources and the textual document is stored in RDF format. This enables humanities scholars to perform advanced SPARQL queries to collect textual resources containing specific references regardless of the language of the text. Furthermore, the relations between these named entities can be parsed from the text based on ontology definition, dependency graph of sentences, and POS tags to be added to the knowledge graph. Since the citability of the information is crucial for humanities research, this workflow adds the metadata regarding the source document of extracted information to the edges of the knowledge graph using RDF-star. This allows queries for documents containing a certain relationship between entities through SPARQL-star.

Files

ALASSI_Sepideh_From_unstructured_texts_to_RDF_star_based_ope.pdf

Additional details

Related works

Is part of
Book: 10.5281/zenodo.7961822 (DOI)