A semantic approach for timeseries data fusion

https://doi.org/10.1016/j.compag.2019.105171Get rights and content
Under a Creative Commons license
open access

Highlights

  • A declarative approach for FAIR data management in environmental sciences - A data acquisition framework for semantic interoperability and unit transformation.

  • Logical reasoning infers compatibility between semantically heterogeneous datasets.

  • A case study to automatically transform meteorological files of four agricultural models.

Abstract

The data deluge following the rise of Internet of Things contributes towards the creation of non-reusable data silos. Especially in the environmental sciences domain, syntactic and semantic heterogeneity hinders data re-usability as most times manual labour and domain expertise is required. Both the different syntaxes under which environmental timeseries are formatted and the implicit semantics which are used to describe them contribute to this end. Usually, the real meaning of data is obscured in a combination of short data labels, titles and various value codes, that require domain or institutional knowledge to decipher. The FAIR data principles for scientific data sharing are stewardship offer a framework based on community-adopted metadata. In this work, we present the Environmental Data Acquisition Module (EDAM) which focuses on data interoperability and reuse, and deals with syntactic and semantic heterogeneity using a template approach. Data curators draft templates to describe in an abstract fashion the syntax of the timeseries datasets they want to acquire or disseminate. They complement each template with a metadata file, which is used to annotate observables and their properties (including physical quantities and units of measurement) with terms from an ontology. EDAM employs a reasoner to infer compatibility among syntactically and semantically heterogeneous datasets, and enables timeseries, format and units of measurement transformation on-the-fly. Our approach utilizes a local ontology to store metadata about datasets, which enables EDAM to acquire and transform datasets which were originally stored with different semantics and syntaxes. We demonstrate EDAM in a case study where we transform meteorological input files of four agricultural models. Our approach, allows to cut across environmental data silos and facilitate timeseries reusability, as it enables users to (a) discover datasets in other formats, (b) transform them and (c) reuse them in their scientific workflows. This directly contributes to the toolshed for FAIR data management in environmental sciences. EDAM implementation has been released under an open-source license.

Keywords

Environmental timeseries
Internet of Things
Legacy data
Semantic heterogeneity
Templates
FAIR data
Reasoning
Interoperability
Data reuse
APSIM
AgMIP
DSSAT
WOFOST

Cited by (0)