Elsevier

Computers & Geosciences

Volume 31, Issue 9, November 2005, Pages 1126-1134
Computers & Geosciences

Syntactic and semantic metadata integration for science data use

https://doi.org/10.1016/j.cageo.2004.12.011Get rights and content

Abstract

This paper proposes a novel metadata solution to allow applications to intelligently use science data in an automated fashion. The solution provides rich syntactic and semantic metadata, where the semantic metadata is linked with an ontology to define the semantic terms. This solution allows applications to exploit the syntactic metadata to read the data and the semantic metadata to infer the content and the meaning of the data. The solution presented in this paper leverages the Earth Science Markup Language for providing the syntactic metadata and adds a semantic metadata component along with links to the appropriate ontology. This new semantic component is orthogonal to the syntactic metadata, so it does not perturb the existing design. An example application was designed and built that integrates this syntactic and semantic metadata via an ontology to perform a data processing operation.

Introduction

Metadata, or data about data, for science data sets can be classified into three general categories: content, syntactic and semantic. Content or “search” metadata is a broad category describing the intrinsic content of the data. Content metadata typically describe the physical parameters or variables measured in a data set, its spatio-temporal coverage, coordinate systems used, information about the data producer, provenance and other keywords. Content metadata typically populate data catalogs and registries. Syntactic metadata describe the structure of the data file in terms of bits, bytes, data type, arrays and structures. This information is often found in README files accompanying science data. For some data formats, this information is embedded in the data file and an accompanying software library is used to create or read the files in these formats. Finally, semantic or “use” metadata provide meaning to the data, relating the content of the data file to some known context. Such semantic information may be found in documentation or publications about a data set. Current semantic web research is aimed at encoding semantic information in ontologies in order to enable more powerful and intelligent automated data search and usage.

To accommodate the rapid growth of Earth Science data, scientific investigations require the ability to use new data sets with minimal effort. With the emergence of Semantic Web (Berners-Lee et al., 2001) concepts, intelligently automating services has become an area of active research. Extending this idea of intelligently automating services to the Earth Science domain requires solving several critical issues related to the varying level of metadata richness associated with different geosciences data sets. This paper proposes a metadata solution by integrating syntactic with semantic metadata to automate data use. This solution assembles orthogonal and yet synergistic information to provide a rich description of the syntax and semantics of scientific data sets. Furthermore, data analysis and other applications are provided a context for the semantic metadata via an ontology. Applications can use this rich set of metadata with the ontology to make automated decisions to achieve data processing goals. This paper describes an example application that uses this metadata information to automate and drive a useful data preprocessing capability.

The concept of integrating syntactic and semantic metadata for data use is not new (Cornillon et al., 2003). However, the solution presented in this paper is the first of its kind in Earth Science: the metadata contains both syntactic and semantic information; the semantic metadata is linked to an ontology to provide context; and an application is designed and built to use this metadata and the ontology to perform data processing.

Section snippets

Syntactic and semantic metadata integration issues

Large quantities of raw and processed observational data and imagery are available to researchers today. These data sets are heterogeneous and with varying levels of metadata richness. It is essential that a syntactic and semantic metadata integration solution take into account existing legacy data sets. In order to design such a solution, one must address the following issues.

Syntactic and semantic metadata integration solution: coupling ontologies with the Earth Science Markup Language (ESML)

A solution for metadata integration for data usability is to provide orthogonal metadata components describing the syntax and semantics of a scientific data file, with semantics coupled with ontologies. Such a solution will allow applications to easily “drill down” from a general query to more detailed metadata on a data field, etc., for any particular data file. This solution is built upon ESML (Ramachandran et al., 2004), where ESML describes the syntactic metadata of a data file. Based on

Example application

As a proof of concept, a new functional layer was added to the core ESML Library. The new ESML Level 1 (L1) library extends the functionality of the core Level 0 (L0) library by providing subsetting capabilities in addition to the capabilities to read data from a file. The subsetting functionality in the L1 library provides the scientists with the capability to reduce the size and complexity of the data. Subsetting is a crucial preprocessing step in many data analysis efforts that use large

Summary

This paper describes a solution for integrating syntactic metadata and semantic metadata coupled with ontologies to allow applications to intelligently “use” science data. The solution uses ESML as a building block for syntactic metadata and domain-driven ontologies for semantic terms. The solution is not restricted to just ESML; it can be used by other XML-based syntactic metadata solutions such as XDF and XSIL. The domain ontologies are used to provide a context for the semantic terms to

Acknowledgments

The authors acknowledge supports from the Earth Science Information Partnership (ESIP) Federation award NCC8-200 and the Earth Science Technology Office award NAG5-13575, Goddard Space Science Flight Center, NASA.

References (12)

There are more references available in the full text version of this article.

Cited by (0)

View full text