Facilitating the production of ISO-compliant metadata of geospatial datasets

https://doi.org/10.1016/j.jag.2015.08.010Get rights and content

Highlights

  • Facilitate the production of standardized metadata by embedding the generation of description in data production workflows.

  • Link data with metadata. Metadata is permanently up-to-date and any changes in data will be automatically reflected thanks to the scheduled harvesting process.

  • The ability to automatically generate standardized metadata from the content of a harvested data-publishing server significantly facilitates maintenance and management of the description of large volumes of data.

  • The proposed approach is entirely based on an interoperable workflow using OGC standards and therefore is reusable.

Abstract

Metadata are recognized as an essential element to enable efficient and effective discovery of geospatial data published in spatial data infrastructures (SDI). However, metadata production is still perceived as a complex, tedious and time-consuming task. This typically results in little metadata production and can seriously hinder the objective of facilitating data discovery.

In response to this issue, this paper presents a proof of concept based on an interoperable workflow between a data publication server and a metadata catalog to automatically generate ISO-compliant metadata.

The proposed approach facilitates metadata creation by embedding this task in daily data management workflows; ensures that data and metadata are permanently up-to-date; significantly reduces the obstacles of metadata production; and potentially facilitates contributions to initiatives like the Global Earth Observation System of Systems (GEOSS) by making geospatial resources discoverable.

Introduction

Spatial data infrastructures (SDI) are recognized as an effective environment for digital geospatial data production, management, analysis and diffusion (Craglia et al., 2012). The primary function of any SDI is data discovery, enabling users to search and evaluate data before downloading them (Nebert, 2005, Nogueras-Iso et al., 2005a, Nogueras-Iso et al., 2005b). The fundamental requirement for an efficient and effective data discovery mechanism is that data is properly documented with metadata and stored in a catalog (Ma, 2006, Foresman, 2008). Without appropriate metadata, a SDI does not facilitate the discovery of, and access to geospatial data (Masser, 2005).

The primary role of metadata and catalogs for data discovery is recognized in data sharing initiatives such as the Infrastructure for Spatial Information in the European Community (INSPIRE) (European Commission, 2007) and the Global Earth Observation System of Systems (GEOSS) (GEO Secretariat, 2005). This important role is also reinforced with the increasing momentum gained by Open Data access policies (Wessels et al., 2014). These policies highlight the importance of using standards to enable interoperability for both metadata description (e.g., ISO19115-1:2014, FGDC, Dublin Core) (Diaz et al., 2012) and searching (e.g. OGC Catalog Service for the Web) (Nogueras-Iso et al., 2005a, Nogueras-Iso et al., 2005b). Having interoperable metadata allows various systems to exchange metadata ensuring that metadata records can be discovered, accurately interpreted, and subsequently used or integrated into other platforms or applications (Nativi et al., 2013).

Despite the importance of having metadata and associated catalogs, most data that are currently published via SDI are lacking metadata (Batcheller, 2008, Batcheller et al., 2009, Trilles et al., 2014). There are several reasons for this: lack of funding (i.e., financial costs), time commitment, no perceived added value, complexity of standards, and tedious process for creating metadata (Myroshnychenko et al., 2015, Kalantari et al., 2009, Lehmann et al., 2014, Trilles et al., 2014). Moreover, data and their description (metadata) are often published and produced with different software, leading to the duplication of efforts to enter relevant information (e.g., title, abstract, keywords), and consequently cause data and metadata to be disconnected (Kalantari et al., 2010, Giuliani et al., 2013). This can be an important issue because when a dataset is updated the changes must be also reflected in the related metadata. Another issue related to data-metadata disconnection is that data providers are often confused in the choice of their publication workflows. Some of them publish data first and then create metadata while others do the opposite. This confusion contributes to fragmentation, disconnection and lack of good and reliable data documentation (Diaz et al., 2011).

Means to facilitate the production of standardized metadata and ensuring that data and metadata remain linked should be beneficial for both data providers and users (Ellul et al., 2013, Olfat et al., 2012). A lot of research has been conducted to overcome some of these issues and various solutions have been proposed: (1) automatic generation of standardized metadata from Earth Observation products (Yue et al., 2010), (2) automatic inventories while scanning data folders (Moura, 2012, Prunayre and Coudert, 2013), (3) using new file format (e.g., NetCDF) where data and metadata are stored in the same file (Lehmann et al., 2014), and (4) innovative workflows to extract information based on web services, semantic enablement or tagging (Kalantari et al., 2009, Yue et al., 2012, Florczyk et al., 2012, Manso-Callejo et al., 2010). These authors recognize the necessity to embed metadata production in data creation, automating the generation of metadata where possible. Unfortunately, most of these implementations require a high level of SDI expertise to develop tailored and often complex solutions. Therefore, convincing data providers to produce metadata can remain a major barrier.

Based on these considerations the aim of this paper is to present a proof of concept using an interoperable workflow between a data publication server and a metadata catalog to: (1) automatically generate standardized descriptions of geospatial data, (2) establish a permanent link between data and metadata (e.g., changes in data are automatically reflected in corresponding metadata), and (3) facilitate data-metadata publication workflows through a single entry point.

Section snippets

Methodology

The proposed approach is designed to meet the following requirements:

  • 1

    The use of a classical workflow: data providers usually store data on a server, publish them as services, then generate the proper documentation and store them in a metadata catalog (Fig. 1). From a data provider point of view, this workflow is easier than first creating the metadata (e.g., requires additional work, time-consuming, monotonous, complex) and then publishing the data;

  • 2

    The introduction by data providers of relevant

Implementation

The implementation used to validate the method is based on two components:

  • A data publishing server (e.g., GeoServer1) together with its CSW extension; Geoserver is an open source web server designed to publish data from different major sources (e.g., shapefile, geotiff, PostGIS) using OGC standards (e.g., WMS, WFS, WCS) and allowing the users to share their data in an interoperable and standardized way.

  • A metadata catalog (e.g., GeoNetwork2

Discussion

Results show that the proposed solution is simple to implement, facilitates the automatic production of ISO-compliant metadata, embeds the generation of metadata in data provider’s workflows, and links data and metadata together. Because the workflow generates ISO19115-1:2014core elements metadata, the proposed approach is sufficient for the purpose of data discovery trough general description of vector, raster, and satellite imagery data. However, it cannot answer complex description

Conclusions and perspectives

Recognizing both the importance of metadata to enable efficient and effective data discovery and the fact that data providers are serving increasingly large volumes of data, managing and maintaining a metadata catalog can be challenging.

The proposed approach:

  • facilitates the production of standardized metadata by embedding the generation of description in data production workflows.

  • links data with metadata. Through the proposed approach metadata is permanently up-to-date and any changes in data

Acknowledgments

The authors would like to acknowledge the European Commission “Seventh Framework Program” that funded EOPOWER (Grant Agreement no. 603500) and IASON (Grant Agreement no. 603534) projects.

We thank Martin Lacayo for the reviewing in an earlier draft.

The views expressed in the paper are those of the authors and do not necessarily reflect the views of the institutions they belong to.

References (32)

  • European Commission (2007). Directive 2007/2/EC of the European Parliament and the Council of 14 March 2007...
  • A.J. Florczyk et al.

    Automatic generation of geospatial metadata for web resources

    Int. J. Spat. Data Infrastruct. Res.

    (2012)
  • T.W. Foresman

    Evolution and implementation of the digital earth vision, technology and society

    Int. J. Digit. Earth

    (2008)
  • GEO Secretariat (2005). The Global Earth Observation System of Systems (GEOSS) 10-Year Implementation Plan:...
  • Giuliani, G., Ray, N., Lehmann, A., (2013). Building Regional Capacities for GEOSS and INSPIRE: a Journey in the Black...
  • ISO (2014) Geographic information – Metadata – Part 1: Fundamentals:...
  • Cited by (0)

    View full text