Intelligent polar cyberinfrastructure: enabling semantic search in geospatial metadata catalogue to support polar data discovery

Wenwen LI
Vidit BHATIA
Kai CAO, Singapore Management University

Abstract

Polar regions have garnered substantial research attention in recent years because they are key drivers of the Earth’s climate, a source of rich mineral resources, and the home of a variety of marine life. Nevertheless, global warming over the past century is pushing the polar systems towards a tipping point: the systems are at high-risk from melting snow and sea ice covers, permafrost thawing, and acidification of the Arctic oceans. To increase understanding of the polar environment, the National Science Foundation established a Polar Cyberinfrastructure (CI) program, aimed at utilizing advanced software architecture to support polar data analysis and decisionmaking. At the center of this Polar CI research are data resources and data discovery components that facilitate the search and retrieval of polar data. This paper reports our development of a semantic search tool that supports the intelligent discovery of polar datasets. This tool is built on latent semantic analysis techniques, which improves search performance by identifying hidden semantic associations between terminologies used in the various datasets’ metadata. The software tool is implemented using an object-oriented design pattern and has been successfully integrated into a popular open source metadata catalog as a new semantic search support. A semantic matrix is maintained persistently within the catalogue to store the semantic associations. A dynamic update mechanism was also developed to allow automated update of semantics once more metadata are loaded into or removed from the catalog. We explored the effects of rank reduction to the effectiveness of this semantic search module and demonstrated its better performance than the traditional search techniques.