Loading [a11y]/accessibility-menu.js
DSDQuery DSI - Querying scientific data repositories with structured operators | IEEE Conference Publication | IEEE Xplore

DSDQuery DSI - Querying scientific data repositories with structured operators


Abstract:

Scientific data is often distributed through repositories that host a large number of files in formats such as NetCDF or HDF5. With recent and anticipated increases in th...Show More

Abstract:

Scientific data is often distributed through repositories that host a large number of files in formats such as NetCDF or HDF5. With recent and anticipated increases in the size of observational and simulation data, it is important to transport just the data that are of interest from a large distributed dataset. Unfortunately, existing portals provide limited querying interfaces - typically a set of predefined hard coded subsettings, limiting user's querying flexibility. This paper describes a system that addresses this gap. The relational algebra is adapted for scientific array querying allowing us to adapt a subset of SQL for this domain, which enables nuanced subsetting conditions to be applied on a set of dataset files within a repository. A query processing algorithm extracts and collects data from relevant datasets, based on metadata that was earlier extracted using an automatic metadata extraction engine. Finally, the system stitches a new structured, NetCDF, file to be returned as a resultset, allowing the returned data to be used and analyzed by existing tools. The system has been extensively evaluated to show its ability to handle increasing data and/or number of files.
Date of Conference: 29 October 2015 - 01 November 2015
Date Added to IEEE Xplore: 28 December 2015
ISBN Information:
Conference Location: Santa Clara, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.