ABSTRACT
Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific file repositories for efficient in-database processing and exploration.
The Data Vault facilitates science data analysis using high-level declarative languages, such as the traditional SQL and the novel array-oriented SciQL. Data of interest are loaded from the attached repository in a just-in-time manner without need for up-front data ingestion.
The demo is built around concrete implementations of the Data Vault for two scientific use cases: seismic time series and Earth observation images. The seismic Data Vault uses the queries submitted by the audience to illustrate the internals of Data Vault functioning by revealing the mechanisms of dynamic query plan generation and on-demand external data ingestion. The image Data Vault shows an application view from the perspective of data mining researchers.
- I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki. NoDB in Action: Adaptive Query Processing on Raw Data. PVLDB, 5(12), 2012. Google ScholarDigital Library
- I. Alagiannis et al. NoDB: Efficient Query Execution on Raw Data Files. In SIGMOD, 2012. Google ScholarDigital Library
- COMMIT Project. http://www.commit-nl.nl/, 2013.Google Scholar
- C. O. Dumitru et al. TELEIOS WP3: KDD concepts and methods proposal: report and design recommendations. http://www.earthobservatory.eu/deliverables/FP7-257662-TELEIOS-D3.1.pdf.Google Scholar
- T. Fritz et al. TerraSAR-X Ground Segment: Basic Product Specification Document, TX-GS-DD-3302, 2008.Google Scholar
- Libgeotiff. http://trac.osgeo.org/geotiff/, 2013.Google Scholar
- J. Gray et al. Scientific Data Management in the Coming Decade. SIGMOD Record, 34(4), 2005. Google ScholarDigital Library
- T. Hey et al. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 2009.Google Scholar
- IRIS. libmseed: Mini-SEED Software Library, 2011.Google Scholar
- M. Ivanova et al. An Architecture for Recycling Intermediates in a Column-store. In SIGMOD, 2009. Google ScholarDigital Library
- M. Ivanova et al. Data Vaults: a Symbiosis between Database Technology and Scientific File Repositories. In SSDBM, 2012. Google ScholarDigital Library
- Y. Kargin et al. Instant-On Scientific Data Warehouses --- Lazy ETL for Data-Intensive Research. In BIRTE, 2012.Google Scholar
- MonetDB. http://www.monetdb.org/, 2013.Google Scholar
- ORFEUS. Seismology Event Data (1988 - now). ftp://www.orfeus-eu.org/pub/data/POND/.Google Scholar
- N. Ritter and M. Ruth. GeoTIFF format specification, Revision 1.0. http://trac.osgeo.org/geotiff/.Google Scholar
- SEED. Standard for the exchange of earthquake data, 2010. www.iris.edu/manuals/SEEDManual_V2.4.pdf.Google Scholar
- M. Stonebraker et al. Requirements for Science Data Bases and SciDB. In CIDR, 2009.Google Scholar
- TELEIOS. http://www.earthobservatory.eu/, 2013.Google Scholar
- Libtiff. http://www.libtiff.org/, 2013.Google Scholar
- Universal File Interface. http://www.barrodale.com/universal-file-interface-ufi, 2013.Google Scholar
- Y. Zhang et al. SciQL: Bridging the Gap between Science and Relational DBMS. In IDEAS, 2011. Google ScholarDigital Library
Recommendations
Data Vaults: Database Technology for Scientific File Repositories
Current data-management systems and analysis tools fail to meet scientists' data-intensive needs. A "data vault" approach lets researchers effectively and efficiently explore and analyze information.
Data vaults: a symbiosis between database technology and scientific file repositories
SSDBM'12: Proceedings of the 24th international conference on Scientific and Statistical Database ManagementIn this short paper we outline the data vault, a database-attached external file repository. It provides a true symbiosis between a DBMS and existing file-based repositories. Data is kept in its original format while scalable processing functionality is ...
A Dissimilarity-Based Approach for Biometric Fuzzy VaultsApplication to Handwritten Signature Images
Proceedings of the ICIAP 2013 International Workshops on New Trends in Image Analysis and Processing ICIAP 2013 - Volume 8158Bio-Cryptographic systems enforce authenticity of cryptogra-phic applications like data encryption and digital signatures. Instead of simple user passwords, biometrics, such as, fingerprint and handwritten signatures, are employed to access the ...
Comments