skip to main content
10.1145/2484838.2484876acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Data vaults: a database welcome to scientific file repositories

Published:29 July 2013Publication History

ABSTRACT

Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific file repositories for efficient in-database processing and exploration.

The Data Vault facilitates science data analysis using high-level declarative languages, such as the traditional SQL and the novel array-oriented SciQL. Data of interest are loaded from the attached repository in a just-in-time manner without need for up-front data ingestion.

The demo is built around concrete implementations of the Data Vault for two scientific use cases: seismic time series and Earth observation images. The seismic Data Vault uses the queries submitted by the audience to illustrate the internals of Data Vault functioning by revealing the mechanisms of dynamic query plan generation and on-demand external data ingestion. The image Data Vault shows an application view from the perspective of data mining researchers.

References

  1. I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki. NoDB in Action: Adaptive Query Processing on Raw Data. PVLDB, 5(12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Alagiannis et al. NoDB: Efficient Query Execution on Raw Data Files. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. COMMIT Project. http://www.commit-nl.nl/, 2013.Google ScholarGoogle Scholar
  4. C. O. Dumitru et al. TELEIOS WP3: KDD concepts and methods proposal: report and design recommendations. http://www.earthobservatory.eu/deliverables/FP7-257662-TELEIOS-D3.1.pdf.Google ScholarGoogle Scholar
  5. T. Fritz et al. TerraSAR-X Ground Segment: Basic Product Specification Document, TX-GS-DD-3302, 2008.Google ScholarGoogle Scholar
  6. Libgeotiff. http://trac.osgeo.org/geotiff/, 2013.Google ScholarGoogle Scholar
  7. J. Gray et al. Scientific Data Management in the Coming Decade. SIGMOD Record, 34(4), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Hey et al. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 2009.Google ScholarGoogle Scholar
  9. IRIS. libmseed: Mini-SEED Software Library, 2011.Google ScholarGoogle Scholar
  10. M. Ivanova et al. An Architecture for Recycling Intermediates in a Column-store. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Ivanova et al. Data Vaults: a Symbiosis between Database Technology and Scientific File Repositories. In SSDBM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Kargin et al. Instant-On Scientific Data Warehouses --- Lazy ETL for Data-Intensive Research. In BIRTE, 2012.Google ScholarGoogle Scholar
  13. MonetDB. http://www.monetdb.org/, 2013.Google ScholarGoogle Scholar
  14. ORFEUS. Seismology Event Data (1988 - now). ftp://www.orfeus-eu.org/pub/data/POND/.Google ScholarGoogle Scholar
  15. N. Ritter and M. Ruth. GeoTIFF format specification, Revision 1.0. http://trac.osgeo.org/geotiff/.Google ScholarGoogle Scholar
  16. SEED. Standard for the exchange of earthquake data, 2010. www.iris.edu/manuals/SEEDManual_V2.4.pdf.Google ScholarGoogle Scholar
  17. M. Stonebraker et al. Requirements for Science Data Bases and SciDB. In CIDR, 2009.Google ScholarGoogle Scholar
  18. TELEIOS. http://www.earthobservatory.eu/, 2013.Google ScholarGoogle Scholar
  19. Libtiff. http://www.libtiff.org/, 2013.Google ScholarGoogle Scholar
  20. Universal File Interface. http://www.barrodale.com/universal-file-interface-ufi, 2013.Google ScholarGoogle Scholar
  21. Y. Zhang et al. SciQL: Bridging the Gap between Science and Relational DBMS. In IDEAS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SSDBM '13: Proceedings of the 25th International Conference on Scientific and Statistical Database Management
    July 2013
    401 pages
    ISBN:9781450319218
    DOI:10.1145/2484838

    Copyright © 2013 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 29 July 2013

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate56of146submissions,38%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader