Abstract:
TileDB is a system for managing data that are naturally represented as dense or sparse multi-dimensional arrays. TileDB's primary goal is massive scale, where data need t...Show MoreMetadata
Abstract:
TileDB is a system for managing data that are naturally represented as dense or sparse multi-dimensional arrays. TileDB's primary goal is massive scale, where data need to be stored persistently at a low cost, while allowing rapid access during parallel computation. Contrary to traditional data management systems, TileDB is an embeddable C library that can easily be integrated with various higher-level programming languages and scientific computing tools. It supports persistent storage on various backends such as POSIX filesystems, HDFS, Amazon S3, and more. TileDB has been successfully used in genomics as the storage engine of GenomicsDB, a project maintained by the Intel Health and Life Sciences group that is currently integrated with the Broad Institute's GATK4. This paper reviews the data model, architecture and key design principles of TileDB, and outlines our future vision for using TileDB in important scientific applications.
Date of Conference: 11-14 December 2017
Date Added to IEEE Xplore: 15 January 2018
ISBN Information: