ABSTRACT
In recent years there is huge influx of genomic data and a growing need for its analysis, yet existing genomic databases do not allow easy accessibility. We developed a pipeline that continuously pre-processes raw human genetic data. The data is then stored in a cloud data lake and can be accessed via a simple and intuitive web service and API.
- Adam Ameur, Ignas Bunikis, Stefan Enroth, and Ulf Gyllensten. 2014. CanvasDB: a local database infrastructure for analysis of targeted-and whole genome re-sequencing projects. Database 2014 (2014).Google Scholar
- Noam Hadar, Grisha Weintraub, Ehud Gudes, Shlomi Dolev, and Ohad Birk. in press. GeniePool: Genomic Database With Corresponding Annotated Samples Based On a Cloud Data Lake Architecture. Database (in press).Google Scholar
- Rasko Leinonen, Hideaki Sugawara, Martin Shumway, and International Nucleotide Sequence Database Collaboration. 2010. The sequence read archive. Nucleic acids research 39, suppl_1 (2010), D19--D21.Google Scholar
- Grisha Weintraub, Ehud Gudes, and Shlomi Dolev. 2021. Needle in a haystack queries in cloud data lakes. In EDBT/ICDT Workshops.Google Scholar
Index Terms
- Analyzing large-scale genomic data with cloud data lakes
Recommendations
Genomic data modeling
Special issue: Data management in bioinformaticsResearchers face many challenges in representing biological data, including: (1) inherent complexity of biological data, (2) domain knowledge barrier, (3) constantly evolving knowledge, and (4) lack of expert data-modeling skills. We have studied how to ...
Provable data transfer from provable data possession and deletion in cloud storage
With the rapid development of cloud storage, an increasing number of users prefer to store their data on the remote cloud to reduce the burden of maintaining the data by themselves. Since different cloud storage providers offer distinct quality of ...
Large scale features in DNA genomic signals
Special issue: Genomic signal processingComplex representations of the nucleotides, codons and amino acids derived from the projection of the Genetic Code Tetrahedron on adequately oriented planes are presented. By converting the sequences of nucleotides and polypeptides into digital genomic ...
Comments