Abstract
With the increasing adoption of next generation sequencing technology in the medical practice, there is an increasing demand for faster data processing to gain immediate insights from the patient’s genome. Due to the extensive amount of genomic information and its big data nature, data processing takes long time and delays are often experienced. In this paper, we show how to exploit in-memory platforms for big genomic data analysis, with focus on the variant analysis workflow. We will determine where different in-memory techniques are used in the workflow and explore different memory-based strategies to speed up the analysis. Our experiments show promising results and encourage further research in this area, especially with the rapid advancement in memory and SSD technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., et al.: Implementation techniques for main memory database systems, vol. 14, no. 2. ACM (1984)
Eich, M.H.: Mars: the design of a main memory database machine. Database Mach. Knowl. Base Mach. 43, 325–338 (1988)
Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992)
Sikka, V., Färber, F., Lehner, W., et al.: Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 731–742. ACM (2012)
Han, J., Haihong, E., Guan, L., Jian, D.: Survey on NoSQL database. In: 6th International Conference on Pervasive Computing and Applications (ICPCA), pp. 363–366 (2011)
Ganesh Chandra, D.: BASE analysis of NoSQL database. Future Gener. Comput. Syst. 52, 13–21 (2015)
Schapranow, M.P., Plattner, H.: An in-memory database platform enabling real-time analyses of genome data. In: 2013 IEEE International Conference on Big Data, pp. 691–696, October 2013
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows and wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)
Goecks, J., Nekrutenko, A., Taylor, J., Team, T.G.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86+ (2010)
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732 (2006)
Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: integrating Taverna and galaxy workflows with cloud computing support. BMC Bioinform. 13(1), 77+ (2012)
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 415–425. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16483-0_41
Elshazly, H., Souilmi, Y., Tonellato, P., Wall, D., Abouelhoda, M.: MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants. BMC Bioinform. 18, 49 (2017)
Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
GATK: How to Map and clean up short read sequence data efficiently. https://gatkforums.broadinstitute.org/gatk/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently. Accessed December 2017
Acknowledgments
This publication was supported by the Saudi Human Genome Project, King Abdulaziz City for Science and Technology (KACST).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Shah, Z.A. et al. (2018). Exploiting In-memory Systems for Genomic Data Analysis. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-78723-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78722-0
Online ISBN: 978-3-319-78723-7
eBook Packages: Computer ScienceComputer Science (R0)