Exploiting In-memory Systems for Genomic Data Analysis

Shah, Zeeshan Ali; El-Kalioby, Mohamed; Faquih, Tariq; Shokrof, Moustafa; Subhani, Shazia; Alnakhli, Yasser; Aljafar, Hussain; Anjum, Ashiq; Abouelhoda, Mohamed

doi:10.1007/978-3-319-78723-7_35

Zeeshan Ali Shah^15,16,
Mohamed El-Kalioby^15,16,
Tariq Faquih^15,16,
Moustafa Shokrof¹⁶,
Shazia Subhani^15,16,
Yasser Alnakhli¹⁶,
Hussain Aljafar¹⁶,
Ashiq Anjum¹⁷ &
…
Mohamed Abouelhoda^15,16

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10813))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1772 Accesses

Abstract

With the increasing adoption of next generation sequencing technology in the medical practice, there is an increasing demand for faster data processing to gain immediate insights from the patient’s genome. Due to the extensive amount of genomic information and its big data nature, data processing takes long time and delays are often experienced. In this paper, we show how to exploit in-memory platforms for big genomic data analysis, with focus on the variant analysis workflow. We will determine where different in-memory techniques are used in the workflow and explore different memory-based strategies to speed up the analysis. Our experiments show promising results and encourage further research in this area, especially with the rapid advancement in memory and SSD technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., et al.: Implementation techniques for main memory database systems, vol. 14, no. 2. ACM (1984)
Google Scholar
Eich, M.H.: Mars: the design of a main memory database machine. Database Mach. Knowl. Base Mach. 43, 325–338 (1988)
Article Google Scholar
Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992)
Article Google Scholar
Sikka, V., Färber, F., Lehner, W., et al.: Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 731–742. ACM (2012)
Google Scholar
Han, J., Haihong, E., Guan, L., Jian, D.: Survey on NoSQL database. In: 6th International Conference on Pervasive Computing and Applications (ICPCA), pp. 363–366 (2011)
Google Scholar
Ganesh Chandra, D.: BASE analysis of NoSQL database. Future Gener. Comput. Syst. 52, 13–21 (2015)
Article Google Scholar
Schapranow, M.P., Plattner, H.: An in-memory database platform enabling real-time analyses of genome data. In: 2013 IEEE International Conference on Big Data, pp. 691–696, October 2013
Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows and wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Article Google Scholar
DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)
Article Google Scholar
Goecks, J., Nekrutenko, A., Taylor, J., Team, T.G.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86+ (2010)
Article Google Scholar
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732 (2006)
Article Google Scholar
Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: integrating Taverna and galaxy workflows with cloud computing support. BMC Bioinform. 13(1), 77+ (2012)
Article Google Scholar
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 415–425. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16483-0_41
Google Scholar
Elshazly, H., Souilmi, Y., Tonellato, P., Wall, D., Abouelhoda, M.: MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants. BMC Bioinform. 18, 49 (2017)
Article Google Scholar
Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
Article Google Scholar
GATK: How to Map and clean up short read sequence data efficiently. https://gatkforums.broadinstitute.org/gatk/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently. Accessed December 2017

Download references

Acknowledgments

This publication was supported by the Saudi Human Genome Project, King Abdulaziz City for Science and Technology (KACST).

Author information

Authors and Affiliations

King Faisal Specialist Hospital and Research Center (KFSHRC), Riyadh, Saudi Arabia
Zeeshan Ali Shah, Mohamed El-Kalioby, Tariq Faquih, Shazia Subhani & Mohamed Abouelhoda
Saudi Human Genome Program, King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia
Zeeshan Ali Shah, Mohamed El-Kalioby, Tariq Faquih, Moustafa Shokrof, Shazia Subhani, Yasser Alnakhli, Hussain Aljafar & Mohamed Abouelhoda
Department of Computing and Mathematics, University of Derby, Derby, UK
Ashiq Anjum

Authors

Zeeshan Ali Shah
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El-Kalioby
View author publications
You can also search for this author in PubMed Google Scholar
Tariq Faquih
View author publications
You can also search for this author in PubMed Google Scholar
Moustafa Shokrof
View author publications
You can also search for this author in PubMed Google Scholar
Shazia Subhani
View author publications
You can also search for this author in PubMed Google Scholar
Yasser Alnakhli
View author publications
You can also search for this author in PubMed Google Scholar
Hussain Aljafar
View author publications
You can also search for this author in PubMed Google Scholar
Ashiq Anjum
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Abouelhoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Abouelhoda .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, Z.A. et al. (2018). Exploiting In-memory Systems for Genomic Data Analysis. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-78723-7_35
Published: 28 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78722-0
Online ISBN: 978-3-319-78723-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics