Skip to main content

Exploiting In-memory Systems for Genomic Data Analysis

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2018)

Abstract

With the increasing adoption of next generation sequencing technology in the medical practice, there is an increasing demand for faster data processing to gain immediate insights from the patient’s genome. Due to the extensive amount of genomic information and its big data nature, data processing takes long time and delays are often experienced. In this paper, we show how to exploit in-memory platforms for big genomic data analysis, with focus on the variant analysis workflow. We will determine where different in-memory techniques are used in the workflow and explore different memory-based strategies to speed up the analysis. Our experiments show promising results and encourage further research in this area, especially with the rapid advancement in memory and SSD technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., et al.: Implementation techniques for main memory database systems, vol. 14, no. 2. ACM (1984)

    Google Scholar 

  2. Eich, M.H.: Mars: the design of a main memory database machine. Database Mach. Knowl. Base Mach. 43, 325–338 (1988)

    Article  Google Scholar 

  3. Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992)

    Article  Google Scholar 

  4. Sikka, V., Färber, F., Lehner, W., et al.: Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 731–742. ACM (2012)

    Google Scholar 

  5. Han, J., Haihong, E., Guan, L., Jian, D.: Survey on NoSQL database. In: 6th International Conference on Pervasive Computing and Applications (ICPCA), pp. 363–366 (2011)

    Google Scholar 

  6. Ganesh Chandra, D.: BASE analysis of NoSQL database. Future Gener. Comput. Syst. 52, 13–21 (2015)

    Article  Google Scholar 

  7. Schapranow, M.P., Plattner, H.: An in-memory database platform enabling real-time analyses of genome data. In: 2013 IEEE International Conference on Big Data, pp. 691–696, October 2013

    Google Scholar 

  8. Li, H., Durbin, R.: Fast and accurate short read alignment with burrows and wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  9. DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)

    Article  Google Scholar 

  10. Goecks, J., Nekrutenko, A., Taylor, J., Team, T.G.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86+ (2010)

    Article  Google Scholar 

  11. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732 (2006)

    Article  Google Scholar 

  12. Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: integrating Taverna and galaxy workflows with cloud computing support. BMC Bioinform. 13(1), 77+ (2012)

    Article  Google Scholar 

  13. Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 415–425. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16483-0_41

    Google Scholar 

  14. Elshazly, H., Souilmi, Y., Tonellato, P., Wall, D., Abouelhoda, M.: MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants. BMC Bioinform. 18, 49 (2017)

    Article  Google Scholar 

  15. Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)

    Article  Google Scholar 

  16. GATK: How to Map and clean up short read sequence data efficiently. https://gatkforums.broadinstitute.org/gatk/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently. Accessed December 2017

Download references

Acknowledgments

This publication was supported by the Saudi Human Genome Project, King Abdulaziz City for Science and Technology (KACST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Abouelhoda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, Z.A. et al. (2018). Exploiting In-memory Systems for Genomic Data Analysis. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78723-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78722-0

  • Online ISBN: 978-3-319-78723-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics