Skip to main content

Scalability of a Genomic Data Analysis in the BioTest Platform

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10192))

Abstract

BioTest platform is dedicated for the processing of biomedical data that originate from various measurement techniques. This includes next-generation sequencing (NGS), that focuses the attention of researchers all of the world due to its broad possibilities in determining the structure of the DNA and RNA. However, the analysis of data provided by NGS requires large disk space, and is time-consuming, becoming a challenge for the data processing systems. In this paper, we have analyzed the possibility of scaling the BioTest platform in terms of genomic data analysis and platform architecture. Scalability tests were carried out using next-generation sequencing data and relied on methods for detection of somatic mutations and polymorphisms in the human DNA. Our results show that the platform is scalable, allowing to significantly reduce the execution time of performed calculations. However, the scalability capabilities depend on the experiment methodology and homogeneity of resources required by each task, which in NGS studies can be highly variable.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bensz, W., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems. SCI, vol. 642, pp. 399–409. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31277-4_35

    Chapter  Google Scholar 

  2. Cibulskis, C., Lawrence, M.S., Carter, S.L., Sivachenko, A., Jaffe, D., Sougnez, C., Gabriel, S., Meyerson, M., Lander, E.S., Getz, G.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013)

    Article  Google Scholar 

  3. Decap, D., Reumers, J., Herzeel, C., Costanza, P., Fostier, J.: Halvade: scalable sequence analysis with MapReduce. Bioinformatics 31(15), 2482–2488 (2015)

    Article  Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  5. DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)

    Article  Google Scholar 

  6. Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 439681, 1–8 (2013)

    Google Scholar 

  7. Koboldt, D.C., Zhang, Q., Larson, D.E., Shen, D., McLellan, M.D., Lin, L., Miller, C.A., Mardis, E.R., Ding, L., Wilson, R.K.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012)

    Article  Google Scholar 

  8. Larson, D.E., Harris, C.C., Chen, K., Koboldt, D.C., Abbott, T.E., Dooling, D.J., Ley, T.J., Mardis, E.R., Wilson, R.K., Ding, L.: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2011)

    Article  Google Scholar 

  9. Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 (2013)

  10. Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(2), 209–219 (2016)

    Article  Google Scholar 

  11. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)

    Article  Google Scholar 

  12. McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thormann, A., Flicek, P., Cunningham, F.: The ensembl variant effect predictor. Genome Biol. 17(1), 122 (2016)

    Article  Google Scholar 

  13. Meienberg, J., Bruggman, R., Oexle, K., Matyas, G.: Clinical sequencing: is WGS the better WES? Hum. Genet. 135, 359–362 (2016)

    Article  Google Scholar 

  14. Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)

    Article  Google Scholar 

  15. Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)

    Article  Google Scholar 

  16. Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13, 561–585 (2015)

    Article  Google Scholar 

  17. Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inf. Sci. 349–350, 77–101 (2016)

    Article  Google Scholar 

  18. Psiuk-Maksymowicz, K., Placzek, A., Jaksik, R., Student, S., Borys, D., Mrozek, D., Fujarewicz, K., Swierniak, A.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. Commun. Comput. Inf. Sci. 616, 449–462 (2016)

    Google Scholar 

  19. Saunders, C.T., Wong, W.S., Swamy, S., Becq, J., Murray, L.J., Cheetham, R.K.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012)

    Article  Google Scholar 

  20. Wiewiorka, M.S., Messina, A., Pacholewska, A., Maffioletti, S., Gawrysiak, P., Okoniewski, M.J.: SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics 30(18), 2652–2653 (2014)

    Article  Google Scholar 

  21. Xu, H., DiCarlo, J., Satya, R.V., Peng, Q., Wang, Y.: Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genom. 15, 244 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by The National Centre for Research and Development grant No. PBS3/B3/32/2015 and Strategmed2/267398/4/NCBR/2015. Presented system was developed and installed on the infrastructure of the Ziemowit computer cluster (www.ziemowit.hpc.polsl.pl) in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Psiuk-Maksymowicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Psiuk-Maksymowicz, K., Mrozek, D., Jaksik, R., Borys, D., Fujarewicz, K., Swierniak, A. (2017). Scalability of a Genomic Data Analysis in the BioTest Platform. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10192. Springer, Cham. https://doi.org/10.1007/978-3-319-54430-4_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54430-4_71

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54429-8

  • Online ISBN: 978-3-319-54430-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics