Skip to main content

Visualizing Next-Generation Sequencing Cancer Data Sets with Cloud Computing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10084))

Abstract

With the advent of next-generation sequencing technology, clinical data sets now contain enormous amounts of valuable genomic information related to a wide range of diseases such as cancer. This data needs to be analysed, managed, stored, visualized and integrated in order to be clinically useful. However, many clinicians and researchers, who need to interpret these data sets, are non-specialists in the information technology domain and so need systems that are effective and easy to use. Herein, we present an overview of a novel cloud computing based next-generation sequencing research management software system which has simplicity, scalability, speed and reproducibility at its core. A prototype that enables rapid visualization of big data cancer sets is described. We present preliminary results from a bioinformatics pipeline for the Sage Care project, a European Union funded cancer research project, for comprehensive genome mapping analysis and visualization and outlined benefits of integrating this into a graphical user interface platform such as Simplicity.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    Simplicity is a trademark NSilico Lifescience Ltd.

References

  1. Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11(9), 647–657 (2010). doi:10.1038/nrg2857

    Article  Google Scholar 

  2. Tsai, E.A., et al.: Bioinformatics workflow for clinical whole genome sequencing at partners healthcare personalized medicine. J. Personal. Med. 6(1), 12 (2016)

    Article  Google Scholar 

  3. Liu, C.M., Wong, T., Wu, E., Luo, R., Yiu, S.M., Li, Y., Wang, B., Yu, C., Chu, X., Zhao, K., Li, R., Lam, T.W.: SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6), 878–879 (2011)

    Article  Google Scholar 

  4. Grossman, R.: Managing and Analysing 1,000,000 Genomes, September 2012. http://rgrossman.com/2012/09/18/million-genomes-challeng

  5. Foster, I.: Accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011). ISSN: 1089-7801

    Article  Google Scholar 

  6. Whiteman, D.C., Green, A.C., Olsen, C.M.: The growing burden of invasive melanoma: projections of incidence rates and numbers of new cases in six susceptible populations through 2031. J. Investig. Dermatol. (2016). doi:10.1016/j.jid.2016.01.035

    Article  Google Scholar 

  7. Mell, P., Grance, T.: The NIST definition of cloud computing, National Institute of Standards and Technology (2011). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

  8. Hyek, P.: Cloud computing issues and impacts, Global Technology Industry Discussion Series, E&Y (2011). http://www.ey.com/Publication/vwLUAssets/Cloud_computing_issues,_impacts_and_insights/$File/Cloud%20computing%20issues%20and%20impacts_14Apr11.pdf

  9. Shvachko, K.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium, Mass Storage Systems and Technologies (MSST). IEEE (2010)

    Google Scholar 

  10. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running pipelines of services. Nucl. Acids Res. 34(Web Server issue), 729–732 (2006)

    Article  Google Scholar 

  11. Brooksbank, C., Cameron, G., Thornton, J.: The European Bioinformatics Institute’s data resources. Nucl. Acids Res. Advance Access (2009). doi:10.1093/nar/gkp986

  12. Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40(4), 346–358 (2001)

    Article  Google Scholar 

  13. Brazas, M.D., Yamada, J.T., Ouellette, B.F.: Evolution in bioinformatic resources: 2009 update on the bioinformatics links directory. Nucl. Acids Res. 37, 3–5 (2009)

    Article  Google Scholar 

  14. Dudley, J.T., Butte, A.J.: A quick guide for developing effective bioinformatics programming skills. PLoS Comput. Biol. 5(12), e1000589 (2009)

    Article  Google Scholar 

  15. Papazoglou, M.P.: Service-oriented computing: state of the art and research challenges. Computer 40(11), 38–45 (2007). IEEE Computer Society. ISSN: 0018-9162

    Article  Google Scholar 

  16. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). doi:10.1145/1721654.1721672

    Article  Google Scholar 

  17. Lu, W., Jackson, J., Barga, R.: AzureBlast: a case study of developing science applications on the cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC 2010), pp. 413–420. ACM, New York (2010). doi:10.1145/1851476.1851537

  18. Cockburn, A.: Agile Software Development. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)

    MATH  Google Scholar 

  19. Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., Mesirov, J.P.: Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011)

    Article  Google Scholar 

  20. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)

    Article  Google Scholar 

  21. Walsh, P., Carroll, J., Sleator, R.D.: Accelerating in silico research with workflows: a lesson in simplicity. Comput. Biol. Med. 43(12), 2028–2035 (2013)

    Article  Google Scholar 

  22. Shachak, A., Shuval, K., Fine, S.: Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study. J. Med. Libr. Assoc. 95(4), 454–458 (2007)

    Article  Google Scholar 

  23. Stajich, J., Lapp, H.: Open source tools and toolkits for bioinformatics: significance, and where are we? Brief. Bioinform. 7(3), 287–296 (2006)

    Article  Google Scholar 

  24. Greene, S., Jones, L., Matchen, P., Thomas, J.: Iterative development in the field. IBM Syst. J. 42(4), 594–612 (2003)

    Article  Google Scholar 

  25. Love, M., Anders, S., Huber, W.: Differential analysis of count data–the DESeq2 package. Genome Biol. 15, 550 (2014)

    Article  Google Scholar 

  26. Kahn, S.D.: On the future of genomic data. Science 331(6018), 728–729 (2011)

    Article  Google Scholar 

  27. Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. In: Internet Computing. IEEE, May–June 2011

    Google Scholar 

  28. Nekrutenko, A., Taylor, J.: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13(9), 667–672 (2012)

    Article  Google Scholar 

  29. Evans, J.A., Foster, J.G.: Metaknowledge. Science 331(6018), 721–725 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Paul Walsh, Brian Kelly, Timm Heuss and Brendan Lawlor are investigators on Sage Care, a H2020 MCSA funded project, grant number 644186.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Walsh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Walsh, P., Lawlor, B., Kelly, B., Manning, T., Heuss, T., Leopold, M. (2016). Visualizing Next-Generation Sequencing Cancer Data Sets with Cloud Computing. In: Bornschlegl, M.X., Engel, F.C., Bond, R., Hemmje, M.L. (eds) Advanced Visual Interfaces. Supporting Big Data Applications. AVI-BDA 2016. Lecture Notes in Computer Science(), vol 10084. Springer, Cham. https://doi.org/10.1007/978-3-319-50070-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50070-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50069-0

  • Online ISBN: 978-3-319-50070-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics