Abstract
With the advent of next-generation sequencing technology, clinical data sets now contain enormous amounts of valuable genomic information related to a wide range of diseases such as cancer. This data needs to be analysed, managed, stored, visualized and integrated in order to be clinically useful. However, many clinicians and researchers, who need to interpret these data sets, are non-specialists in the information technology domain and so need systems that are effective and easy to use. Herein, we present an overview of a novel cloud computing based next-generation sequencing research management software system which has simplicity, scalability, speed and reproducibility at its core. A prototype that enables rapid visualization of big data cancer sets is described. We present preliminary results from a bioinformatics pipeline for the Sage Care project, a European Union funded cancer research project, for comprehensive genome mapping analysis and visualization and outlined benefits of integrating this into a graphical user interface platform such as Simplicity.
This is a preview of subscription content, log in via an institution.
Notes
- 1.
Simplicity is a trademark NSilico Lifescience Ltd.
References
Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L., Nolan, G.P.: Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11(9), 647–657 (2010). doi:10.1038/nrg2857
Tsai, E.A., et al.: Bioinformatics workflow for clinical whole genome sequencing at partners healthcare personalized medicine. J. Personal. Med. 6(1), 12 (2016)
Liu, C.M., Wong, T., Wu, E., Luo, R., Yiu, S.M., Li, Y., Wang, B., Yu, C., Chu, X., Zhao, K., Li, R., Lam, T.W.: SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6), 878–879 (2011)
Grossman, R.: Managing and Analysing 1,000,000 Genomes, September 2012. http://rgrossman.com/2012/09/18/million-genomes-challeng
Foster, I.: Accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011). ISSN: 1089-7801
Whiteman, D.C., Green, A.C., Olsen, C.M.: The growing burden of invasive melanoma: projections of incidence rates and numbers of new cases in six susceptible populations through 2031. J. Investig. Dermatol. (2016). doi:10.1016/j.jid.2016.01.035
Mell, P., Grance, T.: The NIST definition of cloud computing, National Institute of Standards and Technology (2011). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
Hyek, P.: Cloud computing issues and impacts, Global Technology Industry Discussion Series, E&Y (2011). http://www.ey.com/Publication/vwLUAssets/Cloud_computing_issues,_impacts_and_insights/$File/Cloud%20computing%20issues%20and%20impacts_14Apr11.pdf
Shvachko, K.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium, Mass Storage Systems and Technologies (MSST). IEEE (2010)
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running pipelines of services. Nucl. Acids Res. 34(Web Server issue), 729–732 (2006)
Brooksbank, C., Cameron, G., Thornton, J.: The European Bioinformatics Institute’s data resources. Nucl. Acids Res. Advance Access (2009). doi:10.1093/nar/gkp986
Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40(4), 346–358 (2001)
Brazas, M.D., Yamada, J.T., Ouellette, B.F.: Evolution in bioinformatic resources: 2009 update on the bioinformatics links directory. Nucl. Acids Res. 37, 3–5 (2009)
Dudley, J.T., Butte, A.J.: A quick guide for developing effective bioinformatics programming skills. PLoS Comput. Biol. 5(12), e1000589 (2009)
Papazoglou, M.P.: Service-oriented computing: state of the art and research challenges. Computer 40(11), 38–45 (2007). IEEE Computer Society. ISSN: 0018-9162
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). doi:10.1145/1721654.1721672
Lu, W., Jackson, J., Barga, R.: AzureBlast: a case study of developing science applications on the cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC 2010), pp. 413–420. ACM, New York (2010). doi:10.1145/1851476.1851537
Cockburn, A.: Agile Software Development. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)
Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., Mesirov, J.P.: Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011)
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Walsh, P., Carroll, J., Sleator, R.D.: Accelerating in silico research with workflows: a lesson in simplicity. Comput. Biol. Med. 43(12), 2028–2035 (2013)
Shachak, A., Shuval, K., Fine, S.: Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study. J. Med. Libr. Assoc. 95(4), 454–458 (2007)
Stajich, J., Lapp, H.: Open source tools and toolkits for bioinformatics: significance, and where are we? Brief. Bioinform. 7(3), 287–296 (2006)
Greene, S., Jones, L., Matchen, P., Thomas, J.: Iterative development in the field. IBM Syst. J. 42(4), 594–612 (2003)
Love, M., Anders, S., Huber, W.: Differential analysis of count data–the DESeq2 package. Genome Biol. 15, 550 (2014)
Kahn, S.D.: On the future of genomic data. Science 331(6018), 728–729 (2011)
Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. In: Internet Computing. IEEE, May–June 2011
Nekrutenko, A., Taylor, J.: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13(9), 667–672 (2012)
Evans, J.A., Foster, J.G.: Metaknowledge. Science 331(6018), 721–725 (2011)
Acknowledgements
Paul Walsh, Brian Kelly, Timm Heuss and Brendan Lawlor are investigators on Sage Care, a H2020 MCSA funded project, grant number 644186.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Walsh, P., Lawlor, B., Kelly, B., Manning, T., Heuss, T., Leopold, M. (2016). Visualizing Next-Generation Sequencing Cancer Data Sets with Cloud Computing. In: Bornschlegl, M.X., Engel, F.C., Bond, R., Hemmje, M.L. (eds) Advanced Visual Interfaces. Supporting Big Data Applications. AVI-BDA 2016. Lecture Notes in Computer Science(), vol 10084. Springer, Cham. https://doi.org/10.1007/978-3-319-50070-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-50070-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50069-0
Online ISBN: 978-3-319-50070-6
eBook Packages: Computer ScienceComputer Science (R0)