Abstract
The growing amount of biological data, including macromolecular data describing 3D protein structures, encourages the scientific community to reach for computing resources of the Cloud in order to process and analyze the data on a large scale. This applies, among many different analytical processes performed in bioinformatics, to protein structure alignment and similarity searching. In this paper, we show a parameter sweep-based approach for scheduling computations related to massive 3D protein structure alignments performed with Cloud4PSi system working on Microsoft Azure public cloud.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abramson, D., Giddy, J., Kotler, L.: High performance parametric modeling with Nimrod/G: killer application for the global grid? In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2000), pp. 1–5. IEEE Computer Society Press, Los Alamitos (2000)
Al-Absi, A., Kang, D.: Long read alignment with parallel MapReduce cloud platform. Biomed. Res. Int. 1–13 (2015). Article ID 807407
Beberg, A., Ensign, D., Jayachandran, G., Khaliq, S., Pande, V.: Folding@home: lessons from eight years of volunteer distributed computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009)
Berman, H., et al.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)
Forst, D., Welte, W., Wacker, T., Diederichs, K.: Structure of the sucrose-specific porin ScrY from salmonella typhimurium and its complex with sucrose. Nat. Struct. Biol. 5(1), 37–46 (1998)
Hung, C.L., Chen, W.P., Hua, G.J., Zheng, H., Tsai, S., Lin, Y.L.: Cloud computing-based TagSNP selection algorithm for human genome data. Int. J. Mol. Sci. 16(1), 1096–1110 (2015)
Hung, C.L., Hua, G.J.: Local alignment tool based on Hadoop framework and GPU architecture. Biomed. Res. Int. 1–7 (2014). Article Id 541490
Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 1–8 (2013). Article Id 439681
Inda, M.A., Belloum, A.S.Z., Roos, M., Vasunin, D., de Laat, C., Hertzberger, L.O., Breit, T.M.: Interactive workflows in a virtual laboratory for e-Bioscience: the SigWin-detector tool for gene expression analysis. In: Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, E-SCIENCE 2006, pp. 19–26. IEEE Computer Society, Washington, DC (2006)
Leaver-Fay, A., Tyka, M., Lewis, S., Lange, O., Thompson, J., Jacak, R., et al.: ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)
Mell, P., Grance, T.: The NIST Definition of Cloud Computing. Special Publication 800-145 (2011). http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf. Accessed 24 Sept 2016
Mrozek, D.: High-Performance Computational Solutions in Protein Bioinformatics. SpringerBriefs in Computer Science. Springer, Heidelberg (2014)
Mrozek, D., Brożek, M., Małysiak-Mrozek, B.: Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. J. Mol. Model 20, 2067 (2014)
Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inf. Sci. 349–350, 77–101 (2016)
Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13, 561–585 (2015)
Mrozek, D., Kutyła, T., Małysiak-Mrozek, B.: Accelerating 3D protein structure similarity searching on Microsoft Azure cloud with local replicas of macromolecular data. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9574, pp. 254–265. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32152-3_24
Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)
Mrozek, D., Suwała, P., Małysiak-Mrozek, B.: High-throughput and scalable protein function identification with Hadoop and Map-only pattern of the MapReduce processing model. J. Knowl. Inf. Syst. (submitted for publication)
Olabarriaga, S.D., Nederveen, A.J., O’ Nuallain, B.: Parameter sweeps for functional MRI research in the “Virtual Laboratory for e-Science” project. In: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2007, pp. 685–690. IEEE Computer Society, Washington, DC (2007)
Prlić, A., Yates, A., Bliven, S., et al.: BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 28, 2693–2695 (2012)
Smallen, S., Casanova, H., Berman, F.: Applying scheduling and tuning to on-line parallel tomography. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, SC 2001, p. 12. ACM, New York (2001)
Acknowledgments
This work was supported by The National Centre for Research and Development grant No. PBS3/B3/32/2015 and by Microsoft Research within Microsoft Azure for Research grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mrozek, D., Kłapciński, A., Małysiak-Mrozek, B. (2017). Orchestrating Task Execution in Cloud4PSi for Scalable Processing of Macromolecular Data of 3D Protein Structures. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10192. Springer, Cham. https://doi.org/10.1007/978-3-319-54430-4_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-54430-4_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54429-8
Online ISBN: 978-3-319-54430-4
eBook Packages: Computer ScienceComputer Science (R0)