Skip to main content

Orchestrating Task Execution in Cloud4PSi for Scalable Processing of Macromolecular Data of 3D Protein Structures

  • Conference paper
  • First Online:
Book cover Intelligent Information and Database Systems (ACIIDS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10192))

Included in the following conference series:

Abstract

The growing amount of biological data, including macromolecular data describing 3D protein structures, encourages the scientific community to reach for computing resources of the Cloud in order to process and analyze the data on a large scale. This applies, among many different analytical processes performed in bioinformatics, to protein structure alignment and similarity searching. In this paper, we show a parameter sweep-based approach for scheduling computations related to massive 3D protein structure alignments performed with Cloud4PSi system working on Microsoft Azure public cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abramson, D., Giddy, J., Kotler, L.: High performance parametric modeling with Nimrod/G: killer application for the global grid? In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2000), pp. 1–5. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  2. Al-Absi, A., Kang, D.: Long read alignment with parallel MapReduce cloud platform. Biomed. Res. Int. 1–13 (2015). Article ID 807407

    Google Scholar 

  3. Beberg, A., Ensign, D., Jayachandran, G., Khaliq, S., Pande, V.: Folding@home: lessons from eight years of volunteer distributed computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009)

    Google Scholar 

  4. Berman, H., et al.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)

    Article  Google Scholar 

  5. Forst, D., Welte, W., Wacker, T., Diederichs, K.: Structure of the sucrose-specific porin ScrY from salmonella typhimurium and its complex with sucrose. Nat. Struct. Biol. 5(1), 37–46 (1998)

    Article  Google Scholar 

  6. Hung, C.L., Chen, W.P., Hua, G.J., Zheng, H., Tsai, S., Lin, Y.L.: Cloud computing-based TagSNP selection algorithm for human genome data. Int. J. Mol. Sci. 16(1), 1096–1110 (2015)

    Article  Google Scholar 

  7. Hung, C.L., Hua, G.J.: Local alignment tool based on Hadoop framework and GPU architecture. Biomed. Res. Int. 1–7 (2014). Article Id 541490

    Google Scholar 

  8. Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 1–8 (2013). Article Id 439681

    Google Scholar 

  9. Inda, M.A., Belloum, A.S.Z., Roos, M., Vasunin, D., de Laat, C., Hertzberger, L.O., Breit, T.M.: Interactive workflows in a virtual laboratory for e-Bioscience: the SigWin-detector tool for gene expression analysis. In: Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, E-SCIENCE 2006, pp. 19–26. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  10. Leaver-Fay, A., Tyka, M., Lewis, S., Lange, O., Thompson, J., Jacak, R., et al.: ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)

    Article  Google Scholar 

  11. Mell, P., Grance, T.: The NIST Definition of Cloud Computing. Special Publication 800-145 (2011). http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf. Accessed 24 Sept 2016

  12. Mrozek, D.: High-Performance Computational Solutions in Protein Bioinformatics. SpringerBriefs in Computer Science. Springer, Heidelberg (2014)

    Book  Google Scholar 

  13. Mrozek, D., Brożek, M., Małysiak-Mrozek, B.: Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. J. Mol. Model 20, 2067 (2014)

    Google Scholar 

  14. Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inf. Sci. 349–350, 77–101 (2016)

    Article  Google Scholar 

  15. Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13, 561–585 (2015)

    Article  Google Scholar 

  16. Mrozek, D., Kutyła, T., Małysiak-Mrozek, B.: Accelerating 3D protein structure similarity searching on Microsoft Azure cloud with local replicas of macromolecular data. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9574, pp. 254–265. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32152-3_24

    Chapter  Google Scholar 

  17. Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)

    Article  Google Scholar 

  18. Mrozek, D., Suwała, P., Małysiak-Mrozek, B.: High-throughput and scalable protein function identification with Hadoop and Map-only pattern of the MapReduce processing model. J. Knowl. Inf. Syst. (submitted for publication)

    Google Scholar 

  19. Olabarriaga, S.D., Nederveen, A.J., O’ Nuallain, B.: Parameter sweeps for functional MRI research in the “Virtual Laboratory for e-Science” project. In: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2007, pp. 685–690. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  20. Prlić, A., Yates, A., Bliven, S., et al.: BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 28, 2693–2695 (2012)

    Article  Google Scholar 

  21. Smallen, S., Casanova, H., Berman, F.: Applying scheduling and tuning to on-line parallel tomography. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, SC 2001, p. 12. ACM, New York (2001)

    Google Scholar 

Download references

Acknowledgments

This work was supported by The National Centre for Research and Development grant No. PBS3/B3/32/2015 and by Microsoft Research within Microsoft Azure for Research grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Mrozek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mrozek, D., Kłapciński, A., Małysiak-Mrozek, B. (2017). Orchestrating Task Execution in Cloud4PSi for Scalable Processing of Macromolecular Data of 3D Protein Structures. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10192. Springer, Cham. https://doi.org/10.1007/978-3-319-54430-4_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54430-4_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54429-8

  • Online ISBN: 978-3-319-54430-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics