skip to main content
10.1145/3235830.3235837acmotherconferencesArticle/Chapter ViewAbstractPublication PagespbioConference Proceedingsconference-collections
research-article

Protein Secondary Structure Analysis in the Cloud

Published: 23 September 2018 Publication History

Abstract

Many biological problems, such finding recurring geometrical patterns in the secondary structures of protein pairs, are often solved by using parallel applications running on HPC systems that, thanks to their powerful architecture and high number of CPUs, can yield good performance. Recently cloud computing is emerging as a convenient environment to deploy certain types of parallel applications. This work examines Cross Motif Search, an application that has been successfully executed in parallel on on-premise clusters and HPC systems, and studies its porting in a cloud environment. The work uses profiling and analytical modelling to predict communication overhead. While profiling gives unreliable estimates, model-based predictions and actual data are in good match, thanks to the simple pattern of communication embedded in the application. Overall, Cross Motif Search has a viable implementation in the cloud.

References

[1]
Ferretti M., Musci M. Geometrical Motifs Search in Proteins: A Parallel Approach. Parallel Computing, February 2015; 42:60--74.
[2]
Kielmann T., Bal H. E., Verstoep K. Fast measurement of LogP parameters for message passing platforms. International Parallel and Distributed Processing Symposium. Springer, Berlin, Heidelberg, 2000.
[3]
Ferretti M., Musci M., Santangelo L. A Hybrid OpenMP and OpenMPI Approach to Geometrical Motif Search in Proteins. Proceedings of the IEEE International Conference on Cluster Computing (IEEE Cluster 2014), IEEE Computer Society, 2014; 298--304.
[4]
Ferretti M., Musci M., Santangelo L. MPI -CMS: a hybrid parallel approach to geometrical motif search in proteins. Concurrency and Computation: Practice and Experience, 27.18 (2015): 5500--5516.
[5]
Ferretti M., Santangelo L. Hybrid OpenMP-MPI parallelism: porting experiments from small to large clusters. Proc. 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 21-23 March 2018, Cambridge (United Kingdom), IEEE CPS, ISBN 978-1-5386-49756, 2018, 297--301.
[6]
Protein Data Bank. 2018, May 16. Retrieved from https://www.rcsb.org
[7]
Marconi, the new Tier-0 system. 2018, May 16. Retrieved from http://hpc.cineca.it/hardware/marconi
[8]
Overview of Virtual Private Cloud. 2018, May 16. Retrieved from https://cloud.google.com/vpc/docs/vpc
[9]
Andromeda 2.1 reduces GCP's intrazone latency by 40%. 2018, May 16. Retrieved from https://cloudplatform.googleblog.com/2017/11/Andromeda-2-1-reduces-GCPs-intrazone-latency-by-40-percent.html
[10]
Enter the Andromeda zone: Google Cloud Platform's latest networking stack. 2018, May 16. Retrieved from https://cloudplatform.googleblog.com/2014/04/enter-andromeda-zone-google-cloud-platforms-latest-networking-stack.html
[11]
Egress throughput caps. 2018, May 16. Retrieved from https://cloud.google.com/compute/docs/networks-and-firewalls#egress_throughput_caps
[12]
Gong Y., He B., Zhong J., Network Performance Aware MPI Collective Communication Operations in the Cloud, in IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 11, pp. 3079--3089, Nov. 2015.
[13]
Machined types. 2018, May 16. Retrieved from https://cloud.google.com/compute/docs/machine-types
[14]
Casas M., Badia R. M., Labarta J. Prediction of behavior of MPI applications. Cluster Computing, 2008 IEEE International Conference on. IEEE, 2008.
[15]
Steffenel L. A., Mounié G. A framework for adaptive collective communications for heterogeneous hierarchical computing systems. Journal of Computer and System Sciences, 74.6 (2008): 1082--1093.
[16]
Kielmann T., Bal H. E., Gorlatch S., Verstoep K., Hofman R. F. H. Network performance-aware collective communication for clustered wide-area systems, In Parallel Computing, vol. 27, no. 11, 2001, pp. 1431--1456.
[17]
Pješivac-Grbović J., Angskun T., Bosilca G., Fagg G. E., Gabriel E., Dongarra J. Performance analysis of MPI collective operations. Cluster Computing, 10(2), 127--143, 2007.
[18]
Dichev K., Lastovetsky A. Optimization of collective communication for heterogeneous hpc platforms. High-Performance Computing on Complex Environments (2014): 95--114.
[19]
Carlyle G., Harrell S. L., Smith P. M. Cost-Effective HPC: The Community or the Cloud?, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, Indianapolis, IN, 2010, pp. 169--176.
[20]
Hassani R., Aiatullah M., Luksch P. Improving HPC Application Performance in Public Cloud, In IERI Procedia, vol. 10, 2014, pp. 169--176.
[21]
Passerini T., Slawinski J., Villa U., Sunderam V. Experiences with Cost and Utility Trade-offs on IaaS Clouds, Grids, and On-Premise Resources, 2014 IEEE International Conference on Cloud Engineering, Boston, MA, 2014, pp. 391--396.
[22]
Jung E. S., Kettimuthu R. Challenges and opportunities for data-intensive computing in the cloud, IEEE Computer Society, December 2014, pp. 82--85.
[23]
Mancini M., Aloisio G. How advanced cloud technologies can impact and change HPC environments for simulation, 2015 International Conference on High Performance Computing & Simulation (HPCS), Amsterdam, 2015, pp. 667--668.
[24]
Maresova P., Sobeslav V., Krejcar O. Cost--benefit analysis -- evaluation model of cloud computing deployment for use in companies, Applied Economics, vol. 49, no. 6, 2017, pp. 521--533.
[25]
Napper J., Bientinesi P. Can cloud computing reach the top500?, Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop. ACM, 2009.
[26]
Yang T., Ma X., Mueller F. Predicting parallel applications' performance across platforms using partial execution. ACM/IEEE Supercomputing Conference. 2005.
[27]
A new generation of EC2 Instances for compute-intensive workload, November 2013. Retrieved from https://aws.amazon.com/it/blogs/aws/a-generation-of-ec2-instances-for-compute-intensive-workloads/
[28]
Amazon EC2 C3 Instance cluster. 2018, May 16. Retrieved from https://www.top500.org/system/178321
[29]
Real World AWS Scalability. November 2016. Retrieved from https://aws.amazon.com/it/blogs/compute/real-world-aws-scalability/
[30]
Dillon T., Wu C., Chang E. Cloud computing: issues and challenges. In Advanced Information Networking and Applications (AINA), 24th IEEE International Conference on, IEEE, pp. 27--33, 2010.
[31]
Nanath K., Pillai R. A model for cost-benefit analysis of cloud computing. Journal of International Technology and Information Management, 22(3), p. 6, 2013.
[32]
Jackson K. R., Ramakrishnan L., Muriki K., Canon S., Cholia S., Shalf J., ... Wright N. J. Performance analysis of high performance computing applications on the amazon web services cloud. In Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, IEEE, pp. 159--168, 2010.
[33]
Mauch V., Kunze M., Hillenbrand M. High performance cloud computing, In Future Generation Computer Systems, vol. 29, no. 6, 2013, pp. 1408--1416.
[34]
Chakthranont N., Khunphet P., Takano R., Ikegami T. Exploring the performance impact of virtualization on an HPC cloud. In Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on, IEEE, pp. 426--432, 2014.
[35]
Expósito R. R., Taboada G. L., Ramos S., Touriño J., Doallo R. Performance analysis of HPC applications in the cloud. Future Generation Computer Systems, 29(1), 218--229, 2013.
[36]
Rabenseifner R., Träff J.L. More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems. In: Kranzlmüller D., Kacsuk P., Dongarra J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2004. Lecture Notes in Computer Science, Vol 3241. Springer, Berlin, Heidelberg.
[37]
Thakur R., Gropp W. Improving the Performance of Collective Operations in MPICH, in Proc. of the 10th European PVM/MPI Users' Group Meeting (Euro PVM/MPI 2003), Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, LNCS 2840, Springer, September 2003, pp. 257--267.
[38]
Vadhiyar S. S., Fagg G. E., Dongarra J. Automatically Tuned Collective Communications, Supercomputing, ACM/IEEE 2000 Conference, 2000, p. 3.
[39]
Ballard D. H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition 1981; 13(2): 111--122.
[40]
Shi S., Zhong Y., Majumdar I., Krishna S. S., Grishin N. V. Searching for three-dimensional secondary structural patterns in proteins with ProSMoS. Bioinformatics 2007; 23(11):1331--1338.
[41]
Shi S., Chitturi B., Grishin N. V. ProSMoS server: a pattern-based search using interaction matrix representation of protein structures. Nucleic Acids Research 2009; 37(Web Server issue): W526--W531.
[42]
Hutchinson G., Thornton J. M. PROMOTIF---a program to identify and analyze structural motifs in proteins. Protein Science 1996; 5:212--220.
[43]
Dror O., Benyamini H., Nussinov R., Wolfson H. MASS: multiple structural alignment by secondary structures. Bioinformatics 2003; 19(1):i95--i104.

Cited By

View all
  • (2019)Optimized cloud-based scheduling for protein secondary structure analysisThe Journal of Supercomputing10.1007/s11227-019-02859-w75:7(3499-3520)Online publication date: 31-Jul-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PBio 2018: Proceedings of the 6th International Workshop on Parallelism in Bioinformatics
September 2018
70 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cloud Infrastructure
  2. HPC
  3. Parallel Architecture
  4. Performance Prediction
  5. Protein Secondary Structure

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PBio 2018

Acceptance Rates

PBio 2018 Paper Acceptance Rate 7 of 9 submissions, 78%;
Overall Acceptance Rate 7 of 9 submissions, 78%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Optimized cloud-based scheduling for protein secondary structure analysisThe Journal of Supercomputing10.1007/s11227-019-02859-w75:7(3499-3520)Online publication date: 31-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media