High Performance Parallel Computing with Clouds and Cloud Technologies

Ekanayake, Jaliya; Fox, Geoffrey

doi:10.1007/978-3-642-12636-9_2

Jaliya Ekanayake²⁰ &
Geoffrey Fox²⁰

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering ((LNICST,volume 34))

Included in the following conference series:

International Conference on Cloud Computing

3616 Accesses

Abstract

Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most “pleasingly parallel” applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Upgrading a high performance computing environment for massive data processing

Article Open access 16 October 2019

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Article 17 January 2020

A Comparative Survey of Big Data Computing and HPC: From a Parallel Programming Model to a Cluster Architecture

Article 26 May 2021

References

Amazon Elastic Compute Cloud (EC2), http://aws.amazon.com/ec2/
Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/
GoGrid Cloud Hosting, http://www.gogrid.com/
Keahey, K., Foster, L, Freeman, T., Zhang, X.: Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid. Scientific Programming Journal 13(4), 265–276 (2005); Special Issue: Dynamic Grids and Worldwide Computing
Google Scholar
Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The Eucalyptus Open-source Cloud-computing System. In: CCGrid 2009: the 9th IEEE International Symposium on Cluster Computing and the Grid, Shanghai, China (2009)
Google Scholar
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 164–177. ACM, New York (2003), http://doi.acm.org/10.1145/945445.945462
Chapter Google Scholar
Apache Hadoop, http://hadoop.apache.org/core/
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: European Conference on Computer Systems (2007)
Google Scholar
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P., Currey, J.: Dryad-LINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In: Symposium on Operating System Design and Implementation (OS-DI), San Diego, CA (2008)
Google Scholar
Ekanayake, J., Pallickara, S., Fox, G.: MapReduce for Data Intensive Scientific Analysis. In: Fourth IEEE International Conference on eScience, Indianapolis, pp. 277–284 (2008)
Google Scholar
Huang, X., Madan, A.: CAP3: A DNA Sequence Assembly Program. Genome Research 9(9), 868–877 (1999)
Article Google Scholar
Hartigan, J.: Clustering Algorithms. Wiley, Chichester (1975)
MATH Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. ACM Commun. 51, 107–113 (2008)
Article Google Scholar
MPI (Message Passing Interface), http://www-unix.mcs.anl.gov/mpi/
Dongarra, J., Geist, A., Manchek, R., Sunderam, V.: Integrated PVM framework supports heterogeneous network computing. Computers in Physics 7(2), 166–175 (1993)
Google Scholar
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience (2005)
Google Scholar
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research (Web Server issue), W729 (2006)
Google Scholar
Raicu, I., Zhao, Y., Dumitrescu, C., Foster, L, Wilde, M.: Falkon: a Fast and Light-weight tasK executiON framework. In: Proceedings of the ACM/IEEE Conference on Supercom-puting, SC 2007, Nevada, ACM, New York (2007), http://doi.acm.org/10.1145/1362622.1362680
Google Scholar
Pallickara, S., Pierce, M.: SWARM: Scheduling Large-Scale Jobs over the Loosely-Coupled HPC Clusters. In: Fourth IEEE International Conference on eScience, pp. 285–292 (2008)
Google Scholar
Frey, J.: Condor DAGMan: Handling Inter-Job Dependencies, http://www.bo.infn.it/calcolo/condor/dagman/
Foster, I.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. In: Proceedings of the 7th international Euro-Par Conference Manchester on Parallel Processing (2001)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003), http://doi.acm.org/10.1145/1165389.945450
Article Google Scholar
Pallickara, S., Fox, G.: NaradaBrokering: A Distributed Middleware Framework and Architecture for Enabling Durable Peer-to-Peer Grids. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 41–61. Springer, Heidelberg (2003)
Chapter Google Scholar
Gu, Y., Grossman, R.: Sector and Sphere: The Design and Implementation of a High Performance Data Cloud. Philosophical Transactions A Special Issue associated with the UK e-Science All Hands Meeting (2008)
Google Scholar
Moretti, C., Bui, H., Hollingsworth, K., Rich, B., Flynn, P., Thain, D.: All-Pairs: An Abstraction for Data Intensive Computing on Campus Grids. IEEE Transactions on Parallel and Distributed Systems (2009)
Google Scholar
Youseff, L., Wolski, R., Gorda, B., Krintz, C: Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In: Proceedings of the 2nd international Workshop on Virtualization Technology in Distributed Computing. IEEE Computer Society, Washington (2006), http://dx.doi.org/10.1109/VTDC.2006.4
Google Scholar
Constantinos, E., Hill, N.: Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2. In: Cloud Computing and Its Applications, Chicago, IL (2008)
Google Scholar
Walker, E.: benchmarking Amazon EC2 for high-performance scientific computing, http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf
Gavrilovska, A., Kumar, S., Raj, K., Gupta, V., Nathuji, R., Niranjan, A., Saraiya, P.: High-Performance Hypervisor Architectures: Virtualization in HPC Systems. In: 1st Workshop on System-level Virtualization for High Performance Computing (2007)
Google Scholar
Fox, G., Bae, S., Ekanayake, J., Qiu, X., Yuan, H.: Parallel Data Mining from Multicore to Cloudy Grids. In: High Performance Computing and Grids workshop (2008)
Google Scholar
Johnsson, S., Harris, T., Mathur, K.: Matrix multiplication on the connection machine. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Supercomputing 1989, pp. 326–332. ACM, New York (1989), http://doi.acm.org/10.1145/76263.76298
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
Jaliya Ekanayake & Geoffrey Fox

Authors

Jaliya Ekanayake
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Fox
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Research, Institute on Autonomic Network Computing (IRINAC), Menradstr. 2, 80634, Munich, Germany
Dimiter R. Avresky
LAAS-CNRS, 7 Avenue du Colonel Roche, 31077, Toulouse Cedex 4, France
Michel Diaz
Leibmz-Rechenzentrum, Boltzmannstr. 1, 85748, Garching, Germany
Arndt Bode
Dipartimento di Informatica e Sistematica, Univesita di Roma La Spaienza, 00185, Roma, Italy
Bruno Ciciani
IBM Research Laboratory, Haifa, Israel
Eliezer Dekel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekanayake, J., Fox, G. (2010). High Performance Parallel Computing with Clouds and Cloud Technologies. In: Avresky, D.R., Diaz, M., Bode, A., Ciciani, B., Dekel, E. (eds) Cloud Computing. CloudComp 2009. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, vol 34. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12636-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-12636-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12635-2
Online ISBN: 978-3-642-12636-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics