Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

Expósito, Roberto R.; Taboada, Guillermo L.; Ramos, Sabela; González-Domínguez, Jorge; Touriño, Juan; Doallo, Ramón

doi:10.1007/s10723-013-9250-y

Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

Published: 01 March 2013

Volume 11, pages 613–631, (2013)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Roberto R. Expósito¹,
Guillermo L. Taboada¹,
Sabela Ramos¹,
Jorge González-Domínguez¹,
Juan Touriño¹ &
…
Ramón Doallo¹

735 Accesses
1 Altmetric
Explore all metrics

Abstract

Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensive applications. The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Performance Storage Support for Scientific Big Data Applications on the Cloud

Research Characterization on I/O Improvements of Storage Environments

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

References

Amazon Web Services in Top 500 list: http://www.top500.org/system/177457. Last visited: Nov 2012
IOzone Filesystem Benchmark: http://www.iozone.org/. Last visited: Nov 2012
MPI: A Message Passing Interface Standard: http://www.mcs.anl.gov/research/projects/mpi/. Last visited: Nov 2012
The HDF Group: http://www.hdfgroup.org/HDF5/. Last visited: Nov 2012
Abe, Y., Gibson, G.: pWalrus: towards better integration of parallel file systems into cloud storage. In: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS’10), Heraklion, Crete, Greece, pp. 1–7 (2010)
Amazon Web Services LLC: Amazon Elastic Block Store (EBS). http://aws.amazon.com/ebs/. Last visited: Nov 2012
Amazon Web Services LLC: Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2. Last visited: Nov 2012
Amazon Web Services LLC: Amazon Simple Storage Service (Amazon S3). http://aws.amazon.com/s3/. Last visited: Nov 2012
Amazon Web Services LLC: High Performance Computing Using Amazon EC2. http://aws.amazon.com/ec2/hpc-applications/. Last visited: Nov 2012
Carns, P., Ligon III, W., Ross, R., Thakur, R.: PVFS: a parallel virtual file system for linux clusters. In: Proc. 4th Annual Linux Showcase & Conference, Atlanta, GA, USA, pp. 317–328 (2000)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proc. 20th ACM/IEEE Supercomputing Conference (SC’08), Austin, TX, USA, pp. 50:1–50:12 (2008)
Evangelinos, C., Hill, C.N.: Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on Amazon’s EC2. In: Proc. 1st Workshop on Cloud Computing and Its Applications (CCA’08), Chicago, IL, USA, pp. 1–6 (2008)
Expósito, R.R., Taboada, G.L., Ramos, S., Touriño, J., Doallo, R.: Performance analysis of HPC applications in the cloud. Future Gener. Comput. Syst. 29(1), 218–229 (2013)
Article Google Scholar
Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proc. 11th European PVM/MPI Users’ Group Meeting (EuroPVM/MPI’04), Budapest, Hungary, pp. 97–104 (2004)
Ghoshal, D., Canon, R.S., Ramakrishnan, L.: I/O performance of virtualized cloud environments. In: Proc. 2nd International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC’11), Seattle, WA, USA, pp. 71–80 (2011)
Gunarathne, T., Wu, T.L., Qiu, J., Fox, G.: MapReduce in the clouds for science. In: Proc. 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom’10), Indianapolis, IN, USA, pp. 565–572 (2010)
Huang, W., Liu, J., Abali, B., Panda, D.K.: A case for high performance computing with virtual machines. In: Proc. 20th ACM International Conference on Supercomputing (ICS’06), Cairns, Australia, pp. 125–134 (2006)
Juve, G., Deelman, E., Berriman, G.B., Berman, B.P., Maechling, P.: An evaluation of the cost and performance of scientific workflows on Amazon EC2. J. Grid Comput. 10(1), 5–21 (2012)
Article Google Scholar
Liu, M., Zhai, J., Zhai, Y., Ma, X., Chen, W.: One optimized I/O configuration per HPC application: leveraging the configurability of cloud. In: Proc. 2nd ACM SIGOPS Asia-Pacific Workshop on Systems (APSys’11), Shanghai, China, pp. 1–5 (2011)
Mauch, V., Kunze, M., Hillenbrand, M.: High performance cloud computing. Future Gener. Comput. Syst. (2012) doi:10.1016/j.future.2012.03.011
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (1995)
Napper, J., Bientinesi, P.: Can cloud computing reach the TOP500? In: Proc. Combined Workshops on UnConventional High Performance Computing Workshop Plus Memory Access Workshop (UCHPC-MAW’09), Ischia, Italy, pp. 17–20 (2009)
NASA: NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html. Last visited: Nov 2012
Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: Proc. 9th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’09), Shanghai, China, pp. 124–131 (2009)
de Oliveira, D., Ocaña, K.A.C.S., Baião, F.A., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10(3), 521–552 (2012)
Article Google Scholar
Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: A performance analysis of EC2 cloud computing services for scientific computing. In: Proc. 1st International Conference on Cloud Computing (CLOUDCOMP’09), Munich, Germany, pp. 115–131 (2009)
Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science Grids: a viable solution? In: Proc. 1st International Workshop on Data-aware Distributed Computing (DADC’08), Boston, MA, USA, pp. 55–64 (2008)
Ramakrishnan, L., Canon, R.S., Muriki, K., Sakrejda, I., Wright, N.J.: Evaluating interconnect and virtualization performance for high performance computing. SIGMETRICS Perform. Eval. Rev. 40(2), 55–60 (2012)
Article Google Scholar
Regola, N., Ducom, J.C.: Recommendations for virtualization technologies in high performance computing. In: Proc. 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom’10), Indianapolis, IN, USA, pp. 409–416 (2010)
Rodero, I., Viswanathan, H., Lee, E.K., Gamell, M., Pompili, D., Parashar, M.: Energy-efficient thermal-aware autonomic management of virtualized HPC cloud infrastructure. J. Grid Comput. 10(3), 447–473 (2012)
Article Google Scholar
Shafer, J.: I/O virtualization bottlenecks in cloud computing today. In: Proc. 2nd Workshop on I/O Virtualization (WIOV’10), Pittsburgh, PA, USA, p. 5 (7 p.) (2010)
Shan, H., Antypas, K., Shalf, J.: Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In: Proc. 20th ACM/IEEE Supercomputing Conference (SC’08), Austin, TX, USA, pp. 42:1–42:12 (2008)
Sun, C., Nishimura, H., James, S., Song, K., Muriki, K., Qin, Y.: HPC cloud applied to lattice optimization. In: Proc. 2nd International Particle Accelerator Conference (IPAC’11), San Sebastian, Spain, pp. 1767–1769 (2011)
Thakur, R., Gropp, W., Lusk, E.: On implementing MPI-IO portably and with high performance. In: Proc. 6th Workshop on I/O in Parallel and Distributed Systems (IOPADS ’99), Atlanta, GA, USA, pp. 23–32 (1999)
Vecchiola, C., Pandey, S., Buyya, R.: High-performance cloud computing: a view of scientific applications. In: Proc. 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN’09), Kaoshiung, Taiwan, pp. 4–16 (2009)
Walker, E.: Benchmarking Amazon EC2 for high-performance scientific computing. USENIX ;login: 33(5), 18–23 (2008)
Google Scholar
Wong, P., van der Wijngaart, R.: NAS parallel benchmarks I/O version 2.4. Tech. Rep. NAS-03-002, NASA Ames Research Center (2003)
Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput. 10(2), 279–310 (2012)
Article Google Scholar
Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Paravirtualization for HPC systems. In: Proc. International Workshop on XEN in HPC Cluster and Grid Computing Environments (XHPC’06), Sorrento, Italy, pp. 474–486 (2006)
Yu, W., Vetter, J.S.: Xen-based HPC: a parallel I/O perspective. In: Proc. 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’08), Lyon, France, pp. 154–161 (2008)
Zhai, Y., Liu, M., Zhai, J., Ma, X., Chen, W.: Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications. In: Proc. 23rd ACM/IEEE Supercomputing Conference (SC’11, State of the Practice Reports), Seattle, WA, USA, pp. 11:1–11:10 (2011)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Systems, University of A Coruña, A Coruña, Spain
Roberto R. Expósito, Guillermo L. Taboada, Sabela Ramos, Jorge González-Domínguez, Juan Touriño & Ramón Doallo

Authors

Roberto R. Expósito
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo L. Taboada
View author publications
You can also search for this author in PubMed Google Scholar
Sabela Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Jorge González-Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Juan Touriño
View author publications
You can also search for this author in PubMed Google Scholar
Ramón Doallo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto R. Expósito.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Expósito, R.R., Taboada, G.L., Ramos, S. et al. Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform. J Grid Computing 11, 613–631 (2013). https://doi.org/10.1007/s10723-013-9250-y

Download citation

Received: 25 June 2012
Accepted: 25 January 2013
Published: 01 March 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10723-013-9250-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-Performance Storage Support for Scientific Big Data Applications on the Cloud

Research Characterization on I/O Improvements of Storage Environments

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-Performance Storage Support for Scientific Big Data Applications on the Cloud

Research Characterization on I/O Improvements of Storage Environments

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation