Skip to main content
Log in

Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensive applications. The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amazon Web Services in Top 500 list: http://www.top500.org/system/177457. Last visited: Nov 2012

  2. IOzone Filesystem Benchmark: http://www.iozone.org/. Last visited: Nov 2012

  3. MPI: A Message Passing Interface Standard: http://www.mcs.anl.gov/research/projects/mpi/. Last visited: Nov 2012

  4. The HDF Group: http://www.hdfgroup.org/HDF5/. Last visited: Nov 2012

  5. Abe, Y., Gibson, G.: pWalrus: towards better integration of parallel file systems into cloud storage. In: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS’10), Heraklion, Crete, Greece, pp. 1–7 (2010)

  6. Amazon Web Services LLC: Amazon Elastic Block Store (EBS). http://aws.amazon.com/ebs/. Last visited: Nov 2012

  7. Amazon Web Services LLC: Amazon Elastic Compute Cloud (Amazon EC2). http://aws.amazon.com/ec2. Last visited: Nov 2012

  8. Amazon Web Services LLC: Amazon Simple Storage Service (Amazon S3). http://aws.amazon.com/s3/. Last visited: Nov 2012

  9. Amazon Web Services LLC: High Performance Computing Using Amazon EC2. http://aws.amazon.com/ec2/hpc-applications/. Last visited: Nov 2012

  10. Carns, P., Ligon III, W., Ross, R., Thakur, R.: PVFS: a parallel virtual file system for linux clusters. In: Proc. 4th Annual Linux Showcase & Conference, Atlanta, GA, USA, pp. 317–328 (2000)

  11. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  12. Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proc. 20th ACM/IEEE Supercomputing Conference (SC’08), Austin, TX, USA, pp. 50:1–50:12 (2008)

  13. Evangelinos, C., Hill, C.N.: Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on Amazon’s EC2. In: Proc. 1st Workshop on Cloud Computing and Its Applications (CCA’08), Chicago, IL, USA, pp. 1–6 (2008)

  14. Expósito, R.R., Taboada, G.L., Ramos, S., Touriño, J., Doallo, R.: Performance analysis of HPC applications in the cloud. Future Gener. Comput. Syst. 29(1), 218–229 (2013)

    Article  Google Scholar 

  15. Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proc. 11th European PVM/MPI Users’ Group Meeting (EuroPVM/MPI’04), Budapest, Hungary, pp. 97–104 (2004)

  16. Ghoshal, D., Canon, R.S., Ramakrishnan, L.: I/O performance of virtualized cloud environments. In: Proc. 2nd International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC’11), Seattle, WA, USA, pp. 71–80 (2011)

  17. Gunarathne, T., Wu, T.L., Qiu, J., Fox, G.: MapReduce in the clouds for science. In: Proc. 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom’10), Indianapolis, IN, USA, pp. 565–572 (2010)

  18. Huang, W., Liu, J., Abali, B., Panda, D.K.: A case for high performance computing with virtual machines. In: Proc. 20th ACM International Conference on Supercomputing (ICS’06), Cairns, Australia, pp. 125–134 (2006)

  19. Juve, G., Deelman, E., Berriman, G.B., Berman, B.P., Maechling, P.: An evaluation of the cost and performance of scientific workflows on Amazon EC2. J. Grid Comput. 10(1), 5–21 (2012)

    Article  Google Scholar 

  20. Liu, M., Zhai, J., Zhai, Y., Ma, X., Chen, W.: One optimized I/O configuration per HPC application: leveraging the configurability of cloud. In: Proc. 2nd ACM SIGOPS Asia-Pacific Workshop on Systems (APSys’11), Shanghai, China, pp. 1–5 (2011)

  21. Mauch, V., Kunze, M., Hillenbrand, M.: High performance cloud computing. Future Gener. Comput. Syst. (2012) doi:10.1016/j.future.2012.03.011

  22. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (1995)

  23. Napper, J., Bientinesi, P.: Can cloud computing reach the TOP500? In: Proc. Combined Workshops on UnConventional High Performance Computing Workshop Plus Memory Access Workshop (UCHPC-MAW’09), Ischia, Italy, pp. 17–20 (2009)

  24. NASA: NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html. Last visited: Nov 2012

  25. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: Proc. 9th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’09), Shanghai, China, pp. 124–131 (2009)

  26. de Oliveira, D., Ocaña, K.A.C.S., Baião, F.A., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10(3), 521–552 (2012)

    Article  Google Scholar 

  27. Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: A performance analysis of EC2 cloud computing services for scientific computing. In: Proc. 1st International Conference on Cloud Computing (CLOUDCOMP’09), Munich, Germany, pp. 115–131 (2009)

  28. Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science Grids: a viable solution? In: Proc. 1st International Workshop on Data-aware Distributed Computing (DADC’08), Boston, MA, USA, pp. 55–64 (2008)

  29. Ramakrishnan, L., Canon, R.S., Muriki, K., Sakrejda, I., Wright, N.J.: Evaluating interconnect and virtualization performance for high performance computing. SIGMETRICS Perform. Eval. Rev. 40(2), 55–60 (2012)

    Article  Google Scholar 

  30. Regola, N., Ducom, J.C.: Recommendations for virtualization technologies in high performance computing. In: Proc. 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom’10), Indianapolis, IN, USA, pp. 409–416 (2010)

  31. Rodero, I., Viswanathan, H., Lee, E.K., Gamell, M., Pompili, D., Parashar, M.: Energy-efficient thermal-aware autonomic management of virtualized HPC cloud infrastructure. J. Grid Comput. 10(3), 447–473 (2012)

    Article  Google Scholar 

  32. Shafer, J.: I/O virtualization bottlenecks in cloud computing today. In: Proc. 2nd Workshop on I/O Virtualization (WIOV’10), Pittsburgh, PA, USA, p. 5 (7 p.) (2010)

  33. Shan, H., Antypas, K., Shalf, J.: Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In: Proc. 20th ACM/IEEE Supercomputing Conference (SC’08), Austin, TX, USA, pp. 42:1–42:12 (2008)

  34. Sun, C., Nishimura, H., James, S., Song, K., Muriki, K., Qin, Y.: HPC cloud applied to lattice optimization. In: Proc. 2nd International Particle Accelerator Conference (IPAC’11), San Sebastian, Spain, pp. 1767–1769 (2011)

  35. Thakur, R., Gropp, W., Lusk, E.: On implementing MPI-IO portably and with high performance. In: Proc. 6th Workshop on I/O in Parallel and Distributed Systems (IOPADS ’99), Atlanta, GA, USA, pp. 23–32 (1999)

  36. Vecchiola, C., Pandey, S., Buyya, R.: High-performance cloud computing: a view of scientific applications. In: Proc. 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN’09), Kaoshiung, Taiwan, pp. 4–16 (2009)

  37. Walker, E.: Benchmarking Amazon EC2 for high-performance scientific computing. USENIX ;login: 33(5), 18–23 (2008)

    Google Scholar 

  38. Wong, P., van der Wijngaart, R.: NAS parallel benchmarks I/O version 2.4. Tech. Rep. NAS-03-002, NASA Ames Research Center (2003)

  39. Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput. 10(2), 279–310 (2012)

    Article  Google Scholar 

  40. Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Paravirtualization for HPC systems. In: Proc. International Workshop on XEN in HPC Cluster and Grid Computing Environments (XHPC’06), Sorrento, Italy, pp. 474–486 (2006)

  41. Yu, W., Vetter, J.S.: Xen-based HPC: a parallel I/O perspective. In: Proc. 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’08), Lyon, France, pp. 154–161 (2008)

  42. Zhai, Y., Liu, M., Zhai, J., Ma, X., Chen, W.: Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications. In: Proc. 23rd ACM/IEEE Supercomputing Conference (SC’11, State of the Practice Reports), Seattle, WA, USA, pp. 11:1–11:10 (2011)

  43. Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto R. Expósito.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Expósito, R.R., Taboada, G.L., Ramos, S. et al. Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform. J Grid Computing 11, 613–631 (2013). https://doi.org/10.1007/s10723-013-9250-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-013-9250-y

Keywords

Navigation