Reducing communication costs in collective I/O in multi-core cluster systems with non-exclusive scheduling

Cha, Kwangho; Maeng, Seungryoul

doi:10.1007/s11227-011-0669-2

Reducing communication costs in collective I/O in multi-core cluster systems with non-exclusive scheduling

Published: 23 September 2011

Volume 61, pages 966–996, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Kwangho Cha^1,2 &
Seungryoul Maeng¹

232 Accesses
Explore all metrics

Abstract

As the number of nodes in high performance computing (HPC) systems increases, collective I/O becomes an important issue and I/O aggregators are the key factors in improving the performance of collective I/O. When an HPC system uses non-exclusive scheduling, a different number of CPU cores per node can be assigned for MPI jobs; thus, I/O aggregators experience a disparity in their workloads and communication costs. Because the communication behaviors are influenced by the sequence of the I/O aggregators and by the number of CPU cores in neighbor nodes, changing the order of the nodes affects the communication costs in collective I/O. There are few studies, however, that seek to incorporate steps to adequately determine the node sequence. In this study, it was found that an inappropriate order of nodes results in an increase in the collective I/O communication costs. In order to address this problem, we propose the use of specific heuristic methods to regulate the node sequence. We also develop a prediction function in order to estimate the MPI-IO performance when using the proposed heuristic functions. The performance measurements indicated that the proposed scheme achieved its goal of preventing the performance degradation of the collective I/O process. For instance, in a multi-core cluster system with the Lustre file system, the read bandwidth of MPI-Tile-IO was improved by 7.61% to 17.21% and the write bandwidth of the benchmark was also increased by 17.05% to 26.49%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

Algorithm Selection of MPI Collectives Considering System Utilization

References

TOP 500 Supercomputer Sites (2010) http://www.top500.org/. Accessed 17 August 2010
Shan H, Shalf J (2007) Using IOR to analyze the I/O performance for HPC platforms. In: Cray users group meeting (CUG), Seattle, Washington
Google Scholar
Zhang Z, Espinosa A, Iskra K, Raicu I, Foster I, Wilde M (2008) Design and evaluation of a collective IO model for loosely coupled petascale programming. In: Proc of the ACM/IEEE SC08 workshop on many-task computing on grids and supercomputers, pp 1–10. ISBN:978-1-4244-2872-4
Chapter Google Scholar
Liao W-K, Choudhary A (2008) Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: Proc of the 2008 ACM/IEEE conference on supercomputing, Article no 3. ISBN:978-1-4244-2834-2
Google Scholar
Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in ROMIO. In: Proc of the 7th symposium on the frontiers of massively parallel computation. IEEE Computer Society Press, Los Alamitos, pp 182–189. ISBN:0-7695-0087-0
Chapter Google Scholar
Prost J-P, Treumann R, Hedges R, Jia B, Koniges A (2001) MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS. In: Proc of the 2001 ACM/IEEE conference on supercomputing. ISBN:1-58113-293-X
Google Scholar
Nitzberg B, Lo V (1997) Collective buffering: improving parallel I/O performance. In: Proc of the IEEE international symposium on high performance distributed computing, pp 148–157. ISBN:0-8186-8117-9
Chapter Google Scholar
Ma X, Winslett M, Lee J, Yu S (2003) Improving MPI-IO output performance with active buffering plus threads. In: Proc of the international parallel and distributed processing symposium. ISBN:0-7695-1926-1
Google Scholar
Liao W-K, Coloma K, Choudhary A, Ward L (2007) Cooperative client-side file caching for MPI applications. Int J High Perform Comput Appl 21(2):144–154. ISSN:1094-3420
Article Google Scholar
Liao W-K, Coloma K, Choudhary A, Ward L, Russell E, Tideman S (2005) Collective caching: application-aware client-side file caching. In: Proc of the 14th IEEE international symposium on high performance distributed computing, pp 81–90. ISBN:0-7803-9037-7
Chapter Google Scholar
Liao W-K, Ching A, Coloma K, Nisar A, Choudhary A, Chen J, Sankaran R, Klasky S (2007) Using MPI file caching to improve parallel write performance for large-scale scientific applications. In: Proc of the 2007 ACM/IEEE conference on supercomputing, Article no 8. ISBN:978-1-59593-764-3
Google Scholar
Liao W-K, Ching A, Coloma K, Choudhary A, Ward L (2007) An implementation and evaluation of client-side file caching for MPI-IO. In: Proc of the IEEE international parallel and distributed processing symposium. ISBN:1-4244-0910-1
Google Scholar
Liao W-K, Ching A, Coloma K, Choudhary A, Kandemir M (2007) Improving MPI independent write performance using a two-stage write-behind buffering method. In: Proc of the IEEE IPDPS workshop on NSF next generation software program. ISBN:1-4244-0910-1
Google Scholar
Liao W-K, Coloma K, Choudhary A, Ward L, Russell E, Pundit N (2006) Scalable design and implementations for MPI parallel overlapping I/O. IEEE Trans Parallel Distrib Syst 17(11):1264–1276. ISSN:1045-9219
Article Google Scholar
Filgueira R, Carretero J, Singh DE, Calderón A, Núñez A (2010) Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications. J Supercomput. doi:10.1007/s11227-010-0440-0
Google Scholar
Filgueira R, Singh DE, Pichel JC, Isaila F, Carretero J (2008) Data locality aware strategy for two-phase collective I/O. In: High performance computing for computational science—VECPAR 2008. LNCS, vol 5336. Springer, Berlin, pp 137–149. ISBN:978-3-540-92858-4
Chapter Google Scholar
Thakur R, Choudhary A (1996) An extended two-phase method for accessing sections of out-of-core arrays. Sci Program 5(4):301–317. ISSN:1058-9244
Google Scholar
Kotz D (1997) Disk-directed I/O for MIMD multiprocessors. ACM Trans Comput Syst 15(1):41–74. ISSN:0734-2071
Article MathSciNet Google Scholar
Yu W, Vetter J (2008) ParColl: partitioned collective I/O on the cray XT. In: Proc of the 37th international conference on parallel processing, pp 562–569. ISBN:978-0-7695-3374-2
Chapter Google Scholar
Kandemir M (2001) Compiler-directed collective-I/O. IEEE Trans Parallel Distrib Syst 12(12):1318–1331. ISSN:1045-9219
Article Google Scholar
Patrick CM, Son SW, Kandemir M (2008) Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO. Oper Syst Rev 42(6):43–49. ISSN:0163-5980
Article Google Scholar
Dickens PM, Logan J (2010) A high performance implementation of MPI-IO for a Lustre file system environment. Concurr Comput 22(11):1433–1449. ISSN:1532-0626
Google Scholar
Dickens PM, Logan J (2009) Y-Lib: a user level library to increase the performance of MPI-IO in a Lustre file system environment. In: Proc of the 18th ACM international symposium on high performance distributed computing, pp 31–38. ISBN:978-1-60558-587-1
Chapter Google Scholar
Nagle D, Serenyi D, Mattews A (2004) The panasas ActiveScale storage cluster—delivering scalable high bandwidth storage. In: Proc of the ACM/IEEE conference on supercomputing, pp 53–53. ISBN:0-7695-2153-3
Google Scholar
Sun Grid Engine Home (2011) http://wikis.sun.com/display/GridEngine/Home. Accessed 19 June 2011
Portable Batch System (2011) http://www.nas.nasa.gov/Software/PBS/. Accessed 19 June 2011
Workload Management with LoadLeveler (2011) http://www.redbooks.ibm.com/abstracts/sg246038.html. Accessed 19 June 2011
Vienne J, Martinasso M, Vincent J-M, Méhaut J-F (2008) Predictive models for bandwidth sharing in high performance clusters. In: Proc of the IEEE international conference on cluster computing, pp 286–291. ISBN:978-1-4244-2639-3
Google Scholar
Parallel I/O Benchmarking Consortium (2010) http://www.mcs.anl.gov/research/projects/pio-benchmark/. Accessed 17 August 2010
Sebepou Z, Magoutis K, Marazakis M, Bilas A (2008) A comparative experimental study of parallel file systems for large-scale data processing. In: Proc of the first USENIX workshop on large-scale computing, Article no 5. ISBN:978-1-931971-59-1
Google Scholar
Borrill J, Oliker L, Shalf J, Shan H, Uselton A (2009) HPC global file system performance analysis using a scientific-application derived benchmark. Parallel Comput 35(6):358–373. ISSN:0167-8191
Article Google Scholar
Bhatele A, Wesolowski L, Bohm E, Solomonik E, Kale LV (2010) Understanding application performance via micro-benchmarks on three large supercomputers: Intrepid, Ranger and Jaguar. Int J High Perform Comput Appl 24(4):411–427. ISSN:1094-3420
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Korea Advanced Institute of Science and Technology, Daejeon, Korea
Kwangho Cha & Seungryoul Maeng
Supercomputing Center, Korea Institute of Science and Technology Information, Daejeon, Korea
Kwangho Cha

Authors

Kwangho Cha
View author publications
You can also search for this author in PubMed Google Scholar
Seungryoul Maeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwangho Cha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cha, K., Maeng, S. Reducing communication costs in collective I/O in multi-core cluster systems with non-exclusive scheduling. J Supercomput 61, 966–996 (2012). https://doi.org/10.1007/s11227-011-0669-2

Download citation

Published: 23 September 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11227-011-0669-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing communication costs in collective I/O in multi-core cluster systems with non-exclusive scheduling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

Algorithm Selection of MPI Collectives Considering System Utilization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Reducing communication costs in collective I/O in multi-core cluster systems with non-exclusive scheduling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory

Algorithm Selection of MPI Collectives Considering System Utilization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation