Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications

Filgueira, Rosa; Carretero, Jesús; Singh, David E.; Calderón, Alejandro; Núñez, Alberto

doi:10.1007/s11227-010-0440-0

Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications

Published: 28 April 2010

Volume 59, pages 361–391, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Rosa Filgueira¹,
Jesús Carretero¹,
David E. Singh¹,
Alejandro Calderón¹ &
…
Alberto Núñez¹

244 Accesses
14 Citations
Explore all metrics

Abstract

This work presents an optimization of MPI communications, called Dynamic-CoMPI, which uses two techniques in order to reduce the impact of communications and non-contiguous I/O requests in parallel applications. These techniques are independent of the application and complementaries to each other. The first technique is an optimization of the Two-Phase collective I/O technique from ROMIO, called Locality aware strategy for Two-Phase I/O (LA-Two-Phase I/O). In order to increase the locality of the file accesses, LA-Two-Phase I/O employs the Linear Assignment Problem (LAP) for finding an optimal I/O data communication schedule. The main purpose of this technique is the reduction of the number of communications involved in the I/O collective operation. The second technique, called Adaptive-CoMPI, is based on run-time compression of MPI messages exchanged by applications. Both techniques can be applied on every application, because both of them are transparent for the users. Dynamic-CoMPI has been validated by using several MPI benchmarks and real HPC applications. The results show that, for many of the considered scenarios, important reductions in the execution time are achieved by reducing the size and the number of the messages. Additional benefits of our approach are the reduction of the total communication time and the network contention, thus enhancing, not only performance, but also scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

Hierarchical redesign of classic MPI reduction algorithms

Article 18 June 2016

Hierarchical Optimization of MPI Reduce Algorithms

References

Message Passing Interface Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl 8:165–414
Google Scholar
Nieuwejaar N, Kotz D, Purakayastha A, Ellis CS, Best M (1996) File-access characteristics of parallel scientific workloads. IEEE Trans Parallel Distrib Syst 7(10):1075–1089
Article Google Scholar
Simitci H, Reed DA (1998) A comparison of logical and physical parallel I/O patterns. Int J Supercomput Appl High Perform Comput 12(3):364–380
Article Google Scholar
Gropp W, Lusk E (1997) Sowing MPICH: a case study in the dissemination of a portable environment for parallel scientific computing. Int J Supercomput Appl High Perform Comput 11(2):103–114
Article Google Scholar
Kotz D (1994) Disk-directed I/O for mimd multiprocessors. In: Proceedings of the 1994 symposium on operating systems design and implementation, pp 61–74
Seamons K, Chen Y, Jones P, Jozwiak J, Winslett M (1995) Server-directed collective I/O in panda. In: Proceedings of supercomputing ’95
del Rosario J, Bordawekar R, Choundary A (1993) Improved parallel I/O via a two-phase run-time access strategy. ACM Comput Archit News 21:31–38
Article Google Scholar
Bordawekar R (1997) Implementation of collective I/O in the intel paragon parallel file system: initial experiences. In: ICS ’97: Proceedings of the 11th international conference on supercomputing. ACM Press, New York, pp 20–27
Google Scholar
Yu W, Vetter J, Canon RS, Jiang S (2007) Exploiting lustre file joining for effective collective io. In: Cluster computing and the grid, IEEE international symposium on, pp 267–274
Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th symposium on the frontiers of massively parallel computation, Argonne national laboratory (1999), pp 182–189
Thakur R, Gropp W, Lusk E (2002) Optimizing noncontiguous accesses in MPI-IO. Parallel Comput 28(1):83–106
Article MATH Google Scholar
Keng Liao W, Coloma K, Choudhary A, Ward L, Russel E, Tideman S (2005) Collective caching: Application-aware client-side file caching. In: Proceedings of the 14th international symposium on high performance distributed computing (HPDC)
Keng Liao W, Coloma K, Choudhary AN, Ward L (2005) Cooperative write-behind data buffering for MPI-I/O. In: PVM/MPI, pp 102–109
Isaila F, Malpohl G, Olaru V, Szeder G, Tichy W (2004) Integrating collective i/o and cooperative caching into the “clusterfile” parallel file system. In: ICS 04: Proceedings of the 18th annual international conference on supercomputing. ACM Press, New York, pp 58–67
Chapter Google Scholar
Filgueira R, Singh DE, Pichel JC, Isaila F, Carretero J (2008) Data locality aware strategy for two-phase collective i/o. In: High performance computing for computational science—VECPAR 2008: 8th international conference, Toulouse, France, June 24–27, 2008. Revised Selected Papers, pp 137–149
Balkanski D, Trams M, Rehm W (2003) Heterogeneous computing with MPICH/madeleine and PACX MPI: a critical comparison
Keller RML (2005) Using PACX-MPI in metacomputing applications. In: 18th symposium simulations technique, Erlangen, September 12–15
Ratanaworabhan P, Ke J, Burtscher M (2006) Fast lossless compression of scientific floating-point data. In: DCC ’06: proceedings of the data compression conference. IEEE Computer Society, Washington, pp 133–142
Google Scholar
Ke J, Burtscher M, Speight E (2004) Runtime compression of MPI messages to improve the performance and scalability of parallel applications. In: SC ’04: proceedings of the 2004 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington, p 59
Google Scholar
Carretero J, No J, Park SS, Choudhary A, Chen P (1998) COMPASSION: a parallel I/O runtime system including chunking and compression for irregular applications. In: Proceedings of the international conference on high-performance computing and networking. April 1998, pp 668–677
Markus F, Oberhumer XJ (2002) LZO. http://www.oberhumer.com/opensource/lzo/lzodoc.php
Garcia-Carballeira F, Calderon AJC (1999) Mimpi: a multithread-safe implementation of MPI. In: Recent advances in parallel virtual machine and message passing interface, 6th European PVM/MPI users group meeting, 1999, pp 207–214
Thakur R (2006) Issues in developing a thread-safe mpi implementation. In: Recent advances in parallel virtual machine and message passing interface, 13th European PVM/MPI users group meeting. Springer, Berlin, pp 12–21
Google Scholar
Filgueira R, Singh DE, Calderon A, Carretero J (2009) CoMPI:Enhancing MPI based applications performance and scalability using compression. In: European PVM/MPI
Filgueira R, Singh DE, Carretero J, Calderón A (2009) Technical report:enhancing MPI based applications performance and scalability by using adaptive compression. http://www.arcos.inf.uc3m.es/doku.php?id=arcos_tr
Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340
Article MathSciNet MATH Google Scholar
Blackman S (1986) Multiple-target tracking with radar applications. In: Dedham. Artech House, Norwood
Google Scholar
Carpaneto SG, Oth P (1988) Algorithms and codes for the assignment problem. Ann Oper Res 13(1):191–223
Article Google Scholar
Martin RP, Vahdat AM, Culler DE, Anderson TE (1997) Effects of communication latency, overhead, and bandwidth in a cluster architecture. SIGARCH Comput Archit News 25(2):85–97
Article Google Scholar
Ke J, Burtscher M, Speight E (2004) Runtime compression of MPI messanes to improve the performance and scalability of parallel applications. In: SC ’04: proceedings of the 2004 ACM/IEEE conference on supercomputing, p 59
Gropp W, Lusk EL (1999) Reproducible measurements of MPI performance characteristics. In: Proceedings of the 6th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface. Springer, Berlin, pp 11–18
Chapter Google Scholar
Mucci PJ, London K, Mucci PJ (1998) The MPBench Report. Technical report
Reussner R, Sanders P, Träff JL (2002) SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Sci Program 10(1):55–65
Google Scholar
Hockney RW (1994) The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput 20(3):389–398
Google Scholar
Wunderlich HJ (1990) Multiple distributions for biased random test patterns. IEEE Trans Comput-Aided Des Integr Circuits Syst 9(6):584–593
Article Google Scholar
Majumdar A (1996) On evaluating and optimizing weights for weighted random pattern testing. IEEE Trans Comput 45(8):904–916
Article MATH Google Scholar
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI users’ group meeting, Budapest, Hungary, September 2004, pp 97–104
Huang W, Santhanaraman G, Jin HW, Gao Q, Panda DKDKX (2006) Design of high performance mvapich2: Mpi2 over infiniband. In: CCGRID ’06: proceedings of the sixth IEEE international symposium on cluster computing and the grid. IEEE Computer Society, Washington, pp 43–48
Chapter Google Scholar
Gropp W, Lusk E, Doss N, Skyellum A (1996) A high performance, portable implementation of the mpi message passing interface standard. Parallel Comput
Loureiro A, Gonzlez J, Pena TF (2003) A parallel 3D semiconductor device simulator for gradual heterojunction bipolar transistors. Int J Numer Modell Electron Netw Devices Fields 16:53–66
Article MATH Google Scholar
Pichel JC, Singh DE, Rivera FF (2006) Image segmentation based on merging of sub-optimal segmentations. Pattern Recogn Lett 27(10):1105–1116
Article Google Scholar
Mourino J, Martin M, Doallo R, Singh D, Rivera F, Bruguera J (2004) The STEM-II air quality model on a distributed memory system
Carter Russell, Ciotti B, Fineberg S, Nitzberg B (1992) NHT-1 I/O benchmarks. Technical Report RND-92-016, NAS Systems Division, NASA, Ames

Download references

Author information

Authors and Affiliations

Department of Computer Science, University Carlos III of Madrid, Madrid, Spain
Rosa Filgueira, Jesús Carretero, David E. Singh, Alejandro Calderón & Alberto Núñez

Authors

Rosa Filgueira
View author publications
You can also search for this author inPubMed Google Scholar
Jesús Carretero
View author publications
You can also search for this author inPubMed Google Scholar
David E. Singh
View author publications
You can also search for this author inPubMed Google Scholar
Alejandro Calderón
View author publications
You can also search for this author inPubMed Google Scholar
Alberto Núñez
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jesús Carretero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filgueira, R., Carretero, J., Singh, D.E. et al. Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications. J Supercomput 59, 361–391 (2012). https://doi.org/10.1007/s11227-010-0440-0

Download citation

Published: 28 April 2010
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11227-010-0440-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Hierarchical MPI Reduction Collective Algorithms Targeted to Multicore Node Clusters

Hierarchical redesign of classic MPI reduction algorithms

Hierarchical Optimization of MPI Reduce Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now