Skip to main content
Log in

Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This work presents an optimization of MPI communications, called Dynamic-CoMPI, which uses two techniques in order to reduce the impact of communications and non-contiguous I/O requests in parallel applications. These techniques are independent of the application and complementaries to each other. The first technique is an optimization of the Two-Phase collective I/O technique from ROMIO, called Locality aware strategy for Two-Phase I/O (LA-Two-Phase I/O). In order to increase the locality of the file accesses, LA-Two-Phase I/O employs the Linear Assignment Problem (LAP) for finding an optimal I/O data communication schedule. The main purpose of this technique is the reduction of the number of communications involved in the I/O collective operation. The second technique, called Adaptive-CoMPI, is based on run-time compression of MPI messages exchanged by applications. Both techniques can be applied on every application, because both of them are transparent for the users. Dynamic-CoMPI has been validated by using several MPI benchmarks and real HPC applications. The results show that, for many of the considered scenarios, important reductions in the execution time are achieved by reducing the size and the number of the messages. Additional benefits of our approach are the reduction of the total communication time and the network contention, thus enhancing, not only performance, but also scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Message Passing Interface Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl 8:165–414

    Google Scholar 

  2. Nieuwejaar N, Kotz D, Purakayastha A, Ellis CS, Best M (1996) File-access characteristics of parallel scientific workloads. IEEE Trans Parallel Distrib Syst 7(10):1075–1089

    Article  Google Scholar 

  3. Simitci H, Reed DA (1998) A comparison of logical and physical parallel I/O patterns. Int J Supercomput Appl High Perform Comput 12(3):364–380

    Article  Google Scholar 

  4. Gropp W, Lusk E (1997) Sowing MPICH: a case study in the dissemination of a portable environment for parallel scientific computing. Int J Supercomput Appl High Perform Comput 11(2):103–114

    Article  Google Scholar 

  5. Kotz D (1994) Disk-directed I/O for mimd multiprocessors. In: Proceedings of the 1994 symposium on operating systems design and implementation, pp 61–74

  6. Seamons K, Chen Y, Jones P, Jozwiak J, Winslett M (1995) Server-directed collective I/O in panda. In: Proceedings of supercomputing ’95

  7. del Rosario J, Bordawekar R, Choundary A (1993) Improved parallel I/O via a two-phase run-time access strategy. ACM Comput Archit News 21:31–38

    Article  Google Scholar 

  8. Bordawekar R (1997) Implementation of collective I/O in the intel paragon parallel file system: initial experiences. In: ICS ’97: Proceedings of the 11th international conference on supercomputing. ACM Press, New York, pp 20–27

    Google Scholar 

  9. Yu W, Vetter J, Canon RS, Jiang S (2007) Exploiting lustre file joining for effective collective io. In: Cluster computing and the grid, IEEE international symposium on, pp 267–274

  10. Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th symposium on the frontiers of massively parallel computation, Argonne national laboratory (1999), pp 182–189

  11. Thakur R, Gropp W, Lusk E (2002) Optimizing noncontiguous accesses in MPI-IO. Parallel Comput 28(1):83–106

    Article  MATH  Google Scholar 

  12. Keng Liao W, Coloma K, Choudhary A, Ward L, Russel E, Tideman S (2005) Collective caching: Application-aware client-side file caching. In: Proceedings of the 14th international symposium on high performance distributed computing (HPDC)

  13. Keng Liao W, Coloma K, Choudhary AN, Ward L (2005) Cooperative write-behind data buffering for MPI-I/O. In: PVM/MPI, pp 102–109

  14. Isaila F, Malpohl G, Olaru V, Szeder G, Tichy W (2004) Integrating collective i/o and cooperative caching into the “clusterfile” parallel file system. In: ICS 04: Proceedings of the 18th annual international conference on supercomputing. ACM Press, New York, pp 58–67

    Chapter  Google Scholar 

  15. Filgueira R, Singh DE, Pichel JC, Isaila F, Carretero J (2008) Data locality aware strategy for two-phase collective i/o. In: High performance computing for computational science—VECPAR 2008: 8th international conference, Toulouse, France, June 24–27, 2008. Revised Selected Papers, pp 137–149

  16. Balkanski D, Trams M, Rehm W (2003) Heterogeneous computing with MPICH/madeleine and PACX MPI: a critical comparison

  17. Keller RML (2005) Using PACX-MPI in metacomputing applications. In: 18th symposium simulations technique, Erlangen, September 12–15

  18. Ratanaworabhan P, Ke J, Burtscher M (2006) Fast lossless compression of scientific floating-point data. In: DCC ’06: proceedings of the data compression conference. IEEE Computer Society, Washington, pp 133–142

    Google Scholar 

  19. Ke J, Burtscher M, Speight E (2004) Runtime compression of MPI messages to improve the performance and scalability of parallel applications. In: SC ’04: proceedings of the 2004 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington, p 59

    Google Scholar 

  20. Carretero J, No J, Park SS, Choudhary A, Chen P (1998) COMPASSION: a parallel I/O runtime system including chunking and compression for irregular applications. In: Proceedings of the international conference on high-performance computing and networking. April 1998, pp 668–677

  21. Markus F, Oberhumer XJ (2002) LZO. http://www.oberhumer.com/opensource/lzo/lzodoc.php

  22. Garcia-Carballeira F, Calderon AJC (1999) Mimpi: a multithread-safe implementation of MPI. In: Recent advances in parallel virtual machine and message passing interface, 6th European PVM/MPI users group meeting, 1999, pp 207–214

  23. Thakur R (2006) Issues in developing a thread-safe mpi implementation. In: Recent advances in parallel virtual machine and message passing interface, 13th European PVM/MPI users group meeting. Springer, Berlin, pp 12–21

    Google Scholar 

  24. Filgueira R, Singh DE, Calderon A, Carretero J (2009) CoMPI:Enhancing MPI based applications performance and scalability using compression. In: European PVM/MPI

  25. Filgueira R, Singh DE, Carretero J, Calderón A (2009) Technical report:enhancing MPI based applications performance and scalability by using adaptive compression. http://www.arcos.inf.uc3m.es/doku.php?id=arcos_tr

  26. Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340

    Article  MathSciNet  MATH  Google Scholar 

  27. Blackman S (1986) Multiple-target tracking with radar applications. In: Dedham. Artech House, Norwood

    Google Scholar 

  28. Carpaneto SG, Oth P (1988) Algorithms and codes for the assignment problem. Ann Oper Res 13(1):191–223

    Article  Google Scholar 

  29. Martin RP, Vahdat AM, Culler DE, Anderson TE (1997) Effects of communication latency, overhead, and bandwidth in a cluster architecture. SIGARCH Comput Archit News 25(2):85–97

    Article  Google Scholar 

  30. Ke J, Burtscher M, Speight E (2004) Runtime compression of MPI messanes to improve the performance and scalability of parallel applications. In: SC ’04: proceedings of the 2004 ACM/IEEE conference on supercomputing, p 59

  31. Gropp W, Lusk EL (1999) Reproducible measurements of MPI performance characteristics. In: Proceedings of the 6th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface. Springer, Berlin, pp 11–18

    Chapter  Google Scholar 

  32. Mucci PJ, London K, Mucci PJ (1998) The MPBench Report. Technical report

  33. Reussner R, Sanders P, Träff JL (2002) SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Sci Program 10(1):55–65

    Google Scholar 

  34. Hockney RW (1994) The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput 20(3):389–398

    Google Scholar 

  35. Wunderlich HJ (1990) Multiple distributions for biased random test patterns. IEEE Trans Comput-Aided Des Integr Circuits Syst 9(6):584–593

    Article  Google Scholar 

  36. Majumdar A (1996) On evaluating and optimizing weights for weighted random pattern testing. IEEE Trans Comput 45(8):904–916

    Article  MATH  Google Scholar 

  37. Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI users’ group meeting, Budapest, Hungary, September 2004, pp 97–104

  38. Huang W, Santhanaraman G, Jin HW, Gao Q, Panda DKDKX (2006) Design of high performance mvapich2: Mpi2 over infiniband. In: CCGRID ’06: proceedings of the sixth IEEE international symposium on cluster computing and the grid. IEEE Computer Society, Washington, pp 43–48

    Chapter  Google Scholar 

  39. Gropp W, Lusk E, Doss N, Skyellum A (1996) A high performance, portable implementation of the mpi message passing interface standard. Parallel Comput

  40. Loureiro A, Gonzlez J, Pena TF (2003) A parallel 3D semiconductor device simulator for gradual heterojunction bipolar transistors. Int J Numer Modell Electron Netw Devices Fields 16:53–66

    Article  MATH  Google Scholar 

  41. Pichel JC, Singh DE, Rivera FF (2006) Image segmentation based on merging of sub-optimal segmentations. Pattern Recogn Lett 27(10):1105–1116

    Article  Google Scholar 

  42. Mourino J, Martin M, Doallo R, Singh D, Rivera F, Bruguera J (2004) The STEM-II air quality model on a distributed memory system

  43. Carter Russell, Ciotti B, Fineberg S, Nitzberg B (1992) NHT-1 I/O benchmarks. Technical Report RND-92-016, NAS Systems Division, NASA, Ames

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesús Carretero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filgueira, R., Carretero, J., Singh, D.E. et al. Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications. J Supercomput 59, 361–391 (2012). https://doi.org/10.1007/s11227-010-0440-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-010-0440-0

Keywords

Navigation