Skip to main content
Log in

OpenMP compiler for distributed memory architectures

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the “partially replicating shared arrays” memory model, we propose an algorithm for shared array recognition based on the inter-procedural analysis, optimization technique based on the producer/consumer relationship, and communication generation technique for nonlinear references. We evaluate the performance on nine benchmarks which cover computational fluid dynamics, integer sorting, molecular dynamics, earthquake simulation, and computational chemistry. The average scalability achieved by KLCoMP version is close to that achieved by MPI version. We compare the performance of our translated programs with that of versions generated for Omni+SCASH, LLCoMP, and OpenMP(Purdue), and find that parallel applications (especially, irregular applications) translated by KLCoMP can achieve more effective performance than other versions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. OpenMP Architecture Review Board. OpenMP Application Program Interface, version 2.5, 2005

  2. Sato M, Satoh S, Kusano K, et al. Design of OpenMP compiler for an SMP cluster. In: Proc. of the 1st European Workshop on OpenMP. Belin: Springer, 1999. 32–39

    Google Scholar 

  3. Costa J J, Cortes T, Martorell X, et al. Running OpenMP applications efficiently on an everything-shared SDSM. J Parall Distrib Comput, 2006, 66: 647–658

    Article  MATH  Google Scholar 

  4. Min S J, Eigenmann R. Combined compile-time and runtime-driven, pro-active data movement in software DSM systems. In: Proc. of Seventh Workshop on Languages, Compilers, and Run-time Support for Scalable Systems, Houston, Texas, 2004. 1–6

  5. Lu H H. Quantifying the performance differences between PVM and TreadMarks. J Parall Distrib Comput, 1997, 43: 65–78

    Article  Google Scholar 

  6. Basumallik A, Min S, Eigenmann R. Programming distributed memory systems using OpenMP. In: Proc. of International Parallel and Distributed Processing Symposium. New York: IEEE Press, 2007. 1–8

    Google Scholar 

  7. Basumallik A, Eigenmann R. Towards automatic translation of OpenMP to MPI. In: Proc. of the 19th Annual International Conference on Supercomputing. New York: ACM Press, 2005. 189–198

    Chapter  Google Scholar 

  8. Basumallik A, Eigenmann R. Optimizing irregular shared-memory applications for distributed-memory systems. In: Proc. of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM Press, 2006. 119–128

    Chapter  Google Scholar 

  9. MPICH2.1.0.7, http://www.mcs.anl.gov/research/projects/mpich2/, March 21, 2008

  10. Dorta A, Lopez P, Sande F. Basic skeletons in llc. Parall Comput, 2006, 32: 491–506

    Article  Google Scholar 

  11. Eigenmann R, Hoeflinger J, Kuhn R H, et al. Is OpenMP for Grids? In: Proc. of International Parallel and Distributed Processing Symposium. New York: IEEE Press, 2002. 171–178

    Chapter  Google Scholar 

  12. Jeun W C, Kee Y S, Ha S. Improving performance of OpenMP for SMP clusters through overlapped page migrations. In: Proc. of International Workshop on OpenMP, Reims, France, 2006

  13. Eachempati D, Huang L, Chapman B M. Strategies and implementation for translating OpenMP code for clusters. In: Proc. of High Performance Computing and Communications. Belin: Springer, 2007. 420–431

    Chapter  Google Scholar 

  14. Jin H, Frumkin M, Yan J. The OpenMP implementation of NAS parallel benchmarks and its performance. Technical Report NAS-99-011, 1999

  15. Aslot V, Domeika M, Eigenmann R. SPEComp: A new benchmark suite for measuring parallel computer performance. In: Proc. of the Workshop on OpenMP Applications and Tools. Belin: Springer, 2001. 1–10

    Google Scholar 

  16. COSMIC group, University of Maryland. COSMIC software for irregular applications. http://www.cs.umd.edu/projects/osmic/software.html

  17. Brooks B R, Bruccoleri R E, Olafson B D, et al. A program for macromolecular energy, minimization, and dynamics calculations. J Comp Chem, 1983, 4: 187–217

    Article  Google Scholar 

  18. Brandes T. ADAPTOR Users Guide, Fraunhofer Gesellschaft, Augustin, Germany, 2004

  19. Petersen P, Padua D A. Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parall Distrib Syst, 1996, 7: 1121–1132

    Article  Google Scholar 

  20. Brezany P, Dang M. CHAOS+ Runtime Library. Internal Report, Institute for Software Technology and Parallel Systems, University of Vienna, September 1997

  21. Michelle M, Barbara K, Paul D. Data-flow analysis for MPI programs. In: Proceedings of the 2006 International Conference on Parallel Processing, Columbus, Ohio, USA, 2006. 175–184

  22. Wang J, Hu C J, Zhang J L, et al. An optimized strategy for collective communication in data parallelism (in Chinese). Chinese J Comput, 2008, 2: 318–328

    MathSciNet  Google Scholar 

  23. Engelen R, Birch J, Shou Y, et al. A unified framework for nonlinear dependence testing and symbolic analysis. In: Proc. of the ACM International Conference on Supercomputing. New York: ACM Press, 2004. 106–115

    Google Scholar 

  24. Li Z. Array privatization for parallel execution of loops. In: Proc. of the ACM International Conference on Supercomputing. New York: ACM Press, 1992. 313–322

    Google Scholar 

  25. Haghighat M R, Polychronopoulos C D. Symbolic analysis for parallelizing compilers. ACM Trans Program Languag Syst, 1996. 18: 477–518

    Article  Google Scholar 

  26. Hu C, Li J, Wang J, et al. Communication generation for irregular parallel applications. In: Proc. of IEEE International Symposium on Parallel Computing in Electrical Engineering. New York: IEEE Press, 2006. 263–270

    Google Scholar 

  27. Wang J, Hu C, Zhang J, et al. OpenMP extensions for irregular parallel applications on cluster international workshop on OpenMP. Lecture Notes in Computer Science 4935. Berlin: Springer Publisher, 2007. 101–111

    Google Scholar 

  28. Tseng E, Gaudlot J. Communication generation for aligned and cyclic(k) distributions using integer lattice. IEEE Trans Parallel Distrib Syst, 1999, 10: 136–146

    Article  Google Scholar 

  29. Ojima Y, Sato M, Harada H, et al. Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system, cluster computing and the grid. In: Proc. of 3rd IEEE/ACM International Symposium on CCGrid, Tokyo, Japan, 2003. 450–456

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jue Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, J., Hu, C., Zhang, J. et al. OpenMP compiler for distributed memory architectures. Sci. China Inf. Sci. 53, 932–944 (2010). https://doi.org/10.1007/s11432-010-0074-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-010-0074-0

Keywords

Navigation