Abstract
OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the “partially replicating shared arrays” memory model, we propose an algorithm for shared array recognition based on the inter-procedural analysis, optimization technique based on the producer/consumer relationship, and communication generation technique for nonlinear references. We evaluate the performance on nine benchmarks which cover computational fluid dynamics, integer sorting, molecular dynamics, earthquake simulation, and computational chemistry. The average scalability achieved by KLCoMP version is close to that achieved by MPI version. We compare the performance of our translated programs with that of versions generated for Omni+SCASH, LLCoMP, and OpenMP(Purdue), and find that parallel applications (especially, irregular applications) translated by KLCoMP can achieve more effective performance than other versions.
Similar content being viewed by others
References
OpenMP Architecture Review Board. OpenMP Application Program Interface, version 2.5, 2005
Sato M, Satoh S, Kusano K, et al. Design of OpenMP compiler for an SMP cluster. In: Proc. of the 1st European Workshop on OpenMP. Belin: Springer, 1999. 32–39
Costa J J, Cortes T, Martorell X, et al. Running OpenMP applications efficiently on an everything-shared SDSM. J Parall Distrib Comput, 2006, 66: 647–658
Min S J, Eigenmann R. Combined compile-time and runtime-driven, pro-active data movement in software DSM systems. In: Proc. of Seventh Workshop on Languages, Compilers, and Run-time Support for Scalable Systems, Houston, Texas, 2004. 1–6
Lu H H. Quantifying the performance differences between PVM and TreadMarks. J Parall Distrib Comput, 1997, 43: 65–78
Basumallik A, Min S, Eigenmann R. Programming distributed memory systems using OpenMP. In: Proc. of International Parallel and Distributed Processing Symposium. New York: IEEE Press, 2007. 1–8
Basumallik A, Eigenmann R. Towards automatic translation of OpenMP to MPI. In: Proc. of the 19th Annual International Conference on Supercomputing. New York: ACM Press, 2005. 189–198
Basumallik A, Eigenmann R. Optimizing irregular shared-memory applications for distributed-memory systems. In: Proc. of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM Press, 2006. 119–128
MPICH2.1.0.7, http://www.mcs.anl.gov/research/projects/mpich2/, March 21, 2008
Dorta A, Lopez P, Sande F. Basic skeletons in llc. Parall Comput, 2006, 32: 491–506
Eigenmann R, Hoeflinger J, Kuhn R H, et al. Is OpenMP for Grids? In: Proc. of International Parallel and Distributed Processing Symposium. New York: IEEE Press, 2002. 171–178
Jeun W C, Kee Y S, Ha S. Improving performance of OpenMP for SMP clusters through overlapped page migrations. In: Proc. of International Workshop on OpenMP, Reims, France, 2006
Eachempati D, Huang L, Chapman B M. Strategies and implementation for translating OpenMP code for clusters. In: Proc. of High Performance Computing and Communications. Belin: Springer, 2007. 420–431
Jin H, Frumkin M, Yan J. The OpenMP implementation of NAS parallel benchmarks and its performance. Technical Report NAS-99-011, 1999
Aslot V, Domeika M, Eigenmann R. SPEComp: A new benchmark suite for measuring parallel computer performance. In: Proc. of the Workshop on OpenMP Applications and Tools. Belin: Springer, 2001. 1–10
COSMIC group, University of Maryland. COSMIC software for irregular applications. http://www.cs.umd.edu/projects/osmic/software.html
Brooks B R, Bruccoleri R E, Olafson B D, et al. A program for macromolecular energy, minimization, and dynamics calculations. J Comp Chem, 1983, 4: 187–217
Brandes T. ADAPTOR Users Guide, Fraunhofer Gesellschaft, Augustin, Germany, 2004
Petersen P, Padua D A. Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parall Distrib Syst, 1996, 7: 1121–1132
Brezany P, Dang M. CHAOS+ Runtime Library. Internal Report, Institute for Software Technology and Parallel Systems, University of Vienna, September 1997
Michelle M, Barbara K, Paul D. Data-flow analysis for MPI programs. In: Proceedings of the 2006 International Conference on Parallel Processing, Columbus, Ohio, USA, 2006. 175–184
Wang J, Hu C J, Zhang J L, et al. An optimized strategy for collective communication in data parallelism (in Chinese). Chinese J Comput, 2008, 2: 318–328
Engelen R, Birch J, Shou Y, et al. A unified framework for nonlinear dependence testing and symbolic analysis. In: Proc. of the ACM International Conference on Supercomputing. New York: ACM Press, 2004. 106–115
Li Z. Array privatization for parallel execution of loops. In: Proc. of the ACM International Conference on Supercomputing. New York: ACM Press, 1992. 313–322
Haghighat M R, Polychronopoulos C D. Symbolic analysis for parallelizing compilers. ACM Trans Program Languag Syst, 1996. 18: 477–518
Hu C, Li J, Wang J, et al. Communication generation for irregular parallel applications. In: Proc. of IEEE International Symposium on Parallel Computing in Electrical Engineering. New York: IEEE Press, 2006. 263–270
Wang J, Hu C, Zhang J, et al. OpenMP extensions for irregular parallel applications on cluster international workshop on OpenMP. Lecture Notes in Computer Science 4935. Berlin: Springer Publisher, 2007. 101–111
Tseng E, Gaudlot J. Communication generation for aligned and cyclic(k) distributions using integer lattice. IEEE Trans Parallel Distrib Syst, 1999, 10: 136–146
Ojima Y, Sato M, Harada H, et al. Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system, cluster computing and the grid. In: Proc. of 3rd IEEE/ACM International Symposium on CCGrid, Tokyo, Japan, 2003. 450–456
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, J., Hu, C., Zhang, J. et al. OpenMP compiler for distributed memory architectures. Sci. China Inf. Sci. 53, 932–944 (2010). https://doi.org/10.1007/s11432-010-0074-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-010-0074-0