Abstract
The OpenMP Application Program Interface supports parallel programming on scalable symmetric multiprocessor machines (SMP) with a shared memory by providing the user with simple work-sharing directives for C/C++ and Fortran so that the compiler can generate parallel programs based on thread parallelism. However, the lack of language features for exploiting data locality often results in poor performance since the non-uniform memory access times on scalable SMP machines cannot be neglected. HPF, the de-facto standard for data parallel programming, offers a rich set of data distribution directives in order to exploit data locality, but has mainly been targeted towards distributed memory machines. In this paper we describe an optimized execution model for HPF programs on SMP machines that avails itself with the mechanisms provided by OpenMP for work sharing and thread parallelism while exploiting data locality based on user-specified distribution directives. This execution model has been implemented in the ADAPTOR HPF compilation system and experimental results verify the efficiency of the chosen approach.
The work described in this paper was supported by NEC Europe Ltd. as part of the ADVICE project in cooperation with the NEC C&C Research Laboratories.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
ADAPTOR. High Performance Fortran Compilation System. WWW documentation, Institute for Algorithms and Scientific Computing (SCAI, GMD), 1999. http://www.gmd.de/SCAI/lab/adaptor.
G. Antoniu, L. Bougé, R. Namyst, and C. Perez. Compiling data-parallel programs to a distributed runtime environment with thread isomigration. In The 1999 Intl. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA), vol. 4, pages 1756–1762, Las Vegas, NV, June 1999.
S. Benkner and T. Brandes. Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures. In Parallel and Distributed Processing, Proceedings of 15 IPDPS 2000 Workshops, Cancun, Mexico, Lecture Notes in Computer Science (1800), pages 435–442. Springer Verlag, May 2000.
T. Brandes. Exploiting Advanced Task Parallelism in High Performance Fortran via a Task Library. In Amestoy, P. and Berger, P. and Dayde, M. and Duff, I. and Giraud, L. and Frayssé, V. and Ruiz, D. (Eds.), editor, Euro-Par’99 Parallel Processing, Toulouse, pages 833–844. Lecture Notes in Computer Science (1685), Springer-Verlag Berlin Heidelberg, Sept. 1999.
T. Brandes and R. Höver-Klier. ADAPTOR User’s Guide (Version 7.0). Technical documentation, GMD, Dec. 1999. Available via anonymous ftp://ftp.gmd.deasgmd/adaptor/docs/uguide.ps.
B. Chapman, P. Mehrotra, and H. Zima. Enhancing OpenMP with Features for Locality Control. Technical report TR99-02, Inst. for Software Technology and Parallel Systems, U. Vienna, Feb. 1999. http://www.par.univie.ac.at.
J. Clinckemaillie, B. Elsner, and G.L. et al. Performance issues of the parallel PAM-CRASH code. The International Journal of Supercomputer Applications and High Performance Computing, 11(1):3–11, Spring 1997.
M. Gupta and E. Schonberg. Static analysis to reduce synchronization costs in data-parallel programs. In Conference Record of POPL’ 96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 322–332. ACM SIGACT and SIGPLAN, ACM Press, 1996.
High Performance Fortran Forum. High Performance Fortran Language Specification. Version 2.0, Department of Computer Science, Rice University, Jan. 1997.
M. Leair, J. Merlin, S. Nakamoto, V. Schuster, and M. Wolfe. Distributed OMP-A Programming Model for SMP Clusters. In Eighth International Workshop on Compilers for Parallel Computers, pages 229–238, Aussois, France, Jan. 2000.
M. O’Boyle and F. Bodin. Compiler reduction of synchronisation in shared virtual memory systems. In 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 318–327. ACM Press, July 1995.
M. Resch, I. Loebich, and B. Sander. A comparison of OpenMP and MPI for the parallel CFD test case. In Workshop on OpenMP (EWOMP’99) at Lund/Sweden, September 30–October 1 1999, Oct. 1999.
J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-time scheduling and execution of loops on message passing machines. Journal of Parallel and Distributed Computing, 8:303–312, 1990.
Silicon Graphics, Inc. MIPSpro (TM) Power Fortran 77 Programme’s Guide. Document 007-2361-007, SGI, 1999.
The OpenMP Forum. OpenMP Fortran Application Program Interface. Proposal Ver 1.0, SGI, Oct. 1997. http://www.openmp.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Benkner, S., Brandes, T. (2000). Exploiting Data Locality on Scalable Shared Memory Machines with Data Parallel Programs. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds) Euro-Par 2000 Parallel Processing. Euro-Par 2000. Lecture Notes in Computer Science, vol 1900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44520-X_90
Download citation
DOI: https://doi.org/10.1007/3-540-44520-X_90
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67956-1
Online ISBN: 978-3-540-44520-3
eBook Packages: Springer Book Archive