Abstract
Effective utilization of symmetric shared-memory multiprocessors (SMPs) is predicated on the development of efficient parallel code. Unfortunately, efficient parallelism is not always easy for the programmer to identify. Worse, exploiting such parallelism may directly conflict with optimizations affecting per-processor utilization (i.e. loop reordering to improve data locality). Here, we present our experience with a loop-level parallel compiler optimization for SMPs proposed by McKinley [[6]]. The algorithm uses dependence analysis and a simple model of the target machine, to transform nested loops. The goal of the approach is to promote efficient execution of parallel loops by exposing sources of large-grain parallel work while maintaining per-processor locality. We implement the optimization within the Scale compiler framework, and analyze the performance of multiprocessor code produced for three microbenchmarks.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
C. Aoki, P. Damron, K. Goebel, V. Grover, X. Kong, M. Lai, K. Subramanian, P. Tirumalai, and J. Wang. A parallelizing compiler for UltraSPARC. 1996.
B. Chandramouli, J.B. Carter, W.C. Hsieh, and S.A. McKee. A cost framework for evaluating integrated restructuring optimizations. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 131–141, Spain, September 2001.
K. Kennedy and K.S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the ACM International Conference on Supercomputing, pages 323–334, Washington, DC, July 1992.
S. Leung. Array restructuring for cache locality. Technical Report UW-CSE-96-08-01, University of Washington, Department of Computer Science, August 1996.
W. Li and K. Pingali. Access normalization: Loop restructuring for NUMA compilers. ACM Transactions on Computer Systems, 11(4):353–375, November 1993.
K.S. McKinley. A compiler optimization algorithm for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 9(8):769–787, August 1998.
K.S. McKinley, S. Carr, and C. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.
M.E. Wolf and M.S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Johnson, G.S., Sethumadhavan, S. (2003). Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds) Computational Science — ICCS 2003. ICCS 2003. Lecture Notes in Computer Science, vol 2659. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44863-2_93
Download citation
DOI: https://doi.org/10.1007/3-540-44863-2_93
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40196-4
Online ISBN: 978-3-540-44863-1
eBook Packages: Springer Book Archive