Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors

Johnson, Gregory S.; Sethumadhavan, Simha

doi:10.1007/3-540-44863-2_93

Gregory S. Johnson⁶ &
Simha Sethumadhavan⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2659))

Included in the following conference series:

International Conference on Computational Science

546 Accesses

Abstract

Effective utilization of symmetric shared-memory multiprocessors (SMPs) is predicated on the development of efficient parallel code. Unfortunately, efficient parallelism is not always easy for the programmer to identify. Worse, exploiting such parallelism may directly conflict with optimizations affecting per-processor utilization (i.e. loop reordering to improve data locality). Here, we present our experience with a loop-level parallel compiler optimization for SMPs proposed by McKinley [[6]]. The algorithm uses dependence analysis and a simple model of the target machine, to transform nested loops. The goal of the approach is to promote efficient execution of parallel loops by exposing sources of large-grain parallel work while maintaining per-processor locality. We implement the optimization within the Scale compiler framework, and analyze the performance of multiprocessor code produced for three microbenchmarks.

Download to read the full chapter text

Chapter PDF

Software Cache Coherent Control by Parallelizing Compiler

Compiler Optimizations for Parallel Programs

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

C. Aoki, P. Damron, K. Goebel, V. Grover, X. Kong, M. Lai, K. Subramanian, P. Tirumalai, and J. Wang. A parallelizing compiler for UltraSPARC. 1996.
Google Scholar
B. Chandramouli, J.B. Carter, W.C. Hsieh, and S.A. McKee. A cost framework for evaluating integrated restructuring optimizations. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 131–141, Spain, September 2001.
Google Scholar
K. Kennedy and K.S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the ACM International Conference on Supercomputing, pages 323–334, Washington, DC, July 1992.
Google Scholar
S. Leung. Array restructuring for cache locality. Technical Report UW-CSE-96-08-01, University of Washington, Department of Computer Science, August 1996.
Google Scholar
W. Li and K. Pingali. Access normalization: Loop restructuring for NUMA compilers. ACM Transactions on Computer Systems, 11(4):353–375, November 1993.
Article Google Scholar
K.S. McKinley. A compiler optimization algorithm for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 9(8):769–787, August 1998.
Article Google Scholar
K.S. McKinley, S. Carr, and C. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.
Article Google Scholar
M.E. Wolf and M.S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences & Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, 78712, USA
Gregory S. Johnson
Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 78712, USA
Simha Sethumadhavan

Authors

Gregory S. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Simha Sethumadhavan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Institute, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Peter M. A. Sloot
School of Computer Science and Software Engineering, Monash University, Wellington Road, Clayton, VIC 3800, Australia
David Abramson
Institute for High-Performance Computing and Information Systems, Fontanka emb. 6, St. Petersburg, 191187, Russia
Alexander V. Bogdanov & Yuriy E. Gorbachev &
Computer Science Dept., University of Tennessee and Oak Ridge National Laboratory, 1122 Volunteer Blvd., Knoxville, TN, 37996-3450, USA
Jack J. Dongarra
School of Information Technologies, CISCO Systems, The University of Sydney, Madsen Building F09, Sydney, NSW, 2006, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Johnson, G.S., Sethumadhavan, S. (2003). Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds) Computational Science — ICCS 2003. ICCS 2003. Lecture Notes in Computer Science, vol 2659. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44863-2_93

Download citation

DOI: https://doi.org/10.1007/3-540-44863-2_93
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40196-4
Online ISBN: 978-3-540-44863-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Software Cache Coherent Control by Parallelizing Compiler

Compiler Optimizations for Parallel Programs

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Software Cache Coherent Control by Parallelizing Compiler

Compiler Optimizations for Parallel Programs

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation