research-article

Automatic parallelization via matrix multiplication

Authors:
Shigeyuki Sato

The University of Electro-Communications, Tokyo, Japan

The University of Electro-Communications, Tokyo, Japan
View Profile

,
Hideya Iwasaki

The University of Electro-Communications, Tokyo, Japan

The University of Electro-Communications, Tokyo, Japan
View Profile

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2011Pages 470–479https://doi.org/10.1145/1993498.1993554

Published:04 June 2011Publication History

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 470–479

ABSTRACT

Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for parallelization by extracting max-operators automatically, and the other improves the performance of parallelized programs by eliminating redundancy. We have also implemented our framework and techniques as a parallelizer in a compiler. Experiments on examples that existing compilers cannot parallelize have demonstrated the scalability of programs parallelized by our implementation.

References

A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, second edition, 2006. Google ScholarDigital Library
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2001. Google ScholarDigital Library
A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Automatic Intra-Register Vectorization for the Intel® Architecture. Int. J. Parallel Program., 30 (2): 65--98, 2002. Google ScholarDigital Library
R. S. Bird. An Introduction to the Theory of Lists. In Logic of Programming and Calculi of Discrete Design, volume 36 of NATO ASI Series F, pages 3--42. Springer-Verlag, 1987. Google ScholarDigital Library
D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for Subscripted Variables. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation (PLDI '90), pages 177--187. ACM, 1990. Google ScholarDigital Library
W.-N. Chin, A. Takano, and Z. Hu. Parallelization via Context Preservation. In Proceedings of IEEE International Conference on Computer Languages (ICCL '98), pages 153--162. IEEE CS Press, 1998. Google ScholarDigital Library
K. Emoto, K. Matsuzaki, Z. Hu, and M. Takeichi. Domain-Specific Optimization Strategy for Skeleton Programs. In Euro-Par 2007 Parallel Processing, volume 4641 of Lecture Notes in Computer Science, pages 705--714. Springer, 2007. Google ScholarDigital Library
A. L. Fisher and A. M. Ghuloum. Parallelizing Complex Scans and Reductions. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI '94), pages 135--146. ACM, 1994. Google ScholarDigital Library
W. Gander and G. H. Golub. Cyclic Reduction -- History and Applications. In Proceedings of the Workshop on Scientific Computing, 1997.Google Scholar
P. M. Kogge and H. S. Stone. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Trans. Comput., 22 (8): 786--793, 1973. Google ScholarDigital Library
K. Matsuzaki. Parallel Programming with Tree Skeletons. PhD thesis, Graduate School of Information Science and Technology, The University of Tokyo, 2007.Google Scholar
K. Matsuzaki and K. Emoto. Implementing Fusion-Equipped Parallel Skeletons by Expression Templates. In Implementation and Application of Functional Languages (IFL '09), volume 6041 of Lecture Notes in Computer Science, pages 72--89. Springer, 2010. Google ScholarDigital Library
K. Matsuzaki, Z. Hu, and M. Takeichi. Towards Automatic Parallelization of Tree Reductions in Dynamic Programming. In Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '06), pages 39--48. ACM, 2006. Google ScholarDigital Library
A. Morihata and K. Matsuzaki. Automatic Parallelization of Recursive Functions using Quantifier Elimination. In Functional and Logic Programming (FLOPS '10), volume 6009 of Lecture Notes in Computer Science, pages 321--336. Springer, 2010. Google ScholarDigital Library
K. Morita, A. Morihata, K. Matsuzaki, Z. Hu, and M. Takeichi. Automatic Inversion Generates Divide-and-Conquer Parallel Programs. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07), pages 146--155, 2007. Google ScholarDigital Library
A. Nistor, W.-N. Chin, T.-S. Tan, and N. Tapus. Optimizing the parallel computation of linear recurrences using compact matrix representations. J. Parallel Distrib. Comput., 69 (4): 373--381, 2009. Google ScholarDigital Library
X. Redon and P. Feautrier. Detection of Scans in the Polytope Model. Parallel Algorithms Appl., 15 (3--4): 229--263, 2000.Google ScholarCross Ref
J. H. Reif, editor. Synthesis of Parallel Algorithms. Morgan Kaufmann Pub, 1993. Google ScholarDigital Library
S. Sato. Automatic Parallelization via Matrix Multiplication. Master's thesis, The University of Electro-Communications, 2011.Google Scholar
H. S. Stone. An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations. J. ACM, 20 (1): 27--38, 1973. Google ScholarDigital Library
D. N. Xu, S.-C. Khoo, and Z. Hu. PType System: A Featherweight Parallelizability Detector. In Programming Languages and Systems (APLAS '04), volume 3302 of Lecture Notes in Computer Science, pages 197--212. Springer, 2004.Google ScholarCross Ref

Index Terms

Automatic parallelization via matrix multiplication
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Automatic parallelization via matrix multiplication
PLDI '11

Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix ...
Read More
Run-Time Parallelization and Scheduling of Loops

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, ...
Read More
A comparison of automatic parallelization tools/compilers on the SGI origin 2000
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is time consuming and costly, porting codes would ideally be automated by using some parallelization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign
ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic parallelization
linear recurrence equation
loop
matrix multiplication
reduction
scan
semiring
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 678
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic parallelization via matrix multiplication

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic parallelization via matrix multiplication

Run-Time Parallelization and Scheduling of Loops

A comparison of automatic parallelization tools/compilers on the SGI origin 2000