Global register allocation for SIMD multiprocessors

Hao, Benjamin; Pearson, David; Zippel, Richard

doi:10.1007/BF02943131

Global register allocation for SIMD multiprocessors

Published: May 1996

Volume 11, pages 222–236, (1996)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Benjamin Hao¹,
David Pearson¹ &
Richard Zippel¹

29 Accesses
Explore all metrics

Abstract

It is relatively clear how to map regular, repetitive or grid oriented computations onto SIMD architectures. It is not so clear, however, how to do this for irregular computations even though there may be significant amount of intrinsic parallelism in branch free code. We study compilation techniques for this type of code when targeted to SIMD computer and illustrate their use on a simple model architecture.

In this paper, we present one of the compilation techniques,global register allocation, we have developed for SIMD computers, and demonstrate that it can effectively allocate registers for parallelizing irregular computations in branch free code. This technique is an extension and a modification of the register allocations via graph coloring approach used by sequential compilers. Our performance results validate our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

Article 10 May 2016

References

Subhlok J, Stichnoth J M, O’Hallaron D R, Gross T. Exploiting task and data parallelism on a multicomputer. InProc. of 4th SIGPLAN Symp. on Principles and Practice of Parallel Programming PPOPP, May 1993, pp. 13–22.
Zima H, Chapman B. Supercompilers for Parallel and Vector computers. ACM Press, Addison Wesley, 1992, pp. 50–57.
Hillis D W, Steele Jr. G L. Data parallel algorithms.Communications of the ACM, 1986, 29(12): 1170–1183.
Article Google Scholar
Zima H, Bast H-J, Gerndt M. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization.Parallel Computing, 1988, 6: 1–18.
Article Google Scholar
Chatterjee S, Gilbert J R, Long F J E, Schreiber R, Teng S-H. Generating local addresses and communication sets for data-parallel programs. InProc. of 4th SIGPLAN Symp. on Principles and Practice of Parallel Programming PPOPP, May 1993, pp. 149–158.
Chow F C, Hennessy J L. Register allocation by priority based coloring. InProc. of the ACM SIGPLAN’84 Symp. on Compiler Construction, also inSIGPLAN Notices, 1984, 19 (6).
Chaitin G J, Auslander M A, Chandra A K, Cocke J, Hopkins M E, Markstein P W. Register allocation via coloring.Computer Languages, 1981, 8: 47–57.
Article Google Scholar
Chaitin G J. Register allocation and spilling via graph coloring. InProc. of the ACM SIGPLAN’82 Symp. on compiler Construction; also inSIGPLAN Notices, 1982, 17(6): 98–105.
Chow F C, Hennessy J L. The priority-based coloring approach to register allocation.ACM Trans. on Programming Languages and Systems, 1990, 12(4): 501–536.
Article Google Scholar
Connection Machine CM-200 Technical Summaries. Thinking Machine Cooperation, 1991.
Fisher J A. Very long instruction word architectures and the ELI-512. InProc. of 10th Annual Symp. on Computer Architecture, Stockholm, June 1983, pp. 140–150.
Rau B R, Yen D W L, Yen W, Towle R A. The Cydra 5 department supercomputer: Design philosophies, decisions and trade-offs.Computer, 1989 22(1).
Anderson D W, Sparacio F J, Tomasulo R M. The System/360 Model 91: Machine philosophy and instruction handling.IBM Journal of Research and Development, 1967, 11(1): 8–24.
Article Google Scholar
Diefendorff K, Allen M. Organization of the Motorola 88110 superscalar RISC microprocessor.IEEE Micro, 1992, 12(2): 40–63.
Article Google Scholar
Foster I, Kesselman C, Taylor S. Concurrency: Simple concepts and powerful tools.The Computer Journal, Dec. 1990.
Jouppi N P, Wall D W. Available instruction-level parallelism for superscalar and superpipelined machines. In3rd Int’l Symp. on Architectural Support for Programming Languages and Operating Systems, April 1989, pp. 272–282.
Smith M D, Johnson M, Horowitz M A. Limits on multiple instruction issue. In3rd Int’l Symp. on Architectural Support for Programming Languages and Operating Systems, April 1989, pp. 290–302.
Wall D W. Limits of instruction-level parallelism. In4th Int’l Symp. on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 176–188.
Wang C C. An algorithm for the chromatic number of a graph.Journal of ACM, 1974, 21(177): 385–391.
Article MATH Google Scholar
Johnson R, Pingali K. Dependence-based program analysis. InProc. of the ACM SIGPLAN’93 Conf. on Programming Language Design and Implementation, June 1993, pp. 78–89.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, 14853, Ithaca, NY, U.S.A.
Benjamin Hao, David Pearson & Richard Zippel

Authors

Benjamin Hao
View author publications
You can also search for this author in PubMed Google Scholar
David Pearson
View author publications
You can also search for this author in PubMed Google Scholar
Richard Zippel
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Research supported in part by the Advanced Reseach Projects Agency of the Department of Defense under ONR Contract N00014-92-J-1989, by ONR Contract N0014-92-J-1839, United States-Israel Binational science Foundation Grant 92-00234 and in part by the U.S. Army Research Office through the Mathematical Science Institute of Cornell University.

Benjamin HAO received his Ph.D. degree from the Computer Science Department of Cornell University. He received his B.S. degree from the University of California at Berkeley in electrical engineering and computer science. Mr. Hao worked as a technical staff member for Sun Microsystem’s advanced development group from 1988 to 1991. His research interests include parallel computing distributed computing, computer hardware design, and multimedia.

David PEARSON was born in Medina, NY on December 3, 1954. He received his A.B. degree from Dartmouth College in 1975 and is currently pursuing a Ph.D. degree in computer science at Cornell University. Mr. Pearson worked as a system programmer for Data General, was a network designer for Dartmouth, and helped found True Basic, Inc. where he served as the Vice-President of R&D from 1983 to 1988. His research interests include parallel computing and the theory of algorithms.

Richard ZIPPEL received his Ph.D. from MIT for research in symbolic computatation and randomized algorithms. During this period his was one of the main authors of the symbolic computing system Macsyma. After joining the faculty at MIT he lead a group doing research in VLSI design, VLSI CAD and computer architecture. Among the fruits of this research were the database accelerator architecture and the first university level course in memory design. He then joined Symbolics, Inc. as a Technical Director and lead their parallel computing effort. Since joining the Computer Science Department at Cornell University, he has been doing research in programming languages, symbolic computation and collaborative engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, B., Pearson, D. & Zippel, R. Global register allocation for SIMD multiprocessors. J. of Comput. Sci. & Technol. 11, 222–236 (1996). https://doi.org/10.1007/BF02943131

Download citation

Received: 15 July 1995
Issue Date: May 1996
DOI: https://doi.org/10.1007/BF02943131

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global register allocation for SIMD multiprocessors

Abstract

Access this article

Similar content being viewed by others

A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Global register allocation for SIMD multiprocessors

Abstract

Access this article

Similar content being viewed by others

A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors

Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation