Compiler-controlled multithreading for lenient parallel languages

Schauser, Klaus Erik; Culler, David E.; von Eicken, Thorsten

doi:10.1007/3540543961_4

Klaus Erik Schauser¹,
David E. Culler¹ &
Thorsten von Eicken¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 523))

Included in the following conference series:

Conference on Functional Programming Languages and Computer Architecture

185 Accesses
23 Citations

Abstract

Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide.

Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses. Messages and memory replies are received by compiler-generated message handlers which rapidly integrate these events with thread scheduling.

To compile Id90 for TAM, we employ a new parallel intermediate form, dual-graphs, with distinct control and data arcs. This provides a clean framework for partitioning the program into threads, scheduling threads, and managing registers under asynchronous execution. The compilation process is described and preliminary measurements of its effectiveness are discussed. Dynamic execution measurements are obtained via a second compilation step, which translates TAM into native code for existing machines with instrumentation incorporated. These measurements show that the cost of compiler-controlled multithreading is within a small factor of the cost of control flow in sequential languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera Computer System. In Proc. of the 1990 Int. Conf. on Supercomputing, pages 1–6, Amsterdam, 1990.
Google Scholar
Arvind, D. E. Culler, and G. K. Maa. Assessing the Benefits of Fine-Grain Parallelism in Dataflow Programs. The Int. Journal of Supercomputer Applications, 2(3), November 1988.
Google Scholar
Arvind and K. Ekanadham. Future Scientific Programming on Parallel Machines. Journal of Parallel and Distributed Computing, 5(5):460–493, October 1988.
Google Scholar
Arvind, S. K. Heller, and R. S. Nikhil. Programming Generality and Parallel Computers. In Proc. of the Fourth Int. Symp. on Biological and Artificial Intelligence Systems, pages 255–286. ESCOM (Leider), Trento, Italy, September 1988.
Google Scholar
Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR — Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.
Google Scholar
A. Agarwal, B. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A Processor Architecture for Multiprocessing. In Proc. of the 17th Ann. Int. Symp. on Comp. Arch., pages 104–114, Seattle, Washington, May 1990.
Google Scholar
Arvind, R. S. Nikhil, and K. K. Pingali. I-Structures: Data Structures for Parallel Computing. Technical Report CSG Memo 269, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, February 1987. (Also in Proc. of the Graph Reduction Workshop, Santa Fe, NM. October 1986.).
Google Scholar
P. J. Burns, M. Christon, R. Schweitzer, O. M. Lubeck, H. J. Wasserman, M. L. Simmons, and D. V. Pryor. Vectorization of Monte-Carlo Particle Transport: An Architectural Study using the LANL Benchmark “Gamteb”. In Proc. Supercomputing '89. IEEE Computer Society and ACM SIGARCH, New York, NY, November 1989.
Google Scholar
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An Efficient Method of Computing Static Single Assignment Form. In Proc. of the 16th Annual ACM Symp. on Principles of Progr. Lang., pages 25–35, Los Angeles, January 1989.
Google Scholar
W. P. Crowley, C. P. Hendrickson, and T. E. Rudy. The SIMPLE code. Technical Report UCID 17715, Lawrence Livermore Laboratory, February 1978.
Google Scholar
D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. (Also available as Technical Report UCB/CSD 91/591, CS Div., University of California at Berkeley).
Google Scholar
D. E. Culler. Managing Parallelism and Resources in Scientific Dataflow Programs. Technical Report 446, MIT Lab for Comp. Sci., March 1990. (PhD Thesis, Dept. of EECS, MIT).
Google Scholar
V. G. Grafe and J. E. Hoch. The Epsilon-2 Hybrid Dataflow Architecture. In Proc. of Compcon90, pages 88–93, San Francisco, CA, March 1990.
Google Scholar
R. H. Halstead, Jr. Multilisp: A Language for Concurrent Symbolic Computation. ACM Transactions on Programming Languages and Systems, 7(4):501–538, October 1985.
Google Scholar
R. H. Halstead, Jr. and T. Fujita. MASA: a Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proc. of the 15th Int. Symp. on Comp. Arch., pages 443–451, Hawaii, May 1988.
Google Scholar
R. A. Iannucci. Toward a Dataflow/von Neumann Hybrid Architecture. In Proc. 15th Int. Symp. on Comp. Arch., pages 131–140, Hawaii, May 1988.
Google Scholar
R. S. Nikhil and Arvind. Can Dataflow Subsume von Neumann Computing? In Proc. of the 16th Annual Int. Symp. on Comp. Arch., Jerusalem, Israel, May 1989.
Google Scholar
R. S. Nikhil. Id (Version 90.0) Reference Manual. Technical Report CSG Memo, to appear, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, 1990.
Google Scholar
R. S. Nikhil. The Parallel Programming Language Id and its Compilation for Parallel Machines. In Proc. Workshop on Massive Parallelism, Amalfi, Italy, October 1989. Academic Press, 1991. Also: CSG Memo 313, MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA.
Google Scholar
G. M. Papadopoulos and D. E. Culler. Monsoon: an Explicit Token-Store Architecture. In Proc. of the 17th Annual Int. Symp. on Comp. Arch., Seattle, Washington, May 1990.
Google Scholar
G. M. Papadopoulos and K. R. Traub. Multithreading: A Revisionist View of Dataflow Architectures. In Proc. of the 18th Int. Symp. on Comp. Arch., pages 342–351, Toronto, Canada, May 1991.
Google Scholar
A. Sah. Parallel Language Support for Shared memory multiprocessors. Master's thesis, Computer Science Div., University of California at Berkeley, May 1991.
Google Scholar
K. E. Schauser. Compiling Dataflow into Threads. Technical report, Computer Science Div., University of California, Berkeley CA 94720, 1991. (MS Thesis, Dept. of EECS, UCB).
Google Scholar
B. Smith. Keynote Address. Proc. of the 17th Annual Int. Symp. on Comp. Arch., May 1990.
Google Scholar
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the 16th Annual Int. Symp. on Comp. Arch., pages 46–53, Jerusalem, Israel, June 1989.
Google Scholar
K. R. Traub. A Compiler for the MIT Tagged-Token Dataflow Architecture. Technical Report TR-370, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, August 1986. (MS Thesis, Dept. of EECS, MIT).
Google Scholar
K. R. Traub. Sequential Implementation of Lenient Programming Languages. Technical Report TR-417, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, September 1988. (PhD Thesis, Dept. of EECS, MIT).
Google Scholar
T. von Eicken, K. E. Schauser, and D. E. Culler. TL0: An Implementation of the TAM Threaded Abstract Machine, Version 2.1. Technical Report, Computer Science Div., University of California at Berkeley, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Division, University of California, Berkeley
Klaus Erik Schauser, David E. Culler & Thorsten von Eicken

Authors

Klaus Erik Schauser
View author publications
You can also search for this author in PubMed Google Scholar
David E. Culler
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten von Eicken
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

John Hughes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schauser, K.E., Culler, D.E., von Eicken, T. (1991). Compiler-controlled multithreading for lenient parallel languages. In: Hughes, J. (eds) Functional Programming Languages and Computer Architecture. FPCA 1991. Lecture Notes in Computer Science, vol 523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540543961_4

Download citation

DOI: https://doi.org/10.1007/3540543961_4
Published: 06 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54396-1
Online ISBN: 978-3-540-47599-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics