Abstract
Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide.
Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses. Messages and memory replies are received by compiler-generated message handlers which rapidly integrate these events with thread scheduling.
To compile Id90 for TAM, we employ a new parallel intermediate form, dual-graphs, with distinct control and data arcs. This provides a clean framework for partitioning the program into threads, scheduling threads, and managing registers under asynchronous execution. The compilation process is described and preliminary measurements of its effectiveness are discussed. Dynamic execution measurements are obtained via a second compilation step, which translates TAM into native code for existing machines with instrumentation incorporated. These measurements show that the cost of compiler-controlled multithreading is within a small factor of the cost of control flow in sequential languages.
Preview
Unable to display preview. Download preview PDF.
References
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera Computer System. In Proc. of the 1990 Int. Conf. on Supercomputing, pages 1–6, Amsterdam, 1990.
Arvind, D. E. Culler, and G. K. Maa. Assessing the Benefits of Fine-Grain Parallelism in Dataflow Programs. The Int. Journal of Supercomputer Applications, 2(3), November 1988.
Arvind and K. Ekanadham. Future Scientific Programming on Parallel Machines. Journal of Parallel and Distributed Computing, 5(5):460–493, October 1988.
Arvind, S. K. Heller, and R. S. Nikhil. Programming Generality and Parallel Computers. In Proc. of the Fourth Int. Symp. on Biological and Artificial Intelligence Systems, pages 255–286. ESCOM (Leider), Trento, Italy, September 1988.
Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR — Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.
A. Agarwal, B. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A Processor Architecture for Multiprocessing. In Proc. of the 17th Ann. Int. Symp. on Comp. Arch., pages 104–114, Seattle, Washington, May 1990.
Arvind, R. S. Nikhil, and K. K. Pingali. I-Structures: Data Structures for Parallel Computing. Technical Report CSG Memo 269, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, February 1987. (Also in Proc. of the Graph Reduction Workshop, Santa Fe, NM. October 1986.).
P. J. Burns, M. Christon, R. Schweitzer, O. M. Lubeck, H. J. Wasserman, M. L. Simmons, and D. V. Pryor. Vectorization of Monte-Carlo Particle Transport: An Architectural Study using the LANL Benchmark “Gamteb”. In Proc. Supercomputing '89. IEEE Computer Society and ACM SIGARCH, New York, NY, November 1989.
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An Efficient Method of Computing Static Single Assignment Form. In Proc. of the 16th Annual ACM Symp. on Principles of Progr. Lang., pages 25–35, Los Angeles, January 1989.
W. P. Crowley, C. P. Hendrickson, and T. E. Rudy. The SIMPLE code. Technical Report UCID 17715, Lawrence Livermore Laboratory, February 1978.
D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. (Also available as Technical Report UCB/CSD 91/591, CS Div., University of California at Berkeley).
D. E. Culler. Managing Parallelism and Resources in Scientific Dataflow Programs. Technical Report 446, MIT Lab for Comp. Sci., March 1990. (PhD Thesis, Dept. of EECS, MIT).
V. G. Grafe and J. E. Hoch. The Epsilon-2 Hybrid Dataflow Architecture. In Proc. of Compcon90, pages 88–93, San Francisco, CA, March 1990.
R. H. Halstead, Jr. Multilisp: A Language for Concurrent Symbolic Computation. ACM Transactions on Programming Languages and Systems, 7(4):501–538, October 1985.
R. H. Halstead, Jr. and T. Fujita. MASA: a Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proc. of the 15th Int. Symp. on Comp. Arch., pages 443–451, Hawaii, May 1988.
R. A. Iannucci. Toward a Dataflow/von Neumann Hybrid Architecture. In Proc. 15th Int. Symp. on Comp. Arch., pages 131–140, Hawaii, May 1988.
R. S. Nikhil and Arvind. Can Dataflow Subsume von Neumann Computing? In Proc. of the 16th Annual Int. Symp. on Comp. Arch., Jerusalem, Israel, May 1989.
R. S. Nikhil. Id (Version 90.0) Reference Manual. Technical Report CSG Memo, to appear, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, 1990.
R. S. Nikhil. The Parallel Programming Language Id and its Compilation for Parallel Machines. In Proc. Workshop on Massive Parallelism, Amalfi, Italy, October 1989. Academic Press, 1991. Also: CSG Memo 313, MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA.
G. M. Papadopoulos and D. E. Culler. Monsoon: an Explicit Token-Store Architecture. In Proc. of the 17th Annual Int. Symp. on Comp. Arch., Seattle, Washington, May 1990.
G. M. Papadopoulos and K. R. Traub. Multithreading: A Revisionist View of Dataflow Architectures. In Proc. of the 18th Int. Symp. on Comp. Arch., pages 342–351, Toronto, Canada, May 1991.
A. Sah. Parallel Language Support for Shared memory multiprocessors. Master's thesis, Computer Science Div., University of California at Berkeley, May 1991.
K. E. Schauser. Compiling Dataflow into Threads. Technical report, Computer Science Div., University of California, Berkeley CA 94720, 1991. (MS Thesis, Dept. of EECS, UCB).
B. Smith. Keynote Address. Proc. of the 17th Annual Int. Symp. on Comp. Arch., May 1990.
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the 16th Annual Int. Symp. on Comp. Arch., pages 46–53, Jerusalem, Israel, June 1989.
K. R. Traub. A Compiler for the MIT Tagged-Token Dataflow Architecture. Technical Report TR-370, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, August 1986. (MS Thesis, Dept. of EECS, MIT).
K. R. Traub. Sequential Implementation of Lenient Programming Languages. Technical Report TR-417, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, September 1988. (PhD Thesis, Dept. of EECS, MIT).
T. von Eicken, K. E. Schauser, and D. E. Culler. TL0: An Implementation of the TAM Threaded Abstract Machine, Version 2.1. Technical Report, Computer Science Div., University of California at Berkeley, 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schauser, K.E., Culler, D.E., von Eicken, T. (1991). Compiler-controlled multithreading for lenient parallel languages. In: Hughes, J. (eds) Functional Programming Languages and Computer Architecture. FPCA 1991. Lecture Notes in Computer Science, vol 523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540543961_4
Download citation
DOI: https://doi.org/10.1007/3540543961_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54396-1
Online ISBN: 978-3-540-47599-6
eBook Packages: Springer Book Archive