Abstract
The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multiple-instruction-issue extensions such as VLIW, superscalar, and superpipelined architectures.
This paper presents an overview of multithreaded computer architectures and the technical issues affecting their prospective evolution. We introduce the basic concepts of multithreaded computer architecture and describe several architectures representative of the design space for multithreaded, parallel computers. We review design issues for multithreaded processing elements intended for use as the node processor of parallel computers for scientific computing. These include the question of choosing an appropriate program execution model, the organization of the processing element to achieve good utilization of major resources, support for fine-grain interprocessor communication and global memory access, compiling machine code for multithreaded processors, and the challenge of implementing virtual memory in large-scale multiprocessor systems.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anant Agarwal, John Kubiatowicz, David Kranz, Beng-Hong Lim, Donald Yeung, Godfrey D’Souza, and Mike Parkin, “Sparcle: An evolutionary processor design for multiprocessors,” IEEE Micro, 13(3):48–61, June 1993.
Anant Agarwal, Beng-Hong Lim, David Kranz, and John Kubiatowicz, “APRIL: A processor architecture for multiprocessing,” in Proceedings of the 17th Annual International Symposium on Computer Architecture, Seattle, Washington, pp. 104–114, May 1990.
Anant Agarwal, Richard Simoni, John Hennessy, and Mark Horowitz, “An evaluation of directory schemes for cache coherence,” in Proceedings of the 15th Annual International Symposium on Computer Architecture,Honolulu, Hawaii, pp. 280–289, May–June 1988.
Eugene Albert, Kathleen Knobe, Joan D. Lukas, and Guy L. Steele, Jr., “Compiling Fortran 8x array features for the Connection Machine computer system,” in Proceedings of the ACM/SIGPLAN PPEALS 1988 — Parallel Programming: Experience with Applications, Languages and Systems, New Haven, Connecticut, pp. 42–56, July 1988. SIGPLAN Notices, 23(9), September 1988.
Stephen J. Allan and R. R. Oldehoeft, “HEP SISAL: Parallel functional programming,” inParallel MIMD Computation: The HEP Supercomputer and its Applications (Janusz S. Kowalik, ed.), pp. 123–150, MIT Press, 1985.
Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith, “The Tera computer system,” in Conference Proceedings, 1990 International Conference on Supercomputing, Amsterdam, The Netherlands, pp. 1–6, June 1990.
Marco Annaratone, Emmanuel Arnould, Thomas Gross, H. T. Kung, Monica Lam, Onat Menzilicioglu, and Jon A. Webb, “The Warp computer: Architecture, implementation and performance,” IEEE Transactions on Computers, 36(12):1523–1538, December 1987.
James Archibald and Jean-Loup Baer, “Cache coherence protocols: Evaluation using a multiprocessor simulation model,” ACM Transactions on Computer Systems, 4(4):273–298, November 1986.
Arvind, Kim P. Gostelow, and Wil Plouffe, “An asynchronous programming language and computing machine,” Technical Report 114a, Department of Information and Computer Science, University of California at Irvine, December 1978.
Arvind and Robert A. Iannucci, “Two fundamental issues in multiprocessing,” in Parallel Computing in Science and Engineering, no. 295 in Lecture Notes in Computer Science, pp. 61–88, Springer-Verlag, 1987. Proceedings of the 4th International DFVLR Seminar on Foundations of Engineering Sciences, Bonn, West Germany, June 1987.
Arvind, Vinod Kathail, and Keshav Pingali, “A dataflow architecture with tagged tokens,” Technical Memo MIT/LCS/TM-174, MIT Laboratory for Computer Science, September 1980.
Arvind and Rishiyur S. Nikhil, “Executing a program on the MIT tagged-token dataflow architecture,” IEEE Transactions on Computers, 39(3):300–318, March 1990.
Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali, “I-structures: Data structures for parallel computing,” ACM Transactions on Programming Languages and Systems, 11(4):598–632, October 1989.
Arvind and Robert E. Thomas, “I-structures: An efficient data type for functional languages,” Technical Memo MIT/LCS/TM-178, MIT Laboratory for Computer Science, September 1980.
H. B. Bakoglu and T. Whitside, “RISC System/6000 hardware overview,” in IBM RISC System/6000 Technology (Mamata Misra, ed.), pp. 8–15, International Business Machines Corporation, 1990. Order No. SA232619.
BBN Advanced Computers Inc., “The Butterfly GP1000 parallel processor,” in Butterfly Parallel Processing — General Information,Tests, and Modifications: Articles on the Butterfly Computer, pp. 3–22, Cambridge, Massachusetts: Bolt Beranek and Newman, 1986.
Micah Beck, Keshav K. Pingali, and Alex Nicolau, “Static scheduling for dynamic dataflow machines,” Technical Report TR 90–1076, Department of Computer Science, Cornell University, Ithaca, New York, January 1990.
Michael J. Beckerle, “Overview of the START(*T) multithreaded computer,” in Digest of Papers, 38th IEEE Computer Society International Conference, COMP CON Spring ‘83, San Francisco, California, pp. 148–156, February 1993.
David Bernstein, “PREFACE-2: Supporting nested parallelism in FORTRAN,” Technical Report RC14160, IBM Research, 1988.
David Bernstein and Izidor Gertner, “Scheduling expressions on a pipe-lined processor with a maximal delay of one cycle,” ACM Transactions on Programming Languages and Systems, 11(1):57–66, January 1989.
Daniel M. Berry, “Introduction to Oregano,” in Proceedings of a Symposium on Data Structures in Programming Languages, pp. 171–190, ACM, 1971. SIGPLAN Notices, 6(2), February 1971.
Lubomir Bic, “A process-oriented model for efficient execution of dataflow programs,” in Proceedings of the 7th International Conference on Distributed Computing Systems, Berlin, West Germany, pp. 332–336, IEEE Computer Society, September 1987.
Philip Bitar, “The weakest memory-access order,” Journal of Parallel and Distributed Computing, 15(4):305–331, August 1992.
David L. Black, Anoop Gupta, and Wolf-Dietrich Weber, “Competitive management of distributed shared memory,” in Digest of Papers, 34th IEEE Computer Society International Conference, COMPCON Spring ‘89, San Francisco, California, pp. 184–190, February—March 1989.
D. Allan Bromley, ed., Grand Challenges: High Performance Computing and Communications. The FY 1992 U.S. Research and Development Program. Committee on Physical, Mathematical, and Engineering Sciences; Federal Coordinating Council for Science, Engineering, and Technology; Office of Science and Technology Policy, 1992.
E. Brooks, “The attack of the killer micros.” Presentation in the Teraflop Computing Panel Discussion at Supercomputing ‘89, Reno, Nevada, November 1989.
Richard Buerher and Kattamuri Ekanadham, “Incorporating data flow ideas into von Neumann processors for parallel execution,” IEEE Transactions on Computers, 36(12):1515–1522, December 1987.
Lucien M. Censier and Paul Feautrier, “A new solution to coherence problems in multicache systems,” IEEE Transactions on Computers, 27(12):1112–1118, December 1978.
David Chaiken, John Kubiatowicz, and Anant Agarwal, “LimitLESS directories: A scalable cache coherence scheme,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, pp. 224–234, April 1991.
Merrill Cornish, “The TI dataflow architecture: The power of concurrency for avionics,” in Proceedings of the 3rd Digital Avionics Systems Conference, pp. 19–25, IEEE and AIAA, November 1979.
David E. Culler, Anurag Sah, Klaus Eric Schauser, Thorsten von Eicken, and John Wawrzynek, “Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, pp. 164–175, April 1991.
Robert C. Daley and Jack B. Dennis, “Virtual memory, processes and sharing in MULTICS,” Communications of the ACM, 11(5):306–312,May 1968.
William J. Dally, “Wire efficient VLSI multiprocessor communication networks,” in Proceedings of the Stanford Conference on Advanced Research in VLSI, pp. 391–415, March 1987.
William J. Dally, “Fine-grain message-passing concurrent computers,” in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, vol. I, Pasadena, California, pp. 2–12, ACM, January 1988.
William J. Daily, Linda Chao, Andrew Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, and Scott Wills, “Architecture of a message-driven processor,” in Proceedings of the 4th Annual International Symposium on Computer Architecture, Pittsburgh, Pennsylvania, pp. 189–196, June 1987.
William J. Daily and Charles L. Seitz, “Deadlock-free message routing in multiprocessor interconnection networks,” IEEE Transactions on Computers, 36(5):547–553, May 1987.
William J. Daily and D. Scott Wills“Universal mechanisms for concurrency,” in Proceedings of PARLE ‘89 — Parallel Architectures and Languages Europe, vol. I, Veldhoven, The Netherlands, pp. 19–33, June 1989.
F. Darema, D. A. George, V. A. Norton, and G. F. Pfister, “A singleprogram-multiple-data computational model for EPEX/FORTRAN,” Parallel Computing, 7:11–24, April 1988.
Alan L. Davis and Robert M. Keller, “Data flow program graphs,” Computer,15(2):26–41, February 1982.
Denelcor, Inc., “HEP principles of operation,” 1979.
Jack B. Dennis, “First version of a data-flow procedure language,” in Proceedings of the Colloque sur la Programmation, no. 19 in Lecture Notes in Computer Science, Paris, France, pp. 362–376, Springer-Verlag, April 1974.
Jack B. Dennis, “Data flow supercomputers,” Computer, 13(11) 48–56, November 1980.
Jack B. Dennis, “The Paradigm Compiler: Mapping a functional language for the Connection Machine,” in Scientific Applications of the Connection Machine (Horst Simon, ed.), pp. 301–315, Singapore: World Scientific, 1989.
Jack B. Dennis, “The evolution of ”static“ data-flow architecture,” in Advanced Topics in Data-Flow Computing (Jean-Luc Gaudiot and Lubomir Bic, eds.), ch. 2, Prentice-Hall, 1991.
Jack B. Dennis and Guang R. Gao, “An efficient pipelined dataflow processor architecture,” in Proceedings of Supercomputing ‘88, Orlando, Florida, pp. 368–373, November 1988.
Jack B. Dennis and Guang Rong Gao, “Maximum pipelining of array operations on static data flow machine,” in Proceedings of the 1983 International Conference on Parallel Processing, Bellaire, Michigan, pp. 331–334, August 1983.
Jack B. Dennis, Guang-Rong Gao, and Kenneth W. Todd, “Modeling the weather with a data flow supercomputer,” IEEE Transactions on Computers, 33(7):592–603, July 1984.
Jack B. Dennis and David P. Misunas, “A preliminary architecture for a basic data-flow processor,” in Proceedings of the 2nd Annual Symposium on Computer Architecture, Houston, Texas, pp. 126–132, January 1975.
Keith Diefendorff and Michael Allen, “Organization of the Motorola 88110 superscalar RISC microprocessor,” IEEE Micro, 12(2):40–63, April 1992.
Digital Equipment Corporation, Alpha Architecture Manual. Burlington, Vermont, 1992.
E. W. Dijkstra, “Co-operating sequential processes,” in Programming Languages (F. Genuys, ed.), pp. 43–112, New York: Academic Press, 1968.
John R. Ellis, Bulldog: A Compiler for VLIW Architectures. PhD thesis, Yale University, May 1984. ACM Doctoral Dissertation Award, 1985; published in 1986.
Joseph A. Fisher, “Very long instruction word architectures and the ELI-512,” in Proceedings of the 10th Annual International Symposium on Computer Architecture, Stockholm, Sweden, pp. 140–150, June 1983.
Joseph A. Fisher, John R. Ellis, John C. Ruttenberg, and Alexandru Nicolau, “Parallel processing: A smart compiler and a dumb machine,” in Proceedings of the SIGPLAN ‘84 Symposium on Compiler Construction, Montréal, Québec, pp. 37–47, June 1984.
G. Gao and Q. Ning, “Loop storage optimization for dataflow machines,” in Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, no. 589 in Lecture Notes in Computer Science, Santa Clara, California, pp. 359–373, Springer-Verlag, August 1991. Published in 1992.
Guang R. Gao, Herbert H. J. Hum, and Yue-Bong Wong, “Limited balancing—an efficient method for dataflow software pipelining,” in Proceedings of the ISMM International Conference on Parallel and Distributed Computing and Systems,New York, New York, International Society for Mini and Microcomputers, October 1990.
Guang R. Gao, Herbert H. J. Hum, and Yue-Bong Wong, “Parallel function invocation in a dynamic argument-fetching dataflow architecture,” in Proceedings of PARBASE ‘80-International Conference on Databases, Parallel Architectures, and Their Applications, Miami Beach, Florida, pp. 112–116, IEEE Computer Society, March 1990.
Guang R. Gao, Yue-Bong Wong, and Qi Ning, “A timed Petri-Net model for fine-grain loop scheduling,” in Proceedings of the SIGPLAN ‘81 Conference on Programming Language Design and Implementation, Toronto, Ontario, pp. 204–218, June 1991.
James R. Goodman, “Using cache memory to reduce processor-memory traffic,” in Proceedings of the 10th Annual International Symposium on Computer Architecture, Stockholm, Sweden, pp. 124–131, June 1983.
V. G. Grafe, G. S. Davidson, J. E. Hoch, and V. P. Holmes, “The Epsilon dataflow processor,” in Proceedings of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pp. 36–45, May—June 1989.
V. G. Grafe and J. E. Hoch, “The Epsilon-2 hybrid dataflow architecture,” in Digest of Papers, 35th IEEE Computer Society International Conference, COMP CON Spring ‘80, San Francisco, California, pp. 88–93, February—March 1990.
Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and Wolf-Dietrich Weber, “Comparative evaluation of latency reducing and tolerating techniques,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Ontario, pp. 254–263, May 1991.
Robert H. Halstead, Jr., “Multilisp: A language for concurrent symbolic computation,” ACM Transactions on Programming Languages and Systems, 7(4):501–538, October 1985.
Robert H. Halstead, Jr. and Tetsuya Fujita, “MASA: A multithreaded processor architecture for parallel symbolic computing,” in Proceedings of the 15th Annual International Symposium on Computer Architecture, Honolulu, Hawaii, pp. 443–451, May—June 1988.
John L. Hennessy and Norman P. Jouppi, “Computer technology and architecture: An evolving interaction,” Computer, 24(9):18–29, September 1991.
John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.
James Hicks, Derek Chiou, Boon Seong Ang, and Arvind, “Performance studies of Id on the Monsoon dataflow system,” Journal of Parallel and Distributed Computing, 18(3):273–300, July 1993.
High Performance Fortran Forum, “High-performance fortran language specification,” technical report, Rice University, May 1993.
Mark D. Hill, “A case for direct-mapped caches,” Computer, 21(12):2540, December 1988.
Kei Hiraki, Satoshi Sekiguchi, and Toshio Shimada, “Status report of SIGMA-1: A data-flow supercomputer,” in Advanced Topics in Data-Flow Computing (Jean-Luc Gaudiot and Lubomir Bic, eds.), ch. 7, Prentice-Hall, 1991.
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng, “Compiler optimizations for Fortran D on MIMD distributed-memory machines,” in Proceedings of Supercomputing ‘81, Albuquerque, New Mexico, pp. 86–100, November 1991.
Herbert H. J. Hum and Guang R. Gao“Efficient support of concurrent threads in a hybrid dataflow/von Neumann architecture,” in Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, Dallas, Texas, pp. 190–193, ACM SIGARCH and IEEE Computer Society, December 1991.
Herbert H. J. Hum and Guang R. Gao, “A novel high-speed memory organization for fine-grain multi-thread computing,” in Proceedings of PARLE ‘81 - Parallel Architectures and Languages Europe, vol. I, Eindhoven, The Netherlands, pp. 34–51, June 1991.
Robert A. Iannucci, A Dataflow/von Neumann Hybrid Architecture. PhD thesis, Massachusetts Institute of Technology, July 1988. Also published as Technical Report MIT/LCS/TR-418, MIT Laboratory for Computer Science.
Robert A. Iannucci, “Toward a dataflow/von Neumann hybrid architecture,” in Proceedings of the 15th Annual International Symposium on Computer Architecture, Honolulu, Hawaii, pp. 131–140, May-June 1988.
Mike Johnson, Superscalar Microprocessor Design. Prentice Hall, Englewood Cliffs, New Jersey 07632, 1991.
John B. Johnston, “The contour model of block structured processes,” in Proceedings of a Symposium on Data Structures in Programming Languages, pp. 55–82, ACM, 1971. SIGPLAN Notices, 6(2), February 1971.
Norman P. Jouppi and David W. Wall, “Available instruction-level parallelism for superscalar and superpipelined machines,” in Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 272–282, April 1989.
R. H. Katz, S. J. Eggers, D. A. Wood, C. L. Perkins, and R. G Sheldon, “Implementing a cache consistency protocol,” in Proceedings of the 12th Annual International Symposium on Computer Architecture, Boston, Massachusetts, pp. 276–283, June 1985.
Robert M. Keller, “Look-ahead processors,” Computing Surveys, 7(4):177–195, December 1975.
Parviz Kermani and Leonard Kleinrock, “Virtual cut-through: A new computer communication switching technique,” Computer Networks, 3(4):267–286, September 1979.
Kathleen Knobe, Joan D. Lukas, and Guy L. Steele, Jr., “Data optimization: Allocation of arrays to reduce communication on SIMD machines,” Journal of Parallel and Distributed Computing, 8(2):102–118, February 1990.
Yuetsu Kodama, Yasuhito Koumura, Mitsuhisa Sato, Hirohumi Sakane, Shuichi Sakai, and Yoshinori Yamaguchi, “EMC-Y: Parallel processing element optimizing communication and computation,” in Conference Proceedings, 1993 International Conference on Supercomputing, Tokyo, Japan, pp. 167–174, July 1993.
Yuetsu Kodama, Shuichi Sakai, and Yoshinori Yamaguchi, “A prototype of a highly parallel dataflow machine EM-4 and its preliminary evaluation,” in Proceedings of InfoJapan 90, pp. 291–298, October 1990.
Charles Koelbel, “Compile-time generation of communications for scientific programs,” Technical Report CRPC-TR91089, Center for Resarch on Parallel Computation, Rice University, January 1991.
Peter M. Kogge, The Architecture of Pipelined Computers. New York: McGraw-Hill Book Company, 1981.
Les Kohn and Neal Margulis, “Introducing the Intel i860 64-bit microprocessor,” IEEE Micro, 13(4):15–30, August 1989.
Janusz S. Kowalik, ed., Parallel MIMD Computation: The HEP Supercomputer and its Applications. MIT Press, 1985.
James T. Kuehn and Burton J. Smith, “The Horizon supercomputing system: Architecture and software,” in Proceedings of Supercomputing ‘88, Orlando, Florida, pp. 28–34, November 1988.
Daniel Lenoski, James Laudon, Truman Joe, David Nakahira, Luis Stevens, Anoop Gupta, and John Hennessy, “The DASH prototype: Implementation and performance,” in Proceedings of the 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, pp. 92–103, May 1992.
Kai Li and Paul Hudak, “Memory coherence in shared virtual memory systems,” in Proceedings of the Fifth Annual ACM Symposium on Principles of Distributed Computing, Calgary, Alberta, pp. 229–239, August 1986.
J. R. McGraw et al., “SISAL: Streams and iteration in a single assignment language—language reference manual version 1.2,” Technical Report M146, Lawrence Livermore National Laboratory, 1985.
Charles Melear, “The design of the 88000 RISC family,” IEEE Micro, 13(2):26–38, April 1989.
Edward F. Miller, Jr., “A multiple-stream registerless shared-resource processor,” IEEE Transactions on Computers, 23(3):277–285, March 1974.
Mamata Misra, ed., IBM RISC System/6000 Technology. International Business Machines Corporation, 1990. Order No. SA23–2619.
Rishiyur S. Nikhil, “The parallel programming language Id and its compilation for parallel machines,” CSG Memo 313, Computation Structures Group, MIT Laboratory for Computer Science, July 1990.
Rishiyur S. Nikhil and Arvind, “Can dataflow subsume von Neumann computing?” in Proceedings of the 16th Annual International Symposium on Computer Architecture,Jerusalem, Israel, pp. 262–272, May-June 1989.
Rishiyur S. Nikhil and Arvind, “Id: a language with implicit parallelism,” CSG Memo 305, Computation Structures Group, MIT Laboratory for Computer Science, 1990.
Michael D. Noakes, Deborah A. Wallah, and William J. Dally, “The J-Machine multicomputer: An architectural evaluation,” in Proceedings of the 20th Annual International Symposium on Computer Architecture,San Diego, California, pp. 224–235, May 1993.
Amos R. Omondi, “Design of a high performance instruction pipeline,” Computer Systems Science and Engineering, 6(1):13–29, January 1991.
Gregory M. Papadopoulos, Implementation of a General Purpose Dataflow Multiprocessor. MIT Press, 1991. Revised version of the author’s Ph.D. dissertation, MIT, 1988.
Gregory M. Papadopoulos and David E. Culler, “Monsoon: an explicit token-store architecture,” in Proceedings of the 17th Annual International Symposium on Computer Architecture, Seattle, Washington, pp. 82–91, May 1990.
Gregory M. Papadopoulos and Kenneth R. Traub, “Multithreading: A revisionist view of dataflow architectures,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Ontario, pp. 342–351, May 1991.
David A. Patterson and Carlo H. Sequin, “RISC I: A reduced instruction set VLSI computer,” in Proceedings of the 8th Annual Symposium on Computer Architecture, Minneapolis, Minnesota, pp. 443–450, May 1981.
A. R. Pleszkun, J. R. Goodman, W.-C. Hsu, R. T. Joersz, G. Bier, P. Woest, and P. Schechter, “WISQ: A restartable architecture using queues,” in Proceedings of the 14th Annual International Symposium on Computer Architecture, Pittsburgh, Pennsylvania, June 1987.
Steven A. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, 1990.
George Radin, “The 801 minicomputer,” in Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, Palo Alto, California, pp. 39–47, March 1982.
B. Ramakrishna Rau, David W. L. Yen, Wei Yen, and Ross A. Towle, “The Cydra 5 departmental supercomputer,” Computer, 22(1):12–35, January 1989.
Gary Sabot, “A compiler for a massively parallel distributed memory MIMD computer,” in Proceedings of Frontiers ‘82: The Fourth Symposium on the Frontiers of Massively Parallel Computation,McLean, Virginia, pp. 12–20, October 1992.
Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama, and Toshitsugu Yuba, “An architecture of a dataflow single chip processor,” in Proceedings of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pp. 46–53, May—June 1989.
Charles L. Seitz, “The Cosmic Cube,” Communications of the ACM, 28(1):22–33, January 1985.
Jong-Jiann Shieh and Christos A. Papachristou, “On reordering instruction streams for pipelined computers,” in Proceedings of the 22th Annual International Workshop on Microprogramming and Microarchitecture, Dublin, Ireland, pp. 199–206, August 1989. SIGMICRO Newsletter, 20(3), September 1989.
Alan Jay Smith, “Cache memories,” Computing Surveys, 14(3):473–530, September 1982.
Burton Smith, “The architecture of HEP,” in Parallel MIMD Computation: The HEP Supercomputer and its Applications (Janusz S. Kowalik, ed.), pp. 41–55, MIT Press, 1985.
Burton J. Smith, “Architecture and applications of the HEP multiprocessor computer system,” in Proceedings of SPIE — Real-Time Signal Processing IV, vol. 298, San Diego, California, pp. 241–248, August 1981.
Harold S. Stone and John Cocke, “Computer architecture in the 1990s,” Computer, 24(9):30–38, September 1991.
C. K Tang, “Cache system design in the tightly coupled multiprocessor system,” in AFIPS Conference Proceedings,1976 National Computer Conference, New York City, New York, pp. 749–753, June 1976.
Charles P. Thacker and Lawrence C. Stewart, “Firefly: a multiprocessor workstation,” in Proceedings of the Second International Conference on Architectural Support for Programming Languages and Operating Systems, Palo Alto, California, pp. 164–172, October 1987.
Kevin B. Theobald, “Panel sessions of the 1991 workshop on multithreaded computers,” Computer Architecture News, 22(1):2–33, March 1994.
Kevin B. Theobald, Guang R. Gao, and Laurie J. Hendren, “On the limits of program parallelism and its smoothability,” in Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Oregon, pp. 10–19, December 1992.
R. H. Thomas, R. Gurwitz, J. Goodhue, and D. Allen, “Butterfly parallel processor overview,” technical report, Bolt Beranek and Newman, Cambridge, Massachusetts, December 1985.
J. E. Thornton, Design of a Computer: The Control Data 6600. Scott, Foresman, and Co., 1970.
Kenneth R. Traub, “Compilation as partitioning: A new approach to compiling non-strict functional languages,” CSG Memo 291, Computation Structures Group, MIT Laboratory for Computer Science, 1988.
Kenneth R. Traub, Sequential Implementation of Lenient Programming Languages. PhD thesis, Massachusetts Institute of Technology, September 1988. Also published as Technical Report MIT/LCS/TR-417, MIT Laboratory for Computer Science.
D. A. Turner, “The semantic elegance of applicative languages,” in Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, Portsmouth, New Hampshire, pp. 85–97, October 1981.
David W. Wall, “Limits of instruction-level parallelism,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, pp. 176–188, April 1991.
Ian Watson and John Gurd, “A practical data flow computer,” Computer, 15(2):51–57, February 1982.
Wolf-Dietrich Weber and Anoop Gupta, “Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results,” in Proceedings of the.16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pp. 273–280, May—June 1989.
X3J3, The FORTRAN Technical Committee of ANSI, “FORTRAN 90, draft of the international standard,” June 1990.
Toshitsugu Yuba, Toshio Shimada, Kei Hiraki, and Hiroshi Kashiwagi, “SIGMA-1: a dataflow computer for scientific computations,” Computer Physics Communications, 37:141–148, July 1985.
Hans P. Zima, Heinz-J. Bast, and Michael Gerndt, “SUPERB: a tool for semi-automatic MIMD/SIMD parallelization,” Parallel Computing, 6:118, January 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer Science+Business Media New York
About this chapter
Cite this chapter
Dennis, J.B., Gao, G.R. (1994). Multithreaded Architectures: Principles, Projects, and Issues. In: Iannucci, R.A., Gao, G.R., Halstead, R.H., Smith, B. (eds) Multithreaded Computer Architecture: A Summary of the State of the ART. The Springer International Series in Engineering and Computer Science, vol 281. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2698-8_1
Download citation
DOI: https://doi.org/10.1007/978-1-4615-2698-8_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6161-9
Online ISBN: 978-1-4615-2698-8
eBook Packages: Springer Book Archive