Hardware Support for Multithreaded Execution of Loops with Limited Parallelism

Dimitriou, Georgios; Polychronopoulos, Constantine

doi:10.1007/11573036_59

Hardware Support for Multithreaded Execution of Loops with Limited Parallelism

Georgios Dimitriou¹⁸ &
Constantine Polychronopoulos¹⁹

Conference paper

2020 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3746))

Abstract

Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.

This work was supported in part by the National Science Foundation, the Office of Naval Research, by a research grant from the National Security Agency and a research donation from Intel Corp. Most development and testing were done on the computers of the National Center for Supercomputing Applications at the University of Illinois.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dimitriou, G., Polychronopoulos, C.: Loop Scheduling for Multithreaded Processors. In: IEEE Int. Conf. on Parallel Computing in Electrical Engineering, 2004, pp. 361–366 (2004)
Google Scholar
Padua, D.A., Kuck, D.J., Lawrie, D.H.: High-speed Multiprocessors and Compilation Techniques. IEEE Trans. on Computers C-29(9), 763–776 (1980)
Article MATH Google Scholar
Polychronopoulos, C.D.: a-Coral: A New Multithreaded Processor Architecture, its Compiler Support, and Simulation of a multi-a-Coral Parallel System, Project Proposal, CSRD, University of Illinois at Urbana-Champaign (1997)
Google Scholar
Smith, B.J.: A Pipelined, Shared Resource MIMD Computer. In: Int. Conf. on Parallel Processing, pp. 6–8 (1978)
Google Scholar
Agarwal, A., Kubiatowicz, J., Kranz, D., et al.: Sparcle: An Evolutionary Processor Design for Multiprocessors. In: Int. Symp. on Microarchitecture, pp. 48–61 (1993)
Google Scholar
Laudon, J.P.: Architectural and Implementation Tradeoffs for Multiple-Context Processors, PhD dissertation, Stanford University (1994)
Google Scholar
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-chip Parallelism. In: Int. Symp. on Computer Architecture, pp. 392–403 (1995)
Google Scholar
Wallace, S., Calder, B., Tullsen, D.M.: Threaded Multiple Path Execution. In: Int. Symp. on Computer Architecture, pp. 238–249 (1998)
Google Scholar
Krishnan, V.S., Torrellas, J.: A Clustered Approach to Multithreaded Processors. In: Int. Parallel Processing Symp. (1998)
Google Scholar
Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar Processors. In: Int. Symp. on Computer Architecture, pp. 414–425 (1995)
Google Scholar
Marcuello, P., Gonzalez, A., Tubella, J.: Speculative Multithreaded Processors. In: Int. Conf. on Supercomputing, pp. 77–84 (1998)
Google Scholar
Tsai, J., Yew, P.: The Superthreaded Archtecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. In: Conf. on Parallel Architectures and Compilation Techniques, pp. 35–46 (1996)
Google Scholar
Akkary, H., Driscoll, M.A.: A Dynamic Multithreading Processor. In: Int. Symp. on Microarchitecture, pp. 226–236 (1998)
Google Scholar
Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston (1988)
Google Scholar
Padua, D.A., Wolfe, M.J.: Advanced Compiler Optimizations for Supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)
Article Google Scholar
Rau, B.R., Fisher, J.A.: Instruction-level Parallel Processing: History, Overview and Perspectives. Journal of Supercomputing 7(1), 9–50 (1993)
Article Google Scholar
Rau, B.R., Glaeser, C.D.: Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High-performance Scientific Computing. In: 14th Annual Microprogramming Workshop, pp. 183–198 (1981)
Google Scholar
Hwu, W.W., Mahlke, S.A., Chen, W.Y., et al.: The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing 7, 229–248 (1993)
Article Google Scholar
Fisher, J.A.: Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. on Computers, C-30(7), 478–490 (1981)
Article Google Scholar
Lavery, D.M.: Modulo Scheduling for Control-intensive General-purpose Programs, PhD dissertation, University of Illinois at Urbana-Champaign (1997)
Google Scholar
Bringmann, R.A.: Enhancing Instruction-level Parallelism through Compiler-controlled Speculation, PhD dissertation, University of Illinois at Urbana-Champaign (1995)
Google Scholar
Dubey, P.K., O’Brien, K., O’Brien, K.M., Barton, C.: Single-program Speculative Multithreading (SPSM) Architecture: Compiler-assisted Fine-grained Multithreading, Res. Rep. RC 19928. IBM T. J. Watson Research Center (1995)
Google Scholar
Prvulovic, M., Garzaran, M.J., Rauchwerger, L., Torrellas, J.: Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. In: Int. Symp. on Computer Architecture, pp. 204–215 (2001)
Google Scholar
Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Scalar Value Communication between Speculative Threads. In: Int. Conf. on Arch. Support for Programming Languages and Operating Systems, pp. 171–183 (2002)
Google Scholar
Dimitriou, G.: Loop Scheduling for Multithreaded Processors, PhD dissertation, University of Illinois at Urbana-Champaign (2000)
Google Scholar
Girkar, M.B.: Functional Parallelism: Theoretical Foundations and Implementations, PhD dissertation, University of Illinois at Urbana-Champaign (1992)
Google Scholar
Arvind, Nikhil, R.S., Pingali, K.K.: I-structures: Data Structures for Parallel Computing. ACM Trans. on Programming Language and Systems 11(4), 598–632 (1989)
Article Google Scholar
Culler, D.E., Papadopoulos, G.M.: The Explicit Token Store. Journal of Parallel and Distributed Computing 10, 289–308 (1990)
Article Google Scholar
Kranz, D., Lim, B., Agarwal, A., Yeung, D.: Low-cost Support for Fine-grained Synchronization in Multiprocessors. In: Iannucci, R.A., Gao, G.R., Halstead Jr., R.H., Smith, B. (eds.) Multithreaded Computer Architecture: A Summary of the State of the Art, pp. 139–166. Kluwer Academic Publishers, Boston (1994)
Google Scholar
Kung, H.T.: Deadlock Avoidance for Systolic Communication. In: Int. Symp. on Computer Architecture, pp. 252–260 (1988)
Google Scholar
Borkar, S., Cohn, R., Cox, G., et al.: Supporting Systolic and Memory Communication in iWarp. In: Int. Symp. on Computer Architecture, pp. 70–81 (1990)
Google Scholar
Moura, C.: SuperDLX – A Generic Superscalar Simulator, ACAPS Tech. Memo 64, School of Computer Science, McGill University (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer & Communications Engineering, University of Thessaly, Volos, 38221, Greece
Georgios Dimitriou
Dept. of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA
Constantine Polychronopoulos

Authors

Georgios Dimitriou
View author publications
You can also search for this author in PubMed Google Scholar
Constantine Polychronopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Communication Engineering, University of Thessaly, Glavani 37, 382 21, Volos, Greece
Panayiotis Bozanis
Department of Computer and Communication Engineering, University of Thessaly, 382 21, Volos, Greece
Elias N. Houstis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dimitriou, G., Polychronopoulos, C. (2005). Hardware Support for Multithreaded Execution of Loops with Limited Parallelism. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_59

Download citation

DOI: https://doi.org/10.1007/11573036_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29673-7
Online ISBN: 978-3-540-32091-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics