Abstract
Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.
This work was supported in part by the National Science Foundation, the Office of Naval Research, by a research grant from the National Security Agency and a research donation from Intel Corp. Most development and testing were done on the computers of the National Center for Supercomputing Applications at the University of Illinois.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dimitriou, G., Polychronopoulos, C.: Loop Scheduling for Multithreaded Processors. In: IEEE Int. Conf. on Parallel Computing in Electrical Engineering, 2004, pp. 361–366 (2004)
Padua, D.A., Kuck, D.J., Lawrie, D.H.: High-speed Multiprocessors and Compilation Techniques. IEEE Trans. on Computers C-29(9), 763–776 (1980)
Polychronopoulos, C.D.: a-Coral: A New Multithreaded Processor Architecture, its Compiler Support, and Simulation of a multi-a-Coral Parallel System, Project Proposal, CSRD, University of Illinois at Urbana-Champaign (1997)
Smith, B.J.: A Pipelined, Shared Resource MIMD Computer. In: Int. Conf. on Parallel Processing, pp. 6–8 (1978)
Agarwal, A., Kubiatowicz, J., Kranz, D., et al.: Sparcle: An Evolutionary Processor Design for Multiprocessors. In: Int. Symp. on Microarchitecture, pp. 48–61 (1993)
Laudon, J.P.: Architectural and Implementation Tradeoffs for Multiple-Context Processors, PhD dissertation, Stanford University (1994)
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-chip Parallelism. In: Int. Symp. on Computer Architecture, pp. 392–403 (1995)
Wallace, S., Calder, B., Tullsen, D.M.: Threaded Multiple Path Execution. In: Int. Symp. on Computer Architecture, pp. 238–249 (1998)
Krishnan, V.S., Torrellas, J.: A Clustered Approach to Multithreaded Processors. In: Int. Parallel Processing Symp. (1998)
Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar Processors. In: Int. Symp. on Computer Architecture, pp. 414–425 (1995)
Marcuello, P., Gonzalez, A., Tubella, J.: Speculative Multithreaded Processors. In: Int. Conf. on Supercomputing, pp. 77–84 (1998)
Tsai, J., Yew, P.: The Superthreaded Archtecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. In: Conf. on Parallel Architectures and Compilation Techniques, pp. 35–46 (1996)
Akkary, H., Driscoll, M.A.: A Dynamic Multithreading Processor. In: Int. Symp. on Microarchitecture, pp. 226–236 (1998)
Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston (1988)
Padua, D.A., Wolfe, M.J.: Advanced Compiler Optimizations for Supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)
Rau, B.R., Fisher, J.A.: Instruction-level Parallel Processing: History, Overview and Perspectives. Journal of Supercomputing 7(1), 9–50 (1993)
Rau, B.R., Glaeser, C.D.: Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High-performance Scientific Computing. In: 14th Annual Microprogramming Workshop, pp. 183–198 (1981)
Hwu, W.W., Mahlke, S.A., Chen, W.Y., et al.: The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing 7, 229–248 (1993)
Fisher, J.A.: Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. on Computers, C-30(7), 478–490 (1981)
Lavery, D.M.: Modulo Scheduling for Control-intensive General-purpose Programs, PhD dissertation, University of Illinois at Urbana-Champaign (1997)
Bringmann, R.A.: Enhancing Instruction-level Parallelism through Compiler-controlled Speculation, PhD dissertation, University of Illinois at Urbana-Champaign (1995)
Dubey, P.K., O’Brien, K., O’Brien, K.M., Barton, C.: Single-program Speculative Multithreading (SPSM) Architecture: Compiler-assisted Fine-grained Multithreading, Res. Rep. RC 19928. IBM T. J. Watson Research Center (1995)
Prvulovic, M., Garzaran, M.J., Rauchwerger, L., Torrellas, J.: Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. In: Int. Symp. on Computer Architecture, pp. 204–215 (2001)
Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Scalar Value Communication between Speculative Threads. In: Int. Conf. on Arch. Support for Programming Languages and Operating Systems, pp. 171–183 (2002)
Dimitriou, G.: Loop Scheduling for Multithreaded Processors, PhD dissertation, University of Illinois at Urbana-Champaign (2000)
Girkar, M.B.: Functional Parallelism: Theoretical Foundations and Implementations, PhD dissertation, University of Illinois at Urbana-Champaign (1992)
Arvind, Nikhil, R.S., Pingali, K.K.: I-structures: Data Structures for Parallel Computing. ACM Trans. on Programming Language and Systems 11(4), 598–632 (1989)
Culler, D.E., Papadopoulos, G.M.: The Explicit Token Store. Journal of Parallel and Distributed Computing 10, 289–308 (1990)
Kranz, D., Lim, B., Agarwal, A., Yeung, D.: Low-cost Support for Fine-grained Synchronization in Multiprocessors. In: Iannucci, R.A., Gao, G.R., Halstead Jr., R.H., Smith, B. (eds.) Multithreaded Computer Architecture: A Summary of the State of the Art, pp. 139–166. Kluwer Academic Publishers, Boston (1994)
Kung, H.T.: Deadlock Avoidance for Systolic Communication. In: Int. Symp. on Computer Architecture, pp. 252–260 (1988)
Borkar, S., Cohn, R., Cox, G., et al.: Supporting Systolic and Memory Communication in iWarp. In: Int. Symp. on Computer Architecture, pp. 70–81 (1990)
Moura, C.: SuperDLX – A Generic Superscalar Simulator, ACAPS Tech. Memo 64, School of Computer Science, McGill University (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dimitriou, G., Polychronopoulos, C. (2005). Hardware Support for Multithreaded Execution of Loops with Limited Parallelism. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_59
Download citation
DOI: https://doi.org/10.1007/11573036_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29673-7
Online ISBN: 978-3-540-32091-3
eBook Packages: Computer ScienceComputer Science (R0)