Skip to main content

Hardware Support for Multithreaded Execution of Loops with Limited Parallelism

  • Conference paper
  • 2020 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3746))

Abstract

Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.

This work was supported in part by the National Science Foundation, the Office of Naval Research, by a research grant from the National Security Agency and a research donation from Intel Corp. Most development and testing were done on the computers of the National Center for Supercomputing Applications at the University of Illinois.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dimitriou, G., Polychronopoulos, C.: Loop Scheduling for Multithreaded Processors. In: IEEE Int. Conf. on Parallel Computing in Electrical Engineering, 2004, pp. 361–366 (2004)

    Google Scholar 

  2. Padua, D.A., Kuck, D.J., Lawrie, D.H.: High-speed Multiprocessors and Compilation Techniques. IEEE Trans. on Computers C-29(9), 763–776 (1980)

    Article  MATH  Google Scholar 

  3. Polychronopoulos, C.D.: a-Coral: A New Multithreaded Processor Architecture, its Compiler Support, and Simulation of a multi-a-Coral Parallel System, Project Proposal, CSRD, University of Illinois at Urbana-Champaign (1997)

    Google Scholar 

  4. Smith, B.J.: A Pipelined, Shared Resource MIMD Computer. In: Int. Conf. on Parallel Processing, pp. 6–8 (1978)

    Google Scholar 

  5. Agarwal, A., Kubiatowicz, J., Kranz, D., et al.: Sparcle: An Evolutionary Processor Design for Multiprocessors. In: Int. Symp. on Microarchitecture, pp. 48–61 (1993)

    Google Scholar 

  6. Laudon, J.P.: Architectural and Implementation Tradeoffs for Multiple-Context Processors, PhD dissertation, Stanford University (1994)

    Google Scholar 

  7. Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-chip Parallelism. In: Int. Symp. on Computer Architecture, pp. 392–403 (1995)

    Google Scholar 

  8. Wallace, S., Calder, B., Tullsen, D.M.: Threaded Multiple Path Execution. In: Int. Symp. on Computer Architecture, pp. 238–249 (1998)

    Google Scholar 

  9. Krishnan, V.S., Torrellas, J.: A Clustered Approach to Multithreaded Processors. In: Int. Parallel Processing Symp. (1998)

    Google Scholar 

  10. Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar Processors. In: Int. Symp. on Computer Architecture, pp. 414–425 (1995)

    Google Scholar 

  11. Marcuello, P., Gonzalez, A., Tubella, J.: Speculative Multithreaded Processors. In: Int. Conf. on Supercomputing, pp. 77–84 (1998)

    Google Scholar 

  12. Tsai, J., Yew, P.: The Superthreaded Archtecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. In: Conf. on Parallel Architectures and Compilation Techniques, pp. 35–46 (1996)

    Google Scholar 

  13. Akkary, H., Driscoll, M.A.: A Dynamic Multithreading Processor. In: Int. Symp. on Microarchitecture, pp. 226–236 (1998)

    Google Scholar 

  14. Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston (1988)

    Google Scholar 

  15. Padua, D.A., Wolfe, M.J.: Advanced Compiler Optimizations for Supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)

    Article  Google Scholar 

  16. Rau, B.R., Fisher, J.A.: Instruction-level Parallel Processing: History, Overview and Perspectives. Journal of Supercomputing 7(1), 9–50 (1993)

    Article  Google Scholar 

  17. Rau, B.R., Glaeser, C.D.: Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High-performance Scientific Computing. In: 14th Annual Microprogramming Workshop, pp. 183–198 (1981)

    Google Scholar 

  18. Hwu, W.W., Mahlke, S.A., Chen, W.Y., et al.: The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing 7, 229–248 (1993)

    Article  Google Scholar 

  19. Fisher, J.A.: Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. on Computers, C-30(7), 478–490 (1981)

    Article  Google Scholar 

  20. Lavery, D.M.: Modulo Scheduling for Control-intensive General-purpose Programs, PhD dissertation, University of Illinois at Urbana-Champaign (1997)

    Google Scholar 

  21. Bringmann, R.A.: Enhancing Instruction-level Parallelism through Compiler-controlled Speculation, PhD dissertation, University of Illinois at Urbana-Champaign (1995)

    Google Scholar 

  22. Dubey, P.K., O’Brien, K., O’Brien, K.M., Barton, C.: Single-program Speculative Multithreading (SPSM) Architecture: Compiler-assisted Fine-grained Multithreading, Res. Rep. RC 19928. IBM T. J. Watson Research Center (1995)

    Google Scholar 

  23. Prvulovic, M., Garzaran, M.J., Rauchwerger, L., Torrellas, J.: Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. In: Int. Symp. on Computer Architecture, pp. 204–215 (2001)

    Google Scholar 

  24. Zhai, A., Colohan, C.B., Steffan, J.G., Mowry, T.C.: Compiler Optimization of Scalar Value Communication between Speculative Threads. In: Int. Conf. on Arch. Support for Programming Languages and Operating Systems, pp. 171–183 (2002)

    Google Scholar 

  25. Dimitriou, G.: Loop Scheduling for Multithreaded Processors, PhD dissertation, University of Illinois at Urbana-Champaign (2000)

    Google Scholar 

  26. Girkar, M.B.: Functional Parallelism: Theoretical Foundations and Implementations, PhD dissertation, University of Illinois at Urbana-Champaign (1992)

    Google Scholar 

  27. Arvind, Nikhil, R.S., Pingali, K.K.: I-structures: Data Structures for Parallel Computing. ACM Trans. on Programming Language and Systems 11(4), 598–632 (1989)

    Article  Google Scholar 

  28. Culler, D.E., Papadopoulos, G.M.: The Explicit Token Store. Journal of Parallel and Distributed Computing 10, 289–308 (1990)

    Article  Google Scholar 

  29. Kranz, D., Lim, B., Agarwal, A., Yeung, D.: Low-cost Support for Fine-grained Synchronization in Multiprocessors. In: Iannucci, R.A., Gao, G.R., Halstead Jr., R.H., Smith, B. (eds.) Multithreaded Computer Architecture: A Summary of the State of the Art, pp. 139–166. Kluwer Academic Publishers, Boston (1994)

    Google Scholar 

  30. Kung, H.T.: Deadlock Avoidance for Systolic Communication. In: Int. Symp. on Computer Architecture, pp. 252–260 (1988)

    Google Scholar 

  31. Borkar, S., Cohn, R., Cox, G., et al.: Supporting Systolic and Memory Communication in iWarp. In: Int. Symp. on Computer Architecture, pp. 70–81 (1990)

    Google Scholar 

  32. Moura, C.: SuperDLX – A Generic Superscalar Simulator, ACAPS Tech. Memo 64, School of Computer Science, McGill University (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dimitriou, G., Polychronopoulos, C. (2005). Hardware Support for Multithreaded Execution of Loops with Limited Parallelism. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_59

Download citation

  • DOI: https://doi.org/10.1007/11573036_59

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29673-7

  • Online ISBN: 978-3-540-32091-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics