Skip to main content
Log in

Supporting Microthread Scheduling and Synchronisation in CMPs

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

 

Chip multiprocessors (CMPs) hold great promise for achieving scalability in future systems. Microthreaded CMPs add a means of exploiting legacy code in such systems. Using this model, compilers generate parametric concurrency from sequential source code, which can be used to optimise a range of operational parameters such as power and performance over many orders of magnitude, given a scalable implementation. This paper shows scalability in performance, power and most importantly, in silicon implementation, the main contribution of this paper. The microthread model requires dynamic register allocation and a hardware scheduler, which must support hundreds of microthreads per processor. The scheduler must support thread creation, context switching and thread rescheduling on every machine cycle to fully support this model, which is a significant challenge. Scalable implementations of such support structures are given and the feasibility of large-scale CMPs is investigated by giving detailed area estimate of these structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. L. A. Barroso, et al., Piranha: A Scalable Architecture Based on Single-Chip Proc. of 27th Annual International Symposium on Computer Architecture, Vancouver, British Columbia, Canada, pp. 282–293 (June 2000).

  2. Hammond L., Hubbert B.A., Siu M., Prabhu M.K., Chen M., Olukolun K. (March-April 2000). The Stanford Hydra CMP. IEEE Micro 20:71–84

    Article  Google Scholar 

  3. Hammond L., Nayfah B.A., Olukotun K.A. (September 1997). Single-Chip Multiprocessor. IEEE Computer Society 30(9):79–85

    Google Scholar 

  4. Tendler J.M., Dodson J.S., Fields J.S., Le H., Sinharoy B. (2002). Power4 System Micro-architecture. IBM Journal of Research and Development 46(1):5–25

    Article  Google Scholar 

  5. Kongetira P., Aingaran K., Olukotun K. (March-April 2005). Niagara: 32-way Multithreaded Sparc Processor. IEEE Computer Society 25(2):21–29

    Google Scholar 

  6. McNairy C., Bhatia R. (March-April 2005). Montecito: A Dual-Core, Dual-Thread Itanium Processor. IEEE Computer Society 25(2):10–20

    Google Scholar 

  7. V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures, Proc. of the 27th Annual International Symposium on Computer Architecture, Vancouver, British Columbia, , pp. 248–259 (June 2000).

  8. A. Shilov, Intel to Cancel NetBurst Pentum 4 Xeon Evolution, http:/www.xbitlabs. com/news/cpu/display/20040507000306.html (2004), Accessed 7/1/2005.

  9. Lipasti M.H., Shen J.P. (September 1997). Superspeculative Microarchitecture for Beyond AD 2000. IEEE Computer Society 30(9):59–66

    Google Scholar 

  10. International Technology Roadmap for Semiconductors, http://public.itrs.net (2003), Accessed 20/4/2005.

  11. S. Rixner, et al., Register Organisation for Media Processing, International Symposium on High Performance Computer Architecture, Toulouse, France, pp. 375–386 (January 2000).

  12. Ronen R. et al. (2001) Coming Challenges in Microarchitecture and Architecture. Proc. IEEE 89(3):325–340

    Article  Google Scholar 

  13. K. Bousias and C. R. Jesshope, The Challenges of Massive On-chip Concurrency, 10th Asia-Pacific Computer Systems Architecture Conference, Singapore, October 24–26, number 3740 in LNCS, pp. 157–170, Springer-Verlag (2005).

  14. S. Onder and R. Gupta, Superscalar Execution with Dynamic Data Forwarding, Proc. of the International Conference on Parallel Architectures and Compilation Techniques, Paris, France, pp. 130–135 (October 1998).

  15. R. Balasubramonian, S. Dwarkadas, and D. Albonesi, Reducing the Complexity of the Register File in Dynamic Superscalar Processors, Proc. of the 34th International Symposium on Micro-architecture, Austin, Texas, pp. 237–248 (December 2001).

  16. S. Palacharla, N. P. Jouppi, and J. Smith, Complexity-effective Superscalar Processors, Proc. of the 24th International Symposium on Computer Architecture, Denver, Colorado, United States, pp. 206–218 (June 1997).

  17. D. M. Tullsen, S. Eggersa, and H. M. Levy, Simultaneous Multithreading: Maximizing on Chip Parallelism, Proc. of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 392–403 (June 1995).

  18. J. Burns and J. -L. Gaudiot, Area and System Clock Effects on SMT/CMP Processors, Proc. of the 2001 International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, pp. 211–218 (September 2001).

  19. L. Spracklen and S. G. Abraham, Chip Multithreading: Opportunities and Challenges, Proc. of the 11th Intel’s Symposium on High performance Computer Architecture (HPCA-11 2005), San Francisco, CA, USA, pp. 248–252 (February 2005).

  20. K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and Chang, K., The Case for a Single-Chip Multiprocessor, Proc. of the Seventh International Symposium, Cambridge, MA, pp. 2–11 (October 1996).

  21. W. Ro and J. -L. Gaudiot, SPEAR: A Hybrid Model for Speculative Pre-Execution, Proc. of 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Eldorado Hotel, Santa Fe, New Mexico, pp. 26–30 (April 2004).

  22. G. M. Zoppetti, G. Agrawal, L. Pollock, J. N. Amaral, X. Tang, and G. R. Gao, Automatic Compiler Techniques for Thread Coarsening for Multithreaded Architectures, Proc. of the 14th International Conference on Supercomputing, Santa Fe, New Mexico, USA, pp. 306–315, (May 2000).

  23. K. Wilcox and S. Manne, Alpha Processor: A history of Power issues and a look to the Future, In Cool-chips Tutorial, Held in conjunction with MICRO-32 (Dec. 1999).

  24. J. Huh, D. Burger, and S. W. Keckler, Exploring the Design Space of Future CMPs, Proc. Of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, pp. 199–210 (September 2001).

  25. R. P. Preston, et al., Design of an 8-wide Superscalar RISC microprocessor with Simultaneous Multithreading, 2002 IEEE International Solid-State Circuits Conference, San Francisco, CA, pp. 334–335 (February 2002).

  26. J. Scott, Designing the Low-Power M-CORE Architecture, Proc. IEEE Power Driven Micro Architecture Workshop at ISCA98, Barcelona, Spain, pp. 145–150 (June 1998).

  27. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, Proc. of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, San Diego, CA, USA, pp. 81 (December 2003).

  28. L. Yingmin, D. Brooks, H. Zhigang, and K. Skadron, Performance, Energy, and Thermal Considerations for SMT and CMP Architectures, Proc. of the 11th IEEE International Symposium on high Performance Computer Architecture (HPCA), San Francisco, CA, USA, pp. 71–82 (February 2005).

  29. M. Kiemb and K. Choi, Memory and Architecture Exploration with Thread Shifting for Multithreaded Processors in Embedded Systems, Proc. of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Washington DC, USA, pp. 230–237 (September 2004).

  30. C. R. Jesshope, Scalable Instruction-level Parallelism, Computer Systems: Architectures, Modeling and Simulation, 3rd and 4th International Workshops, SAMOS 2004, Samos, Greece, pp. 383–392 (July 2004).

  31. K. Bousias, N. M. Hasasneh, and C. R. Jesshope, Instruction-level Parallelism Through Microthreading—A Scalable Approach to Chip Multiprocessors, an Electronic Version of an article to be published in the BCS Computer Journal (2005). Online access: http://comjnl.oxfordjournals.org/cgi/rapidpdf/bxh157?ijkey= EoSzke60tdKdUYz&keytype=ref

  32. C. R. Jesshope, Micro-Grids—The Exploitation of Massive On-Chip Concurrency, in L. Grandinetti (ed.), Grid Computing: A New Frontier of High Performance Computing, 14 (Invited paper, (HPC 2004)Cetraro, June 2004), Elsevier, Amsterdam pp. 203–223, (2005).

  33. J. Silberman, et al., A 1.0 GHz Single Issue 64b PowerPC Integer Processor, ISSCC, Department of Computer Sciences, IBM Austin Research Lab., Austin, TX, pp. 230 (1998).

  34. S. Gupta, S. W. Keckler, and D. C. Burger, Technology Independent Area and Delay Estimates for Microprocessor Building Blocks, Tech. Report TR2000–05, Department of Computer Sciences, the University of Texas at Austin, pp. 1–27 (May 2000).

  35. D. Lopez, J. Llosa, M. Valero, and E. Ayguade, Resource Widening versus Replication: Limits and Performance-Cost Trade-Off, 12th International Conference on Supercomputing (ICS-12), Melbourne, Australia, pp. 441–448 (1998).

  36. R. Kumar, N. P. Jouppi, and D. M. Tullsen, Conjoined-Core Chip Multiprocessing, Proc. of the 37th annual International Symposium on Microarchitecture (MICRO-37 2004), Portland, Oregon, pp. 195–206 (December 2004).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Jesshope.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bell, I., Hasasneh, N. & Jesshope, C. Supporting Microthread Scheduling and Synchronisation in CMPs. Int J Parallel Prog 34, 343–381 (2006). https://doi.org/10.1007/s10766-006-0017-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-006-0017-y

Keywords

Navigation