Skip to main content

Throughput-Oriented Multicore Processors

  • Chapter
  • First Online:
Multicore Processors and Systems

Part of the book series: Integrated Circuits and Systems ((ICIR))

Abstract

Many important commercial server applications are throughput-oriented. Chip multiprocessors (CMPs) are ideally suited to handle these workloads, as the multiple processors on the chip can independently service incoming requests. To date, most CMPs have been built using a small number of high-performance superscalar processor cores. However, the majority of commercial applications exhibit high cache miss rates, larger memory footprints, and low instruction-level parallelism, which leads to poor utilization on these CMPs. An alternative approach is to build a throughput-oriented, multithreaded CMP from a much larger number of simpler processor cores. This chapter explores the tradeoffs involved in building such a simple-core CMP. Two case studies, the Niagara and Niagara 2 CMPs from Sun Microsystems, are used to illustrate how simple-core CMPs are built in practice and how they compare to CMPs built from more traditional high-performance superscalar processor cores. The case studies show that simple-core CMPs can have a significant performance/watt advantage over complex-core CMPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D’Souza, and M. Parkin. Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE Micro, pages 48–61, June 1993.

    Google Scholar 

  2. T. Agerwala and S. Chatterjee, Computer architecture: challenges and opportunities for the next decade. IEEE Micro, 25(3): 58–69, May/June 2005.

    Article  Google Scholar 

  3. L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27), pages 282–293, June 2000.

    Google Scholar 

  4. J. Clabes, J. Friedrich, M. Sweet, J. DiLullo, S. Chu, D. Plass, J. Dawson, P. Muench, L. Powell, M. Floyd, B. Sinharoy, M. Lee, M. Goulet, J. Wagoner, N. Schwartz, S. Runyon, G. Gorman, P. Restle, R. Kalla, J. McGill, and S. Dodson. Design and implementation of the POWER5â„¢ microprocessor. IEEE International Solid-State Circuits Conference (ISSCC), Feb 2004.

    Google Scholar 

  5. J. D. Davis, J. Laudon, and K. Olukotun. Maximizing CMT throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 51–62, Sep 2005.

    Google Scholar 

  6. M. Hrishikesh, D. Burger, N. Jouppi, S. Keckler, K. Farkas, and P. Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 14–24, May 2002.

    Google Scholar 

  7. S. Kapil. UltraSPARC Gemini: Dual CPU processor. In Hot Chips 15, http://www.hotchips.org/archives/, Stanford, CA, Aug 2003.

  8. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, pages 21–29, Mar/Apr 2005.

    Google Scholar 

  9. R. Kumar, N. Jouppi, and D. Tullsen. Conjoined-core chip multiprocessing. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37), pages 195–206, Dec 2004.

    Google Scholar 

  10. S. Kunkel, R. Eickemeyer, M. Lip, and T. Mullins. A performance methodology for commercial servers. IBM Journal of Research and Development, 44(6), 2000.

    Google Scholar 

  11. J. Laudon. Performance/Watt: The New Server Focus. In ACM SIGARCH Computer Architecture News, 33(4), pages 5–13, Nov 2005.

    Article  Google Scholar 

  12. J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of the 6th International Symposium on Architectural Support for Parallel Languages and Operating Systems, pages 308–318, Oct 1994.

    Google Scholar 

  13. A. Leon, J. Shin, K. Tam, W. Bryg, F. Schumacher, P. Kongetira, D. Weisner, and A. Strong. A power-efficient high-throughput 32-thread SPARC processor. IEEE International Solid-State Circuits Conference (ISSCC), Feb 2006.

    Google Scholar 

  14. J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, H. Levy, and S. Parekh. An analysis of database workload performance on simultaneous multithreaded processors. Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 39–50, June 1998.

    Google Scholar 

  15. D. Marr. Hyper-Threading Technology in the Netburst® Microarchitecture. In Hot Chips, 14, http://www.hotchips.org/archives/, Stanford, CA, Aug 2002.

  16. T. Maruyama. SPARC64 VI: Fujitsu’s next generation processor. In Microprocessor Forum, San Jose, CA, Oct 2003.

    Google Scholar 

  17. C. McNairy and R. Bhatia. Montecito: The next product in the Itanium processor family. In Hot Chips, 16, http://www.hotchips.org/archives/, Stanford, CA, Aug 2004.

  18. C. Moore. POWER4 system microarchitecture. In Microprocessor Forum, San Jose, CA, Oct 2000.

    Google Scholar 

  19. S. Naffziger, T. Grutkowski, and B. Stackhouse. The Implementation of a 2-core Multi-Threaded Itanium® Family Processor. IEEE International Solid-State Circuits Conference (ISSCC), pages 182–183, Feb 2005.

    Google Scholar 

  20. U. Nawathe, M. Hassan, K. Yen, A. Kumar, A. Ramachandran, and D. Greenhill. Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip. IEEE Journal of Solid-State Circuits, 44(1), pages 6–20, Jan 2008.

    Article  Google Scholar 

  21. D. Tullsen, S. Eggers, and H. Levy. Simultaneous multithreading: Maximizing on-chip parallism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, June 1995.

    Google Scholar 

  22. O. Wechsler. Inside Intel®Core™ Microarchitecture Setting New Standards for Energy- Efficient Performance. http://download.intel.com/technology/architecture/new_architecture_06.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Laudon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag US

About this chapter

Cite this chapter

Laudon, J., Golla, R., Grohoski, G. (2009). Throughput-Oriented Multicore Processors. In: Keckler, S., Olukotun, K., Hofstee, H. (eds) Multicore Processors and Systems. Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0263-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0263-4_7

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0262-7

  • Online ISBN: 978-1-4419-0263-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics