skip to main content
10.1145/1128022.1128023acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Chip multiprocessing and the cell broadband engine

Published:03 May 2006Publication History

ABSTRACT

Chip multiprocessing has become an exciting new direction for system designers to deliver increased performance by exploiting CMOS scaling. We discuss key design decisions facing the system architect of a chip multiprocessor and describe how these choices were made in the design of the Cell Broadband Engine.An important decision is whether to base system performance on thread-level parallelism alone, or to complement thread-level parallelism with other forms of parallelism. Depending on workload characteristics, providing parallelism at the processor core level may increase overall system efficiency.Parallelism is also a key to utilize available memory bandwidth more efficiently, by overlapping and interleaving multiple accesses to system memory. By interleaving the access streams of multiple threads, memory level parallelism can be increased to allow better memory interface utilization. In addition, compute-transfer parallelism (CTP) offers a new form of parallelism to initiate memory transfers under software control without stalling the requesting thread.We describe how the Cell Broadband Enginetmuses parallelism at all levels of the system abstraction to deliver a quantum leap in application performance, and how the Cell Synergistic Memory Flow engine exploits compute-transfer level parallelism by providing efficient block transfer capabilities.

References

  1. Frances Allen and the Blue Gene team. Blue Gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Luis Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In 27th Annual International Symposium on Computer Architecture, pages 282--293, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Calin Cascaval, Jose Castanos, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, Jose Moreira, Karin Strauss, and Henry Warren. Evaluation of a multithreaded architecture for cellular computing. In Eighth International Symposium on High-Performance Computer Architecture, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yuan Chou, Brian Fahs, and Santosh Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In 31st Annual International Symposium on Computer Architecture, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Scott Clark, Kent Haselhorst, Kerry Imming, John Irish, Dave Krolak, and Tolga Ozguner. Cell Broadband Engineinterconnect and memory interface. In Hot Chips 17, Palo Alto, CA, August 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. Cliff Click. A tour inside the Azul384-way Javaappliance. Tutorial at the 14th International Conference on Parallel Architectures and Compilation Techniques, September 2005.Google ScholarGoogle Scholar
  7. Robert Dennard. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, SC-9:256--268, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexandre Eichenberger, Kathryn O'Brien, Kevin O'Brien, Peng Wu, Tong Chen, Peter Oden, Daniel Prener, Janice Shepherd, Byoungro So, Zera Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael Gschwind. Optimizing compiler for the Cellprocessor. In 14th International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brian Flachs, S. Asano, S. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D. Brokenshire, M. Peyravian, V. To, and E. Iwata. The microarchitecture of the Synergistic Processorfor a Cell processor. IEEE Journal of Solid-State Circuits, 41(1), January 2006.Google ScholarGoogle ScholarCross RefCross Ref
  10. Andrew Glew. MLPyes! ILPno! In ASPLOS Wild and Crazy Idea Session '98, October 1998.Google ScholarGoogle Scholar
  11. Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.Google ScholarGoogle Scholar
  12. Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In IEEE Micro, March 2006.Google ScholarGoogle Scholar
  13. Peter Hofstee. Introduction to the Cell Broadband Engine. Technical report, IBM Corp., 2005.Google ScholarGoogle Scholar
  14. Peter Hofstee. Power efficient processor architecture and the Cellprocessor. In 11th International Symposium on High-Performance Computer Architecture. IEEE, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. James Kahle, Michael Day, Peter Hofstee, Charles Johns, Theodore Maeurer, and David Shippy. Introduction to the Cellmultiprocessor. IBM Journal of Research and Development, 49(4/5):589--604, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tejas Karkhanis and James E. Smith. A day in the life of a data cache miss. In Workshop on Memory Performance Issues, 2002.Google ScholarGoogle Scholar
  17. Valentina Salapura, Randy Bickford, Matthias Blumrich, Arthur A. Bright, Dong Chen, Paul Coteus, Alan Gara, Mark Giampapa, Michael Gschwind, Manish Gupta, Shawn Hall, Ruud A. Haring, Philip Heidelberger, Dirk Hoenicke, Gerry V. Kopcsay, Martin Ohmacht, Rick A. Rand, Todd Takken, and Paul Vranas. Power and performance optimization at the system level. In ACM Computing Frontiers 2005, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Philip Emma, Victor Zyuban, and Philip Strenski. Optimizing pipelines for power and performance. In 35th International Symposium on Microarchitecture, Istanbul, Turkey, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. William Wulf and Sally McKee. Hitting the memory wall: Implications of the obvious. Computer Architecture News, 23(4), September 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Chip multiprocessing and the cell broadband engine

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader