ABSTRACT
Chip multiprocessing has become an exciting new direction for system designers to deliver increased performance by exploiting CMOS scaling. We discuss key design decisions facing the system architect of a chip multiprocessor and describe how these choices were made in the design of the Cell Broadband Engine.An important decision is whether to base system performance on thread-level parallelism alone, or to complement thread-level parallelism with other forms of parallelism. Depending on workload characteristics, providing parallelism at the processor core level may increase overall system efficiency.Parallelism is also a key to utilize available memory bandwidth more efficiently, by overlapping and interleaving multiple accesses to system memory. By interleaving the access streams of multiple threads, memory level parallelism can be increased to allow better memory interface utilization. In addition, compute-transfer parallelism (CTP) offers a new form of parallelism to initiate memory transfers under software control without stalling the requesting thread.We describe how the Cell Broadband Enginetmuses parallelism at all levels of the system abstraction to deliver a quantum leap in application performance, and how the Cell Synergistic Memory Flow engine exploits compute-transfer level parallelism by providing efficient block transfer capabilities.
- Frances Allen and the Blue Gene team. Blue Gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2), 2001. Google ScholarDigital Library
- Luis Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In 27th Annual International Symposium on Computer Architecture, pages 282--293, June 2000. Google ScholarDigital Library
- Calin Cascaval, Jose Castanos, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, Jose Moreira, Karin Strauss, and Henry Warren. Evaluation of a multithreaded architecture for cellular computing. In Eighth International Symposium on High-Performance Computer Architecture, 2002. Google ScholarDigital Library
- Yuan Chou, Brian Fahs, and Santosh Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In 31st Annual International Symposium on Computer Architecture, June 2004. Google ScholarDigital Library
- Scott Clark, Kent Haselhorst, Kerry Imming, John Irish, Dave Krolak, and Tolga Ozguner. Cell Broadband Engineinterconnect and memory interface. In Hot Chips 17, Palo Alto, CA, August 2005.Google ScholarCross Ref
- Cliff Click. A tour inside the Azul384-way Javaappliance. Tutorial at the 14th International Conference on Parallel Architectures and Compilation Techniques, September 2005.Google Scholar
- Robert Dennard. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, SC-9:256--268, 1974.Google ScholarCross Ref
- Alexandre Eichenberger, Kathryn O'Brien, Kevin O'Brien, Peng Wu, Tong Chen, Peter Oden, Daniel Prener, Janice Shepherd, Byoungro So, Zera Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael Gschwind. Optimizing compiler for the Cellprocessor. In 14th International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, September 2005. Google ScholarDigital Library
- Brian Flachs, S. Asano, S. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D. Brokenshire, M. Peyravian, V. To, and E. Iwata. The microarchitecture of the Synergistic Processorfor a Cell processor. IEEE Journal of Solid-State Circuits, 41(1), January 2006.Google ScholarCross Ref
- Andrew Glew. MLPyes! ILPno! In ASPLOS Wild and Crazy Idea Session '98, October 1998.Google Scholar
- Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.Google Scholar
- Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In IEEE Micro, March 2006.Google Scholar
- Peter Hofstee. Introduction to the Cell Broadband Engine. Technical report, IBM Corp., 2005.Google Scholar
- Peter Hofstee. Power efficient processor architecture and the Cellprocessor. In 11th International Symposium on High-Performance Computer Architecture. IEEE, February 2005. Google ScholarDigital Library
- James Kahle, Michael Day, Peter Hofstee, Charles Johns, Theodore Maeurer, and David Shippy. Introduction to the Cellmultiprocessor. IBM Journal of Research and Development, 49(4/5):589--604, September 2005. Google ScholarDigital Library
- Tejas Karkhanis and James E. Smith. A day in the life of a data cache miss. In Workshop on Memory Performance Issues, 2002.Google Scholar
- Valentina Salapura, Randy Bickford, Matthias Blumrich, Arthur A. Bright, Dong Chen, Paul Coteus, Alan Gara, Mark Giampapa, Michael Gschwind, Manish Gupta, Shawn Hall, Ruud A. Haring, Philip Heidelberger, Dirk Hoenicke, Gerry V. Kopcsay, Martin Ohmacht, Rick A. Rand, Todd Takken, and Paul Vranas. Power and performance optimization at the system level. In ACM Computing Frontiers 2005, May 2005. Google ScholarDigital Library
- Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Philip Emma, Victor Zyuban, and Philip Strenski. Optimizing pipelines for power and performance. In 35th International Symposium on Microarchitecture, Istanbul, Turkey, December 2002. Google ScholarDigital Library
- William Wulf and Sally McKee. Hitting the memory wall: Implications of the obvious. Computer Architecture News, 23(4), September 1995. Google ScholarDigital Library
Index Terms
- Chip multiprocessing and the cell broadband engine
Recommendations
The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system ...
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated ...
Application Acceleration with the Cell Broadband Engine
The Cell Broadband Engine is a heterogeneous chip multiprocessor that combines a PowerPC processor core with eight single-instruction multiple-data accelerator cores and delivers high performance on many computationally intensive codes.
Comments