Abstract
Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather–compute–scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Agarwala, A. Rajagopal, A. Hill, M. Joshi, S. Mullinnix, T. Anderson, R. Damodaran, L. Nardini, P. Wiley, P. Groves, J. Apostol, M. Gill, J. Flores, A. Chachad, A. Hales, K. Chirca, K. Panda, R. Venkatasubramanian, P. Eyres, R. Veiamuri, A. Rajaram, M. Krishnan, J. Nelson, J. Frade, M. Rahman, N. Mahmood, U. Narasimha, S. Sinha, S. Krishnan, W. Webster, Due Bui, S. Moharii, N. Common, R. Nair, R. Ramanujam, and M. Ryan. A 65 nm c64x+ multi-core dsp platform for communications infrastructure. Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pages 262–601, 11–15 Feb 2007.
J. H. Ahn. Memory and Control Organizations of Stream Processors. PhD thesis, Stanford University, 2007.
J. H. Ahn, W. J. Dally, and M. Erez. Tradeoff between data-, instruction-, and Thread-level parallelism in stream processors. In proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007.
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the imagine stream architecture. In ISCA ’04: Proceedings of the 31st Annual International Symposium on Computer Architecture, page 14, Washington, DC, USA, 2004. IEEE Computer Society.
J. H. Ahn, M. Erez, and W. J. Dally. Scatter-add in data parallel architectures. In Proceedings of the Symposium on High Performance Computer Architecture, Feb. 2005.
J. H. Ahn, M. Erez, and W. J. Dally. The design space of data-parallel memory systems. In SC’06, Nov. 2006.
AMD. AMD ATI Radeon™ HD 2900 Graphics Technology. http://ati.amd.com/products/Radeonhd2900/specs.html
AMD. Product brief: Quad-core AMD opteron™ procsesor. http: http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8796_152%23,00.html
AMD. AMD stream computing SDK, 2008. http://ati.amd.com/technology/streamcomputing/sdkdwnld.html
S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Press, Norwell, MA, 1996.
I. Buck. Brook specification v0.2. Oct. 2003.
I. Buck. Stream Computing on Graphics Hardware. PhD thesis, Stanford University, Stanford, CA, USA, 2005. Adviser-Pat Hanrahan.
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777–786, 2004.
J. Buck, S. Ha, E. A. Lee, and D. G. Messerschmitt. Ptolemy: a framework for simulating and prototyping heterogeneous systems. Readings in Hardware/Software co-design, pages 527–543, 2002.
J. B. Carter, W. C. Hsieh, L. B. Stoller, M. R. Swanson, L. Zhang, and S. A. McKee. Impulse: Memory system support for scientific applications. Journal of Scientific Programming, 7: 195–209, 1999.
C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the cell broadband engine processor. In CF ’08: Proceedings of the 2008 Conference on Computing Frontiers, pages 3–12. ACM, 2008.
W. J. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonté, J-H Ahn., N. Jayasena, U. J. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In SC’03, Phoenix, Arizona, Nov 2003.
W. J. Dally and W. Poulton. Digital Systems Engineering. Cambridge University Press, 1998.
A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In PACT ’06: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 33–42, 2006.
ELPIDA Memory Inc. 512 M bits XDR™ DRAM, 2005. http://www.elpida.com/pdfs/E0643E20.pdf
M. Erez. Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams. PhD thesis, Stanford University, Jan 2007.
M. Erez, J. H. Ahn, A. Garg, W. J. Dally, and E. Darve. Analysis and performance results of a molecular modeling application on Merrimac. In SC’04, Pittsburgh, Pennsylvaniva, Nov 2004.
M. Erez, J. H. Ahn, J. Gummaraju, M. Rosenblum, and W. J. Dally. Executing irregular scientific applications on stream architectures. In Proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007.
M. Erez, N. Jayasena, T. J. Knight, and W. J. Dally. Fault tolerance techniques for the Merrimac streaming supercomputer. In SC’05, Seattle, Washington, USA, Nov 2005.
K. Fatahalian, T. J. Knight, M. Houston, M. Erezand, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC’06, Nov 2006.
J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: programming general-purpose multicore processors using streams. SIGARCH Computer Architecture News, 36(1):297–307, 2008.
J. Gummaraju, M. Erez, J. Coburn, M. Rosenblum, and W. J. Dally. Architectural support for the stream execution model on general-purpose processors. In PACT ’07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 3–12. IEEE Computer Society, 2007.
J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. pages 343–354, 2005.
R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):14–25, Apr 2001.
H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of the 11th International Symposium on High Performance Computer Architecture, Feb 2005.
Intel® Corp. Pemtium®M processor datasheet. http://download.intel.com/design/mobile/datashts/25261203.pdf, April 2004.
T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka. A stereo machine for video-rate dense depth mapping and its new applications. Proceedings CVPR, 96:196–202, 1996.
U. J. Kapasi, W. J. Dally, S. Rixner, P. R. Mattson, J. D. Owens, and B. Khailany. Efficient conditional operations for data-parallel architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 159–170, Dec 2000.
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. IEEE Computer, Aug 2003.
B. K. Khailany, T. Williams, J. Lin, E.P. Long, M. Rygh, D.W. Tovey, and W.J. Dally. A Programmable 512 GOPS stream processor for signal, image, and video processing. Solid-State Circuits, IEEE Journal, 43(1):202–213, 2008.
B. Khailany. The VLSI Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors. PhD thesis, Stanford University, June 2003.
B. Khailany, W. J. Dally, A. Chang, U. J. Kapasi, J. Namkoong, and B. Towles. VLSI design and verification of the Imagine processor. In Proceedings of the IEEE International Conference on Computer Design, pages 289–294, Sep 2002.
B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang. Imagine: Media processing with streams. IEEE Micro, pages 35–46, Mar/Apr 2001.
B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owen, and B. Towles. Exploring the VLSI scalability of stream processors. In Proceedings of the Ninth Symposium on High Performance Computer Architecture, pages 153–164, Anaheim, CA, USA, Feb 2003.
R. Kleihorst, A. Abbo, B. Schueler, and A. Danilin. Camera mote with a high-performance parallel processor for real-time frame-based video processing. Distributed Smart Cameras, 2007. ICDSC ’07. First ACM/IEEE International Conference, pages 109–116, 25–28 Sept 2007.
E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, Jan 1987.
A. A. Liddicoat and M. J. Flynn. High-performance floating point divide. In Proceedings of the Euromicro Symposium on Digital System Design, pages 354–361, Sept 2001.
P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2002.
P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, and J. D. Owens. Communication scheduling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 82–92, 2000.
MIPS Technologies. MIPS64 20Kc Core, 2004. http://www.mips.com/ProductCatalog/P_MIPS6420KcCore
NVIDIA®. NVIDIA’s Unified Architecture GeForce® 8 Series GPUs. http://www.nvidia.com/page/geforce8.html
J. D. Owens, W. J. Dally, U. J. Kapasi, S. Rixner, P. Mattson, and B. Mowery. Polygon rendering on a stream architecture. In HWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics hardware, pages 23–32, 2000.
J. D. Owens, B. Khailany, B. Towles, and W. J. Dally. Comparing reyes and OpenGL on a stream architecture. In HWWS ’02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 47–56, 2002.
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998.
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens. Register organization for media processing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, Toulouse, France, Jan 2000.
Semiconductor Industry Association. The International Technology Roadmap for Semiconductors, 2005 Edition.
Texas Instruments. TMS320C6713 floating-point digital signal processor, datasheet SPRS186D, dec. 2001. http://focus.ti.com/lit/ds/symlink/tms320c6713.pdf, May 2003.
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: a language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, pages 179–196, Apr 2002.
D. van der Spoel, A. R. van Buuren, E. Apol, P. J. Meulen -hoff, D. Peter Tieleman, A. L. T. M. Sij bers, B. Hess, K. Anton Feenstra, E. Lindahl, R. van Drunen, and H. J. C. Berendsen. Gromacs User Manual version 3.1. Nij enborgh 4, 9747 AG Groningen, The Netherlands. Internet: http://www.gromacs.org, 2001.
Acknowledgments
Acknowledgments We would like to thank Steve Keckler for his insightful comments as well as the contributions of Jung Ho Ahn, Nuwan Jayasena, and Brucek Khailany. In addition, we are grateful to the entire Imagine and Merrimac teams and the projects’ sponsors.
Imagine was supported by a Sony Stanford Graduate Fellowship, an Intel Foundation Fellowship, the Defense Advanced Research Projects Agency under ARPA order E254 and monitored by the Army Intelligence Center under contract DABT63-96-C0037 and by ARPA order L172 monitored by the Department of the Air Force under contract F29601-00-2-0085.
The Merrimac Project was supported by the Department of Energy ASCI Alliances Program, Contract LLNL-B523583, with Stanford University as well as the NVIDIA Graduate Fellowship program.
Portions of this chapter are reprinted with permission from the following sources:
-
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens, “Programmable Stream Processors,” IEEE Computer, August 2003 (©2003 IEEE).
-
J. H. Ahn, W. J. Dally, B. K. Khailany, U. J. Kapasi, and A. Das, “Evaluating the imaginestream architecture,” In Proceedings of the 31st Annual International Symposium on Computer Architecture (© 2004 IEEE).
-
Stream Processors Inc., “Stream Processing: Enabling the New Generation of Easyto Use, High-Performance DSPs,” White Paper (© 2007 Stream Processors Inc.).
-
B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh, D. W. Tovey, and W. J. Dally, “A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing,” Solid-State Circuits, IEEE Journal, 43(1):202–213, 2008 (© 2008 IEEE).
-
J. H. Ahn, M. Erez, and W. J. Dally, “Tradeoff between Data-, Instruction-, and Thread-level Parallelism in Stream Processors,” In Proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007 (DOI10.1145/1274971.1274991). © 2007 ACM, Inc. Included here by permission.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag US
About this chapter
Cite this chapter
Erez, M., Dally, W.J. (2009). Stream Processors. In: Keckler, S., Olukotun, K., Hofstee, H. (eds) Multicore Processors and Systems. Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0263-4_8
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0263-4_8
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0262-7
Online ISBN: 978-1-4419-0263-4
eBook Packages: Computer ScienceComputer Science (R0)