Stream Processors

Erez, Mattan; Dally, William J.

doi:10.1007/978-1-4419-0263-4_8

Mattan Erez⁴ &
William J. Dally⁵

Part of the book series: Integrated Circuits and Systems ((ICIR))

1923 Accesses

Abstract

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather–compute–scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Towards Semi-automated Parallelization of Data Stream Processing

The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors

Large-Scale Data Stream Processing Systems

References

S. Agarwala, A. Rajagopal, A. Hill, M. Joshi, S. Mullinnix, T. Anderson, R. Damodaran, L. Nardini, P. Wiley, P. Groves, J. Apostol, M. Gill, J. Flores, A. Chachad, A. Hales, K. Chirca, K. Panda, R. Venkatasubramanian, P. Eyres, R. Veiamuri, A. Rajaram, M. Krishnan, J. Nelson, J. Frade, M. Rahman, N. Mahmood, U. Narasimha, S. Sinha, S. Krishnan, W. Webster, Due Bui, S. Moharii, N. Common, R. Nair, R. Ramanujam, and M. Ryan. A 65 nm c64x+ multi-core dsp platform for communications infrastructure. Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pages 262–601, 11–15 Feb 2007.
Google Scholar
J. H. Ahn. Memory and Control Organizations of Stream Processors. PhD thesis, Stanford University, 2007.
Google Scholar
J. H. Ahn, W. J. Dally, and M. Erez. Tradeoff between data-, instruction-, and Thread-level parallelism in stream processors. In proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007.
Google Scholar
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the imagine stream architecture. In ISCA ’04: Proceedings of the 31st Annual International Symposium on Computer Architecture, page 14, Washington, DC, USA, 2004. IEEE Computer Society.
Google Scholar
J. H. Ahn, M. Erez, and W. J. Dally. Scatter-add in data parallel architectures. In Proceedings of the Symposium on High Performance Computer Architecture, Feb. 2005.
Google Scholar
J. H. Ahn, M. Erez, and W. J. Dally. The design space of data-parallel memory systems. In SC’06, Nov. 2006.
Google Scholar
AMD. AMD ATI Radeon™ HD 2900 Graphics Technology. http://ati.amd.com/products/Radeonhd2900/specs.html
AMD. Product brief: Quad-core AMD opteron™ procsesor. http: http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8796_152%23,00.html
AMD. AMD stream computing SDK, 2008. http://ati.amd.com/technology/streamcomputing/sdkdwnld.html
S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Press, Norwell, MA, 1996.
MATH Google Scholar
I. Buck. Brook specification v0.2. Oct. 2003.
Google Scholar
I. Buck. Stream Computing on Graphics Hardware. PhD thesis, Stanford University, Stanford, CA, USA, 2005. Adviser-Pat Hanrahan.
Google Scholar
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777–786, 2004.
Article Google Scholar
J. Buck, S. Ha, E. A. Lee, and D. G. Messerschmitt. Ptolemy: a framework for simulating and prototyping heterogeneous systems. Readings in Hardware/Software co-design, pages 527–543, 2002.
Google Scholar
J. B. Carter, W. C. Hsieh, L. B. Stoller, M. R. Swanson, L. Zhang, and S. A. McKee. Impulse: Memory system support for scientific applications. Journal of Scientific Programming, 7: 195–209, 1999.
Google Scholar
C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the cell broadband engine processor. In CF ’08: Proceedings of the 2008 Conference on Computing Frontiers, pages 3–12. ACM, 2008.
Google Scholar
W. J. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonté, J-H Ahn., N. Jayasena, U. J. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In SC’03, Phoenix, Arizona, Nov 2003.
Google Scholar
W. J. Dally and W. Poulton. Digital Systems Engineering. Cambridge University Press, 1998.
Google Scholar
A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In PACT ’06: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 33–42, 2006.
Google Scholar
ELPIDA Memory Inc. 512 M bits XDR™ DRAM, 2005. http://www.elpida.com/pdfs/E0643E20.pdf
M. Erez. Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams. PhD thesis, Stanford University, Jan 2007.
Google Scholar
M. Erez, J. H. Ahn, A. Garg, W. J. Dally, and E. Darve. Analysis and performance results of a molecular modeling application on Merrimac. In SC’04, Pittsburgh, Pennsylvaniva, Nov 2004.
Google Scholar
M. Erez, J. H. Ahn, J. Gummaraju, M. Rosenblum, and W. J. Dally. Executing irregular scientific applications on stream architectures. In Proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007.
Google Scholar
M. Erez, N. Jayasena, T. J. Knight, and W. J. Dally. Fault tolerance techniques for the Merrimac streaming supercomputer. In SC’05, Seattle, Washington, USA, Nov 2005.
Google Scholar
K. Fatahalian, T. J. Knight, M. Houston, M. Erezand, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC’06, Nov 2006.
Google Scholar
J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: programming general-purpose multicore processors using streams. SIGARCH Computer Architecture News, 36(1):297–307, 2008.
Article Google Scholar
J. Gummaraju, M. Erez, J. Coburn, M. Rosenblum, and W. J. Dally. Architectural support for the stream execution model on general-purpose processors. In PACT ’07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 3–12. IEEE Computer Society, 2007.
Google Scholar
J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. pages 343–354, 2005.
Google Scholar
R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):14–25, Apr 2001.
Article Google Scholar
H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of the 11th International Symposium on High Performance Computer Architecture, Feb 2005.
Google Scholar
Intel^® Corp. Pemtium^®M processor datasheet. http://download.intel.com/design/mobile/datashts/25261203.pdf, April 2004.
T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka. A stereo machine for video-rate dense depth mapping and its new applications. Proceedings CVPR, 96:196–202, 1996.
Google Scholar
U. J. Kapasi, W. J. Dally, S. Rixner, P. R. Mattson, J. D. Owens, and B. Khailany. Efficient conditional operations for data-parallel architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 159–170, Dec 2000.
Google Scholar
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. IEEE Computer, Aug 2003.
Google Scholar
B. K. Khailany, T. Williams, J. Lin, E.P. Long, M. Rygh, D.W. Tovey, and W.J. Dally. A Programmable 512 GOPS stream processor for signal, image, and video processing. Solid-State Circuits, IEEE Journal, 43(1):202–213, 2008.
Article Google Scholar
B. Khailany. The VLSI Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors. PhD thesis, Stanford University, June 2003.
Google Scholar
B. Khailany, W. J. Dally, A. Chang, U. J. Kapasi, J. Namkoong, and B. Towles. VLSI design and verification of the Imagine processor. In Proceedings of the IEEE International Conference on Computer Design, pages 289–294, Sep 2002.
Google Scholar
B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang. Imagine: Media processing with streams. IEEE Micro, pages 35–46, Mar/Apr 2001.
Google Scholar
B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owen, and B. Towles. Exploring the VLSI scalability of stream processors. In Proceedings of the Ninth Symposium on High Performance Computer Architecture, pages 153–164, Anaheim, CA, USA, Feb 2003.
Google Scholar
R. Kleihorst, A. Abbo, B. Schueler, and A. Danilin. Camera mote with a high-performance parallel processor for real-time frame-based video processing. Distributed Smart Cameras, 2007. ICDSC ’07. First ACM/IEEE International Conference, pages 109–116, 25–28 Sept 2007.
Google Scholar
E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, Jan 1987.
Google Scholar
A. A. Liddicoat and M. J. Flynn. High-performance floating point divide. In Proceedings of the Euromicro Symposium on Digital System Design, pages 354–361, Sept 2001.
Google Scholar
P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2002.
Google Scholar
P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, and J. D. Owens. Communication scheduling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 82–92, 2000.
Google Scholar
MIPS Technologies. MIPS64 20Kc Core, 2004. http://www.mips.com/ProductCatalog/P_MIPS6420KcCore
NVIDIA^®. NVIDIA’s Unified Architecture GeForce^® 8 Series GPUs. http://www.nvidia.com/page/geforce8.html
J. D. Owens, W. J. Dally, U. J. Kapasi, S. Rixner, P. Mattson, and B. Mowery. Polygon rendering on a stream architecture. In HWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics hardware, pages 23–32, 2000.
Google Scholar
J. D. Owens, B. Khailany, B. Towles, and W. J. Dally. Comparing reyes and OpenGL on a stream architecture. In HWWS ’02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 47–56, 2002.
Google Scholar
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998.
Google Scholar
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
Google Scholar
S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens. Register organization for media processing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, Toulouse, France, Jan 2000.
Google Scholar
Semiconductor Industry Association. The International Technology Roadmap for Semiconductors, 2005 Edition.
Google Scholar
Texas Instruments. TMS320C6713 floating-point digital signal processor, datasheet SPRS186D, dec. 2001. http://focus.ti.com/lit/ds/symlink/tms320c6713.pdf, May 2003.
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: a language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, pages 179–196, Apr 2002.
Google Scholar
D. van der Spoel, A. R. van Buuren, E. Apol, P. J. Meulen -hoff, D. Peter Tieleman, A. L. T. M. Sij bers, B. Hess, K. Anton Feenstra, E. Lindahl, R. van Drunen, and H. J. C. Berendsen. Gromacs User Manual version 3.1. Nij enborgh 4, 9747 AG Groningen, The Netherlands. Internet: http://www.gromacs.org, 2001.

Download references

Acknowledgments

Acknowledgments We would like to thank Steve Keckler for his insightful comments as well as the contributions of Jung Ho Ahn, Nuwan Jayasena, and Brucek Khailany. In addition, we are grateful to the entire Imagine and Merrimac teams and the projects’ sponsors.

Imagine was supported by a Sony Stanford Graduate Fellowship, an Intel Foundation Fellowship, the Defense Advanced Research Projects Agency under ARPA order E254 and monitored by the Army Intelligence Center under contract DABT63-96-C0037 and by ARPA order L172 monitored by the Department of the Air Force under contract F29601-00-2-0085.

The Merrimac Project was supported by the Department of Energy ASCI Alliances Program, Contract LLNL-B523583, with Stanford University as well as the NVIDIA Graduate Fellowship program.

Portions of this chapter are reprinted with permission from the following sources:

U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens, “Programmable Stream Processors,” IEEE Computer, August 2003 (©2003 IEEE).
J. H. Ahn, W. J. Dally, B. K. Khailany, U. J. Kapasi, and A. Das, “Evaluating the imaginestream architecture,” In Proceedings of the 31st Annual International Symposium on Computer Architecture (© 2004 IEEE).
Stream Processors Inc., “Stream Processing: Enabling the New Generation of Easyto Use, High-Performance DSPs,” White Paper (© 2007 Stream Processors Inc.).
B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh, D. W. Tovey, and W. J. Dally, “A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing,” Solid-State Circuits, IEEE Journal, 43(1):202–213, 2008 (© 2008 IEEE).
J. H. Ahn, M. Erez, and W. J. Dally, “Tradeoff between Data-, Instruction-, and Thread-level Parallelism in Stream Processors,” In Proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007 (DOI10.1145/1274971.1274991). © 2007 ACM, Inc. Included here by permission.

Author information

Authors and Affiliations

The University of Texas at Austin, 1 University Station, 78712, Austin, TX, USA
Mattan Erez
Stanford University, Stanford, CA, USA
William J. Dally

Authors

Mattan Erez
View author publications
You can also search for this author in PubMed Google Scholar
William J. Dally
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mattan Erez .

Editor information

Editors and Affiliations

College of Natural Sciences, University of Texas, Austin, University Station 1, Austin, 78712-0233, U.S.A.
Stephen W. Keckler
Dept. Electrical Engineering, Stanford University, Stanford, 94305-9510, U.S.A.
Kunle Olukotun
IBM Software Group, Burnet Rd. 11501, Austin, 78758, U.S.A.
H. Peter Hofstee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Erez, M., Dally, W.J. (2009). Stream Processors. In: Keckler, S., Olukotun, K., Hofstee, H. (eds) Multicore Processors and Systems. Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0263-4_8

Download citation

DOI: https://doi.org/10.1007/978-1-4419-0263-4_8
Published: 03 August 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0262-7
Online ISBN: 978-1-4419-0263-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stream Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Towards Semi-automated Parallelization of Data Stream Processing

The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors

Large-Scale Data Stream Processing Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Stream Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Towards Semi-automated Parallelization of Data Stream Processing

The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors

Large-Scale Data Stream Processing Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation