Bounds on memory bandwidth in streamed computations

McKee, Sally A.; Wulf, Wm. A.; Landon, Trevor C.

doi:10.1007/BFb0020457

Sally A. McKee¹,
Wm. A. Wulf¹ &
Trevor C. Landon¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 966))

Included in the following conference series:

European Conference on Parallel Processing

261 Accesses

Abstract

The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap severely impacts stream-orientated computations such as (de)compression, encryption, text searching, and scientific (vector) processing. This paper looks at streaming computations and derives analytic upper bounds on the bandwidth attainable from a class of access reordering schemes. We compare these bounds to the simulated performance of a particular dynamic access ordering scheme, the Stream Memory Controller (SMC). We are building the SMC, and where possible we relate our analytic bounds and simulation data to the simulation performance of the hardware. The results suggest that the SMC can deliver nearly the full attainable bandwidth with relatively modest hardware costs.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Baer, J. L., Chen, T. F., “An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty”, Proc. Supercomputing'91, Nov. 1991.
Google Scholar
Benitez, M.E., and Davidson, J.W., “Code Generation for Streaming: An Access/ Execute Mechanism”, Proc. ASPLOS-IV, April 1991.
Google Scholar
Callahan, D., Kennedy, K., and Porterfield, A., “Software Prefetching”, Proc. ASPLOS-IV, April 1991.
Google Scholar
Chiueh, T., “Sunder: A Programmable Hardware Prefetch Architecture for Numerical Loops”, Proc. Supercomputing '94, Nov. 1994.
Google Scholar
Fu, J.W.C., and Patel, J.H., “Data Prefetching in Multiprocessor Vector Cache Memories”, Proc. 18th ISCA, May 1991.
Google Scholar
Gupta, A., et. al., “Comparative Evaluation of Latency Reducing and Tolerating Techniques”, Proc. 18th ISCA, May 1991.
Google Scholar
“High-speed DRAMs”, Special Report, IEEE Spectrum, 29(10), Oct. 1992.
Google Scholar
Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers”, Proc. 17th ISCA, May 1990.
Google Scholar
Klaiber, A.C., and Levy, H.M., “An Architecture for Software-Controlled Data Prefetching”, Proc. 18th ISCA, May 1991.
Google Scholar
Lee, K. “The NAS860 Library User's Manual”, NAS TR RND-93-003, NASA Ames Research Center, Moffett Field, CA, March 1993.
Google Scholar
Loshin, D., and Budge, D., “Breaking the Memory Bottleneck, Parts 1 & 2”, Supercomputing Review, Jan./Feb. 1992.
Google Scholar
McKee, S.A, “Hardware Support for Access Ordering: Performance of Some Design Options”, Univ. of Virginia, Department of Computer Science, Technical Report CS-93-08, August 1993.
Google Scholar
McKee, S.A., et. al., “Experimental Implementation of Dynamic Access Ordering”, Proc. 27th Hawaii International Conference on Systems Sciences, Jan. 1994.
Google Scholar
McKee, S.A., Moyer, S.A., Wulf, Wm.A, and Hitchcock, C., “Increasing Memory Bandwidth for Vector Computations”, Proc. Programming Languages and System Architectures, Zurich, Switzerland, March 1994.
Google Scholar
McKee, S.A., “Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors”, Univ. of Virginia, Technical Report CS-94-14, April 1994.
Google Scholar
McKee, S.A., “Dynamic Access Ordering: Bounds on Memory Bandwidth,” Univ. of Virginia, Technical Report CS-94-38, Oct. 1994.
Google Scholar
McKee, S.A., and Wulf, Wm.A., “Access Ordering and Memory-Conscious Cache Utilization”, Proc. High Performance Computer Architecture, Jan. 1995.
Google Scholar
Meadows, L., et.al., “A Vectorizing Software Pipelining Compiler for LIW and Superscalar Architectures”, Proc. RISC'92.
Google Scholar
Mowry, T.C., Lam, M., and Gupta, A., “Design and Evaluation of a Compiler Algorithm for Prefetching”, Proc. ASPLOS-V, Sept. 1992.
Google Scholar
Moyer, S.A., “Access Ordering and Effective Memory Bandwidth”, Ph.D. Thesis, Department of Computer Science, Univ. of Virginia, Technical Report CS-93-18, April 1993.
Google Scholar
Quinnell, R., “High-speed DRAMs”, EDN, May 23 1991.
Google Scholar
“Architectural Overview”, Rambus Inc., Mountain View, CA 1992.
Google Scholar
Sklenar, Ivan, “Prefetch Unit for Vector Operation on Scalar Computers”, Computer Architecture News, 20(4), Sept. 1992.
Google Scholar
Sohi, G. and Franklin, M., “High Bandwidth Memory Systems for Superscalar Processors”, Proc. ASPLOS-IV, April 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Virginia, 22903, Charlottesville, VA, USA
Sally A. McKee, Wm. A. Wulf & Trevor C. Landon

Authors

Sally A. McKee
View author publications
You can also search for this author in PubMed Google Scholar
Wm. A. Wulf
View author publications
You can also search for this author in PubMed Google Scholar
Trevor C. Landon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Seif Haridi Khayri Ali Peter Magnusson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McKee, S.A., Wulf, W.A., Landon, T.C. (1995). Bounds on memory bandwidth in streamed computations. In: Haridi, S., Ali, K., Magnusson, P. (eds) EURO-PAR '95 Parallel Processing. Euro-Par 1995. Lecture Notes in Computer Science, vol 966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020457

Download citation

DOI: https://doi.org/10.1007/BFb0020457
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60247-7
Online ISBN: 978-3-540-44769-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics