Abstract
The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap severely impacts stream-orientated computations such as (de)compression, encryption, text searching, and scientific (vector) processing. This paper looks at streaming computations and derives analytic upper bounds on the bandwidth attainable from a class of access reordering schemes. We compare these bounds to the simulated performance of a particular dynamic access ordering scheme, the Stream Memory Controller (SMC). We are building the SMC, and where possible we relate our analytic bounds and simulation data to the simulation performance of the hardware. The results suggest that the SMC can deliver nearly the full attainable bandwidth with relatively modest hardware costs.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baer, J. L., Chen, T. F., “An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty”, Proc. Supercomputing'91, Nov. 1991.
Benitez, M.E., and Davidson, J.W., “Code Generation for Streaming: An Access/ Execute Mechanism”, Proc. ASPLOS-IV, April 1991.
Callahan, D., Kennedy, K., and Porterfield, A., “Software Prefetching”, Proc. ASPLOS-IV, April 1991.
Chiueh, T., “Sunder: A Programmable Hardware Prefetch Architecture for Numerical Loops”, Proc. Supercomputing '94, Nov. 1994.
Fu, J.W.C., and Patel, J.H., “Data Prefetching in Multiprocessor Vector Cache Memories”, Proc. 18th ISCA, May 1991.
Gupta, A., et. al., “Comparative Evaluation of Latency Reducing and Tolerating Techniques”, Proc. 18th ISCA, May 1991.
“High-speed DRAMs”, Special Report, IEEE Spectrum, 29(10), Oct. 1992.
Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers”, Proc. 17th ISCA, May 1990.
Klaiber, A.C., and Levy, H.M., “An Architecture for Software-Controlled Data Prefetching”, Proc. 18th ISCA, May 1991.
Lee, K. “The NAS860 Library User's Manual”, NAS TR RND-93-003, NASA Ames Research Center, Moffett Field, CA, March 1993.
Loshin, D., and Budge, D., “Breaking the Memory Bottleneck, Parts 1 & 2”, Supercomputing Review, Jan./Feb. 1992.
McKee, S.A, “Hardware Support for Access Ordering: Performance of Some Design Options”, Univ. of Virginia, Department of Computer Science, Technical Report CS-93-08, August 1993.
McKee, S.A., et. al., “Experimental Implementation of Dynamic Access Ordering”, Proc. 27th Hawaii International Conference on Systems Sciences, Jan. 1994.
McKee, S.A., Moyer, S.A., Wulf, Wm.A, and Hitchcock, C., “Increasing Memory Bandwidth for Vector Computations”, Proc. Programming Languages and System Architectures, Zurich, Switzerland, March 1994.
McKee, S.A., “Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors”, Univ. of Virginia, Technical Report CS-94-14, April 1994.
McKee, S.A., “Dynamic Access Ordering: Bounds on Memory Bandwidth,” Univ. of Virginia, Technical Report CS-94-38, Oct. 1994.
McKee, S.A., and Wulf, Wm.A., “Access Ordering and Memory-Conscious Cache Utilization”, Proc. High Performance Computer Architecture, Jan. 1995.
Meadows, L., et.al., “A Vectorizing Software Pipelining Compiler for LIW and Superscalar Architectures”, Proc. RISC'92.
Mowry, T.C., Lam, M., and Gupta, A., “Design and Evaluation of a Compiler Algorithm for Prefetching”, Proc. ASPLOS-V, Sept. 1992.
Moyer, S.A., “Access Ordering and Effective Memory Bandwidth”, Ph.D. Thesis, Department of Computer Science, Univ. of Virginia, Technical Report CS-93-18, April 1993.
Quinnell, R., “High-speed DRAMs”, EDN, May 23 1991.
“Architectural Overview”, Rambus Inc., Mountain View, CA 1992.
Sklenar, Ivan, “Prefetch Unit for Vector Operation on Scalar Computers”, Computer Architecture News, 20(4), Sept. 1992.
Sohi, G. and Franklin, M., “High Bandwidth Memory Systems for Superscalar Processors”, Proc. ASPLOS-IV, April 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McKee, S.A., Wulf, W.A., Landon, T.C. (1995). Bounds on memory bandwidth in streamed computations. In: Haridi, S., Ali, K., Magnusson, P. (eds) EURO-PAR '95 Parallel Processing. Euro-Par 1995. Lecture Notes in Computer Science, vol 966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020457
Download citation
DOI: https://doi.org/10.1007/BFb0020457
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60247-7
Online ISBN: 978-3-540-44769-6
eBook Packages: Springer Book Archive