Increasing memory bandwidth for vector computations

McKee, Sally A.; Moyer, Steven A.; Wulf, Wm. A.; Hitchcock, Charles

doi:10.1007/3-540-57840-4_26

Sally A. McKee¹,
Steven A. Moyer¹^nAff1,
Wm. A. Wulf¹ &
…
Charles Hitchcock²^nAff2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 782))

193 Accesses

Abstract

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the “Grand Challenge” scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components “on the other side of the cache” — they should not be treated as uniform access-time RAM. This paper describes the use of hardwareassisted access ordering, a technique that combines compile-time detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. Our simulations show significant speedup on important scientific kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Memory Controller for Vector Processor

Article 28 December 2016

Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer

On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures

References

Baer, J. L., Chen, T. F., “An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty”, Supercomputing 91, November 1991.
Google Scholar
Baron, R.L., and Higbie, L., Computer Architecture, Addison-Wesley, 1992.
Google Scholar
Budnik, P., and Kuck, D., “The Organization and Use of Parallel Memories”, IEEE Trans. Comput., 20, 12, 1971.
Google Scholar
Callahan, D., et. al., “Software Prefetching”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.
Google Scholar
Carr, S., Kennedy, K., “Blocking Linear Algebra Codes for Memory Hierarchies”, Proc. Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989.
Google Scholar
Davidson, Jack W., and Benitez, Manuel E., “Code Generation for Streaming: An Access/Execute Mechanism”, Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.
Google Scholar
Dongarra, et. al., “Linpack User's Guide“, SLAM, Philadelphia, 1979.
Google Scholar
Fu, J. W. C., and Patel, J. H., “Data Prefetching in Multiprocessor Vector Cache Memories”, 18th International Symposium on Computer Architecture, May 1991.
Google Scholar
Golub, G., and Ortega, J.M., Scientific Computation: An Introduction with Parallel Computing, Academic Press, Inc., 1993.
Google Scholar
Goodman, J. R., et al, “PIPE: A VLSI Decoupled Architecture”, Twelfth International Symposium on Computer Architecture, June 1985.
Google Scholar
Gupta, R., and Soffa, M., “Compile-time Techniques for Efficient Utilization of Parallel Memories”, SIGPLAN Not., 23, 9, 1988, pp. 235–246.
Google Scholar
Harper, D. T., Jump., J., “Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme”, IEEE Trans. Comput., 36, 12, 1987.
Google Scholar
Harper, D. T., “Address Transformation to Increase Memory Performance”, 1989 International Conference on Supercomputing.
Google Scholar
Hayes, J.P., Computer Architecture and Organization, McGraw-Hill, 1988.
Google Scholar
Hwang, K., and Briggs, F.A., Computer Architecture and Parallel Processing, McGraw-Hill, Inc., 1984.
Google Scholar
“High-speed DRAMs”, Special Report, IEEE Spectrum, vol. 29, no. 10, October 1992.
Google Scholar
i860 XP Microprocessor Data Book, Intel Corporation, 1991.
Google Scholar
Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers”, 17th International Symposium on Computer Architecture, May 1990.
Google Scholar
Katz, R., and Hennessy, J., “High Performance Microprocessor Architectures”, University of California, Berkeley, Report No. UCB/CSD 89/529, August, 1989.
Google Scholar
Klaiber, A., et. al., “An Architecture for Software-Controlled Data Prefetching”, 18th International Symposium on Computer Architecture, May 1991.
Google Scholar
Lam, Monica, et. al., “The Cache Performance and Optimizations of Blocked Algorithms”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.
Google Scholar
Lawson, et. al., “Basic Linear Algebra Subprograms for Fortran Usage”, ACM Trans. Math. Soft., 5, 3, 1979.
Google Scholar
Lee, K., “Achieving High Performance On the i860 Microprocessor Using Naspack Subroutines”, NAS Systems Division, NASA Ames Research Center, July 1990.
Google Scholar
Lee, K., “On the Floating Point Performance of the i860 Microprocessor”, RNR-90-019, NAS Systems Division, NASA Ames Research Center, October 1990.
Google Scholar
Maccabe, A.B., Computer Systems: Architecture, Organization, and Programming, Richard D. Irwin, Inc., 1993.
Google Scholar
Mano, M.M., Computer System Architecture, 2nd ed., Prentice-Hall, Inc., 1982
Google Scholar
McMahon, F.H., “The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range”, Lawrence Livermore National Laboratory, UCRL-53745, December 1986.
Google Scholar
McKee, S.A, “Hardware Support for Access Ordering: Performance of Some Design Options”, University of Virginia, Department of Computer Science, Technical Report CS-93-08, July 1993.
Google Scholar
Meadows, L., Nakamoto, S., and Schuster, V., “A Vectorizing, Software Pipelining Compiler for LIW and Superscalar Architectures”, RISC'92, February 1992.
Google Scholar
Moyer, S.A., “Performance of the iPSC/860 Node Architecture,” University of Virginia, IPC-TR-91-007, 1991.
Google Scholar
Moyer, S., “Access Ordering and Effective Memory Bandwidth”, Ph.D. Dissertation, Department of Computer Science, University of Virginia, Technical Report CS-93-18, April 1993.
Google Scholar
Quinnell, R., “High-speed DRAMs”, EDN, May 23, 1991.
Google Scholar
“Architectural Overview”, Rambus Inc., Mountain View, CA, 1992.
Google Scholar
Rau, B. R., “Pseudo-Randomly Interleaved Memory”, 18th International Symposium on Computer Architecture, May 1991.
Google Scholar
Sklenar, Ivan, “Prefetch Unit for Vector Operation on Scalar Computers”, Computer Architecture News, 20, 4, September 1992.
Google Scholar
Smith, J. E., et al, “The ZS-1 Central Processor”, The Second International Conference on Architectural Support for Programming Languages and Systems, Oct. 1987
Google Scholar
Sohi, G. and Manoj, F., “High Bandwidth Memory Systems for Superscalar Processors”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.
Google Scholar
Tomek, I., The Foundations of Computer Architecture and Organization, Computer Science Press, 1990.
Google Scholar
Valero, M., et. al., “Increasing the Number of Strides for Conflict-Free Vector Access”, 19th International Symposium on Computer Architecture, May 1992.
Google Scholar
Wallach, S., “The CONVEX C-1 64-bit Supercomputer”, Compcon Spring 85, February 1985.
Google Scholar
Wolfe, M., “Optimizing Supercompilers for Supercomputers”, MIT Press, Cambridge, MA, 1989.
Google Scholar
Wulf, W. A., “Evaluation of the WM Architecture”, 19th Annual International Symposium on Computer Architecture, vol 20, no. 2, May 19–21, 1992.
Google Scholar

Download references

Author information

Steven A. Moyer
Present address: Department of Mathematics and Computer Science, Emory University, 30322, Atlanta, GA
Charles Hitchcock
Present address: Fostex R&D, 2 Buck Rd., Suite 2, 03755, Hanover, NH

Authors and Affiliations

Department of Computer Science, University of Virginia, Thornton Hall, 22903, Charlottesville, VA
Sally A. McKee, Steven A. Moyer & Wm. A. Wulf
Thayer School of Engineering, Dartmouth College, 03755, Hanover, NH
Charles Hitchcock

Authors

Sally A. McKee
View author publications
You can also search for this author in PubMed Google Scholar
Steven A. Moyer
View author publications
You can also search for this author in PubMed Google Scholar
Wm. A. Wulf
View author publications
You can also search for this author in PubMed Google Scholar
Charles Hitchcock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jürg Gutknecht

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McKee, S.A., Moyer, S.A., Wulf, W.A., Hitchcock, C. (1994). Increasing memory bandwidth for vector computations. In: Gutknecht, J. (eds) Programming Languages and System Architectures. Lecture Notes in Computer Science, vol 782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57840-4_26

Download citation

DOI: https://doi.org/10.1007/3-540-57840-4_26
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57840-6
Online ISBN: 978-3-540-48356-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Increasing memory bandwidth for vector computations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Memory Controller for Vector Processor

Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer

On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Increasing memory bandwidth for vector computations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Memory Controller for Vector Processor

Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer

On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation