Missing the memory wall: the case for processor/memory integration

Authors:
Ashley Saulsbury

Swedish Institute of Computer Science

Swedish Institute of Computer Science
View Profile

,
Fong Pong

Sun Microsystems Computer Corporation

Sun Microsystems Computer Corporation
View Profile

,
Andreas Nowatzyk

Sun Microsystems Computer Corporation

Sun Microsystems Computer Corporation
View Profile

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architectureMay 1996Pages 90–101https://doi.org/10.1145/232973.232984

Published:01 May 1996Publication History

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

Pages 90–101

ABSTRACT

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening gap between CPU and main memory speeds. Yet, many large applications do not operate well on these systems and are limited by the memory subsystem performance.This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and complexity. Based on a design study using the next generation 0.25µm, 256Mbit dynamic random-access memory (DRAM) process and on the analysis of existing machines, we show that processor memory integration can be used to build competitive, scalable and cost-effective MP systems.We present results from execution driven uni- and multi-processor simulations showing that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.

References

1.Wulf, Wm.A and McKee, S.A. Hitting the Memory Walh Implications of the Obvious. ACM Computer Architecture News. Vol.23, No.1 March 1995.]] Google ScholarDigital Library
2.Wilkes, M.V., The Memory Wall and the CMOS End-Point, ACM Computer Architecture News. Vol. 23, No. 4 September 1995.]] Google ScholarDigital Library
3.SPEC Newsletter; URL: http : //www. specbench, org/ results .html]]Google Scholar
4.Synopsys Inc., 700 East Middlefield Rd. Mountain View, California, CA 94043.]]Google Scholar
5.Horiguchi, M. et.al., An Experimental 220MHz 1Gb DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.252.]]Google ScholarCross Ref
6.Sugibayashi, T. et.al., A 1Gb DRAM for file Applications, IEEE international Solid-State Circuits Conference 1995. San Francisco, p.254.]]Google ScholarCross Ref
7.Miyano, S. et.al., A 1.6GB/s Data-Transfer-Rate 8Mb Embedded DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.300]]Google Scholar
8.MicroSparc documentation, internal communication with Sparc Technology Business inc.]]Google Scholar
9.Shimizu, et.al. A Multimedia 32b RISC Microprocessor with 16Mb DRAM, International Solid-State-Circuits Conference, February 1996, pp216-217.]]Google Scholar
10.MIPS R4300i Processor Reference Manual, URL: http : / / www.mips.com/r4300i/R4300i B.html]]Google Scholar
11.Nowatzyk, A., Browne, M., Kelly, E. and Parkin, M. S-Connect: from Network of Workstations to Supercomputer Performance. Proceedings of the 22nd International Symposium on Computer Architecture, June 1994.]] Google ScholarDigital Library
12.Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B. and Vishin, S. The S3.mp Scalable Shared Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, 1995.]]Google Scholar
13.MB81164840- CMOS 4x2Mx8 Synchronous DRAM, Fujitsu Microelectronics Inc., 3455 N. first St., San Jose CA 95134,]]Google Scholar
14.RDRAM Reference Manual, Rambus Inc., 2465 Latharn Street, Mountain View, CA 94040.]]Google Scholar
15.Yoo, J.H. et.al., A 32-bank 1Gb DRAM with 1GB/s Bandwidth, IEEE international Solid-State Circuits Conference 1996, San Francisco, p.378.]]Google Scholar
16.Przybylski, S., MoSys Reveals MDRAM Architecture,/Vlicroprocessor Report, Vol 9:17, Dec 25, 1995, MicroDesign Resources, Sebastopol, CA95472. ISSN 0899-9341]]Google Scholar
17.Koike, H., et.al., A 30ns 64Mb DRAM with Built-in Self-Test and Repair Function,iSSCC t 992, San Francisco, p 150]]Google Scholar
18.Jouppi, N. Improving Direct-Mapped Cache Performance by Addition of a Small Fully-Associative Cache and Prefetch Buffer, Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990 pages 364-373]] Google ScholarDigital Library
19.Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B. and Vishin, S.Exploiting Parallelism in Cache Coherency Protocol engines, Europar 1995, Stockholm, Sweden]] Google ScholarDigital Library
20.Lenoski, D. The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor. PhD Dissertation, Stanford University, December 1991.]] Google ScholarDigital Library
21.Saulsbury, A. et.al. An Argument for Simple COMA, 1st IEEE Symposium on High Performance Computer Architecture January 22-25th 1995, Rayleigh, North Carolina, USA; pages 276-285.]] Google ScholarDigital Library
22.Cmelik, B. The SHADE simulator, Sun-Labs Technical Report, 1993]]Google Scholar
23.Marsan, G.,Conti, A class of generalized stochastic petrinets for the performance evaluation of multiprocessor systems, ACM Transactions on Computer Systems, 2(2): 93, May 1984]] Google ScholarDigital Library
24.Dubois, M., Skeppstedt, J., Ricciulli, L., Ramamurthy, K. and StenstrOm, P. The Detection and Elimination of Useless Misses in Multiprocessors. Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 88- 97, May 1993.]] Google ScholarDigital Library
25.Singh, J.P., Weber, W.-D., and Gupta, A. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1):5-44, March 1992.]] Google ScholarDigital Library
26.Brorsson, M., Dahlgren, E, Nilsson, H. and Stenstr6m, P. The CacheMire Test Bench - A Flexible and Effective Approach for Simulation of Multiprocessors. Proceedings of the 26th Annual Simulation Symposium, pp. 115-124, 1993,]]Google Scholar
27.The Transputer Reference Manual, 1988, INMOS Ltd., Pub. Prentice Hall, ISBN 0-13-929001-X.]]Google Scholar
28.Dally, W.J. et. al. M-Machine Microarchitecture, Tech Report, Artificial Intelligence Lab MIT, Cambridge, MA. Jan 1993]]Google Scholar
29.Kogge, P.M., EXECUBE- A New Architecture for Scalable MPPs, 1994 international Conference on Parallel Processing.]] Google ScholarDigital Library
30.ADSP-21060 SHAR C Super Harvard Architecture Computer, ANALOG DEVICES, Norwood, MA, Oct. 1993.]]Google Scholar

Index Terms

Missing the memory wall: the case for processor/memory integration
1. Computing methodologies
  1. Modeling and simulation
2. Hardware

Recommendations

Missing the memory wall: the case for processor/memory integration
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening ...
Read More
Cache Design with Domain Wall Memory
Domain wall memory (DWM) is a recently developed spin-based memory technology in which several bits of data are densely packed into the domains of a ferromagnetic wire. DWM has shown great promise in enabling non-volatile memory with very high density and ...
Read More
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack
We propose to overcome the memory capacity limitation of GPUs with a <italic>Heterogeneous Memory Stack (HMS)</italic> that integrates Storage Class Memory (SCM) and DRAM in a 3D memory stack. By effectively utilizing the DRAM as a cache, the HMS ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
May 1996
318 pages
ISBN:0897917863
DOI:10.1145/232973
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
ACM SIGARCH Computer Architecture News Volume 24, Issue 2
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
May 1996
303 pages
ISSN:0163-5964
DOI:10.1145/232974
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
Issue’s Table of Contents
Copyright © 1996 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1996
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 148
  Total Citations
  View Citations
- 1,343
  Total Downloads
- Downloads (Last 12 months)197
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Missing the memory wall: the case for processor/memory integration

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Missing the memory wall: the case for processor/memory integration

Cache Design with Domain Wall Memory

Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack