research-article

Design and management of 3D-stacked NUCA cache for chip multiprocessors

Authors:
Jongpil Jung

Korea Advanced Institute of Science and Technology, Daejeon, South Korea

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
View Profile

,
Kyungsu Kang

Korea Advanced Institute of Science and Technology, Daejeon, South Korea

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
View Profile

,
Chong-Min Kyung

Korea Advanced Institute of Science and Technology, Daejeon, South Korea

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
View Profile

GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSIMay 2011Pages 91–96https://doi.org/10.1145/1973009.1973028

Published:02 May 2011Publication History

GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI

Pages 91–96

ABSTRACT

Power and delay induced from long on-chip interconnections are becoming major issues of chip multiprocessor design. Both network-on-chip (NoC) and three-dimensional integration are promising ways to mitigate the interconnection problem. In this paper, we explore the design of 3Dstacked non-uniform cache architecture (NUCA) with onchip network. In addition, this paper investigates the problem of partitioning shared L2 cache for concurrently executing multiple applications in order to improve the system performance in terms of instructions per cycle. The proposed design is evaluated in an integrated power, performance, and temperature simulator. Experimental results show that the proposed method enhances system performance by 23.3% and reduces energy consumption by 17.9% for 16-core processor system compared to conventional design.

References

Intel products. {Online}. http://www.intel.com/products/processor/index.htmGoogle Scholar
Annavaram, M. and et al. 2005. Mitigating Amdahl's Law through EPI Throttling. In Proc. of the 32nd Ann. Int. Symp. on Comp. Architecture (ISCA). IEEE Computer Society, Washington, DC, USA, 298--309. Google ScholarDigital Library
Vangal, S. and et al. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE Journal of Solid-State Circuits, 43, 1, 29--41.Google ScholarCross Ref
Kim, C. and et al. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. SIGARCH Comput. Archit. News 30, 5, 211--222. Google ScholarDigital Library
Loh, G. 2008. 3D-Stacked Memory Architectures for Multi-core Processors. In Proc. of the 35th Ann. Int. Symp. on Comp. Architecture (ISCA). IEEE Computer Society, Washington, DC, USA, 453--464. Google ScholarDigital Library
Kang, K. and et al. 2010. Temperature-Aware Integrated DVFS and Power Gating for Executing Tasks with Runtime Distribution. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 29, 9, 1381--1394. Google ScholarDigital Library
Zia, A. and et al. 2010. A 3-D Cache With Ultra-Wide Data Bus for 3-D Processor-Memory Integration. IEEE Trans. Very Large Scale Integr. Syst. 18, 6, 967--977. Google ScholarDigital Library
Tsai, Y. and et al. 2008. Design space exploration for 3-D cache. IEEE Trans. Very Large Scale Integr. Syst. 16, 4, 444--455. Google ScholarDigital Library
Li, F. and et al. 2006. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. SIGARCH Comput. Archit. News 34, 2, 130--141. Google ScholarDigital Library
Sun, G. and et al. 2009. Exploration of 3D stacked L2 cache design for high performance and efficient thermal control. In Proc. of the 14th ACM/IEEE int. symp. on Low power electronics and design. 295--298. Google ScholarDigital Library
Chang, J. and Sohi, G. 2006. Cooperative Caching for Chip Multiprocessors. SIGARCH Comput. Archit. News 34, 2, 264--276. Google ScholarDigital Library
Qureshi, M. and Patt, Y. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Ann. IEEE/ACM Int. Symp. on Microarchitecture (MICRO). Washington, DC, USA, 423--432. Google ScholarDigital Library
Jung, J. and et al. 2010. Latency-aware Utility-based NUCA Cache Partitioning in 3D-stacked multi-processor systems. In Proc. of the VLSI System on Chip Conference (VLSI-SoC), 125--130.Google Scholar
Cho, S. and et al. 2008. TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation. In Proc. of the 2008 37th Int. Conf. on Parallel Processing (ICPP). IEEE Computer Society, Washington, DC, USA, 446--453. Google ScholarDigital Library
Weiping, L. and et al. 2005. Temperature and supply Voltage aware performance and power modeling at microarchitecture level. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 24, 7, 1042--1053. Google ScholarDigital Library
Huang, W. and et al. 2006. Hotspot: A compact thermal modeling method for CMOS VLSI systems. IEEE Trans. VLSI Sys, 14, 5, 501--513. Google ScholarDigital Library

Index Terms

Design and management of 3D-stacked NUCA cache for chip multiprocessors
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design

Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Read More
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional ...
Read More
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

In chip multiprocessors (CMPs), data access latency depends on the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access latency is vital to achieving performance improvements and scalability of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
May 2011
496 pages
ISBN:9781450306676
DOI:10.1145/1973009
General Chairs:
David Atienza
EPFL, Switzerland
,
Yuan Xie
Penn State University, USA
,
Program Chairs:
Jose L. Ayala
Federal University of Pernambuco, Brazil
,
Ken Stevens
University of Utah, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D IC
L2 cache
cache partitioning
energy
performance
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,156submissions,27%
Upcoming Conference
GLSVLSI '24

Sponsor:

sigda

Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

Clearwater , FL , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 224
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Design and management of 3D-stacked NUCA cache for chip multiprocessors

GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI

ABSTRACT

References

Cited By

Index Terms

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches

Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

NoC-aware cache design for multithreaded execution on tiled chip multiprocessors