ABSTRACT
Power and delay induced from long on-chip interconnections are becoming major issues of chip multiprocessor design. Both network-on-chip (NoC) and three-dimensional integration are promising ways to mitigate the interconnection problem. In this paper, we explore the design of 3Dstacked non-uniform cache architecture (NUCA) with onchip network. In addition, this paper investigates the problem of partitioning shared L2 cache for concurrently executing multiple applications in order to improve the system performance in terms of instructions per cycle. The proposed design is evaluated in an integrated power, performance, and temperature simulator. Experimental results show that the proposed method enhances system performance by 23.3% and reduces energy consumption by 17.9% for 16-core processor system compared to conventional design.
- Intel products. {Online}. http://www.intel.com/products/processor/index.htmGoogle Scholar
- Annavaram, M. and et al. 2005. Mitigating Amdahl's Law through EPI Throttling. In Proc. of the 32nd Ann. Int. Symp. on Comp. Architecture (ISCA). IEEE Computer Society, Washington, DC, USA, 298--309. Google ScholarDigital Library
- Vangal, S. and et al. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE Journal of Solid-State Circuits, 43, 1, 29--41.Google ScholarCross Ref
- Kim, C. and et al. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. SIGARCH Comput. Archit. News 30, 5, 211--222. Google ScholarDigital Library
- Loh, G. 2008. 3D-Stacked Memory Architectures for Multi-core Processors. In Proc. of the 35th Ann. Int. Symp. on Comp. Architecture (ISCA). IEEE Computer Society, Washington, DC, USA, 453--464. Google ScholarDigital Library
- Kang, K. and et al. 2010. Temperature-Aware Integrated DVFS and Power Gating for Executing Tasks with Runtime Distribution. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 29, 9, 1381--1394. Google ScholarDigital Library
- Zia, A. and et al. 2010. A 3-D Cache With Ultra-Wide Data Bus for 3-D Processor-Memory Integration. IEEE Trans. Very Large Scale Integr. Syst. 18, 6, 967--977. Google ScholarDigital Library
- Tsai, Y. and et al. 2008. Design space exploration for 3-D cache. IEEE Trans. Very Large Scale Integr. Syst. 16, 4, 444--455. Google ScholarDigital Library
- Li, F. and et al. 2006. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. SIGARCH Comput. Archit. News 34, 2, 130--141. Google ScholarDigital Library
- Sun, G. and et al. 2009. Exploration of 3D stacked L2 cache design for high performance and efficient thermal control. In Proc. of the 14th ACM/IEEE int. symp. on Low power electronics and design. 295--298. Google ScholarDigital Library
- Chang, J. and Sohi, G. 2006. Cooperative Caching for Chip Multiprocessors. SIGARCH Comput. Archit. News 34, 2, 264--276. Google ScholarDigital Library
- Qureshi, M. and Patt, Y. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Ann. IEEE/ACM Int. Symp. on Microarchitecture (MICRO). Washington, DC, USA, 423--432. Google ScholarDigital Library
- Jung, J. and et al. 2010. Latency-aware Utility-based NUCA Cache Partitioning in 3D-stacked multi-processor systems. In Proc. of the VLSI System on Chip Conference (VLSI-SoC), 125--130.Google Scholar
- Cho, S. and et al. 2008. TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation. In Proc. of the 2008 37th Int. Conf. on Parallel Processing (ICPP). IEEE Computer Society, Washington, DC, USA, 446--453. Google ScholarDigital Library
- Weiping, L. and et al. 2005. Temperature and supply Voltage aware performance and power modeling at microarchitecture level. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 24, 7, 1042--1053. Google ScholarDigital Library
- Huang, W. and et al. 2006. Hotspot: A compact thermal modeling method for CMOS VLSI systems. IEEE Trans. VLSI Sys, 14, 5, 501--513. Google ScholarDigital Library
Index Terms
- Design and management of 3D-stacked NUCA cache for chip multiprocessors
Recommendations
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitectureL2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional ...
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and CompilersIn chip multiprocessors (CMPs), data access latency depends on the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access latency is vital to achieving performance improvements and scalability of ...
Comments