High-performance processor design based on 3D on-chip cache

https://doi.org/10.1016/j.micpro.2016.07.009Get rights and content

Highlights

  • We implement a high-performance processor architecture based 3D on-chip cache, Using 3D integration technology.

  • We simulate the performance of the 3D processor and 3D cache at different node, using 3D Cacti tools and theoretical algorithms.

  • The performance of the 3D processor and 3D cache is obviously improved, including power consumption of the storage system, access time and cycle time of the processor, and critical paths delay.

Abstract

Interconnection becomes one of main concerns in current and future microprocessor designs from both performance and consumption. Three-dimensional integration technology, with its capability to shorten the wire length, is a promising method to mitigate the interconnection related issues. In this paper we implement a novel high-performance processor architecture based 3D on-chip cache to show the potential performance and power benefits achievable through 3D integration technology. We separate other logic module and cache module and stack 3D cache with the processor which reduces the global interconnection, power consumption and improves access speed. The performance of 3D processor and 3D cache at different node is simulated using 3D Cacti tools and theoretical algorithms. The results show that comparing with 2D, power consumption of the storage system is reduced by about 50%, access time and cycle time of the processor increase 18.57% and 21.41%, respectively. The reduced percentage of the critical path delay is up to 81.17%.

Introduction

With continued technology scaling and raised the degree of integration, interconnect has emerged as the major source of delay and power consumption. The reduction of interconnect delay and power consumption are of paramount important for deep-sub-micron design, in particularly high-density interconnect layout design [1], [2]. 3D integration technology, using TSV (Through-Silicon Via) to transfer signal, is a promising solution for overcoming obstacle in technology scaling [3], [4], thereby offering an opportunity to improve circuit performance, especially for processor [5], [6].

Despite the merits mentioned above, there are a few works focusing on 3D processor architecture exploration stacking the memory and the processor logic module. In [7], the processor-DRAM-stacked is investigated and turns out that 3D integration technology can effectively reduce the inter-module interconnect length and consumption. The footprint and latency of 3D microprocessor using DRAM as cache is studied in [8]. Paul Reed et al. [9] has studied the 3D integrated memory-processor and further analyzed the sense amplifier. Puttaswamy K [10] has researched a 3D stacked register file and cache in high-performance microprocessor architecture. However, most of these works simply consider the 3D memory and the processor to be different level or only stack entire memory with another memory to increase cache capacity, in addition, there are few works focusing on overall performance of the 3D processor.

In this paper, we firstly explore a high-performance processor architecture based 3D on-chip cache. There is a 3D integrated processor vertically stacked 3D on-chip cache with logic module of processor into a single chip using TSV technology. Therefore, we make a comprehensive analysis about the delay, power consumption and overhead footprint of 3D integrated processor using Elmore delay model and 3D Cacti tool.

The rest of the paper is organized as follows. Section 2 briefly reviews the 3D technology. Section 3 introduces basic design principles of the 3D on-chip cache processor and makes a description of the 3D cache design structure. Section 4, investigating theoretical algorithms, presents and discusses experimental results. We make a conclusion this paper and plan the future work in the last section.

Section snippets

Background

3D integration refers to a variety of technologies which provides electrical connectivity to allow multiple layers of active silicon to be stacked one on top of the other. Each stratum is an active device layer and is processed independently. The layers are integrated together with TSVs which are short, fast and dense, allowing for an incredibly high inter-layer bandwidth that simply cannot be matched by other existent technologies like MCMs or SIPs. For example, state- for-the-art TSV

3D processor architecture

Fig. 2a shows the basic structure of a conventional 3D stacked processor [12]. The processor is divided into memory layers (die #2) and logic unit layer (die #1), which are stacked by vertical interconnect, as shown in Fig. 2a. The structure formed by vertically stacking multiple strata could reduce the processor footprint, the global interconnect length, system power consumption and the critical path delay.

We propose a novel processor architecture which stack the 3D on-chip cache and logic

Analysis and theoretical algorithms

The footprint of cache is calculated by: S=L*WS=S(Nx*Ny)Where S is the footprint of 2D cache,   L and W respectively is length and width of 2D cache, S′ is the footprint of 3D cache, Nx and Ny is the times of bit lines and word lines split.

It is assumed that l and l′ are the length of an internal interconnect path in the 2D and 3D processor, they are written as: l=l1+l2+l3l=l1Nx+l2Ny+lvWhere l1 is the overlap length along cache x-axis, l2 is the overlap length along cache y-axis, l3 is

Conclusion

In this work we propose a 3D processor architecture which uses a 3D on-chip cache to replace the traditional 2D cache. This paper addresses the issue of on-chip cache design in 3D processor integrated structures. It reduces the area of processor, improves performance of 3D processor such as access time, delay, power consumption and overcomes long interconnect obstacles in 2D processor. As a result, the access time of 3D processor is improved to over 18.57% of the baseline case, and the area

References (18)

  • K. Puttaswamy et al.

    Implementing register files for high-performance microprocessors in a die-stacked (3D) technology, emerging VLSI technologies and architectures, 2006

  • K. Puttaswamy et al.

    Implementing caches in a 3D technology for high performance processors, computer design: VLSI in computers and processors, 2005

  • K. Zoschke et al.

    TSV based silicon interposer technology for wafer level fabrication of 3D SiP modules

  • S. Das et al.

    Timing, energy, and thermal performance of three-dimensional integrated circuits

  • B. Black et al.

    3D processing technology and its impact on iA32 microprocessors[C]//computer design: VLSI in computers and processors, 2004

  • LohG.H.

    3D-stacked memory architectures for multi-core processors, ACM SIGARCH computer architecture news

    IEEE Comput. Soc.

    (2008)
  • ChenS.S. et al.

    Processor and DRAM integration by TSV-based 3-D stacking for power-aware SOCs

  • D. Jevdjic et al.

    Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache

    ACM SIGARCH Comput. Arch. News. ACM

    (2013)
  • P. Reed et al.

    Design aspects of a microprocessor data cache using 3D die interconnect technology, Integrated Circuit Design and Technology, 2005

There are more references available in the full text version of this article.
View full text