Sustainable disturbance crosstalk mitigation in deeply scaled phase-change memory

https://doi.org/10.1016/j.suscom.2020.100410Get rights and content

Highlights

  • Experiments show fewer disturbed cells, leading to fewer extra write/read operations.

  • Our technique improves performance by 47% over the state-of-the-art.

  • Performance and endurance are improved by 47% and 42%, respectively.

  • Operational energy is reduced by 36% with only a ∼1% increase in embodied energy.

  • Embodied energy can be offset within 4 days or take 4 years depending on scenario.

  • For most workloads, the increased lifetime alone recoups the embodied energy.

Abstract

While Phase Change Memory (PCM) is gaining popularity in next generation systems compared to conventional DRAM and Flash, write disturbance due to crosstalk remains a significant challenge to PCM performance and reliability. To address this concern we propose a multi-tiered compression (MTC) technique combined with novel encoding that can decrease the probability of write disturbance. MTC compresses a high proportion (>94%) of cachelines by a small predictable amount (e.g., 40- or 56-bits of a 512-bit block) while maintaining data similarity vital for optimizing PCM writes. By detecting data patterns prone to crosstalk, we use encoding, correction pointers, and a hybrid approach to reduce the instances of write disturbance. We use MTC reclaimed bits to store the encoding. Thus, our approach requires only five additional auxiliary bits per 512-bit cacheline, minimizing the embodied energy (fabrication) overhead to mitigate write disturbance. Additionally, through a multi-objective optimization, dramatic endurance improvements are possible while maintaining nearly all write disturbance improvement.

Our technique improves performance, endurance and write energy by 47%, 42% and 36% versus the state-of-the-art with nominal (circa 1%) increases to embodied energy. When considering the full life-cycle impact, the additional area of our MTC with encoding approach is mitigated by the use-phase benefits within 60 days for a high performance system and in less than four years of use for a desktop system. In nearly all studied scenarios, the improved lifetime provided by our MTC and encoding approach easily recoups the required additional upfront manufacturing energy.

Introduction

New memory technologies continue to emerge with promises of improved density, non-volatility, energy efficiency benefits, and increases in performance. Of course each new technology also comes with unique challenges which must be addressed to reach their full potential. Phase Change Memory (PCM) is likely the most mature of these technologies and is receiving considerable attention as a useful element in tiered memory solutions. Compared to DRAM, PCM has a competitive access latency, with performance overheads of only around 20%. However, there are considerable advantages in density [1] and of static energy. PCM has been recently commercialized by Micron and Intel in a “3D Xpoint” memory with the commercial name of Optane Persistent Memory.

Like many of its emerging memory competitors, PCM still faces a number of challenges that limit its viability for large scale Flash and/or DRAM replacement. Recent studies show that scaling PCM to technologies below 20 nm significantly reduces the reliability of memory cells [2] due to impacts from crosstalk. Scaling compacts more cells into the same die area and contracts the spacing between bitlines and wordlines. When the cells are placed in closer proximity, the heat for changing the state of a cell can impact neighboring cells both along the same wordline or between adjacent wordlines. This phenomenon, referred to as write disturbance [2], is amplified by this scaling. While write disturbance can be mitigated by repeated writing attempts, this creates several problematic behaviors. For each memory write, write disturbance errors within the wordline must be corrected immediately by rewriting disturbed cells, requiring additional read and rewrite operations. Furthermore, rewritten cells themselves may cause additional write disturbance, leading to many write-read iterations during a single memory write operation. While this is undesirable for correcting bits within the same word, bits may be affected in other words from neighboring bitlines, which can require considerable additional reads/writes and subsequent performance delays and energy overheads.

Several techniques have been proposed to minimize disturbance due to crosstalk. One type of crosstalk is crosstalk between bits activated in the selected wordline. We refer to this as wordline crosstalk. DIN attempts to reduce wordline crosstalk through compression and encoding [2]. Another form of crosstalk is crosstalk in the neighboring memory rows of the active wordline. We refer to this as bitline crosstalk. Authors in [3] use extra storage dedicated for fault tolerance to store disturbed data in the neighboring rows. ADAM [4] attempts to address both bitline and wordline crosstalk through interleaving data in adjacent rows with dummy data to avoid useful data from being disturbed. Unfortunately, these previously proposed techniques typically require either significant compression or the introduction of additional fault tolerance bits to be effective.

One way to address this concern is to reclaim space for this encoding using a lightweight compression that is effective at reclaiming a small amount of space in a high percentage of cases and to develop a fault tolerance technique that minimizes the necessary reclaimed space to to store encoding bits and be effective. The alternative is to add additional storage for fault tolerance which increases costs, increases environmental impacts, and reduces overall sustainability [5]. Prior work, such as WLCRC [6], shows that for a wide range of typical workloads, only 30% of cacheline writes to memory can be compressed using lightweight compression approaches such as ADAM. Thus, a technique requiring compression to mitigate bitline and wordline crosstalk would be ineffective for more than 70% of the data blocks written into memory for typical applications. WLCRC takes a first step toward more effective compression coverage such that it achieves a small compression ratio but can be applied to 90% of memory blocks written to memory. In this paper, we propose a multi-tiered compression (MTC) technique that compresses more than 94% of memory writes. MTC reclaims at least as many bits as WLCRC (40 bits) and often may reclaim 40% more space (56 bits) without introducing extra complexity while achieving this better compression coverage. Additionally, MTC retains data similarity between accesses which ensures techniques such as differential write remain effective. We explore two approaches to utilize the reclaimed bits of MTC to mitigate write disturbance. First, we use a word-level encoding that reduces the number of aggressor cells that lead to write disturbance of neighboring victim cells. Second, using pointers we track the locations of the most impactful aggressor cells, i.e., aggressor cells with the largest number of neighboring victim cells. Aggressors with the highest disturbance probability are inverted to avoid changing their cell. A pointer is used to indicate which locations were inverted for disturbance mitigation in order to recover the original value. Additionally, we propose a hybrid approach that utilizes both encoding and pointers to minimize both wordline and bitline write disturbance.

In particular, this paper makes the following contributions:

  • We characterize realistic workloads and use them to stress our multi-tiered compression technique and examine its suitability to reduce write disturbance in deeply-scaled PCM.

  • We propose a new data encoding with a cost function that is tuned to reduce both bitline and wordline write disturbance.

  • We propose a hybrid encoding with pointers technique that best filters out aggressor cells that disturb potential victim cells.

  • We discuss the holistic sustainability of a hybrid approach to write disturbance mitigation for techniques that introduce additional storage for fault tolerance.

  • We provide additional cost function optimization studies, which further enhance the effectiveness of the multi-tiered compression technique.

The remainder of the paper is organized as follows: Section 2 provides relevant background and related work for memory semiconductor sustainability, PCM, and write disturbance. Section 3 introduces the experimental settings used in our evaluation. We describe and characterize MTC for representative workloads in Section 4 and then present applying coset encoding and pointers for write disturbance minimization in Section 5. Section 6 introduces our combined compression with encoding approach. A multi-objective approach for minimizing energy and improving endurance is related in Section 7. Section 8 evaluates the efficiency of our approach against the state-of-the-art approaches. Finally, we conclude the paper in Section 9.

Section snippets

Background and related work

To place the problem of write disturbance in the context of system sustainability requires a background discussion of several factors. PCM dynamic (operational) energy is dominated by its write operation. Thus, write disturbance has a significant impact on this element of energy consumption. However, semiconductors have a significant portion of their lifetime energy consumed prior to being put into service. This embodied energy is the energy from manufacturing the device. The lifetime of PCM,

Experimental setup

To study our newly proposed approaches we used a trace driven simulator to perform experimental tests. The input traces to our simulator were collected with Virtutech Simics [21]. It is widely assumed that PCM employs differential write [22], i.e., writing bits only when the value differs from the previously stored value. To evaluate this, for each memory write transaction our traces store both the value to be stored as well as the value to be overwritten in order to compute the differential

Multi-tiered compression

Typically, several adjacent data elements (words) of a given size form a 64-byte cacheline. For example, a cacheline may encompass eight 64-bit double-precision floating-point values, sixteen 32-bit integers, or thirty-two 16-bit floating-point values. Significant similarities in adjacent data elements stored in on-chip caches and off-chip memories have been observed in prior work [23], [24], [20], [25], [26], [27]. The basis of so called “in memory compression” algorithms is in leveraging this

Coset coding versus pointer approach

Coset encoding uses a translation function to map each data block into multiple codeword candidates. The codeword candidate that minimizes a cost function is then selected and written into the system and auxiliary bits are used to record which translation function was used. To develop a coset approach to minimize write disturbance, we can take advantage of write disturbance asymmetry in PCM, as some cells have a high probability of disturbance and others are “safe.” Specifically, the cost

Combined compression and encoding

While coset encoding in Section 5 outperforms ADAM and 4pointers, it incurs a more than 6% storage overhead. Our goal is to achieve the same performance (write disturbance mitigation) as coset encoding while incurring minimal or negligible area, and consequently embodied energy, overhead. Note that disturbance crosstalk occurs in the data bits and auxiliary bits, therefore removing auxiliary storage overhead can also lead to further aggressor cell reduction.

We showed in Section 4 that

Multi-objective optimization

In Section 5, the evaluation for selecting the best coset candidate used a cost function which focused on minimizing potential write disturbance errors that lead to extra writes. While this strategy provided the best probability of reducing extra writes, in many instances it may degrade endurance. For example, if the number of potential write disturbance errors was lower for the compliment of a data block, the compliment was written instead. However, due to data locality we expect a

Evaluation

In this section, we assess the efficiency of various iso-area approaches. Specifically, we compare our proposed approach that only uses 5 auxiliary bits per 512-bit cacheline compared to ADAM [4] that uses 1 auxiliary bit per 512-bit cacheline (both <1% overhead). Also, we consider the impact of adding 32 auxiliary bits while also employing compression to compare our proposed approach versus 4pointers and “CosetCoding” with the same area overheads. Note that 32 auxiliary bits provides the two

Conclusion

Deep scaling of PCM (e.g., below 22 nm) increases crosstalk among cells and leads to reliability concerns such as write disturbance [2]. In this paper, we propose a hybrid approach that reduces the probability of write disturbance in order to minimize its performance and energy overheads, while also improving overall write energy and endurance of the PCM system. In particular, we explore a coset and pointer technique to reduce the aggressor cells in close proximity to potential victim cells

Author contributions

Seyed Mohammad Seyedzadeh: Conceptualization (IGSC submission), Data Curation (IGSC submission), Formal Analysis (IGSC submission), Investigation (IGSC submission), Methodology (IGSC submission), Writing – original draft

Donald Kline, Jr: Conceptualization (new material after IGSC), Data Curation (new material after IGSC), Formal Analysis (new material after IGSC), Investigation (new material after IGSC), Methodology (new material after IGSC), Writing – original draft, Writing - review and

Conflict of interest

The authors declare no conflict of interest.

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgements

This work was supported by NSF Graduate Research Fellowship Award 1747452, and by SHREC industry and agency members and the I/UCRC Program of the National Science Foundation under Grant CNS-1738783.

References (34)

  • J. Zhang et al.

    Data block partitioning methods to mitigate stuck-at faults in limited endurance memories

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2018)
  • B.C. Lee et al.

    Architecting phase change memory as a scalable dram alternative

    ISCA

    (2009)
  • L. Jiang et al.

    Mitigating write disturbance in super-dense phase change memories

    DSN

    (2014)
  • R. Wang et al.

    Sd-pcm: constructing reliable super dense phase change memory under write disturbance

    ASPLOS

    (2015)
  • S. Swami et al.

    Adam: architecture for write disturbance mitigation in scaled phase change memory

    DATE

    (2018)
  • D. Kline et al.

    Sustainable ic design and fabrication

    IGSC

    (2017)
  • S. Seyedzadeh et al.

    Enabling fine-grain restricted coset coding through word-level compression for pcm

    HPCA

    (2018)
  • M.A. Yao et al.

    Comparative assessment of life cycle assessment methods used for personal computers

    Environ. Sci. Technol.

    (2010)
  • Apple Inc

    Environmental Report

    (2015)
  • S.B. Boyd

    Life-Cycle Assessment of Semiconductors

    (2012)
  • S.B. Boyd et al.

    Life-cycle energy demand and global warming potential of computational logic

    Environ. Sci. Technol.

    (2009)
  • ISO

    Environmental Management – Life Cycle Assessment – Requirements and Guidelines

    (2006)
  • UNEP/SETAC

    Life Cycle Approaches: The Road From Analysis to Practice

    (2005)
  • C.F. Murphy et al.

    Development of parametric material, energy, and emission inventories for wafer fabrication in the semiconductor industry

    Environ. Sci. Technol.

    (2003)
  • S.W. Jones

    Understanding the Costs of Mems Products

    (2009)
  • I. Bayram et al.

    Modeling stt-ram fabrication cost and impacts in nvsim

    IGSC

    (2016)
  • T. Nirschl et al.

    Write strategies for 2 and 4-bit multi-level phase-change memory

    IEDM

    (2007)
  • Cited by (1)

    View full text