skip to main content
10.1145/2485922.2485961acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Zombie memory: Extending memory lifetime by reviving dead blocks

Published:23 June 2013Publication History

ABSTRACT

Zombie is an endurance management framework that enables a variety of error correction mechanisms to extend the lifetimes of memories that suffer from bit failures caused by wearout, such as phase-change memory (PCM). Zombie supports both single-level cell (SLC) and multi-level cell (MLC) variants. It extends the lifetime of blocks in working memory pages (primary blocks) by pairing them with spare blocks, i.e., working blocks in pages that have been disabled due to exhaustion of a single block's error correction resources, which would be 'dead' otherwise. Spare blocks adaptively provide error correction resources to primary blocks as failures accumulate over time. This reduces the waste caused by early block failures, making working blocks in discarded pages a useful resource. Even though we use PCM as the target technology, Zombie applies to any memory technology that suffers stuck-at cell failures.

This paper describes the Zombie framework, a combination of two new error correction mechanisms (ZombieXOR for SLC and ZombieMLC for MLC) and the extension of two previously proposed SLC mechanisms (ZombieECP and ZombieERC). The result is a 58% to 92% improvement in endurance for Zombie SLC memory and an even more impressive 11x to 17x improvement for ZombieMLC, both with performance overheads of only 0.1% when memories using prior error correction mechanisms reach end of life.

References

  1. S. Ahn et al., "Highly manufacturable high density phase change memory of 64mb and beyond," in Electron Devices Meeting, 2004. IEDM Technical Digest. IEEE International, Dec. 2004, pp. 907--910.Google ScholarGoogle Scholar
  2. N. Alon and S. Lovett, "Almost k-wise vs. k-wise independent permutations and uniformity for general group actions," in International Workshop on Randomization and Computation (RANDOM), 2012.Google ScholarGoogle Scholar
  3. G. Atwood, "The evolution of phase change memory," Micron, Tech. Rep., 2010.Google ScholarGoogle Scholar
  4. A. Barg and A. Mazumdar, "Codes in permutations and error correction for rank modulation," IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3158--3165, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. W. Burr et al., "Phase change memory technology," Journal of Vacuum Science and Technology B, vol. 28, no. 2, pp. 223--262, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Cho and H. Lee, "Flip-n-write: a simple deterministic technique to improve pram write performance, energy and endurance," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Datta and N. A. Touba, "Designing a fast and adaptive error correction scheme for increasing the lifetime of phase change memories," in VLSI Test Symposium, 2011.Google ScholarGoogle Scholar
  8. J. D. Davis et al., "Supplement to Zombie Memory: Extending memory lifetime by reviving dead blocks," Technical Report: MSR-TR-2013-47, Microsoft Research Silicon Valley, 2013.Google ScholarGoogle Scholar
  9. A. Gabizon and R. Shaltiel, "Invertible zero-error dispersers and defective memory with stuck-at errors," in International Workshop on Randomization and Computation (RANDOM), 2012.Google ScholarGoogle Scholar
  10. J. L. Henning, "SPEC CPU2006 benchmark descriptions," ACM Computer Architecture News, vol. 34, no. 4, Sep. 2006, http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Horowitz, "Modular arithmetic and finite field theory: A tutorial," in Proceedings of the second ACM Symposium on Symbolic and Algebraic Manipulation, ser. SYMSAC '71. New York, NY, USA: ACM, 1971, pp. 188--194. {Online}. Available: http://doi.acm.org/10.1145/800204.806287 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Hwang et al., "Full integration and reliability evaluation of phase-change RAM based on 0.24um-cmos technologies," in 2003 Symposium on VLSI Technology, Jun. 2003.Google ScholarGoogle Scholar
  13. D. Ielmini et al., "Physical interpretation, modeling and impact on phase change memory (PCM) reliability of resistance drift due to chalcogenide structural relaxation," in Electron Devices Meeting, 2007. IEDM 2007. IEEE International, dec. 2007, pp. 939--942.Google ScholarGoogle Scholar
  14. D. Ielmini et al., "Recovery and drift dynamics of resistance and threshold voltages in phase-change memories," Electron Devices, IEEE Transactions on, vol. 54, no. 2, pp. 308--315, feb. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. E. Ipek et al., "Dynamically replicated memory: building reliable systems from nanoscale resistive memories," in Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. ITRS, "Emerging research devices," International Technology Roadmap for Semiconductors, Tech. Rep., 2009.Google ScholarGoogle Scholar
  17. A. N. Jacobvitz et al., "Coset coding to improve the lifetime of memory," in IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Jiang et al., "Rank modulation for flash memories," Information Theory, IEEE Transactions on, vol. 55, no. 6, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. V. Kuznetsov and B. S. Tsybakov, "Coding in a memory with defective cells," Problems of Information Transmission, vol. 10, no. 2, pp. 132--138, 1974.Google ScholarGoogle Scholar
  20. B. C. Lee et al., "Architecting phase change memory as a scalable dram alternative," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Jun. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.-K. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes. Amsterdam, New York: North Holland, 1977.Google ScholarGoogle Scholar
  23. N. Papandreou et al., "Drift-tolerant multilevel phase-change memory," in Proceedings of the 3rd IEEE International Memory Workshop, May 2011, pp. 1--4.Google ScholarGoogle Scholar
  24. M. K. Qureshi, "Pay-as-You-Go: Low overhead hard-error correction for phase change memories," in Proceedings of the 44th International Symposium on Microarchitecture, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. K. Qureshi et al., "Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. K. Qureshi et al., "Scalable high performance main memory system using phase-change memory technology," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Jun. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. K. Qureshi et al., "Morphable memory system: a robust architecture for exploiting multi-level phase change memories," in Proceedings of the 37th Annual International Symposium on Computer Architecture, Jun. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Ralph and M. Stiles, "Spin transfer torques," Journal of Magnetism and Magnetic Materials, vol. 320, no. 7, pp. 1190--1216, 2008. {Online}. Available: http://www.sciencedirect.com/science/article/pii/S0304885307010116.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. Raoux et al., "Phase-change random access memory: a scalable technology," IBM Journal of Research and Development, vol. 52, pp. 465--479, Jul. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Schechter et al., "Use ecp, not ecc, for hard failures in resistive memories," in Proceedings of the 37th Annual International Symposium on Computer Architecture, Jun. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. H. Seong et al., "SAFER: Stuck-at-fault error recovery for memories," in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. H. Seong et al., "Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping," in Proceedings of the 37th Annual International Symposium on Computer Architecture, Jun. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. B. Strukov et al., "The missing memristor found," Nature, vol. 453, pp. 80--83, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  34. B. S. Tsybakov, "Additive group codes for defect correction," Problems of Information Transmission, vol. 11, no. 1, pp. 88--90, 1975.Google ScholarGoogle Scholar
  35. B.-D. Yang et al., "A low power phase-change random access memory using a data-comparison write scheme," in IEEE International Symposium on Circuits and Systems, May 2007.Google ScholarGoogle Scholar
  36. Y. Yehezkeally and M. Schwartz, "Snake-in-the-box codes for rank modulation," Information Theory, IEEE Transactions on, vol. 58, no. 8, Aug 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. H. Yoon et al., "FREE-p: Protecting non-volatile memory against both hard and soft failures," in Proceedings of the 17th Symposium on High Performance Computer Architecture, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. W. Zhang and T. Li, "Characterizing and mitigating the impact of process variations on phase change based memory systems," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W. Zhang and T. Li, "Exploring phase change memory and 3d die-stacking for power/thermal friendly, fast and durable memory architectures," in Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Zhou et al., "A durable and energy efficient main memory using phase change memory technology," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Jun. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Zombie memory: Extending memory lifetime by reviving dead blocks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
      June 2013
      686 pages
      ISBN:9781450320795
      DOI:10.1145/2485922
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
        ICSA '13
        June 2013
        666 pages
        ISSN:0163-5964
        DOI:10.1145/2508148
        Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader