ABSTRACT
Zombie is an endurance management framework that enables a variety of error correction mechanisms to extend the lifetimes of memories that suffer from bit failures caused by wearout, such as phase-change memory (PCM). Zombie supports both single-level cell (SLC) and multi-level cell (MLC) variants. It extends the lifetime of blocks in working memory pages (primary blocks) by pairing them with spare blocks, i.e., working blocks in pages that have been disabled due to exhaustion of a single block's error correction resources, which would be 'dead' otherwise. Spare blocks adaptively provide error correction resources to primary blocks as failures accumulate over time. This reduces the waste caused by early block failures, making working blocks in discarded pages a useful resource. Even though we use PCM as the target technology, Zombie applies to any memory technology that suffers stuck-at cell failures.
This paper describes the Zombie framework, a combination of two new error correction mechanisms (ZombieXOR for SLC and ZombieMLC for MLC) and the extension of two previously proposed SLC mechanisms (ZombieECP and ZombieERC). The result is a 58% to 92% improvement in endurance for Zombie SLC memory and an even more impressive 11x to 17x improvement for ZombieMLC, both with performance overheads of only 0.1% when memories using prior error correction mechanisms reach end of life.
- S. Ahn et al., "Highly manufacturable high density phase change memory of 64mb and beyond," in Electron Devices Meeting, 2004. IEDM Technical Digest. IEEE International, Dec. 2004, pp. 907--910.Google Scholar
- N. Alon and S. Lovett, "Almost k-wise vs. k-wise independent permutations and uniformity for general group actions," in International Workshop on Randomization and Computation (RANDOM), 2012.Google Scholar
- G. Atwood, "The evolution of phase change memory," Micron, Tech. Rep., 2010.Google Scholar
- A. Barg and A. Mazumdar, "Codes in permutations and error correction for rank modulation," IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3158--3165, July 2010. Google ScholarDigital Library
- G. W. Burr et al., "Phase change memory technology," Journal of Vacuum Science and Technology B, vol. 28, no. 2, pp. 223--262, 2010.Google ScholarCross Ref
- S. Cho and H. Lee, "Flip-n-write: a simple deterministic technique to improve pram write performance, energy and endurance," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
- R. Datta and N. A. Touba, "Designing a fast and adaptive error correction scheme for increasing the lifetime of phase change memories," in VLSI Test Symposium, 2011.Google Scholar
- J. D. Davis et al., "Supplement to Zombie Memory: Extending memory lifetime by reviving dead blocks," Technical Report: MSR-TR-2013-47, Microsoft Research Silicon Valley, 2013.Google Scholar
- A. Gabizon and R. Shaltiel, "Invertible zero-error dispersers and defective memory with stuck-at errors," in International Workshop on Randomization and Computation (RANDOM), 2012.Google Scholar
- J. L. Henning, "SPEC CPU2006 benchmark descriptions," ACM Computer Architecture News, vol. 34, no. 4, Sep. 2006, http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf. Google ScholarDigital Library
- E. Horowitz, "Modular arithmetic and finite field theory: A tutorial," in Proceedings of the second ACM Symposium on Symbolic and Algebraic Manipulation, ser. SYMSAC '71. New York, NY, USA: ACM, 1971, pp. 188--194. {Online}. Available: http://doi.acm.org/10.1145/800204.806287 Google ScholarDigital Library
- Y. Hwang et al., "Full integration and reliability evaluation of phase-change RAM based on 0.24um-cmos technologies," in 2003 Symposium on VLSI Technology, Jun. 2003.Google Scholar
- D. Ielmini et al., "Physical interpretation, modeling and impact on phase change memory (PCM) reliability of resistance drift due to chalcogenide structural relaxation," in Electron Devices Meeting, 2007. IEDM 2007. IEEE International, dec. 2007, pp. 939--942.Google Scholar
- D. Ielmini et al., "Recovery and drift dynamics of resistance and threshold voltages in phase-change memories," Electron Devices, IEEE Transactions on, vol. 54, no. 2, pp. 308--315, feb. 2007.Google ScholarCross Ref
- E. Ipek et al., "Dynamically replicated memory: building reliable systems from nanoscale resistive memories," in Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2010. Google ScholarDigital Library
- ITRS, "Emerging research devices," International Technology Roadmap for Semiconductors, Tech. Rep., 2009.Google Scholar
- A. N. Jacobvitz et al., "Coset coding to improve the lifetime of memory," in IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013. Google ScholarDigital Library
- A. Jiang et al., "Rank modulation for flash memories," Information Theory, IEEE Transactions on, vol. 55, no. 6, 2009. Google ScholarDigital Library
- A. V. Kuznetsov and B. S. Tsybakov, "Coding in a memory with defective cells," Problems of Information Transmission, vol. 10, no. 2, pp. 132--138, 1974.Google Scholar
- B. C. Lee et al., "Architecting phase change memory as a scalable dram alternative," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Jun. 2009. Google ScholarDigital Library
- C.-K. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2005. Google ScholarDigital Library
- F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes. Amsterdam, New York: North Holland, 1977.Google Scholar
- N. Papandreou et al., "Drift-tolerant multilevel phase-change memory," in Proceedings of the 3rd IEEE International Memory Workshop, May 2011, pp. 1--4.Google Scholar
- M. K. Qureshi, "Pay-as-You-Go: Low overhead hard-error correction for phase change memories," in Proceedings of the 44th International Symposium on Microarchitecture, 2011. Google ScholarDigital Library
- M. K. Qureshi et al., "Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
- M. K. Qureshi et al., "Scalable high performance main memory system using phase-change memory technology," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Jun. 2009. Google ScholarDigital Library
- M. K. Qureshi et al., "Morphable memory system: a robust architecture for exploiting multi-level phase change memories," in Proceedings of the 37th Annual International Symposium on Computer Architecture, Jun. 2010. Google ScholarDigital Library
- D. Ralph and M. Stiles, "Spin transfer torques," Journal of Magnetism and Magnetic Materials, vol. 320, no. 7, pp. 1190--1216, 2008. {Online}. Available: http://www.sciencedirect.com/science/article/pii/S0304885307010116.Google ScholarCross Ref
- S. Raoux et al., "Phase-change random access memory: a scalable technology," IBM Journal of Research and Development, vol. 52, pp. 465--479, Jul. 2008. Google ScholarDigital Library
- S. Schechter et al., "Use ecp, not ecc, for hard failures in resistive memories," in Proceedings of the 37th Annual International Symposium on Computer Architecture, Jun. 2010. Google ScholarDigital Library
- N. H. Seong et al., "SAFER: Stuck-at-fault error recovery for memories," in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2010. Google ScholarDigital Library
- N. H. Seong et al., "Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping," in Proceedings of the 37th Annual International Symposium on Computer Architecture, Jun. 2010. Google ScholarDigital Library
- D. B. Strukov et al., "The missing memristor found," Nature, vol. 453, pp. 80--83, 2008.Google ScholarCross Ref
- B. S. Tsybakov, "Additive group codes for defect correction," Problems of Information Transmission, vol. 11, no. 1, pp. 88--90, 1975.Google Scholar
- B.-D. Yang et al., "A low power phase-change random access memory using a data-comparison write scheme," in IEEE International Symposium on Circuits and Systems, May 2007.Google Scholar
- Y. Yehezkeally and M. Schwartz, "Snake-in-the-box codes for rank modulation," Information Theory, IEEE Transactions on, vol. 58, no. 8, Aug 2012.Google ScholarDigital Library
- D. H. Yoon et al., "FREE-p: Protecting non-volatile memory against both hard and soft failures," in Proceedings of the 17th Symposium on High Performance Computer Architecture, 2011. Google ScholarDigital Library
- W. Zhang and T. Li, "Characterizing and mitigating the impact of process variations on phase change based memory systems," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
- W. Zhang and T. Li, "Exploring phase change memory and 3d die-stacking for power/thermal friendly, fast and durable memory architectures," in Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2009. Google ScholarDigital Library
- P. Zhou et al., "A durable and energy efficient main memory using phase change memory technology," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Jun. 2009. Google ScholarDigital Library
Index Terms
- Zombie memory: Extending memory lifetime by reviving dead blocks
Recommendations
Zombie memory: Extending memory lifetime by reviving dead blocks
ICSA '13Zombie is an endurance management framework that enables a variety of error correction mechanisms to extend the lifetimes of memories that suffer from bit failures caused by wearout, such as phase-change memory (PCM). Zombie supports both single-level ...
Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer SystemsCompared to planar NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive ...
Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation
SIGMETRICS '18Compared to planar NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive ...
Comments