ABSTRACT
This paper proposes implicit-storing to extend the logical capacity of a memory array without increasing its physical capacity by leveraging the array's error-correction-codes to infer the implicitly stored bits. Implicit-storing is related to error-code-tagging, a technique that distinguishes between faults in data and invariant attributes of a location when the attributes are not stored in the memory array but are encoded in the error-correction-codes. Both error-code-tagging and implicit-storing cause a code-strength reduction due to their encoding of additional information in the code meant to only protect data.
Redundant-encoding-of-attributes is introduced to improve the strength of a code by encoding same information in multiple codewords in a cache or memory. We demonstrate how EREA and IREA, two derivatives of redundant-encoding, alleviate the code-strength reduction experienced by error-code-tagging and implicit-storing respectively.
Implementing the proposed methods requires minor modifications in the encoding and decoding logic of the baseline error-correction scheme used in this work. The paper discusses several uses of the proposed schemes to help demonstrate their usefulness.
- Artisan Memory Compilers. www.arm.com/products/physical-ip/embedded-memory-ip, 2013.Google Scholar
- Cortex-A9 technical reference manual. infocenter.arm.com, 2010.Google Scholar
- Cortex-r4 and cortex-r4f technical reference manual. infocenter.arm.com, 2010.Google Scholar
- J. Abella, P. Chaparro, X. Vera, J. Carretero, and A. González. On-line failure detection and confinement in caches. In IOLTS, pages 3--9, 2008. Google ScholarDigital Library
- AMD Corporation. BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh Processors, 2009. Order Number> 32559 Rev. 3.16 Nov. 2009.Google Scholar
- S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. In DAC '03, pages 338--342, 2003. Google ScholarDigital Library
- K. Bowman, J. Tschanz, C. Wilkerson, S.-L. Lu, T. Karnik, V. De, and S. Borkar. Circuit techniques for dynamic variation tolerance. In DAC46, pages 4--7, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- A. Dixit and A. Wood. The impact of new technology on soft error rates. In SELSE11, Mar. 2011.Google ScholarCross Ref
- P. Elias. Coding for two noisy channels. In The 3rd London Symposium, Information Theory, pages 61--76, 1955.Google Scholar
- R. H. Gumpertz. Combining tags with error codes. In ISCA, 1983. Google ScholarDigital Library
- R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 26(2):147--160, 1950.Google ScholarCross Ref
- M. Y. Hsiao. A class of optimal minimum odd-weight-column sec-ded codes. IBM Journal of Research and Development, 14(4):395--401, july 1970. Google ScholarDigital Library
- E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo, and T. Toba. Impact of scaling on neutron-induced soft error in srams from a 250 nm to a 22 nm design rule. IEEE Transactions, Electron Devices on, 57(7):1527--1538, 2010.Google ScholarCross Ref
- Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide, Part 2, 2013. Order Number> 253669-046US March 13.Google Scholar
- C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The amd opteron processor for multiprocessor servers. IEEE, Micro, 23(2): 66--76, march-april 2003. Google ScholarDigital Library
- J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th International Symposium on Microarchitecture, pages 197--209, Dec. 2007. Google ScholarDigital Library
- S. Li, K. Chen, M. yu Hsieh, N. Muralimanohar, C. D. Kersey, J. B. Brockman, A. F. Rodrigues, and N. P. Jouppi. System implications of memory reliability in exascale computing. In SC, 2011. Google ScholarDigital Library
- R. E. Lyons and W. Vanderkulk. The use of triple-modular redundancy to improve computer reliability. IBM Journal of Research and Development, 6(2): 200--209, april 1962. Google ScholarDigital Library
- M. J. Mack, W. M. Sauer, S. B. Swaney, and B. G. Mealey. Ibm power6 reliability. IBM Journal of Research and Development, 51(6): 763--774, nov. 2007. Google ScholarDigital Library
- M. Manoochehri, M. Annavaram, and M. Dubois. Cppc: correctable parity protected cache. In ISCA 38, pages 223--234, 2011. Google ScholarDigital Library
- C. McNairy and R. Bhatia. Montecito: a dual-core, dual-thread itanium processor. IEEE, Micro, 25(2): 10--20, 2005. Google ScholarDigital Library
- C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. IEEE, Micro, 23(2): 44--55, march-april 2003. Google ScholarDigital Library
- A. Meixner, M. E. Bauer, and D. J. Sorin. Argus: Low-cost, comprehensive error detection in simple cores. IEEE Micro, 28(1):52--59, 2008. Google ScholarDigital Library
- D. A. Patterson, G. A. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (raid). In SIGMOD Conference, pages 109--116, 1988. Google ScholarDigital Library
- W. Peterson and E. Weldon. Error Correcting Codes. MIT Press, 1972.Google Scholar
- V. Sridharan and D. Liberty. A study of dram failures in the field. In SC, 2012. Google ScholarDigital Library
- G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure program execution via dynamic information flow tracking. SIGARCH Comput. Archit. News, 32(5):85--96, Oct. 2004. Google ScholarDigital Library
- J. Suh, M. Manoochehri, M. Annavaram, and M. Dubois. Soft error benchmarking of l2 caches with parma. In SIGMETRICS, pages 85--96, 2011. Google ScholarDigital Library
- A. Tipton, J. Pellish, J. Hutson, R. Baumann, X. Deng, A. Marshall, M. Xapsos, H. Kim, M. Friendlich, M. Campola, C. Seidleck, K. LaBel, M. Mendenhall, R. Reed, R. Schrimpf, R. Weller, and J. Black. Device-orientation effects on multiple-bit upset in 65 nm srams. IEEE Transactions, Nuclear Science on, 55(6):2880--2885, 2008.Google ScholarCross Ref
- S. Wang, J. Hu, and S. Ziavras. On the characterization and optimization of on-chip cache reliability against soft errors. IEEE Transactions, Computers on, 58(9): 1171--1184, sept. 2009. Google ScholarDigital Library
- C. Weaver, J. Emer, S. S. Mukherjee, and S. K. Reinhardt. Techniques to reduce the soft error rate of a high-performance microprocessor. In Proceedings of the 31st annual international symposium on Computer architecture, ISCA 31, 2004. Google ScholarDigital Library
- C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. Trading off cache capacity for reliability to enable low voltage operation. In ISCA35, pages 203--214, June 2008. Google ScholarDigital Library
- A. Wood, R. Jardine, and W. Bartlett. Data integrity in HP nonstop servers. In SELSE, Apr. 2006.Google Scholar
- J. F. Ziegler, H. W. Curtis, H. P. Muhlfeld, C. J. Montrose, B. Chin, M. Nicewicz, C. A. Russell, W. Y. Wang, L. B. Freeman, P. Hosier, L. E. LaFave, J. L. Walsh, J. M. Orro, G. J. Unger, J. M. Ross, T. J. O'Gorman, B. Messina, T. D. Sullivan, A. J. Sykes, H. Yourke, T. A. Enger, V. R. Tolat, T. S. Scott, A. H. Taber, R. J. Sussman, W. A. Klein, and C. W. Wahaus. Ibm experiments in soft fails in computer electronics (1978--1994). IBM Journal of Research and Development, 40(1):3--18, 1996. Google ScholarDigital Library
Index Terms
- Implicit-storing and redundant-encoding-of-attribute information in error-correction-codes
Recommendations
Free-p: A Practical End-to-End Nonvolatile Memory Protection Mechanism
Free-p—fine-grained remapping with error checking and correcting (ECC) and embedded pointers—remaps worn-out nonvolatile RAM (NVRAM) blocks at a fine granularity without requiring large dedicated storage and protects NVRAM against both hard and soft ...
Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime
ICCD '12: Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)With the continued scaling of NAND flash and multi-level cell technology, flash-based storage has gained widespread use in systems ranging from mobile platforms to enterprise servers. However, the robustness of NAND flash cells is an increasing concern, ...
Frugal ECC: efficient and versatile memory error protection through fine-grained compression
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisBecause main memory is vulnerable to errors and failures, large-scale systems and critical servers utilize error checking and correcting (ECC) mechanisms to meet their reliability requirements. We propose a novel mechanism, Frugal ECC (FECC), that ...
Comments