Abstract
Reliability is a critical issue for memories. Radiation particles that hit the device can cause errors in some cells, which can lead to data corruption. To avoid this problem, memories are protected with per-word error correction codes (ECCs). Typically, single-error correction and double-error detection (SEC-DED) codes are used. As technology scales, errors caused by radiation particles on memories tend to affect more than one cell—what is known as a multiple cell upset (MCU). To ensure that only a single cell is affected in each word, interleaving is used. With interleaving, cells that belong to the same word are placed at a sufficient distance such that an MCU will only affect a single cell on each word. The use of interleaving significantly increases the cost of the device. Also, determining the interleaving distance (ID) required to avoid MCUs causing double errors is not trivial. Typically, accelerated radiation experiments with a limited number of particle hits are used. They provide a lower bound on the required ID, but larger MCUs may occur with a low probability. But even if the percentage of such large MCUs is very low, the impact on reliability can be significant. This article presents a technique to mitigate the effects of large MCUs that is, those that exceed the ID, on memory reliability. The proposed approach is able to correct most double errors caused by large MCUs by exploiting the locality of the errors within an MCU.
- Baeg, S., Wen, S., and Wong, R. 2009. Interleaving distance selection with a soft error failure model. IEEE Trans. Nuclear Sci. 56, 4, 2111--2118.Google ScholarCross Ref
- Chen, C. L. and Hsiao, M. Y. 1984. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Res. Dev. 28, 2, 124--134. Google ScholarDigital Library
- Dutta, A. and Touba, N. A. 2007. Multiple bit upset tolerant memory using a selective cycle avoidance based SEC-DED-DAEC code. In Proceedings of the IEEE VLSI Test Symposium. IEEE Los Alamitos, CA, 349--354. Google ScholarDigital Library
- Lawrence, R. K. and Kelly, A. T. 2008. Single event effect induced multiple-cell upsets in a commercial 90 nm CMOS digital technology. IEEE Trans. Nuclear Sci. 55, 6, 3367--3374.Google ScholarCross Ref
- Maiz, J., Hareland, S., Zhang, K., and Armstrong, P. 2003. Characterization of multi-bit soft error events in advanced SRAMs. In Proceedings of the IEEE International Electron Devices Meeting. IEEE, Los Alamitos, CA, 21.4.1--21.4.4.Google Scholar
- Radaelli, D., Puchner, H., Wong, S., and Daniel, S. 2005. Investigation of multi-bit upsets in a 150 nm technology SRAM device. IEEE Trans. Nuclear Sci. 52, 6, 2433--2437.Google ScholarCross Ref
- Reviriego, P., Maestro, J. A., Baeg, S., Wen, S., and Wong, R. 2010. Protection of memories suffering MCUs through the selection of the optimal interleaving distance. IEEE Trans. Nuclear Sci. 57, 4, 2124--2128.Google ScholarCross Ref
- Reviriego, P., Maestro, J. A., and Cervantes, C. 2007. Reliability analysis of memories suffering multiple bit upsets. IEEE Trans. Device Materials Reliability 7, 4, 592--601.Google ScholarCross Ref
- Richter, M., Oberlaender, K., and Goessel, M. 2008. New linear SED-DED codes with reduced triple bit error miscorrection probability. In Proceedings of the 14th IEEE International On-Line Testing Symposium (IOLTS). IEEE, Los Alamitos, CA, 37--42. Google ScholarDigital Library
- Satoh, S., Tosaka Y., and Wender, S. A. 2000. Geometric effect of multiple-bit soft errors induced by cosmic ray neutrons on DRAMs. IEEE Electron Device Lett. 21, 6, 310--312.Google ScholarCross Ref
- Saleh, A. M., Serrano, J. J., and Patel, J. H. 1990. Reliability of scrubbing recovery-techniques for memory systems. IEEE Trans. Reliability 39, 1, 114--122.Google ScholarCross Ref
- Tipton, D., Pellish, J. A., Reed, R. A., Schrimpf, R. D., Weller, R. A., Mendenhall, M. H., Sierawski, B., Sutton, A. K., Diestelhorst, R. M., Espinel, G., Cressler, J. D., Marshall, P. W., and Vizkelethy, G. 2006. Multiple-bit upset in 130 nm CMOS technology. IEEE Trans. Nuclear Sci. 53, 6, 3259--3264.Google ScholarCross Ref
- Yang, G. C. 1995. Reliability of semiconductor RAMs with soft-error scrubbing techniques. IEE Proc. Comput. Digital Tech. 142, 5, 337--344.Google ScholarCross Ref
Index Terms
- Mitigating the effects of large multiple cell upsets (MCUs) in memories
Recommendations
Efficient error detection codes for multiple-bit upset correction in SRAMs with BICS
Memories are one of the most widely used elements in electronic systems, and their reliability when exposed to Single Events Upsets (SEUs) has been studied extensively. As transistor sizes shrink, Multiple Bits Upsets (MBUs) are becoming an increasingly ...
Reliability analysis of memories protected with BICS and a per-word parity bit
This article presents an analysis of the reliability of memories protected with Built-in Current Sensors (BICS) and a per-word parity bit when exposed to Single Event Upsets (SEUs). Reliability is characterized by Mean Time to Failure (MTTF) for which ...
A multiple bit upset tolerant SRAM memory
SRAMs are used nowadays in almost every electronic product. However, as technology shrinks transistor sizes, single and multiple bit upsets only observable in space applications previously are now reported at ground level. This article presents a high ...
Comments