Application-Level Error Measurements for Network Processors

Arindam MALLIK
Matthew C. WILDRICK
Gokhan MEMIK

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E88-D    No.8    pp.1870-1877
Publication Date: 2005/08/01
Online ISSN: 
DOI: 10.1093/ietisy/e88-d.8.1870
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Recent Advances in Circuits and Systems--Part 2)
Category: Communications and Wireless Systems
Keyword: 
fault tolerance,  network processors,  

Full Text: PDF(269.5KB)>>
Buy this Article



Summary: 
Faults in computer systems can occur due to a variety of reasons. These include internal effects such as coupling and external effects such as alpha particles. As we move towards smaller manufacturing technologies, the probability of errors for a single transistor is likely to increase. Even if this probability remains the same, the probability of a fault in a processor will increase linearly with the boost in the number of transistors per chip. In many systems, an error has a binary effect, i.e., the output is either correct or erroneous. However, networking systems exhibit different properties. For example, although a portion of the code behaves incorrectly due to a fault, the application can still work correctly. Therefore, measuring the effects of faults on the network processor applications require new measurement metrics to be developed. Particularly, hardware faults need to be measured in the context of their effect on the application behavior. In this paper, we highlight essential application properties and data structures that can be used to measure the error behavior of network processors. Using these metrics, we study the error behavior of seven representative networking applications under different cache access fault probabilities. With this study, we hope to bridge the gap between the circuit-level phenomena and their impact on the application behavior.


open access publishing via