Skip to main content
Log in

Fault-tolerant systems with concurrent error-locating capability

  • Correspondence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Fault-tolerant systems have found wide applications in military, industrial and commercial areas. Most of these systems are constructed by multiple-modular redundancy or error control coding techniques. They need some fault-tolerant specific components (such as voter, switcher, encoder, or decoder) to implement error-detecting or error-correcting functions. However, the problem of error detection, location or correction for fault-tolerance specific components themselves has not been solved properly so far. Thus, the dependability of a whole fault-tolerant system will be greatly affected. This paper presents a theory of robust fault-masking digital circuits for characterizing fault-tolerant systems with the ability of concurrent error location and a new scheme of dual-modular redundant systems with partially robust fault-masking property. A basic robust fault-masking circuit is composed of a basic functional circuit and an error-locating corrector. Such a circuit not only has the ability of concurrent error correction, but also has the ability of concurrent error location. According to this circuit model, for a partially robust fault-masking dual-modular redundant system, two redundant modules based on alternating-complementary logic consist of the basic functional circuit. An error-correction specific circuit named as alternating-complementary corrector is used as the error-locating corrector. The performance (such as hardware complexity, time delay) of the scheme is analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Hu M. Computer Fault Tolerant Techniques. Beijing: China Railway Press, 1995 (in Chinese)

    Google Scholar 

  2. Lala P K. Fault-Tolerant and Fault-Testable Hardware Design. NJ: Prentice-Hall, 1985.

    Google Scholar 

  3. Rao T R N, Fujiwara E. Error-Control Coding for Computer Systems. NJ: Prentice Hall, 1989.

    Google Scholar 

  4. Lo J C, Kitakami M, Fujiwara E. Reliable logic circuits with byte error control codes — A feasibility study. InProc. IEEE 1996 Int. Symp. Defect and Fault Tolerance in VLSI Systems, Boston, Oct., 1996, pp.286–294.

  5. Barbour A E, Wojcik A S. A general, constructive approach to fault-tolerant design using redundancy.IEEE Trans. Computers, 1989, 38(1): 15–29.

    Article  MATH  MathSciNet  Google Scholar 

  6. Lo J C. Highly reliable systems with differential built-in current sensors. InProc IEEE 1998 Int. Symp. Defect and Fault Tolerance in VLSI Systems, Austin, Nov., 1998, pp.261–269.

  7. Schwab T E, Yau S S. An algebraic model of fault-masking logic circuits.IEEE Trans. Computers, 1983, 32(9): 809–825.

    Article  MATH  Google Scholar 

  8. Stroud C E, Tannehill J K. Applying built-in self-test to majority voting fault tolerant circuits. InProc. 16th IEEE Test VLSI Symposium, Monterey, Apr., 1998, pp.303–308.

  9. Gaitanis N. Design of TSC N-modular redundancy systems. InProc 2nd Int. Conf. Supercomputing, Vol. III, San Francisco, May, 1987, pp. 238–244.

  10. Gaitanis N. The design of TSC error C/D circuits for SEC/DED codes.IEEE Trans. Computers, 1988, 37(3): 258–265.

    Article  MathSciNet  Google Scholar 

  11. Gaitanis N. The design of totally self-checking TMR fault-tolerant systems.IEEE Trans. Computers, 1988, 37(11): 1450–1454.

    Article  MATH  MathSciNet  Google Scholar 

  12. Gaitanis N, Paschalis A, Gizopoulos D, Kostarakis P. A new totally self-checking reconfigurable duplication system. InProc. Int. Workshop on Computer-Aided Design, Test, and Evaluation for Dependability, July, 1996, Beijing: International Academic Publishers, pp.264–268.

    Google Scholar 

  13. Jiang J H, Shi H N, Min Y H, Zhao X D. A novel NMR structure with concurrent output error location capability. InProc. 1999 Pacific Rim Int. Symp. Dependable Computing, Los Alamitos: IEEE Computer Society, Hong Kong, Dec., 1999, pp.32–39.

    Google Scholar 

  14. Jiang J H. Alternating-complementary locator and its use for error location in dual-modular redundancy with comparison structure.Journal of Computer Research and Development, 2001, 38(6): 754–764. (in Chinese)

    Google Scholar 

  15. Jiang J H, Hu M. The extended self-checking properties of alternating-complementary logic systems. InProc. Int. Workshop on Computer-Aided Design, Test, and Evaluation for Dependability, Beijing: International Academic Publishers, July, 1996, pp.258–263.

    Google Scholar 

  16. Lubaszewski M, Courtois B. A reliable fail-safe system.IEEE Trans. Computers, 1998, 47(2): 236–241.

    Article  Google Scholar 

  17. Jiang J H, Min Y H, Shi H B. The concepts and basic structure of concurrent error location for digital circuits.Journal of Computer Research and Development, 2000, 37(5): 532–542 (in Chinese)

    Google Scholar 

  18. Jiang J H, Min Y H, Shi H B. A theory of extended fault-masking digital circuits with concurrent error detection capability. InProc. 6th Int. Conf. Computer-Aided Design and Computer Graphics, Shanghai: Wen Hui Publishers, Dec., 1999, pp.696–697.

    Google Scholar 

  19. Liu M Y, Zhang D X, Ye M L, Li Y. The Theory of High-Level Synthesis for Application Specific Integrated Circuits. Beijing: Beijing Institute of Technology Publishing House, 1998. (in Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to JianHui Jiang.

Additional information

This work was supported by the Shanghai Academic Young Teachers Foundation of the Shanghai Education Commission under Grant No.95QD18, and now supported by the National Natural Science Foundation of China under Grant Nos.90207021, 69733010 and 69873010.

JIANG JianHui received his B.E., M.E. and Ph.D. degrees in traffic information engineering and control from Shanghai Tiedao University (in April 2000, it was merged to Tongji University) in 1985, 1988, and 1999, respectively. In September 2000, he joined Fudan University as a part-time Postdoctoral Research Fellow. He is currently a professor of computer science and technology at Tongji University. His research interests include fault-tolerant computing, digital system design and testing, hardware and software codesign, performance evaluation of computer systems, and distributed computing.

MIN YingHua graduated from Mathematics Department, Jilin University in 1962, and visited some US universities for years. He is a professor of computer science at the Institute of Computing Technology, Chinese Academy of Sciences, a guest professor at Hunan University, and the Chair of Technical Committee on Fault-Tolerant Computing, China Computer Federation. His research interests include IC design and test, fault-tolerant computing, software reliability. He is a fellow of IEEE, and a member of ACM.

PENG ChengLian graduated from Department of Mathematics, Fudan University, in 1964. He is currently a professor of computer science and technology at Fudan University. His research interests include CAD of digital systems, fault-tolerant computing and embedded computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, J., Min, Y. & Peng, C. Fault-tolerant systems with concurrent error-locating capability. J. Comput. Sci. & Technol. 18, 190–200 (2003). https://doi.org/10.1007/BF02948884

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02948884

Keywords

Navigation