A fault-tolerant approach to test control utilizing dual-redundant processors
Introduction
The goal of the research described herein was to define a less expensive while still simple approach for fault-tolerant test management and data acquisition. In many developmental and industrial environments a need exists for test management software that can maintain data acquisition and output integrity in the event of test control computer failure. Often loss of output signals to the test hardware can cause damage to the system under test or require repeating an entire test at substantial expense. Current redundancy methods typically involve three or more voting computers, often running separately developed software, interconnected by specially designed circuitry. Examples are described by Bolduc [1] and IEEE [2]. Such systems are usually uniquely designed for a specific application and are therefore expensive and time-consuming to deploy. This research investigates methods for achieving an acceptable level of fault tolerance in uniquely developed new systems while reducing the number of computers needed, without the need for completely independently developed software. Analytical methods previously developed by Wang et al. [3] and Wattanapongsakorn and Levitan [4] are used to assess the level of reliability of the chosen approach and the quantitative economic benefit attainable. A system capable of providing proof-of-concept has been developed using off-the-shelf personal computers connected using a serial interface. It consists of two computers, a primary computer and a hot standby, connected through the existing two channel analog cards to a system under test emulator. No known test automation systems embody all of these features at an equivalent cost. The emulator in the proof-of-concept system is dynamically representative of the NASA Tether Deployment Test Facility as used in testing the ProSEDS tethered satellite deployer controller.
Section snippets
Background
To provide an understanding of the capabilities of current test control systems and their drawbacks, this section includes a survey of current designs presently in widespread use. Several applications that could benefit from fault-tolerant control systems with new features (including lower cost) have been identified, and mutation testing for achieving higher reliability without increasing cost is discussed.
An automation and control systems literature survey revealed few examples offering fault
Description of research
To obtain the highest level of test control reliability at minimal cost, an innovative approach to achieving fault tolerance utilizing readily available off-the-shelf technologies in the implementation has been developed. (NOTE: the redundant software is not itself off-the-shelf; however, the hardware employed and the software development environments employed are off-the-shelf. This should be distinguished from the Simplex architecture, where the redundant software itself is often
The proof-of-concept implementation
The validity and practicality of the proposed fault-tolerant approach has been verified by analysis of test data acquired from a prototype constructed in accordance with the block diagram in Fig. 1. In this implementation, two desktop computers are used, with the output of each analog board fed into two analog inputs on the analog card of a hardware-under-test emulator. It receives control signals from both systems, switching to the backup if so directed or in response to a loss of primary
Fault tolerance cost/benefit comparative analysis
The only disadvantage of fault-tolerant test control systems relative to single computer controllers is the higher cost. A cost/benefit analysis [21] has been performed to determine which applications might benefit from a fault-tolerant approach and the relative cost savings attainable. This analysis compares the reliability of the proposed dual-redundant approach with that of a triple-modular-redundant design, illustrating that the former is almost as dependable as the latter. Parameters
Proof-of-concept testing by bug injection in system under test
Extensive testing of the proof-of-concept prototype has been performed by applying primarily the “weak mutation” method discussed in Section 2 to the system-under-test software model. The development environments employed in implementation of the prototype and model have excellent interactive GUI and debugging facilities, making examination of internal data structures required for weak mutation testing a simple matter. This approach is advantageous in several respects, offering greater
Conclusions
As a result of the analysis and testing described herein, it is possible to conclude that duplex systems running newly developed software that incorporate real-time self-testing and mutual reasonableness checking of outputs represent a viable alternative to more expensive triplex systems, especially for supervised or non-life-critical applications or as an upgrade alternative for existing non-fault-tolerant control systems. A proof-of-concept system employing the simple fault-tolerant approach
Acknowledgements
The work presented in this paper would not have been possible without the assistance of several people whose contributions deserve to be recognized. We would also like to thank the National Aeronautics and Space Administration for providing inspiration and financial support; in particular, the Propulsive Small Expendable Deployer System (ProSEDS) ground testing project at Marshall Space Flight Center. We would also like to thank Dr. Nellie Maulsby and Ms. Cynthia McPherson for their assistance
References (22)
X-33 redundancy management system
IEEE Aerospace Electron Syst Mag
(2001)- Smith TJ, Yelverton JN. Processor architecture for fault tolerant avionics systems. In: IEEE/AIAA digital avionics...
- et al.
Determining redundancy levels for fault tolerant real-time systems
IEEE Trans Comput
(1995) - Wattanapongsakorn N, Levitan S. Integrating dependability analysis into the real-time design process. In: Proceedings...
- Sha L, Goodenough JB, Pollak B. Simplex architecture: meeting the challenges of using COTS in high-reliability systems....
- Sha L, Rajkuman R, Gagliardi M. Evolving dependable real-time systems. Technical Report CMS/SEI-95-TR-005, Carnegie...
- Seto D, Sha L. A case study on analytical analysis of the inverted pendulum real-time control system. Technical Report...
- Alliance Systems Inc. 3501 East Plano Parkway, Plano, TX....
- GE-Fanuc Automation Information Center....
- Honeywell Automation Safety Systems....
Cited by (7)
Design and implementation of highly reliable dual-computer systems
2009, Computers and SecurityCitation Excerpt :Fault tolerance is concerned with the continuation of the correct operation of a system despite an internal fault (Avizienis, 1978; Laprie, 1996). Fault tolerance can be achieved by various methods, including redundancies of time (Sohi et al., 1989), information (Ejlali et al., 2006), software (Tsai, 1998; Seba, 2006) and hardware (Kim et al., 2002; Laprie et al., 1990; Mitra et al., 2002; Freydel and Ida, 2006; Freydel, 2004; Zhao and Liu, 2004; Beckman, 1996; Dabney et al., 2008; Bolchini et al., 2002; Shuai et al., 2007; Hua et al., 2006; Kim et al., 2005). Hardware redundancy is more suitable than software redundancy for application to a time-critical system (Kim et al., 2002).
Redundancy issues in software and hardware systems: An overview
2011, International Journal of Reliability, Quality and Safety EngineeringMulti-root I/O virtualization based redundant systems
2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems, SCIS 2014 and 15th International Symposium on Advanced Intelligent Systems, ISIS 2014Virtual-sensor-based maximum-likelihood voting approach for fault-tolerant control of electric vehicle powertrains
2013, IEEE Transactions on Vehicular TechnologyDSP-based sensor fault-tolerant control of electric vehicle powertrains
2011, Proceedings - ISIE 2011: 2011 IEEE International Symposium on Industrial ElectronicsRecovery device for real-time dual-redundant computer systems
2011, IEEE Transactions on Dependable and Secure Computing