The design and analysis of AVTMR (all voting triple modular redundancy) and dual–duplex system

https://doi.org/10.1016/j.ress.2004.08.012Get rights and content

Abstract

In this paper, we design AVTMR (All Voting Triple Modular Redundancy) and dual–duplex system which have a fault-tolerant characteristic, and two systems are compared in the evaluation of RAMS (Reliability, Availability, Maintainability and Safety) and MTTF (Mean Time To Failure).

AVTMR system is designed in a triplicated voter technique and dual–duplex system in a comparator, and two systems are based on MC68000. To evaluate system characteristic, Markov modeling method is designed for reliability, availability, safety and MTTF (Mean Time To Failure), and RELEX6.0 tool is used for the calculation of failure rate of electrical components that is based on MILSPEC-217F.

In this paper, we can see two systems are more high dependability than a single system, and AVTMR or dual–duplex system can be selected for a specific application system. Especially, because AVTMR and dual–duplex system have high RAMS better than a single system, they can be applied to life critical system such as an airplane and a high-speed railway system.

Introduction

As the industry is developed, a fault-tolerant system with high reliability and availability is required. This development of a fault-tolerant system is necessary one for the study of failures, and it is known that fault, error, and failure have some close relation in system problem; a fault can lead an error, and vice-versa [1]. So, faults have studied to enhance the reliability of the system to block the failure of a system. It was found as two approaches: the first is fault avoidance and the second, fault tolerance. The former is the technique of making a complete system by testing it and enhancing the quality of electronic components. Because components may develop faults as time goes on, this technique is very difficult to be applied. So, a complete fault avoidance system may be impossible for some systems. But with the latter, even though the fault happened in the system, a formal operation continues, and so the fault tolerant system is more effective for cost and development than the fault avoidance system. Usually, the fault tolerant system has a redundancy and a fault is allowed without stopping its normal operation. In this method, there are hardware redundancy, software redundancy, time redundancy and information redundancy techniques.

The hardware fault tolerant system is applied to a time critical system better than a software fault tolerant one. The hardware fault tolerant system votes or compares data in address or data bus level, but a software fault tolerant system performs these functions on a system level.

NASA developed the FTMP [2] to apply to commercial airplanes with the hardware fault-tolerant technique and SIFT [3] with the software fault tolerant technique.

The proposed system in this paper is the technique of hardware redundancy. In hardware redundancy, there are passive hardware redundancy, active hardware redundancy and hybrid hardware redundancy. The passive hardware redundancy system has the characteristic of fault masking so that the fault is not detected and the system is operating correctly. The active hardware redundancy system has the characteristic of fault detection, fault location and fault recovery. The hybrid hardware redundancy system has elements of passive and active hardware redundancy.

AVTMR and dual–duplex system in this paper has an active hardware redundancy [6].

The proposed AVTMR system has a characteristic of fault masking and fault detection that can verify what problem system has. So, it is not a complete active system, but it has active system characteristics.

The proposed dual–duplex system has an active hardware redundancy. The active hardware redundancy system is divided into a cold standby system, hot standby system and warm standby system according to the status of system operation [8].

Among these characteristics, the hot standby system has the fastest reconfiguration time. The standby system with a comparator is widely used in high reliability, availability, and safety systems [9].

AVTMR and dual–duplex system are compared with the single system, and to evaluate these system, the failure rate of electrical components is calculated for the electrical components which are used in each system with RELEX6.0 [5]. Markov modeling equation is calculated in Matlab and Mathematica for the evaluation of RAMS (Reliability, Avaliability, Maintainability and Safety) and MTTF.

The calculation of failure rate for electrical components is based on MILSPEC-217F standard [4]. The designed each system is based on MC68000 [7].

Section snippets

AVTMR system design

In fault-tolerant design technique, there are passive hardware redundancy, active hardware redundancy and hybrid hardware redundancy. AVTMR system is the passive hardware redundancy, which has a fault masking and detection. When one fault is injected in the system, AVTMR, which has a majority voter, is operated correctly. The reason is that a majority voter compares three inputs and the majority data with three inputs is outputted. So, if the AVTMR system has one fault, the fault is masking and

Design of dual–duplex system

The dual–duplex system is a hot standby system and two MC68000 CPU are on a board operated with a common clock. The data of each CPU are compared at ‘read’ or ‘write’ point by the comparator which is designed in ALTERA(EPM7128LC84) by exclusive OR. The dual–duplex system is comprised of two dual systems. The picture of the dual CPU board is shown in Fig. 3. The dual CPU board is VMEbus compatible and two CPUs are on the same board. Address bus, data bus and control bus are compared in

The calculation of the failure rate

The failure rate is the most important element to evaluate the reliability, availability, safety and MTTF (Mean Time To Failure). The failure rate is represented in Eq. (1).

The failure rate is calculated on MIL-HDBK-217F and the RELEX tool is used for the calculation of the failure rate (Table 1).λ=πLπQ(C1πT+C2πE)πPfailurespermillionhoursThe failure rate of commercial and MILSPEC components are calculated and the system evaluation is compared for each failure rates.

The calculated failure rate

System modeling

Makov modeling technique is used to evaluate dual–duplex system. Markov modeling provides the probabilistic system model as the system's state transition. The state transition of system failure is represented in a discrete time model and system reliability and availability are evaluated.

In this paper, we construct our transition using two assumptions for Markov Modeling.

  • (1)

    Only one failure will occur at a time.

  • (2)

    The system starts in the perfect operation where all of the system's modules are

Reliability

Reliability of single system (SS), AVTMR (All Voting Triple Modular Redundancy) and dual–duplex (DD) is shown in Fig. 11, Fig. 12. In figure, m means military component system and c commercial. As you can see, the reliability of military component system is higher than commercial in Fig. 11, Fig. 12.

The difference between Fig. 11, Fig. 12 is fault coverage. In dual–duplex system, if a fault is detected, system state is changed into standby system. But, AVTMR system does not give a serious

Conclusion

In this paper, single system, AVTMR system, and dual_duplex system is designed and compared for RAMS (Reliability, Availability, Maintainability and Safety).

Totally, when we can see, if dual_duplex system has a high fault coverage, it is the best characteristic in RAMS. As it was, the good quality for the system can be easily achieved by standby characteristic. But, dual_duplex system needs a lot of times to be developed and a lot of money. For example, dual_duplex system needs more electrical

References (11)

  • B.w. Johnson

    Design and analysis of fault tolerant digital systems

    (1989)
  • A.L. Hopkins et al.

    FTMP—a highly reliable fault-tolerant multiprocessor for aircraft

    Proc IEEE

    (1978)
  • J.H. Wensley

    SIFT:design and analysis of a fault tolerant computer for aircraft control

    Proc IEEE

    (1978)
  • Military handbook 217F. USA: Department of...
  • RELEX 6.0 user guide. USA: RELEX Corporation;...
There are more references available in the full text version of this article.

Cited by (44)

  • Architecture for safety–critical transportation systems

    2023, Microprocessors and Microsystems
  • A sequence-based method for dynamic reliability assessment of MPD systems

    2021, Process Safety and Environmental Protection
  • Safety-based availability assessment at design stage

    2014, Computers and Industrial Engineering
    Citation Excerpt :

    However, they determine for a particular system, average operational availability from technical point of view. Kim, Lee, and Lee (2005), to improve availability for high risk of production loss systems, proposed the components redundancy strategy. Juang, Lin, and Kao (2008) presented a solution for availability integration in design process.

  • Performance evaluation of subsea BOP control systems using dynamic Bayesian networks with imperfect repair and preventive maintenance

    2013, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    The results were compared with those obtained by means of Monte Carlo simulations based on Petri net models. Kim et al. developed all voting triple modular redundancy system, dual-duplex system and double 2-out-of-2 system, and assessed the reliability with respect to fault coverage by using discrete-time Markov modeling technique (Kim et al., 2005; Wang et al., 2007). Parashar and Taneja (2007) presented a PLC hot standby system based on master–slave concept and two types of repair facilities (ordinary repairman and expert repairman), and evaluated the reliability and profit by using semi-Markov processes.

  • Operational reliability analysis of remote operated vehicle based on dynamic Bayesian network synthesis method

    2024, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability
View all citing articles on Scopus
View full text