Optimal sequencing of warm standby elements

https://doi.org/10.1016/j.cie.2013.05.001Get rights and content

Highlights

  • Mission cost and reliability model for warm standby systems is developed.

  • Time-to-failure distributions of elements are taken into account.

  • The method uses discrete approximation of the distributions.

  • An algorithm for optimal sequencing of standby elements is suggested.

Abstract

Warm standby redundancy has been used as an effective design technique for improving the reliability of a system while achieving the compromise between restoration cost and operation cost of standby elements. This paper considers the optimal standby element sequencing problem (SESP) for 1-out-of-N: G heterogeneous warm-standby systems. Given the desired redundancy level and a fixed set of element choices, the objective of the optimal system design is to select the initiation sequence of the system elements so as to minimize the expected mission cost of the system while providing a certain level of system reliability. Based on a discrete approximation of time-to-failure distributions of the system elements, the system reliability and expected mission cost are evaluated using an iterative procedure. A genetic algorithm is used as an optimization tool for solving the formulated SESP problem for 1-out-of-N: G warm-standby systems with non-identical elements. As illustrated through examples, results generated using the suggested methodology can facilitate the system reliability versus cost trade-off study, which can further assist in the decision making about the best standby policy for fault-tolerant system designs.

Introduction

Different types of standby redundancy techniques, typically hot, cold and warm standby, have been used in various important applications to achieve fault-tolerance and high system reliability (Gnedenko et al., 1969, Johnson, 1989, Shen and Xie, 1991). Examples of applications include satellite systems, power systems, aerospace systems, telecommunication systems, and distributed computing systems (Amari and Pham, 2010, Amari et al., 2012, Pandey et al., 1996, Pham et al., 1995, Sklaroff, 1976).

In hot-standby systems, a standby element works in synchrony with the primary online unit and is ready to take over at any time. On the one hand, hot standby redundancy can provide fast restoration in the case of failures. On the other hand, maintaining the redundant elements in the hot operational state is usually costly as the working elements consume energy and materials. In addition, because the hot standby elements are fully exposed to working stresses, they can fail even before they are used. This standby technique is generally used for applications where the recovery time is critical. For cold-standby systems, a standby element is unpowered and does not operate until needed to replace a faulty on-line unit (Van Gemund and Reijns, 2012, Xing et al., 2012b). As compared to hot-standby, cold standby elements require long restoration delays when they are needed to operate as a substitute for a failed online element. But keeping the redundant elements in the cold standby mode is almost costless. Also, since the cold-standby elements are shielded from the working stresses associated with system operation, their failure rates can be assumed to be zero. The cold-standby technique is commonly used in applications where energy consumption is critical.

The warm-standby technique compromises the hot and cold in terms of fast recovery (i.e., low restoration cost) and energy conservation (i.e., low operational cost), where an element while in the standby mode is partially powered and partially exposed to operational stresses (Ruiz-Castro and Fernández-Villodre, 2012, Zhang et al., 2006). Therefore, the failure rate of a warm-standby element is typically less than its full operational failure rate. In other words, for warm-standby systems, the standby elements have time-dependent failure behavior; they have different failure rates before and after they are used to replace the on-line faulty units (Amari and Pham, 2010, Amari et al., 2012).

Examples of the warm standby systems are redundant hard disks used to replace the failed disks in a storage system. The spare disks are spinning and, thus, can be exposed to operation stresses. On the other hand, the warm standby disks do not provide the access to information and, therefore their positioning mechanisms are idle which makes the disks in the standby mode less failure prone than in the operation mode. Another example of the warm standby systems is a power plant in which extra generating units are waiting in the standby mode. The standby units can fail, but their failure rates as well as exploitation costs are less than those for the primary unit working under the full load and consuming more energy and materials. Wireless sensors network also uses warm standby redundancy to keep a balance between energy consumption and recovery time needed for switching the backup sleeping sensor to operation mode.

Both hot and cold standby are essentially special cases of the warm-standby model. Therefore, in this work, we focus on the generic warm-standby systems.

Consider a 1-out-of-N: G warm standby system with non-identical elements, the order in which the standby elements are initiated heavily affects the system reliability as well as mission cost (associated with energy consumption, standby and operation maintenance, startup cost etc). Therefore, it is important to solve the optimal standby element sequencing problem (SESP), especially when considering the high competition to provide an economical system design with the limited resources. Given the desired redundancy level (i.e., value of N) and a fixed set of element choices, the objective of SESP is to select initiation sequence of the system elements so as to minimize the expected system mission cost while providing a desired level of system reliability. In this paper, we first formulate and solve the SESP problem for warm-standby systems using an iterative numerical method integrated with a genetic algorithm. The suggested numerical method, which is based on a discrete approximation of time-to-failure distributions of the system elements, is used for evaluating the system reliability and expected mission cost. The method simplifies the similar universal generating function approach first suggested in (Levitin & Amari, 2010). The genetic algorithm is utilized as an optimization tool for solving the formulated optimization problem for the 1-out-of-N: G heterogeneous warm-standby system.

Considerable research efforts have been dedicated to formulating and solving optimization problems such as redundancy allocation and reliability allocation problems for standby systems (e.g., Gen and Yun, 2006, Kuo et al., 2001, Kuo and Wan, 2007). For example, exact methods like dynamic programming and integer programming were proposed for solving the redundancy allocation problem (RAP) of 1-out-of-N: G homogeneous hot-standby series-parallel systems, where one type of elements can be substituted only by the same type of elements (Fyffe et al., 1968, Misra and Sharma, 1991). Meta-heuristic methods like genetic algorithm and Tabu search were proposed to solve the RAP for 1-out-of-N: G or K-out-of-N: G heterogeneous hot-standby series-parallel systems, where one type of elements can be substituted with a different type of functionally-equivalent elements to achieve fault tolerance (Chen and You, 2005, Chia and Smith, 2004, Coit and Smith, 1996, Onishi et al., 2007). In (Coit & Smith, 2002), the genetic algorithm was also used to solve the RAP for 1-out-of-N: G hot-standby series-parallel systems with uncertain component Weibull scale parameters. In (Coit, 2001), an integer programming-based method was proposed for solving the redundancy optimization problem for 1-out-of-N: G cold-standby series-parallel systems; it was further extended for series-parallel systems with combined redundancies where each subsystem involves either hot or cold standby redundancy in (Coit, 2003, Coit and Liu, 2000). Another work that addressed series-parallel systems with combined redundancies is (Chambari, Rahmati, Najafi, & Karimi, 2012), where the RAP was solved in bi-objective reliability-cost formulation using multi-objective version of the genetic algorithm. In (Zhao & Liu, 2005), a hybrid algorithm based on the genetic algorithm and fuzzy theory was proposed to solve the redundancy optimization problem for 1-out-of-N: G cold-standby series-parallel systems with fuzzy element time-to-failure. While most of optimization efforts were made on hot or cold standby systems, there are few works for the optimization of warm-standby systems. For example, in (Amari & Dill, 2010) a binary integer programming algorithm was proposed for solving the redundancy optimization problem of K-out-of-N: G warm-standby series-parallel systems. In (Tannous, Xing, Peng, Xie, & Ng, 2011) two optimization methodologies, respectively based on genetic algorithm and integer programming were investigated for solving the RAP of warm-standby systems.

All the above-mentioned works dealing with the standby redundancy optimization considered system reliability – cost relationship, where the cost of each element choice was given as a constant and the system mission cost was modeled as a linear combination of cost of elements selected for the system design. In practice, however the element operation cost and thus the whole mission cost depend on factors such as energy consumption and materials associated with life-time of used elements. Moreover, the mission cost may depend on the initiation sequence of standby elements when they are not identical. None of the existing works have addressed practical modeling of mission cost considering life-time or time-to-failure of elements or have considered the optimal SESP that minimizes the expected mission cost for warm-standby systems.

The remainder of the paper is organized as follows: Section 2 describes the system we consider and presents the suggested iterative numerical method for evaluating the system reliability and expected mission cost of 1-out-of-N: G heterogeneous warm standby systems. Section 3 presents examples and results for illustrating the proposed numerical method. Section 4 presents the problem formulation and solution to examples for the considered optimization problem. Lastly, Section 5 gives conclusion as well as directions for future work.

Section snippets

The model

The system consists of N elements with one being primary online and the other N  1 elements waiting in the warm standby mode before their initiation (transfer from the warm standby to the operation mode). Specifically, the first element is initiated at the beginning of the mission. The N  1 standby elements are in the warm standby mode from the beginning of the mission and are initiated in a predetermined order; a standby element is initiated immediately after the failure of the previous element.

Example of Mission Reliability and Expected Cost Evaluation

Consider a 1-out-of-10 warm standby system consisting of elements characterized by Weibull time-to-failure distributions with parameters presented in Table 1. The operation cost (per time unit) for WS and operation modes and the startup cost are also presented in this table. The mission time is τ = 400. The obtained system reliability and expected mission cost for elements initiated in the increasing numerical order are R = 0.907 and C = 16176, respectively.

As mentioned in Introduction, both hot and

Problem formulation

The optimal element sequencing problem for the warm standby system is formulated as follows. Find the initiation sequence s(1), s(2),  , s(N) of the elements that minimizes the expected mission cost subject to providing a desired level R* of the system reliability:minCs.t.RR.Finding the optimal element initiation sequence is a complicated combinatorial optimization problem having N! possible solutions. An exhaustive examination of all these solutions is not realistic for a large number of

Conclusions and future work

Both fast recovery and low energy and material consumption (or, in general, low operation cost) are important design objectives for fault-tolerant systems in many application areas such as power systems and telecommunication systems. The warm-standby model provides an effective solution to trade-off those two conflicting design goals. In the heterogeneous 1-out-of-n: G warm-standby systems with non-identical elements, the order in which the elements are initiated affects the system mission cost

Acknowledgement

This research is partially supported by Natural Science Foundation of China (No.61170042), the Fundamental Research Funds for the Central Universities (No. ZYGX2011Z001), and Key Technology R&D Program of Jiangsu Province (No. BE2012029)

References (37)

  • T. Zhang et al.

    Availability and reliability of k-out-of-(M+N): G warm standby systems

    Reliability Engineering and System Safety

    (2006)
  • R. Zhao et al.

    Standby redundancy optimization problems with fuzzy lifetimes

    Computers and Industrial Engineering

    (2005)
  • A.A. Alhadeed et al.

    Optimal simple step-stress plan for cumulative exposure model using log-normal distribution

    IEEE Transactions on Reliability

    (2005)
  • Amari, S. V. & Dill, G. (2010). Redundancy optimization problem with warm-standby redundancy. In Proceedings annual...
  • Amari, S. V., Misra, K. B. & Pham, H. (2008). Tampered failure rate load-sharing systems: Status and perspectives. In...
  • S.V. Amari et al.

    A new insight into k-out-of-n warm standby model

    International Journal of Performability Engineering

    (2010)
  • S.V. Amari et al.

    Reliability characteristics of k-out-of-n warm standby systems

    IEEE Transactions on Reliability

    (2012)
  • L.Y. Chia et al.

    An ant colony optimization algorithm for the redundancy allocation problem (RAP)

    IEEE Transactions on Reliability

    (2004)
  • Cited by (75)

    • Heterogeneous 1-out-of-n standby systems with limited unit operation time

      2022, Reliability Engineering and System Safety
      Citation Excerpt :

      We put forward a new event transition-based numerical algorithm for evaluating the MSP of the considered standby system. Since the unit activation sequence (UAS) adopted can affect the standby system's MSP significantly [39], we formulate and solve the optimal UAS problem to maximize the MSP. Further, we formulate and solve the joint optimal product resource distribution and UAS problem and the optimal loading problem.

    View all citing articles on Scopus

    This manuscript was processed by Area Editor Min Xie.

    View full text