Skip to main content
Log in

Trivariate Bernoulli distribution with application to software fault tolerance

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

The widespread reliance on software for mission and life critical applications makes the reliability of these systems essential. Techniques such as fault tolerance have been proposed to achieve the highest levels of software reliability. However, the fault tolerance paradigm suffers from the risk of correlated failures, where a majority of the software versions fail on the same input leading to system failure. This paper derives a trivariate Bernoulli distribution to quantify the negative impact of correlated failures on the reliability of fault tolerant software composed of highly reliable versions. An experiment based on early empirical research demonstrates the capacity of the distribution to conduct reliability assessment for many combinations of the version reliabilities and correlations. The results indicate that correlated failures detract from system reliability, but that this reliability is often higher than a system composed of the single most reliable version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

TVB:

Trivariate Bernoulli Distribution

NVP:

\(N\)-version programming

\(Y\) :

Sum of uncorrelated Poisson random variables

\(X(\lambda )\) :

Poisson random variable with rate \(\lambda \)

\(Z_{i}(p_{i})\) :

Bernoulli random variable with success probability \(p_{i}\,(1\le i\le 3)\)

\({\mathbf{p}}\) :

Vector of multivariate Bernoulli success probabilities

\(I_{\{0\}}(\cdot )\) :

Indicator function of zero

\(\rho _{i,j}\) :

Correlation between Bernoulli variables \(Z_{i}\) and \(Z_j\)

\(\rho _{i,j}^{+}\) :

Upper bound on correlation between \(Z_{i}\) and \(Z_j\)

\(\varvec{\Sigma }\) :

Correlation matrix consisting of entries \(\rho _{i,j}\)

\(\alpha _{i,j}\) :

Poisson rate encoding correlation between \(Z_{i}\) and \(Z_j\)

\(\varvec{\alpha }(k)\) :

Matrix of Poisson encoding in iteration \(k\)

\(\beta _{i,j}\) :

Notational simplification \((\beta _{i,j}=\exp (\alpha _{i,j}))\)

\(\beta _{i,j}^{(1)}\) :

Entry \(\beta _{i,j}\) of matrix encoding in first iteration

\(\varGamma \) :

Set of systems \(\varGamma =\{\mathtt{series}, \mathtt{parallel}, \mathtt{two}\, \mathtt{out}\, \mathtt{of}\, \mathtt{three}\}\)

\(R_{\gamma }\) :

\(s\)-Expected reliability of system \(\gamma \in \varGamma \)

References

  • Alger, L., & Lala, J. (1986). A real time operating system for a nuclear power plant computer. In Proceedings of IEEE Real-Time Systems Symposium (pp. 244–248). New Orleans, LA.

  • Avizienis, A. (1985). The \(n\)-version approach to fault-tolerant software. IEEE Transactions on Software Engineering, 11(12), 1491–1501.

    Article  Google Scholar 

  • Avizienis, A., Lyu, M., & Schutz, W. (1988). In search of effective diversity: A six-language study of fault-tolerant flight control software. In Proceedings of international symposium on fault tolerant computing (FTC 88) (pp. 15–22).

  • Eckhardt, D., & Lee, L. (1985). A theoretical basis for the analysis of multiversion software subject to coincident errors. IEEE Transactions on Software Engineering, 11(12), 1511–1517.

    Article  Google Scholar 

  • Eckhardt, D., Caglayan, A., Knight, J., Lee, L., McAllister, D., Vouk, M., et al. (1991). An experimental evaluation of software redundancy as a strategy for improving reliability. IEEE Transactions on Software Engineeering, 17(7), 692–702.

    Article  Google Scholar 

  • Feller, W. (1968). An introduction to probability and its application (3rd ed.). New York, NY: Wiley.

    Google Scholar 

  • Fiondella, L. (2010). Reliability and sensitivity analysis of coherent systems with negatively correlated component failures. International Journal of Reliability, Quality and Safety Engineering, 17(5), 505–529.

    Article  Google Scholar 

  • Fisher, R. (1924). On a distribution yielding the error functions of several well known statistics. In Proceedings of the international congress of mathematics, Toronto (Vol, 2, pp. 805–813).

  • Johnson, N., Kotz, S., & Balakrishnan, N. (1997). Discrete multivariate distributions. Series in probability and statistics. New York, NY: Wiley.

    Google Scholar 

  • Knight, J., & Leveson, N. (1986). An experimental evaluation of the assumption of independence in multi-version programming. IEEE Transactions on Software Engineering, 12(1), 96–109.

    Article  Google Scholar 

  • Knuth, D. (1997). Seminumerical algorithms (3rd ed., Vol. 2). Reading, MA: Addison Wesley.

    Google Scholar 

  • Lai, C., & Xie, M. (2006). Stochastic ageing and dependence for reliability. New York, NY: Springer.

    Google Scholar 

  • Littlewood, B. (1996). The impact of diversity upon common mode failures. Reliability Engineering and System Safety, 51(1), 101–113.

    Article  Google Scholar 

  • Littlewood, B., & Miller, D. (1989). Conceptual modeling of coincident failures in multiversion software. IEEE Transactions on Software Engineering, 15(12), 1596–1614.

    Article  Google Scholar 

  • Littlewood, B., Popov, P., & Stringini, L. (2001). Modeling software design diversity—a review. ACM Computing Surveys, 33(2), 177–208.

    Article  Google Scholar 

  • Lyu, M., & He, Y. (1993). Improving the n-version programming process through the evolution of a design paradigm. In IEEE Transactions on Reliability (Vol. 42, pp. 179–189).

  • Musa, J. (1994). Sensitivity of field failure intensity to operational profile errors. In Proceedings of international symposium on software reliability engineering (ISSRE 94) (pp. 1334–337).

  • Musa, J., Fuoco, G., Irving, N., Kropfl, D., & Juhlin, B. (1996). Handbook of software reliability engineering. In The operational profile (pp. 167–216). New York, NY: McGraw-Hill.

  • Park, C., Park, T., & Shin, D. (1996). A simple method for generating correlated binary variates. The American Statistician, 50(4), 306–310.

    Google Scholar 

  • Prentice, R. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81(394), 321–327.

    Article  Google Scholar 

  • Scott, R., Gault, J., & McAllister, D. (1987). Fault-tolerant software reliability modeling. IEEE Transactions on Software Engineering, 13(5), 582–592.

    Article  Google Scholar 

  • Shapiro, A. (2005). An ultra reliability project for NASA. In Proceedings of IEEE aerospace conference, Big Sky, MT (pp. 1–12).

  • Singpurwalla, N. (2006). Reliability and risk: A Bayesian perspective. Series in probability and statistics. New York, NY: Wiley.

    Book  Google Scholar 

  • Tang, D., & Iyer, R. (1992). Analysis and modeling of correlated failures in multicomputer systems. IEEE Transactions on Computers, 41(5), 567–577.

    Article  Google Scholar 

  • Teng, X., & Pham, H. (2002). A software-reliability growth model for n-version programming systems. IEEE Transactions on Reliability, 51(3), 311–321.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lance Fiondella.

Appendices

Appendix

Proof of Theorem 1

Without loss of generality, let \( p_{1}> p_{2}> p_{3}\). It follows directly from Eqs. (16) and (21) that \(\alpha _{1,1}<\alpha _{2,2}<\alpha _{3,3}\) and that

$$\begin{aligned} \alpha _{i,j}<\alpha _{i,i} \,\mathtt{and }\, \alpha _{i,j}<\alpha _{j,j},\quad \forall i,j. \end{aligned}$$
(22)

These inequalities reduce the number of cases to the eight given in Table 1. The first six correspond to permutations of \(\alpha _{1,2}, \alpha _{1,3}\), and \(\alpha _{2,3}\) followed by \(\alpha _{1,1}<\alpha _{2,2}<\alpha _{3,3}\). In the other two cases, \(\alpha _{1,1}<a_{2,3}\) and the beginning of the sequence is \(\alpha _{1,2}<\alpha _{1,3}\) or \(\alpha _{1,3}<\alpha _{1,2}\). The proof of Case I where,

$$\begin{aligned} \alpha _{1,2}<\alpha _{1,3}<\alpha _{2,3}<\alpha _{1,1}<\alpha _{2,2}<\alpha _{3,3} \end{aligned}$$
(23)

proceeds as follows.

Proof

Using the algorithm of (Park et al. 1996), write the initial \(\alpha _{i,j}\) in matrix form as

$$\begin{aligned} \varvec{\alpha }(0)=\left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{1,1} &{} \alpha _{1,2} &{} \alpha _{1,3} \\ &{} \alpha _{2,2} &{} \alpha _{2,3} \\ &{} &{} \alpha _{3,3} \\ \end{array}\right) \!. \end{aligned}$$
(24)

We know from Eq. (23) that \(\alpha _{1,2}\) is the minimum element, so we subtract it from each entry of Eq. (24), producing

$$\begin{aligned} \varvec{\alpha }(1)=\left( \begin{array}{c@{\quad }c@{\quad }c} (\alpha _{1,1}-\alpha _{1,2}) &{} 0 &{} (\alpha _{1,3}-\alpha _{1,2}) \\ &{} (\alpha _{2,2}-\alpha _{1,2}) &{} (\alpha _{2,3}-\alpha _{1,2}) \\ &{} &{} (\alpha _{3,3}-\alpha _{1,2}) \\ \end{array} \right) \!. \end{aligned}$$
(25)

Subtracting, \(\alpha _{1,2}\) preserves the order of the nonzero entries in Eq. (25), so that \((\alpha _{1,3}-\alpha _{1,2})<(\alpha _{2,3} -\alpha _{1,2})<(\alpha _{1,1}-\alpha _{1,2}) <(\alpha _{2,2}-\alpha _{1,2})<(\alpha _{3,3}-\alpha _{1,2})\), which implies that the entry \((\alpha _{1,3}-\alpha _{1,2})\) in position \(\alpha _{1,3}(1)\), is the smallest value remaining. Thus, we subtract \((\alpha _{1,3}-\alpha _{1,2})\) from \(\alpha _{1,1}(1), \alpha _{1,3}(1)\), and \(\alpha _{3,3}(1)\), producing

$$\begin{aligned} \varvec{\alpha }(2)=\left( \begin{array}{c@{\quad }c@{\quad }c} (\alpha _{1,1}-\alpha _{1,3}) &{} 0 &{} 0 \\ &{} (\alpha _{2,2}-\alpha _{1,2}) &{} (\alpha _{2,3}-\alpha _{1,2}) \\ &{} &{} (\alpha _{3,3}-\alpha _{1,3}) \\ \end{array} \right) \!. \end{aligned}$$
(26)

The algorithm (Park et al. 1996) requires that all off diagonal elements \(\alpha _{i,j}, i< j\), be eliminated before the elements \(\alpha _{i,i}\) on the diagonal to successfully encode the correlations. Because \((\alpha _{2,3}-\alpha _{1,2})<(\alpha _{2,2}-\alpha _{1,2})\) from the initial ordering the only requirement is that \((\alpha _{2,3}-\alpha _{1,2})<(\alpha _{3,3}-\alpha _{1,3})\), which is equivalent to

$$\begin{aligned}&= (\alpha _{2,3}+\alpha _{1,3}-\alpha _{1,2})<\alpha _{3,3}\nonumber \\&= \exp (\alpha _{2,3}+\alpha _{1,3}-\alpha _{1,2})<\exp (\alpha _{3,3})\nonumber \\&= \frac{\beta _{2,3}\beta _{1,3}}{\beta _{1,2}}<\frac{1}{ p_{3}}\nonumber \\&= p_{3}<\frac{\beta _{1,2}}{\beta _{1,3}\beta _{2,3}}=\frac{\beta _{1,2}^{2}}{\prod _{i<j}\beta _{i,j}}. \end{aligned}$$
(27)

Repeating this derivation process for each of the eight cases given in Table 1, reveals that the term in the numerator on the right-hand side of Eq. (27) is the smallest \(\beta _{i,j}\) of the initial sequence, \(\beta _{i,j}^{(1)}\). Note that the denominator of Eq. (27) has been rewritten as a product series to avoid enumerating the three specific forms of the denominator that occur when the numerator is \(\beta _{1,2}, \beta _{1,3}\), or \(\beta _{2,3}\). It also follows that for each of the eight cases, the subscript of \(p\) in Eq. (27), is equal to the third variable index \(k\) not appearing in \(\beta _{i,j}^{(1)}\). These two generalizations lead to the bound provided in Eq. (17).

To complete the proof, assume that Eq. (27) is satisfied and subtract \((\alpha _{2,3}-\alpha _{1,2})\) from \(\alpha _{2,2}(2), \alpha _{2,3}(2)\), and \(\alpha _{3,3}(2)\), producing

$$\begin{aligned} \varvec{\alpha }(3)=\left( \begin{array}{c@{\quad }c@{\quad }c} (\alpha _{1,1}-\alpha _{1,3}) &{} 0 &{} 0 \\ &{} (\alpha _{2,2}-\alpha _{2,3}) &{} 0 \\ &{} &{} (\alpha _{3,3}+\alpha _{1,2}\\ &{} &{} -\alpha _{2,3}-\alpha _{1,3}) \end{array} \right) \!. \end{aligned}$$
(28)

Now the on-diagonal elements can be eliminated in isolation, so do this in the order \(\alpha _{1,1}(3), \alpha _{2,2}(3), \alpha _{3,3}(3)\).

Table 3 shows the symbolic value subtracted in each iteration. These values are the rate parameters of independent Poisson random variables \(Y_{l}, 1\le l\le 6\). The sets indicate the subset of correlated Bernoulli variables to which these independent Poisson variables belong.

Thus, the expressions for the success of the three correlated Bernoulli variables are

$$\begin{aligned} Z_{1}&= I_{\{0\}}\left( Y_{1}+Y_{2}+Y_{4}\right) \\ Z_{2}&= I_{\{0\}}\left( Y_{1}+Y_{3}+Y_{5}\right) \\ Z_{3}&= I_{\{0\}}\left( Y_{1}+Y_{2}+Y_{3}+Y_{6}\right) \end{aligned}$$

Here, the indicator function \(I_{\{0\}}(.)=1\) if the sum of the outcomes of \(Y_{l}\) in \(Z_{i}\) equals zero. Hence \(Z_{1}=1\) if and only if \(Y_{1}=Y_{2}=Y_{4}=0, Z_{2}=1\) if and only if \(Y_{1}=Y_{3}=Y_{5}=0\), while \(Z_{3}=1\) if and only if \(Y_{1}=Y_{2}=Y_{3}=Y_{6}=0\). Table 4 shows the probability that variable \(Y_{l}=0\), which is defined as \(m_{l}:=Pr\{Y_{l}=0\}=\exp (-\lambda _{l})\).

The only outcome that contributes to the correlated outcome \(E[Z_{1}Z_{2}Z_{3}]\), where all three experiments are successful is the uncorrelated outcome where all six independent Poisson variables are zero. Thus, the probability of this outcome is

$$\begin{aligned} E[Z_{1}Z_{2}Z_{3}]&= \textstyle {\prod \limits _{i=1}^{6}}m_{i}\nonumber \\&= \frac{1}{\beta _{1,2}}\times \frac{\beta _{1,2}}{\beta _{1,3}}\times \frac{\beta _{1,2}}{\beta _{2,3}}\nonumber \\&\quad \times \, p_{1}\beta _{1,3}\times p_{2}\beta _{2,3}\times \frac{ p_{3}\beta _{1,3}\beta _{2,3}}{\beta _{1,2}}\nonumber \\&= p_{1} p_{2} p_{3}\beta _{1,3}\beta _{2,3}\nonumber \\&= \frac{\prod _{i=1}^{3} p_{i}\prod _{i<j}\beta _{i,j}}{\beta _{1,2}}. \end{aligned}$$
(29)

The remaining seven outcomes are easily obtained from Eqs. (29) and (12), with expectations. For example,

$$\begin{aligned} E[Z_{1}Z_{2}\overline{Z}_{3}]&= E[Z_{1}Z_{2}-Z_{1}Z_{2}Z_{3}]\nonumber \\&= p_{1} p_{2}\beta _{1,2}-\frac{\prod _{i=1}^{3} p_{i}\prod _{i<j}\beta _{i,j}}{\beta _{1,2}}, \end{aligned}$$
(30)

and the cases \(E[Z_{1}\overline{Z}_{2}Z_{3}]\) and \(E[\overline{Z}_{1}Z_{2}Z_{3}]\) are symmetric to Eq. (30). Similarly,

$$\begin{aligned} E[\overline{Z}_{1}\overline{Z}_{2}Z_{3}]&= E[Z_{3}]-E[Z_{1}Z_{3}]-E[Z_{2}Z_{3}]+E[Z_{1}Z_{2}Z_{3}]\nonumber \\&= p_{3}-p_{1}p_{3}\beta _{1,3}-p_{2}p_{3}\beta _{2,3}\nonumber \\&\quad +\,\frac{\prod _{i=1}^{3} p_{i}\prod _{i<j}\beta _{i,j}}{\beta _{1,2}}, \end{aligned}$$
(31)

and the cases \(E[Z_{1}\overline{Z}_{2}\overline{Z}_{3}]\) and \(E[\overline{Z}_{1}Z_{2}\overline{Z}_{3}]\) are symmetric to Eq. (31). The final outcome where all three experiments result in failure is

$$\begin{aligned} E[\overline{Z}_{1}\overline{Z}_{2}\overline{Z}_{3}]&= 1-\sum _{i=1}^{3}E[Z_{i}]+\sum _{i<j}E[Z_{i}Z_{j}]-E[Z_{1}Z_{2}Z_{3}]\nonumber \\&= 1-\sum _{i=1}^{3} p_{i}+\sum _{i<j} p_{i} p_{j}\beta _{i,j}-\frac{\prod _{i=1}^{3} p_{i}\prod _{i<j}\beta _{i,j}}{\beta _{1,2}}. \end{aligned}$$
(32)

Repeating this derivation for Cases II–VIII, given in Table 1, reveals \(\beta ^{(1)}_{i,j}\) is the general form of the denominator in terms of the form \(\frac{\prod _{i=1}^{3} p_{i}\prod _{i<j}\beta _{i,j}}{\beta _{1,2}}\) in Eqs. (29)–(32). This completes the proof. \(\square \)

Table 3 Minimum and sets
Table 4 Poisson probabilities

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fiondella, L., Zeephongsekul, P. Trivariate Bernoulli distribution with application to software fault tolerance. Ann Oper Res 244, 241–255 (2016). https://doi.org/10.1007/s10479-015-1798-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-015-1798-4

Keywords

Navigation