A new extended Birnbaum–Saunders regression model for lifetime modeling

doi:10.1016/j.csda.2013.02.025

Computational Statistics & Data Analysis

Volume 64, August 2013, Pages 34-50

https://doi.org/10.1016/j.csda.2013.02.025 Get rights and content

Abstract

A new class of extended Birnbaum–Saunders regression models is introduced. It can be applied to censored data and be used more effectively in survival analysis and fatigue life studies. Maximum likelihood estimation of the model parameters with censored data as well as influence diagnostics for the new regression model are investigated. The normal curvatures for studying local influence are derived under various perturbation schemes and a martingale-type residual is considered to assess departures from the extended Birnbaum–Saunders error assumption as well as to detect outlying observations. Further, a test of homogeneity of the shape parameters of the new regression model is proposed. Two real data sets are analyzed for illustrative purposes.

Introduction

The two-parameter Birnbaum–Saunders (BS) distribution, also known as the fatigue life distribution, was introduced by Birnbaum and Saunders (1969) and has received considerable attention in recent years. It was originally derived from a model for a physical fatigue process where dominant crack growth causes failure. It was later derived by Desmond (1985) using a biological model which followed from relaxing some of the assumptions originally made by Birnbaum and Saunders (1969). The relationship between the BS distribution and the inverse Gaussian distribution was investigated by Desmond (1986) who demonstrated that the BS distribution is an equal-weight mixture of an inverse Gaussian distribution and its complementary reciprocal. For book treatments of inverse Gaussian and BS distributions and their relationships, see Marshall and Olkin (2007, Chapter 13) and especially Saunders (2007, Chapter 10). More recently, Jones (2012) also discussed the relationship between the BS and the inverse Gaussian distributions.

The cumulative distribution function of a random variable $T$ with BS distribution, say $T \sim BS (α, η)$ , is $G (t) = Φ (v)$ , with $t > 0$ , where $Φ (\cdot)$ is the standard normal cumulative function, $v = v (t) = ρ (t / η) / α$ , $ρ (z) = z^{1 / 2} - z^{- 1 / 2}$ , and $α > 0$ and $η > 0$ are the shape and scale parameters, respectively. The shape of the hazard function of the BS distribution is discussed in Kundu et al. (2008). The authors showed that the hazard function is not monotone and is unimodal for all ranges of the parameter values. Some interesting results on improved statistical inference as well as interval estimation for the BS distribution may be revised in Wu and Wong (2004), Lemonte et al., 2007, Lemonte et al., 2008 and Wang (2012). The BS distribution has been applied in a wide variety of fields. For the applications of the BS distributions, read, for example, Balakrishnan et al. (2007) in reliability and Leiva et al., 2008, Leiva et al., 2009 in other fields. It is worthwhile to mention that there has been a great deal of progress recently in developing statistical methodology for the BS model and its generalizations. Notable contributions include Professor Narayanaswamy Balakrishnan (http://www.math.mcmaster.ca/bala/bala.html) and co-workers, and Professor Victor Leiva (http://staff.deuv.cl/leiva/) and co-workers.

On the basis of the scheme proposed by Marshall and Olkin (1997), Lemonte (2013) introduced a quite flexible distribution which can be used to model failure times for materials subject to fatigue and lifetime data. The new distribution was called by the author as the Marshall–Olkin extended Birnbaum–Saunders (MOEBS) distribution. Hereafter, the random variable $T$ is said to have a MOEBS distribution with shape parameters $α > 0$ and $λ > 0$ , and scale parameter $η > 0$ , say $T \sim MOEBS (λ, α, η)$ , if its cumulative function is given by $G (t) = \frac{Φ (v)}{1 - \bar{λ} Φ (- v)}, t > 0,$ where $\bar{λ} = 1 - λ$ . The survival function is $S (t) = λ Φ (- v) / [1 - \bar{λ} Φ (- v)]$ , whereas the probability density function corresponding to (1) takes the form $g (t) = λ κ (α, η) t^{- 3 / 2} (t + η) {[1 - \bar{λ} Φ (- v)]}^{- 2} exp {- τ (t / η) / (2 α^{2})}$ , where $κ (α, η) = exp (α^{- 2}) / (2 α \sqrt{2 π η})$ and $τ (z) = z + z^{- 1}$ . It can be shown that if $T \sim MOEBS (λ, α, η)$ , then $k T \sim MOEBS (λ, α, k η)$ , for $k > 0$ , i.e. the class of MOEBS distributions is closed under scale transformations. The two-parameter BS distribution arises from (1) when $λ = 1$ , that is, $T \sim BS (α, η) = MOEBS (1, α, η)$ .

Rieck and Nedelman (1991) proposed a log-linear regression model based on the BS distribution. They showed that if $T \sim BS (α, η)$ , then $Y = log (T)$ is sinh-normal (SN) distributed with shape, location and scale parameters given by $α$ , $μ = log (η)$ and $σ = 2$ , respectively; that is, the log-BS (LBS) distribution is a special case of the SN distribution introduced by them and, in this case, the notation $Y \sim LBS (α, μ)$ is considered. The SN distribution is symmetrical, presents greater and smaller degrees of kurtosis than the normal model and also has bi-modality. Their regression model has received significant attention over the last few years by many researchers. For some recent references about the BS regression model, the reader is refereed to Desmond et al. (2008), Xiao et al. (2010), Lemonte et al. (2010), Lemonte (2011), Lemonte and Ferrari, 2011a, Lemonte and Ferrari, 2011b, Lemonte and Ferrari, 2011c, Qu and Xie (2011) and Li et al. (2012), among others.

Some generalizations of the log-linear BS regression model have been proposed in the statistical literature. For example, some efforts can be found in the works by Barros et al. (2008), Lemonte and Cordeiro (2009), Santana et al. (2011), Lemonte (2012), Desmond et al. (2012) and Villegas et al. (2011). Barros et al. (2008) introduced the generalized BS regression model based on the BS- $t_{ν}$ distribution (that is, based on the BS Student- $t$ model with $ν$ degrees of freedom), Lemonte and Cordeiro (2009) proposed a non-linear BS regression model, Santana et al. (2011) and Lemonte (2012) introduced the skewed BS regression model, whereas Villegas et al. (2011) and Desmond et al. (2012) studied a mixed log-linear model based on the BS distribution.

In this paper, in addition to the existing generalizations of the BS regression model, we shall propose the extended BS regression model based on the MOEBS distribution; that is, we will introduce a new class of lifetime regression models in which the errors follow the log-MOEBS distribution. The main motivation for introducing this new class of regression models relies on the fact that the practitioners will have a new BS regression model to use in practical applications. Moreover, the formulas related with the new regression model are manageable and with the use of modern computer resources and its numerical capabilities, the proposed model may prove to be an useful addition to the arsenal of applied statisticians. Additionally, the new model is quite flexible and can be widely applied in analyzing lifetime data. Further, we provide two applications to real data sets which show that the new regression model yields a better fit than the usual BS regression model. Furthermore, the new extended BS regression model can be used for modeling censored data as well as data without censoring. It should be mentioned that censored data is very common in lifetime data because of time limits and other restrictions on data collection. In a engineering life test experiment, for example, it is usually not feasible to continue experimentation until all items under study have failed. In a survival study, patients follow-up may be lost and also data analysis is usually done before all patients have reached the event of interest. The partial information contained in the censored observations is just a lower bound on the lifetime distribution. Reliability studies usually finish before all units have failed, even making use of accelerated tests. This is a special source of difficulty in the analysis of reliability data. Such data are said to be censored at right and they arise when some units are still running at the time of the data analysis, removed from test before they fail or because they failed from an extraneous cause. We refer the reader to Gijbels (2010) for a recent overview on censored data.

It is nowadays a well spread practice, after modeling, to check the model assumptions and conduct diagnostic studies in order to detect possible influential observations that may distort the results of the analysis. Diagnostic analysis is an efficient way to detect influential observations. The first technique developed to assess the individual impact of cases on the estimation process is, perhaps, the case deletion which became a very popular tool. Cook (1977) presents a great development of case deletion diagnostics for a general statistical model. Case deletion is an example of a global influence analysis, that is, the effect of an observation is assessed by completely removing it. However, case deletion excludes all information from an observation and we can hardly say whether this observation has some influence on a specific aspect of the model. To overcome this problem, one can resort to local influence approach where one investigates the model sensitivity under small perturbations. In this context, Cook (1986) proposed a general framework to detect influential observations which gives a measure of this sensitivity under small perturbations on the data or in the model. Many applications of the local influence method may be found in the statistical literature for various models and under different perturbation schemes. For instance, Espinheira et al. (2008), Vasconcellos and Fernandez (2009), Patriota et al. (2010), Lemonte and Patriota (2011), Zevallos et al. (2012) and Matos et al. (2013), among others. In this paper, we also propose a similar methodology to detect influential subjects in the new extended BS regression model. In particular, we obtain explicit formulas for Cook’s (1986) normal curvature measure under three perturbation schemes.

The paper unfolds as follows. The log-MOEBS distribution is proposed in Section 2. In Section 3, we introduce the extended BS regression model and discuss estimation of the model parameters. Specifically, we compute the maximum likelihood estimating equations by assuming random censoring. In Section 4, the normal curvatures of local influence are derived under various perturbation schemes and a kind of deviance residual is proposed to assess departures from the underlying log-MOEBS distribution as well as to detect outlying observations. In Section 5, we propose a likelihood ratio statistic for testing the homogeneity of the shape parameters. Two real data illustrations are considered in Section 6. The paper ends up with some concluding remarks in Section 7.

Section snippets

The log-MOEBS distribution

Let $T$ be a random variable having the MOEBS cumulative function (1). The random variable $Y = log (T)$ has a log-MOEBS (LMOEBS) distribution. After some algebra, the survival function, the cumulative function and the density function of $Y$ , parameterized in terms of $μ = log (η)$ , can be expressed, respectively, as $S (y) = \frac{λ Φ (- ξ_{2})}{1 - \bar{λ} Φ (- ξ_{2})}, F (y) = \frac{Φ (ξ_{2})}{1 - \bar{λ} Φ (- ξ_{2})}, y \in R,$ $f (y) = \frac{λ ξ_{1} ϕ (ξ_{2})}{2 {[1 - \bar{λ} Φ (- ξ_{2})]}^{2}}, y \in R,$ where $ϕ (\cdot)$ is the standard normal density function, $ξ_{1} = \frac{2}{α} cosh (\frac{y - μ}{2}), ξ_{2} = \frac{2}{α} sinh (\frac{y - μ}{2}) .$ Evidently, the density

The model and estimation

The extended BS regression model (that is, the LMOEBS regression model) is defined in the form $y_{i} = x_{i}^{⊤} β + ε_{i}, i = 1, \dots, n,$ where $y_{i}$ is the observed log-lifetime or log-censoring time for the $i$ th individual, $x_{i} = {(x_{i 1}, \dots, x_{i p})}^{⊤}$ is a vector of known explanatory variables associated with $y_{i}$ , $β = {(β_{1}, \dots, β_{p})}^{⊤}$ is a $p$ -vector (where $p < n$ and it is fixed) of unknown regression parameters to be estimated and $ε_{i} \sim LMOEBS (λ, α, 0)$ . It is also assumed that the random variables $ε_{i}$ ’s are independent and identically distributed.

Diagnostic analysis

Since regression models are sensitive to the underlying model assumptions, generally performing a sensitivity analysis is strongly advisable. In order to assess the sensitivity of the maximum likelihood estimates of the parameters of the regression model (4), the local influence method under three perturbation schemes is carried out. In order to assess departures from the underlying LMOEBS distribution as well as to detect outlying observations, a kind of deviance residual will be considered.

Testing the homogeneity of the shape parameters

In the extended BS regression model introduced in Section 3, the homogeneity of the shape parameters $λ$ and $α$ is a standard assumption. This assumption, however, is not necessarily appropriate, because the actual shape parameters of the response variable $y_{i}$ may be related to the $i$ th observation. In this case, the inference would be much difficult to deal with. Hence, this assumption usually need to be checked. In this section, we consider a LR test statistic to verify the homogeneity of the

Real data illustrations

In this section, we use two real data sets to show the flexibility and applicability of the extended BS regression model in practice. We will consider real data with and without censoring. All the computations presented in this section were done using the Ox matrix programming language (Doornik, 2009), which is freely distributed for academic purposes and available at http://www.doornik.com. The Broyden–Fletcher–Goldfarb–Shanno (BFGS) method with analytical derivatives through the subroutine

Concluding remarks

The BS distribution has many attractive properties and has found several applications in the literature including lifetime, survival and environmental data analysis. It has received significant attention over the last few years and some generalizations and extensions of this distribution have been proposed by many researchers. Based on the BS distribution, Rieck and Nedelman (1991) introduced the BS regression model, which has been studied by several authors. Their regression model is becoming

Acknowledgments

The author thanks the associate editor and three anonymous referees for useful suggestions and comments that aided in improving the first version of the manuscript. The author gratefully acknowledges grants from FAPESP (Brazil).

References (56)

P.S. Chan et al.
Point and interval estimation for extreme-value regression model under Type-II censoring
Computational Statistics ans Data Analysis
(2008)
A.F. Desmond et al.
A mixed effects log-linear model based on the Birnbaum–Saunders distribution
Computational Statistics and Data Analysis
(2012)
P.L. Espinheira et al.
Influence diagnostics in beta regression
Computational Statistics and Data Analysis
(2008)
M.C. Jones
Relationships between distributions with certain symmetries
Statistics and Probability Letters
(2012)
D. Kundu et al.
On the hazard function of Birnbaum–Saunders distribution and associated inference
Computational Statistics and Data Analysis
(2008)
V. Leiva et al.
Influence diagnostics in log-Birnbaum–Saunders regression models with censored data
Computational Statistics and Data Analysis
(2007)
A.J. Lemonte et al.
Birnbaum–Saunders nonlinear regression models
Computational Statistics and Data Analysis
(2009)
A.J. Lemonte et al.
Improved statistical inference for the two-parameter Birnbaum–Saunders distribution
Computational Statistics and Data Analysis
(2007)
A.J. Lemonte et al.
Size and power properties of some tests in the Birnbaum–Saunders regression model
Computational Statistics and Data Analysis
(2011)
A.J. Lemonte et al.
Signed likelihood ratio tests in the Birnbaum–Saunders regression model
Journal of Statistical Planning and Inference
(2011)

A.J. Lemonte et al.

Improved likelihood inference in Birnbaum–Saunders regressions

Computational Statistics and Data Analysis

(2010)

A.P. Li et al.

Diagnostic analysis for heterogeneous log-Birnbaum–Saunders regression models

Statistics and Probability Letters

(2012)

A.P. Li et al.

Diagnostics for a class of survival regression models with heavy-tailed errors

Computational Statistics and Data Analysis

(2012)

L.A. Matos et al.

Influence diagnostics in linear and nonlinear mixed-effects models with censored data

Computational Statistics and Data Analysis

(2013)

A.G. Patriota et al.

Influence diagnostics in a multivariate normal regression model with general parameterization

Statistical Methodology

(2010)

K.L.P. Vasconcellos et al.

Influence analysis with homogeneous linear restrictions

Computational Statistics and Data Analysis

(2009)

B.X. Wang

Generalized interval estimation for the Birnbaum–Saunders distribution

Computational Statistics and Data Analysis

(2012)

J. Wu et al.

Improved interval estimation for the two-parameter Birnbaum–Saunders distribution

Computational Statistics and Data Analysis

(2004)

Q. Xiao et al.

Estimation of the Birnbaum–Saunders regression model with current status data

Computational Statistics and Data Analysis

(2010)

F.C. Xie et al.

Diagnostics analysis for log-Birnbaum–Saunders regression models

Computational Statistics and Data Analysis

(2007)

M. Zevallos et al.

A note on influence diagnostics in AR(1) time series models

Journal of Statistical Planning and Inference

(2012)

A.C. Atkinson

Two graphical display for outlying and influential observations in regression

Biometrika

(1981)

N. Balakrishnan et al.

Acceptance sampling plans from truncated life tests from generalized Birnbaum–Saunders distribution

Communications in Statistics—Simulation and Computation

(2007)

W.E. Barlow et al.

Residuals for relative risk regression

Biometrika

(1988)

M. Barros et al.

A new class of survival regression models with heavy-tailed errors: robustness and diagnostics

Lifetime Data Analysis

(2008)

Z.W. Birnbaum et al.

A new family of life distributions

Journal of Applied Probability

(1969)

R.D. Cook

Detection of influential observation in linear regression

Technometrics

(1977)

R.D. Cook

Assessment of local influence

Journal of the Royal Statistical Society B

(1986)

Cited by (13)

Modeling right-skewed financial data streams: A likelihood inference based on the generalized Birnbaum–Saunders mixture model
2020, Applied Mathematics and Computation
Citation Excerpt :
The BS distribution could be criticized not only for its lack of robustness against atypical observations (highly skewed and heavy-tailed data) but also for the fact that it cannot accommodate monotone (increasing or decreasing) nor bathtub-shaped hazard rate functions [27]. To overcome these deficiencies, some generalizations of the BS distribution have recently been proposed in [28–33]. Although these generalized models may not have physical meaning as the BS distribution, they can be used for modeling right-skewed and non-negative datasets with strong asymmetrical features.
Finite mixture models have recently been considered for analyzing positive support economical data streams with non-normal features. In this paper, a new mixture model based on the novel class of generalized Birnbaum–Saunders distributions is proposed to enhance strength and flexibility in modeling heterogeneous lifetime data. Some characteristics and properties of this mixture model are outlined. By presenting a convenient hierarchical representation, a mathematically elegant and computationally tractable EM-type algorithm is adopted for computing maximum likelihood estimates. Theoretical formulae of well-known risk measures referring to the class of generalized Birnbaum–Saunders distributions are derived. Finally, the utility of the postulated methodology is illustrated with some real-world data examples.
A family of autoregressive conditional duration models applied to financial data
2014, Computational Statistics and Data Analysis
Citation Excerpt :
Birnbaum and Saunders (1969) introduced a distribution to model fatigue life data, assuming that the failure follows from the development and growth of a dominant fissure produced by stress. The Birnbaum–Saunders (BS) distribution has been widely studied because of its good properties and its relation with the normal distribution; see, e.g., Cysneiros et al. (2008), Balakrishnan et al. (2009a), Balakrishnan et al. (2011), Kotz et al. (2010), Vilca et al. (2010), Vilca et al. (2011), Villegas et al. (2011), Ferreira et al. (2012), Leiva et al. (2012), Li and Xie (2012), Vanegas et al. (2012), Fierro et al. (2013), Lemonte (2013) and Barros et al. (2014). In addition, although it has its genesis from engineering, its applications have been considered in other fields, including business, economics, finance and quality control; see Jin and Kawczak (2003), Balakrishnan et al. (2007), Ahmed et al. (2010), Bhatti (2010), Leiva et al. (2011b), Leiva et al. (2014a), Leiva et al. (2014b), Leiva et al. (2014c), Paula et al. (2012) and Marchant et al. (2013).
The Birnbaum–Saunders distribution is receiving considerable attention due to its good properties. One of its extensions is the class of scale-mixture Birnbaum–Saunders (SBS) distributions, which shares its good properties, but it also has further properties. The autoregressive conditional duration models are the primary family used for analyzing high-frequency financial data. We propose a methodology based on SBS autoregressive conditional duration models, which includes in-sample inference, goodness-of-fit and out-of-sample forecast techniques. We carry out a Monte Carlo study to evaluate its performance and assess its practical usefulness with real-world data of financial transactions from the New York stock exchange.
Multivariate Birnbaum–Saunders distribution based on a skewed distribution and associated EM-estimation
2023, Brazilian Journal of Probability and Statistics
A new log-linear bimodal Birnbaum–Saunders regression model with application to survival data
2019, Brazilian Journal of Probability and Statistics
Birnbaum-saunders distribution: A review of models, analysis and applications
2018, arXiv
On Multivariate Log Birnbaum-Saunders Distribution
2017, Sankhya B

View all citing articles on Scopus

View full text

A new extended Birnbaum–Saunders regression model for lifetime modeling

Abstract

Introduction

Section snippets

The log-MOEBS distribution

The model and estimation

Diagnostic analysis

Testing the homogeneity of the shape parameters

Real data illustrations

Concluding remarks

Acknowledgments

Computational Statistics ans Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Statistics and Probability Letters

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Journal of Statistical Planning and Inference

Computational Statistics and Data Analysis

Statistics and Probability Letters

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Statistical Methodology

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Journal of Statistical Planning and Inference

Two graphical display for outlying and influential observations in regression

Biometrika

Acceptance sampling plans from truncated life tests from generalized Birnbaum–Saunders distribution

Communications in Statistics—Simulation and Computation

Residuals for relative risk regression

Biometrika

A new class of survival regression models with heavy-tailed errors: robustness and diagnostics

Lifetime Data Analysis

A new family of life distributions

Journal of Applied Probability

Detection of influential observation in linear regression

Technometrics

Assessment of local influence

Journal of the Royal Statistical Society B