A new extended Birnbaum–Saunders regression model for lifetime modeling

https://doi.org/10.1016/j.csda.2013.02.025Get rights and content

Abstract

A new class of extended Birnbaum–Saunders regression models is introduced. It can be applied to censored data and be used more effectively in survival analysis and fatigue life studies. Maximum likelihood estimation of the model parameters with censored data as well as influence diagnostics for the new regression model are investigated. The normal curvatures for studying local influence are derived under various perturbation schemes and a martingale-type residual is considered to assess departures from the extended Birnbaum–Saunders error assumption as well as to detect outlying observations. Further, a test of homogeneity of the shape parameters of the new regression model is proposed. Two real data sets are analyzed for illustrative purposes.

Introduction

The two-parameter Birnbaum–Saunders (BS) distribution, also known as the fatigue life distribution, was introduced by Birnbaum and Saunders (1969) and has received considerable attention in recent years. It was originally derived from a model for a physical fatigue process where dominant crack growth causes failure. It was later derived by Desmond (1985) using a biological model which followed from relaxing some of the assumptions originally made by Birnbaum and Saunders (1969). The relationship between the BS distribution and the inverse Gaussian distribution was investigated by Desmond (1986) who demonstrated that the BS distribution is an equal-weight mixture of an inverse Gaussian distribution and its complementary reciprocal. For book treatments of inverse Gaussian and BS distributions and their relationships, see Marshall and Olkin (2007, Chapter 13) and especially Saunders (2007, Chapter 10). More recently, Jones (2012) also discussed the relationship between the BS and the inverse Gaussian distributions.

The cumulative distribution function of a random variable T with BS distribution, say TBS(α,η), is G(t)=Φ(v), with t>0, where Φ() is the standard normal cumulative function, v=v(t)=ρ(t/η)/α, ρ(z)=z1/2z1/2, and α>0 and η>0 are the shape and scale parameters, respectively. The shape of the hazard function of the BS distribution is discussed in Kundu et al. (2008). The authors showed that the hazard function is not monotone and is unimodal for all ranges of the parameter values. Some interesting results on improved statistical inference as well as interval estimation for the BS distribution may be revised in Wu and Wong (2004), Lemonte et al., 2007, Lemonte et al., 2008 and Wang (2012). The BS distribution has been applied in a wide variety of fields. For the applications of the BS distributions, read, for example, Balakrishnan et al. (2007) in reliability and Leiva et al., 2008, Leiva et al., 2009 in other fields. It is worthwhile to mention that there has been a great deal of progress recently in developing statistical methodology for the BS model and its generalizations. Notable contributions include Professor Narayanaswamy Balakrishnan (http://www.math.mcmaster.ca/bala/bala.html) and co-workers, and Professor Victor Leiva (http://staff.deuv.cl/leiva/) and co-workers.

On the basis of the scheme proposed by Marshall and Olkin (1997), Lemonte (2013) introduced a quite flexible distribution which can be used to model failure times for materials subject to fatigue and lifetime data. The new distribution was called by the author as the Marshall–Olkin extended Birnbaum–Saunders (MOEBS) distribution. Hereafter, the random variable T is said to have a MOEBS distribution with shape parameters α>0 and λ>0, and scale parameter η>0, say TMOEBS(λ,α,η), if its cumulative function is given by G(t)=Φ(v)1λ̄Φ(v),t>0, where λ̄=1λ. The survival function is S(t)=λΦ(v)/[1λ̄Φ(v)], whereas the probability density function corresponding to (1) takes the form g(t)=λκ(α,η)t3/2(t+η)[1λ̄Φ(v)]2exp{τ(t/η)/(2α2)}, where κ(α,η)=exp(α2)/(2α2πη) and τ(z)=z+z1. It can be shown that if TMOEBS(λ,α,η), then kTMOEBS(λ,α,kη), for k>0, i.e. the class of MOEBS distributions is closed under scale transformations. The two-parameter BS distribution arises from (1) when λ=1, that is, TBS(α,η)=MOEBS(1,α,η).

Rieck and Nedelman (1991) proposed a log-linear regression model based on the BS distribution. They showed that if TBS(α,η), then Y=log(T) is sinh-normal (SN) distributed with shape, location and scale parameters given by α, μ=log(η) and σ=2, respectively; that is, the log-BS (LBS) distribution is a special case of the SN distribution introduced by them and, in this case, the notation YLBS(α,μ) is considered. The SN distribution is symmetrical, presents greater and smaller degrees of kurtosis than the normal model and also has bi-modality. Their regression model has received significant attention over the last few years by many researchers. For some recent references about the BS regression model, the reader is refereed to Desmond et al. (2008), Xiao et al. (2010), Lemonte et al. (2010), Lemonte (2011), Lemonte and Ferrari, 2011a, Lemonte and Ferrari, 2011b, Lemonte and Ferrari, 2011c, Qu and Xie (2011) and Li et al. (2012), among others.

Some generalizations of the log-linear BS regression model have been proposed in the statistical literature. For example, some efforts can be found in the works by Barros et al. (2008), Lemonte and Cordeiro (2009), Santana et al. (2011), Lemonte (2012), Desmond et al. (2012) and Villegas et al. (2011). Barros et al. (2008) introduced the generalized BS regression model based on the BS-tν distribution (that is, based on the BS Student-t model with ν degrees of freedom), Lemonte and Cordeiro (2009) proposed a non-linear BS regression model, Santana et al. (2011) and Lemonte (2012) introduced the skewed BS regression model, whereas Villegas et al. (2011) and Desmond et al. (2012) studied a mixed log-linear model based on the BS distribution.

In this paper, in addition to the existing generalizations of the BS regression model, we shall propose the extended BS regression model based on the MOEBS distribution; that is, we will introduce a new class of lifetime regression models in which the errors follow the log-MOEBS distribution. The main motivation for introducing this new class of regression models relies on the fact that the practitioners will have a new BS regression model to use in practical applications. Moreover, the formulas related with the new regression model are manageable and with the use of modern computer resources and its numerical capabilities, the proposed model may prove to be an useful addition to the arsenal of applied statisticians. Additionally, the new model is quite flexible and can be widely applied in analyzing lifetime data. Further, we provide two applications to real data sets which show that the new regression model yields a better fit than the usual BS regression model. Furthermore, the new extended BS regression model can be used for modeling censored data as well as data without censoring. It should be mentioned that censored data is very common in lifetime data because of time limits and other restrictions on data collection. In a engineering life test experiment, for example, it is usually not feasible to continue experimentation until all items under study have failed. In a survival study, patients follow-up may be lost and also data analysis is usually done before all patients have reached the event of interest. The partial information contained in the censored observations is just a lower bound on the lifetime distribution. Reliability studies usually finish before all units have failed, even making use of accelerated tests. This is a special source of difficulty in the analysis of reliability data. Such data are said to be censored at right and they arise when some units are still running at the time of the data analysis, removed from test before they fail or because they failed from an extraneous cause. We refer the reader to Gijbels (2010) for a recent overview on censored data.

It is nowadays a well spread practice, after modeling, to check the model assumptions and conduct diagnostic studies in order to detect possible influential observations that may distort the results of the analysis. Diagnostic analysis is an efficient way to detect influential observations. The first technique developed to assess the individual impact of cases on the estimation process is, perhaps, the case deletion which became a very popular tool. Cook (1977) presents a great development of case deletion diagnostics for a general statistical model. Case deletion is an example of a global influence analysis, that is, the effect of an observation is assessed by completely removing it. However, case deletion excludes all information from an observation and we can hardly say whether this observation has some influence on a specific aspect of the model. To overcome this problem, one can resort to local influence approach where one investigates the model sensitivity under small perturbations. In this context, Cook (1986) proposed a general framework to detect influential observations which gives a measure of this sensitivity under small perturbations on the data or in the model. Many applications of the local influence method may be found in the statistical literature for various models and under different perturbation schemes. For instance, Espinheira et al. (2008), Vasconcellos and Fernandez (2009), Patriota et al. (2010), Lemonte and Patriota (2011), Zevallos et al. (2012) and Matos et al. (2013), among others. In this paper, we also propose a similar methodology to detect influential subjects in the new extended BS regression model. In particular, we obtain explicit formulas for Cook’s (1986) normal curvature measure under three perturbation schemes.

The paper unfolds as follows. The log-MOEBS distribution is proposed in Section 2. In Section 3, we introduce the extended BS regression model and discuss estimation of the model parameters. Specifically, we compute the maximum likelihood estimating equations by assuming random censoring. In Section 4, the normal curvatures of local influence are derived under various perturbation schemes and a kind of deviance residual is proposed to assess departures from the underlying log-MOEBS distribution as well as to detect outlying observations. In Section 5, we propose a likelihood ratio statistic for testing the homogeneity of the shape parameters. Two real data illustrations are considered in Section 6. The paper ends up with some concluding remarks in Section 7.

Section snippets

The log-MOEBS distribution

Let T be a random variable having the MOEBS cumulative function (1). The random variable Y=log(T) has a log-MOEBS (LMOEBS) distribution. After some algebra, the survival function, the cumulative function and the density function of Y, parameterized in terms of μ=log(η), can be expressed, respectively, as S(y)=λΦ(ξ2)1λ̄Φ(ξ2),F(y)=Φ(ξ2)1λ̄Φ(ξ2),yR,f(y)=λξ1ϕ(ξ2)2[1λ̄Φ(ξ2)]2,yR, where ϕ() is the standard normal density function, ξ1=2αcosh(yμ2),ξ2=2αsinh(yμ2). Evidently, the density

The model and estimation

The extended BS regression model (that is, the LMOEBS regression model) is defined in the form yi=xiβ+εi,i=1,,n, where yi is the observed log-lifetime or log-censoring time for the ith individual, xi=(xi1,,xip) is a vector of known explanatory variables associated with yi, β=(β1,,βp) is a p-vector (where p<n and it is fixed) of unknown regression parameters to be estimated and εiLMOEBS(λ,α,0). It is also assumed that the random variables εi’s are independent and identically distributed.

Diagnostic analysis

Since regression models are sensitive to the underlying model assumptions, generally performing a sensitivity analysis is strongly advisable. In order to assess the sensitivity of the maximum likelihood estimates of the parameters of the regression model (4), the local influence method under three perturbation schemes is carried out. In order to assess departures from the underlying LMOEBS distribution as well as to detect outlying observations, a kind of deviance residual will be considered.

Testing the homogeneity of the shape parameters

In the extended BS regression model introduced in Section 3, the homogeneity of the shape parameters λ and α is a standard assumption. This assumption, however, is not necessarily appropriate, because the actual shape parameters of the response variable yi may be related to the ith observation. In this case, the inference would be much difficult to deal with. Hence, this assumption usually need to be checked. In this section, we consider a LR test statistic to verify the homogeneity of the

Real data illustrations

In this section, we use two real data sets to show the flexibility and applicability of the extended BS regression model in practice. We will consider real data with and without censoring. All the computations presented in this section were done using the Ox matrix programming language (Doornik, 2009), which is freely distributed for academic purposes and available at http://www.doornik.com. The Broyden–Fletcher–Goldfarb–Shanno (BFGS) method with analytical derivatives through the subroutine

Concluding remarks

The BS distribution has many attractive properties and has found several applications in the literature including lifetime, survival and environmental data analysis. It has received significant attention over the last few years and some generalizations and extensions of this distribution have been proposed by many researchers. Based on the BS distribution, Rieck and Nedelman (1991) introduced the BS regression model, which has been studied by several authors. Their regression model is becoming

Acknowledgments

The author thanks the associate editor and three anonymous referees for useful suggestions and comments that aided in improving the first version of the manuscript. The author gratefully acknowledges grants from FAPESP (Brazil).

References (56)

  • A.J. Lemonte et al.

    Improved likelihood inference in Birnbaum–Saunders regressions

    Computational Statistics and Data Analysis

    (2010)
  • A.P. Li et al.

    Diagnostic analysis for heterogeneous log-Birnbaum–Saunders regression models

    Statistics and Probability Letters

    (2012)
  • A.P. Li et al.

    Diagnostics for a class of survival regression models with heavy-tailed errors

    Computational Statistics and Data Analysis

    (2012)
  • L.A. Matos et al.

    Influence diagnostics in linear and nonlinear mixed-effects models with censored data

    Computational Statistics and Data Analysis

    (2013)
  • A.G. Patriota et al.

    Influence diagnostics in a multivariate normal regression model with general parameterization

    Statistical Methodology

    (2010)
  • K.L.P. Vasconcellos et al.

    Influence analysis with homogeneous linear restrictions

    Computational Statistics and Data Analysis

    (2009)
  • B.X. Wang

    Generalized interval estimation for the Birnbaum–Saunders distribution

    Computational Statistics and Data Analysis

    (2012)
  • J. Wu et al.

    Improved interval estimation for the two-parameter Birnbaum–Saunders distribution

    Computational Statistics and Data Analysis

    (2004)
  • Q. Xiao et al.

    Estimation of the Birnbaum–Saunders regression model with current status data

    Computational Statistics and Data Analysis

    (2010)
  • F.C. Xie et al.

    Diagnostics analysis for log-Birnbaum–Saunders regression models

    Computational Statistics and Data Analysis

    (2007)
  • M. Zevallos et al.

    A note on influence diagnostics in AR(1) time series models

    Journal of Statistical Planning and Inference

    (2012)
  • A.C. Atkinson

    Two graphical display for outlying and influential observations in regression

    Biometrika

    (1981)
  • N. Balakrishnan et al.

    Acceptance sampling plans from truncated life tests from generalized Birnbaum–Saunders distribution

    Communications in Statistics—Simulation and Computation

    (2007)
  • W.E. Barlow et al.

    Residuals for relative risk regression

    Biometrika

    (1988)
  • M. Barros et al.

    A new class of survival regression models with heavy-tailed errors: robustness and diagnostics

    Lifetime Data Analysis

    (2008)
  • Z.W. Birnbaum et al.

    A new family of life distributions

    Journal of Applied Probability

    (1969)
  • R.D. Cook

    Detection of influential observation in linear regression

    Technometrics

    (1977)
  • R.D. Cook

    Assessment of local influence

    Journal of the Royal Statistical Society B

    (1986)
  • Cited by (13)

    • Modeling right-skewed financial data streams: A likelihood inference based on the generalized Birnbaum–Saunders mixture model

      2020, Applied Mathematics and Computation
      Citation Excerpt :

      The BS distribution could be criticized not only for its lack of robustness against atypical observations (highly skewed and heavy-tailed data) but also for the fact that it cannot accommodate monotone (increasing or decreasing) nor bathtub-shaped hazard rate functions [27]. To overcome these deficiencies, some generalizations of the BS distribution have recently been proposed in [28–33]. Although these generalized models may not have physical meaning as the BS distribution, they can be used for modeling right-skewed and non-negative datasets with strong asymmetrical features.

    • A family of autoregressive conditional duration models applied to financial data

      2014, Computational Statistics and Data Analysis
      Citation Excerpt :

      Birnbaum and Saunders (1969) introduced a distribution to model fatigue life data, assuming that the failure follows from the development and growth of a dominant fissure produced by stress. The Birnbaum–Saunders (BS) distribution has been widely studied because of its good properties and its relation with the normal distribution; see, e.g., Cysneiros et al. (2008), Balakrishnan et al. (2009a), Balakrishnan et al. (2011), Kotz et al. (2010), Vilca et al. (2010), Vilca et al. (2011), Villegas et al. (2011), Ferreira et al. (2012), Leiva et al. (2012), Li and Xie (2012), Vanegas et al. (2012), Fierro et al. (2013), Lemonte (2013) and Barros et al. (2014). In addition, although it has its genesis from engineering, its applications have been considered in other fields, including business, economics, finance and quality control; see Jin and Kawczak (2003), Balakrishnan et al. (2007), Ahmed et al. (2010), Bhatti (2010), Leiva et al. (2011b), Leiva et al. (2014a), Leiva et al. (2014b), Leiva et al. (2014c), Paula et al. (2012) and Marchant et al. (2013).

    View all citing articles on Scopus
    View full text