An empirical study of a test for polynomial relationships in randomly right censored regression models

https://doi.org/10.1016/j.csda.2007.03.005Get rights and content

Abstract

In this paper, a test statistic is constructed to test polynomial relationships in randomly right censored regression models based on the local polynomial smoothing technique. Two bootstrap procedures, namely the residual-based bootstrap and the naive bootstrap procedures, are suggested to derive the p-value of the test. Some simulations are conducted to empirically assess the performance of the two bootstrap procedures. The results demonstrate that the residual-based bootstrap performs much better than the naive bootstrap and the test method with the residual-based bootstrap to derive the p-value works satisfactorily. Although the limiting distribution of the test statistic and the consistency of the bootstrap approximations remain to be investigated, simulation results indicate that the proposed test method may be of some practical use. As a real example, the proposed test is applied to the Stanford heart transplant data.

Introduction

In recent years, there has been a great deal of interest in the regression analysis of randomly right censored data, particularly in the context of survival analysis in clinical trials where patients often survive beyond the end of the trial period or are lost to follow-up for some reason. A general way to model such a situation is to introduce a censoring variable apart from the response and explanatory variables. More precisely, denote by Y1,Y2,,Yn an independent sample from an unknown lifetime distribution function F(x) and Xi=(Xi1,Xi2,,Xip)T(i=1,2,,n) the associated observable covariates. Let C1,C2,,Cn be an independent sample from a so-called censoring distribution function G(x). Hence, we observe (Zi,Xi,δi)(i=1,2,,n) where Zi=min(Yi,Ci),δi=I(YiCi),with I(·) being the indicator function. It is further assumed that the distribution function G(x) satisfies τGτF, where τG=inf{x:G(x)=1} and τF=inf{x:F(x)=1}.

Suppose that (Yi,Xi)(i=1,2,,n) satisfyYi=m(Xi)+εi,i=1,2,,n,where ε1,ε2,,εn are error terms which are generally assumed to be independent and identically distributed random variables with mean zero and common variance σ2, and m(·) is an unknown regression function. The main problem in right censored regression analysis is, as usual, to specify m(·) based on the data (Zi,Xi,δi)(i=1,2,,n). Although some nonparametric smoothing techniques have been discussed for fitting the model (see, for example, Fan and Gijbels, 1994, Wang and Zheng, 1997, Wang and Li, 2002), the parametric regression model, that is, the form of m(·) is known except for some parameters, is still one of the commonly used models for analyzing randomly right censored data (see Miller, 1976, Buckley and James, 1979, Koul et al., 1981, Leurgans, 1987, Stute, 1993, He and Huang, 2003, Li and Wang, 2003) because of its simplicity and wide applications.

However, similar to the problem encountered in analyzing complete data through a parametric regression model, if the assumed regression relationship deviates away from the real structure of data seriously, the conclusions will be misleading. Therefore, development of statistical tests for some parametric regression relationships is also an important issue in randomly right censored regression analysis. To the best of our knowledge, Kim (1993) has constructed a generalized Pearson statistic to handle this problem and studied the large sample behavior of the test statistic. Nikabadza and Stute (1997) have developed a method through transforming the general model check to another one from which asymptotic distribution-free full model checks are available. Stute et al. (2000) have extended the test based on the empirical process of the regressors marked by the residuals in general regression to the case of the right censored regression, and have provided the asymptotic distribution of the underlying marked empirical process.

Motivated by the frequently used nonparametric regression models for checking parametric regression relationships in the case of complete data (for example, see Azzalini and Bowman, 1993, Härdle and Mammen, 1993, Jayasuriya, 1996, Fan and Gijbels, 1996, Fan et al., 2001, Mei et al., 2003), we propose in this paper a relatively simple test for a polynomial regression model with randomly right censored data. With the properly transformed data, a test statistic is constructed through comparing the residual sums of squares obtained by, respectively, fitting a polynomial regression model and a nonparametric model. Two bootstrap procedures, namely the residual-based bootstrap and the naive bootstrap procedures, are suggested to derive the p-value of the test. Simulation results demonstrate that the residual-based bootstrap performs more satisfactorily than the naive bootstrap for approximating the null distribution of the test statistic and the test method with the residual-based bootstrap is quite powerful in identifying the polynomial relationships in randomly right censored regression. Although this paper is only an empirical study and the theoretical proof of the validity of the bootstrap approximations remains to be investigated, the proposed test method may be of some practical use with the support of the simulation results.

The remainder of this paper is organized as follows. In Section 2, a test statistic is constructed from the viewpoint of analysis of variance to check a polynomial relationship for a right censored data set with the local polynomial smoothing technique. Section 3 contains two bootstrap procedures to derive the p-value of the test. Simulations are conducted in Section 4 to empirically assess the performance of the test and the Stanford heart transplant data are further analyzed in Section 5.

Section snippets

Construction of test statistic

For simplicity and notational convenience, we will restrict the discussion to the univariate explanatory variable case. When we have right censored data (Zi,Xi,δi) (i=1,2,,n) instead of complete data (Yi,Xi)(i=1,2,,n), the commonly used method to fit a regression relationship between the response variable Y and the explanatory variable X is to firstly transform the incomplete data in an appropriate way. In this aspect, Buckley and James (1979) as well as Koul et al. (1981) have proposed some

Calculation of the p-value

In order to calculate p in (12), we should firstly obtain the null distribution of the test statistic T. We may surmise in our case that T is asymptotically distributed as a χ2 distribution based on the fact that the form of T is similar to that of the generalized likelihood ratio test statistic proposed by Fan et al. (2001) in the nonparametric regression setting where the observed data are complete. Note that the asymptotic null distribution of their test statistic was proved to be χ2 with

Simulation studies

Since the validity of the bootstrap procedures for approximating the null distribution of the test statistic remains to be investigated, simulations are conducted in this section to assess the performance of the proposed test method.

Analysis of the Stanford heart transplant data

In this section, we applied the proposed test to the Stanford heart transplant data.

The Stanford heart transplant program began in October 1967. By February 1980, 184 patients had received heart transplants and a few of them had multiple transplants. The final data contain the censored or uncensored survival times of these patients in February 1980 and their ages at the time of their first transplant. The original data can be found in Miller and Halpern (1982). This data set has been widely

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 10531030 and 60675013). The authors would like to thank the associate editor and two anonymous referees for their invaluable suggestions which lead to a substantial improvement of this paper.

References (37)

  • J. Fan et al.

    Censored regression: local linear approximation and their applications

    J. Amer. Statist. Assoc.

    (1994)
  • J. Fan et al.

    Local Polynomial Modelling and its Applications

    (1996)
  • J. Fan et al.

    Profile likelihood inferences on semiparametric varying-coefficient partially linear models

    Bernoulli

    (2005)
  • J. Fan et al.

    Nonparametric inferences for additive models

    J. Amer. Statist. Assoc.

    (2005)
  • J. Fan et al.

    Generalized likelihood ratio statistic and wilks phenomenon

    Ann. Statist.

    (2001)
  • B. Grund et al.

    Semiparametric lack-of-fit tests in an additive hazard regression model

    Statist. Comput.

    (2001)
  • W. Härdle et al.

    Comparing nonparametric versus parametric regression fits

    Ann. Statist.

    (1993)
  • W. Härdle et al.

    Bootstrap simultaneous error bars for nonparametric regression

    Ann. Statist.

    (1991)
  • Cited by (0)

    View full text