A goodness-of-fit test for variable-adjusted models

https://doi.org/10.1016/j.csda.2019.01.018Get rights and content

Abstract

This research provides a projection-based test to check parametric single-index regression structure in variable-adjusted models. An adaptive-to-model strategy is employed, which makes the proposed test work better on the significance level maintenance and more powerful than existing tests. With mild conditions, the proposed test asymptotically behaves like a test that is for classical regression setup without distortion errors in observations. Numerical studies with simulated and real data are conducted to examine the performance of the test in finite sample scenarios.

Introduction

Regression models are widely used to describe the relationship between response variable Y and p-dimensional predictor X=(X1,,Xp). Generally, the response Y and the predictor X are assumed to be observable. But in some cases, obtaining the true values of Y and X are expensive or impossible, while the surrogates Ỹ and X̃ are available. This motivates and formulates a number of measurement error models. A classical structure is that the surrogate is the sum of the measurement error and the unobservable variable. Carroll et al. (2006), Cheng and Van Ness (1999) and Fuller (2009) are comprehensive references. Motivated by a real dataset, Şentürk and Müller (2005b) introduced another type of measurement error model, in which, the observed response and predictors are adjusted variables with distortion errors rather than in an additive manner. For the dataset, they investigated a linear relationship between the fibrinogen level and the serum transferrin level in haemodialysis patients. As the fibrinogen level and the serum transferrin level are both measured with a confounding effect from body mass index, they considered a way of model fitting in which the effect is multiplicative with an unknown function of body mass index. This measurement error model has several extensions in later developments. Cui et al. (2009) used a nonlinear regression model to determine the glomerular filtration rate by the serum creatinine level. With the body surface area as the distortion effect, a study of variable-adjusted nonlinear regression model was suggested by Cui et al. (2009).

Consider a general variable-adjusted regression model where the response Y, the predictor X and their surrogates Ỹ, X̃ are related to each other by the following relations: Y=μ(X)+ε,Ỹ=ψ(U)Y,X̃=(X̃1,,X̃d,Xd+1,,Xp),with X̃r=ϕr(U)Xr, r=1,,d. Here dp, U is an observable confounder and ψ(u) and ϕr(u) are unknown distorting functions. The efforts are mainly devoted to estimation. A natural idea is to estimate the true values of Y and X by adjusting the observed surrogates Ỹ, X̃ and then to further estimate μ(X) with Yˆ and Xˆ. The reference includes Nguyen and Şentürk (2008), Cui et al. (2009), (Zhang et al., 2012a), Zhang et al. (2013), Delaigle et al. (2016) and so on. However there is less attention on goodness-of-fit test for this model based on variable adjustment. Zhang et al. (2015) proposed a residual marked empirical process-based test that is a n-consistent test. Their test requires a time-consuming bootstrap procedure and a user-specified weight function.  Zhao and Xie (2018) developed a local smoothing test for variable-adjusted models, which is very simple and easy to implement. The cost is that it can only detect local alternatives distinct from the null hypothesis at the rate of n12hp4, which is slower than that of the test in Zhang et al. (2015) and greatly affected by the dimensionality p. This means when the dimensionality of the predictor is large or the sample size is moderate, this test may not work well.

In the case of no distortion error, the projection-based methods are broadly discussed to develop tests of dimension reduction type. These methods could be tracked back to Zhu and Li (1998), motivated from projection pursuit (see Huber, 1985). The later developments include Escanciano (2006), Stute et al. (2008), Lavergne and Patilea (2008), Lavergne and Patilea (2012), and Guo et al. (2016). A relevant reference is Delgado and Escanciano (2016).  Guo et al. (2016) proposed an adaptive-to-model test, which significantly improves the performances of local smoothing tests. Motivated by this work, we will take the advantages of the model adaptation strategy and combine the special construction for variable-adjusted models to develop a projection-pursuit test. To utilize dimension reduction structure in the hypothetical model, the problem of interest is to check whether the model is single-index and the hypotheses are formulated as H0:Pμ(X)=m(β0X,γ0)=1,for some θ0=(β0;γ0)ΘRp+p,versusH1:Pμ(X)=m(βX,γ)<1,for all θ=(β;γ)ΘRp+p. where m is a given function, β and γ are the parameters of p-dimension and p-dimension respectively. From the viewpoint of sufficient dimension reduction (SDR, Cook, 2009), a general alternative regression model can be written as μ(X)=G(BX) where G(.) is an unknown function, different from m(β0x,γ0) and B is a p×q matrix(or vector) with unknown number q of columns. Note that 1qp and usually q is much smaller than p. If q<p, the alternative model has a dimension reduction structure. When q=p, it is just a general nonparametric model. To conveniently study the asymptotic properties of the proposed test, we consider the following sequence of models H1n:Y=m(β0X,γ0)+CnΔ(BX)+ε.The case Cn=0 corresponds to the null hypothesis H0 and the alternative holds when Cn0. When Cn is a fixed constant, the alternative is a global alternative under H1 and when Cn0, H1n specifies a sequence of local alternatives. The test in Zhao and Xie (2018) can only detect the local alternatives converge to the null model at the rate of Cn such that Cnn12hp4 is bounded above or goes to zero. Thus, n12hp4 is the fastest rate to ensure that their test can detect the local alternatives. We will show that the proposed test can detect the local alternatives with the rate Cn=n12h14 and be consistent for any Cnn12h14. On the other hand, according to the arguments in Guo et al. (2016), an estimate Bˆ, which converges to ±β0β0 under H0 and to B when H1 is true, is the key to make the proposed test adaptive to the underlying model. In this paper, we use the sufficient dimension reduction technique proposed by Zhang et al. (2012b) to obtain Bˆ, and systematically investigate its asymptotic properties under the local alternatives. We give more details in the following.

The paper is organized as follows. Section 2 describes the test problem for variable-adjusted model and proposes an adaptive-to-model test procedure. In Section 3, we present the large sample properties of the proposed test. Sections 4 Numerical studies, 5 A real data example report the simulation results and real data application to illustrate our method. The assumptions and proofs are postponed to Appendix A.

Section snippets

Test statistic construction

To identify the model (1), we assume that ε, X and U are mutually independent, U[0,1], and ψ(U), ϕr(U) are positive functions satisfying E[ψ(U)]=1,E[ϕr(U)]=1, for r=1,,d.This implies that there is no distortion effect on average, which is similar to E[U]=0 for the classical additive measurement error W=X+U. Then according to the assumptions, we have, for r=1,,d, ψ(u)=E[|Ỹ||U=u]E[|Y|]=E[|Ỹ||U=u]E[|Ỹ|],ϕr(u)=E[|X̃r||U=u]E[|Xr|]=E[|X̃r||U=u]E[|X̃r|]. Assume the observed data {(ỹi,x̃i,ui),i=1

Asymptotic properties

In this section, we derive the asymptotic null distribution of the proposed test statistic Vn(θˆn,Bˆ) and prove its consistency under alternatives. As the asymptotic properties of θˆn and Bˆ will affect the behavior of the test, we also investigate the asymptotic decomposition of θˆnθ0 and discuss the consistency of Bˆ(qˆ) under local alternatives.

If all elements of X are polluted by distortion errors, Theorem 1 in Cui et al. (2009) illustrated that the nonlinear least squares estimator with

Numerical studies

This section presents three examples to check the finite sample performance of the proposed test Tn. Example 1 contains a simple linear model as the null hypothesis to assess the dimensionality effect. In this example, we compare the proposed test with the test by Zhao and Xie (2018) (TnZX), which does not have the model adaptation property. In Example 2, we conduct a simulation study to illustrate that the pollution ratio of the predictors will not affect the consistency rate of the proposed

A real data example

In this section, we use the Boston house-price dataset (Harrison and Rubinfeld, 1978) to illustrate the proposed test. The dataset contains information about houses and their owners around Boston and is available at http://lib.stat.cmu.edu/datasets. Şentürk and Müller (2005a) analyzed the correlation between the median price of houses (Ỹ) and the per capita crime rate by town (X̃) with the confounding effect of the proportion of population of lower educational status (U). Delaigle et al. (2016)

References (36)

  • CookR.D. et al.

    Dimension reduction for conditional mean in regression

    Ann. Statist.

    (2002)
  • CookR.D. et al.

    Comment

    J. Amer. Statist. Assoc.

    (1991)
  • CuiX. et al.

    Covariate-adjusted nonlinear regression

    Ann. Statist.

    (2009)
  • DelaigleA. et al.

    Nonparametric covariate-adjusted regression

    Ann. Statist.

    (2016)
  • EscancianoJ.C.

    A consistent diagnostic test for regression models using projections

    Econometric Theory

    (2006)
  • FullerW.A.

    Measurement Error Models, Vol. 305

    (2009)
  • GuoX. et al.

    Model checking for parametric single-index models: a dimension reduction model-adaptive approach

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2016)
  • HuberP.J.

    Projection pursuit

    Ann. Stat.

    (1985)
  • Cited by (25)

    • Average derivation estimation with multiplicative distortion measurement errors

      2024, Communications in Statistics: Simulation and Computation
    • Kernel density estimation for multiplicative distortion measurement regression models

      2023, Communications in Statistics: Simulation and Computation
    View all citing articles on Scopus
    1

    Chuanlong Xie is an assistant professor at Jinan University, Guangzhou, China.

    2

    Lixing Zhu is a chair professor of Department of Mathematics at Hong Kong Baptist University, Hong Kong, and a professor of School of Statistics at Beijing Normal University, Beijing, China. Lixing Zhu’s research was supported by a grant from the University Grants Council of Hong Kong, Hong Kong, China and a grant from the Natural Science Foundation of China (11671042).

    View full text