A goodness-of-fit test for variable-adjusted models

doi:10.1016/j.csda.2019.01.018

Computational Statistics & Data Analysis

Volume 138, October 2019, Pages 27-48

https://doi.org/10.1016/j.csda.2019.01.018 Get rights and content

Abstract

This research provides a projection-based test to check parametric single-index regression structure in variable-adjusted models. An adaptive-to-model strategy is employed, which makes the proposed test work better on the significance level maintenance and more powerful than existing tests. With mild conditions, the proposed test asymptotically behaves like a test that is for classical regression setup without distortion errors in observations. Numerical studies with simulated and real data are conducted to examine the performance of the test in finite sample scenarios.

Introduction

Regression models are widely used to describe the relationship between response variable $Y$ and $p$ -dimensional predictor $X = {(X_{1}, \dots, X_{p})}^{⊤}$ . Generally, the response $Y$ and the predictor $X$ are assumed to be observable. But in some cases, obtaining the true values of $Y$ and $X$ are expensive or impossible, while the surrogates $\tilde{Y}$ and $\tilde{X}$ are available. This motivates and formulates a number of measurement error models. A classical structure is that the surrogate is the sum of the measurement error and the unobservable variable. Carroll et al. (2006), Cheng and Van Ness (1999) and Fuller (2009) are comprehensive references. Motivated by a real dataset, Şentürk and Müller (2005b) introduced another type of measurement error model, in which, the observed response and predictors are adjusted variables with distortion errors rather than in an additive manner. For the dataset, they investigated a linear relationship between the fibrinogen level and the serum transferrin level in haemodialysis patients. As the fibrinogen level and the serum transferrin level are both measured with a confounding effect from body mass index, they considered a way of model fitting in which the effect is multiplicative with an unknown function of body mass index. This measurement error model has several extensions in later developments. Cui et al. (2009) used a nonlinear regression model to determine the glomerular filtration rate by the serum creatinine level. With the body surface area as the distortion effect, a study of variable-adjusted nonlinear regression model was suggested by Cui et al. (2009).

Consider a general variable-adjusted regression model where the response $Y$ , the predictor $X$ and their surrogates $\tilde{Y}$ , $\tilde{X}$ are related to each other by the following relations: $Y = μ (X) + ε, \tilde{Y} = ψ (U) Y, \tilde{X} = ({\tilde{X}}_{1}, \dots, {\tilde{X}}_{d}, X_{d + 1}, \dots, X_{p}),$ with ${\tilde{X}}_{r} = ϕ_{r} (U) X_{r}$ , $r = 1, \dots, d$ . Here $d \leq p$ , $U$ is an observable confounder and $ψ (u)$ and $ϕ_{r} (u)$ are unknown distorting functions. The efforts are mainly devoted to estimation. A natural idea is to estimate the true values of $Y$ and $X$ by adjusting the observed surrogates $\tilde{Y}$ , $\tilde{X}$ and then to further estimate $μ (X)$ with $\hat{Y}$ and $\hat{X}$ . The reference includes Nguyen and Şentürk (2008), Cui et al. (2009), (Zhang et al., 2012a), Zhang et al. (2013), Delaigle et al. (2016) and so on. However there is less attention on goodness-of-fit test for this model based on variable adjustment. Zhang et al. (2015) proposed a residual marked empirical process-based test that is a $\sqrt{n}$ -consistent test. Their test requires a time-consuming bootstrap procedure and a user-specified weight function. Zhao and Xie (2018) developed a local smoothing test for variable-adjusted models, which is very simple and easy to implement. The cost is that it can only detect local alternatives distinct from the null hypothesis at the rate of $n^{- 1 ∕ 2} h^{- p ∕ 4}$ , which is slower than that of the test in Zhang et al. (2015) and greatly affected by the dimensionality $p$ . This means when the dimensionality of the predictor is large or the sample size is moderate, this test may not work well.

In the case of no distortion error, the projection-based methods are broadly discussed to develop tests of dimension reduction type. These methods could be tracked back to Zhu and Li (1998), motivated from projection pursuit (see Huber, 1985). The later developments include Escanciano (2006), Stute et al. (2008), Lavergne and Patilea (2008), Lavergne and Patilea (2012), and Guo et al. (2016). A relevant reference is Delgado and Escanciano (2016). Guo et al. (2016) proposed an adaptive-to-model test, which significantly improves the performances of local smoothing tests. Motivated by this work, we will take the advantages of the model adaptation strategy and combine the special construction for variable-adjusted models to develop a projection-pursuit test. To utilize dimension reduction structure in the hypothetical model, the problem of interest is to check whether the model is single-index and the hypotheses are formulated as $H_{0} : P (μ (X) = m (β_{0}^{⊤} X, γ_{0})) = 1, for some θ_{0} = (β_{0}; γ_{0}) \in Θ \subset R^{p + p^{'}},$ $versus$ $H_{1} : P (μ (X) = m (β^{⊤} X, γ)) < 1, for all θ = (β; γ) \in Θ \subset R^{p + p^{'}} .$ where $m$ is a given function, $β$ and $γ$ are the parameters of $p$ -dimension and $p^{'}$ -dimension respectively. From the viewpoint of sufficient dimension reduction (SDR, Cook, 2009), a general alternative regression model can be written as $μ (X) = G (B^{⊤} X)$ where $G (.)$ is an unknown function, different from $m (β_{0}^{⊤} x, γ_{0})$ and $B$ is a $p \times q$ matrix(or vector) with unknown number $q$ of columns. Note that $1 \leq q \leq p$ and usually $q$ is much smaller than $p$ . If $q < p$ , the alternative model has a dimension reduction structure. When $q = p$ , it is just a general nonparametric model. To conveniently study the asymptotic properties of the proposed test, we consider the following sequence of models $H_{1 n} : Y = m (β_{0}^{⊤} X, γ_{0}) + C_{n} Δ (B^{⊤} X) + ε .$ The case $C_{n} = 0$ corresponds to the null hypothesis $H_{0}$ and the alternative holds when $C_{n} \neq 0$ . When $C_{n}$ is a fixed constant, the alternative is a global alternative under $H_{1}$ and when $C_{n} \to 0$ , $H_{1 n}$ specifies a sequence of local alternatives. The test in Zhao and Xie (2018) can only detect the local alternatives converge to the null model at the rate of $C_{n}$ such that $C_{n} n^{1 ∕ 2} h^{p ∕ 4}$ is bounded above or goes to zero. Thus, $n^{- 1 ∕ 2} h^{- p ∕ 4}$ is the fastest rate to ensure that their test can detect the local alternatives. We will show that the proposed test can detect the local alternatives with the rate $C_{n} = n^{- 1 ∕ 2} h^{- 1 ∕ 4}$ and be consistent for any $C_{n} ≫ n^{- 1 ∕ 2} h^{- 1 ∕ 4}$ . On the other hand, according to the arguments in Guo et al. (2016), an estimate $\hat{B}$ , which converges to $\pm β_{0} ∕ ‖ β_{0} ‖$ under $H_{0}$ and to $B$ when $H_{1}$ is true, is the key to make the proposed test adaptive to the underlying model. In this paper, we use the sufficient dimension reduction technique proposed by Zhang et al. (2012b) to obtain $\hat{B}$ , and systematically investigate its asymptotic properties under the local alternatives. We give more details in the following.

The paper is organized as follows. Section 2 describes the test problem for variable-adjusted model and proposes an adaptive-to-model test procedure. In Section 3, we present the large sample properties of the proposed test. Sections 4 Numerical studies, 5 A real data example report the simulation results and real data application to illustrate our method. The assumptions and proofs are postponed to Appendix A.

Section snippets

Test statistic construction

To identify the model (1), we assume that $ε$ , $X$ and $U$ are mutually independent, $U \in [0, 1]$ , and $ψ (U)$ , $ϕ_{r} (U)$ are positive functions satisfying $E [ψ (U)] = 1, E [ϕ_{r} (U)] = 1, for r = 1, \dots, d .$ This implies that there is no distortion effect on average, which is similar to $E [U] = 0$ for the classical additive measurement error $W = X + U$ . Then according to the assumptions, we have, for $r = 1, \dots, d$ , $ψ (u) = \frac{E [| \tilde{Y} | | U = u]}{E [| Y |]} = \frac{E [| \tilde{Y} | | U = u]}{E [| \tilde{Y} |]},$ $ϕ_{r} (u) = \frac{E [| {\tilde{X}}_{r} | | U = u]}{E [| X_{r} |]} = \frac{E [| {\tilde{X}}_{r} | | U = u]}{E [| {\tilde{X}}_{r} |]} .$ Assume the observed data ${({\tilde{y}}_{i}, {\tilde{x}}_{i}, u_{i}), i = 1$

Asymptotic properties

In this section, we derive the asymptotic null distribution of the proposed test statistic $V_{n} ({\hat{θ}}_{n}, \hat{B})$ and prove its consistency under alternatives. As the asymptotic properties of ${\hat{θ}}_{n}$ and $\hat{B}$ will affect the behavior of the test, we also investigate the asymptotic decomposition of ${\hat{θ}}_{n} - θ_{0}$ and discuss the consistency of $\hat{B} (\hat{q})$ under local alternatives.

If all elements of $X$ are polluted by distortion errors, Theorem 1 in Cui et al. (2009) illustrated that the nonlinear least squares estimator with

Numerical studies

This section presents three examples to check the finite sample performance of the proposed test $T_{n}$ . Example 1 contains a simple linear model as the null hypothesis to assess the dimensionality effect. In this example, we compare the proposed test with the test by Zhao and Xie (2018) ( $T_{n}^{Z X}$ ), which does not have the model adaptation property. In Example 2, we conduct a simulation study to illustrate that the pollution ratio of the predictors will not affect the consistency rate of the proposed

A real data example

In this section, we use the Boston house-price dataset (Harrison and Rubinfeld, 1978) to illustrate the proposed test. The dataset contains information about houses and their owners around Boston and is available at http://lib.stat.cmu.edu/datasets. Şentürk and Müller (2005a) analyzed the correlation between the median price of houses ( $\tilde{Y}$ ) and the per capita crime rate by town ( $\tilde{X}$ ) with the confounding effect of the proportion of population of lower educational status ( $U$ ). Delaigle et al. (2016)

References (36)

DelgadoM.A. et al.
Distribution-free tests of conditional moment inequalities
J. Statist. Plann. Inference
(2016)
HarrisonD. et al.
Hedonic housing prices and the demand for clean air
J. Environ. Econ. Manag.
(1978)
LavergneP. et al.
Breaking the curse of dimensionality in nonparametric testing
J. Econometrics
(2008)
ZhangJ. et al.
Checking the adequacy for a distortion errors-in-variables parametric regression model
Comput. Statist. Data Anal.
(2015)
ZhangJ. et al.
Nonlinear models with measurement errors subject to single-indexed distortion
J. Multivariate Anal.
(2012)
ZhangJ. et al.
On a dimension reduction regression with covariate adjustment
J. Multivariate Anal.
(2012)
ZhaoJ. et al.
A nonparametric test for covariate-adjusted models
Statist. Probab. Lett.
(2018)
CarrollR.J. et al.
Measurement Error in Nonlinear Models: A Modern Perspective
(2006)
ChengC.-L. et al.
Statistical Regression with Measurement Error
(1999)
CookR.D.
Regression Graphics: Ideas for Studying Regressions Through Graphics, Vol. 482
(2009)

CookR.D. et al.

Dimension reduction for conditional mean in regression

Ann. Statist.

(2002)

CookR.D. et al.

Comment

J. Amer. Statist. Assoc.

(1991)

CuiX. et al.

Covariate-adjusted nonlinear regression

Ann. Statist.

(2009)

DelaigleA. et al.

Nonparametric covariate-adjusted regression

Ann. Statist.

(2016)

EscancianoJ.C.

A consistent diagnostic test for regression models using projections

Econometric Theory

(2006)

FullerW.A.

Measurement Error Models, Vol. 305

(2009)

GuoX. et al.

Model checking for parametric single-index models: a dimension reduction model-adaptive approach

J. R. Stat. Soc. Ser. B Stat. Methodol.

(2016)

HuberP.J.

Projection pursuit

Ann. Stat.

(1985)

Cited by (25)

Interpretable artificial intelligence for advanced oxidation systems: Principle, operations and performance
2023, Process Safety and Environmental Protection
Advanced oxidation processes have been widely studied and employed due to their potent mineralization capacity for pollutants. However, the intricate reaction mechanisms of these processes pose limitations for fitting and predicting performance. In this study, we comprehensively derived and assessed neural networks for three advanced oxidation methods: catalytic oxidation, catalytic wet air oxidation, and electrochemical oxidation. Our analysis encompassed multilayer perceptron principles, forward and back propagation process, strategies for handling overfitting, and performance evaluation matrices. Additionally, we utilized Bayesian optimization to probe the impact of network architecture on outcomes. Two conventional methods, multiple linear regression and response surface methodology, are employed for comparison. Our results demonstrate that neural networks exhibit more robust performance in fitting and predicting advanced oxidation processes as indicated by statistical indicators. Importantly, we tackle the "black box" issue of neural networks by incorporating the shapley additive explanations interpretable model of game theory to elucidate the impact of advanced oxidation features on outcomes. The superior performance of explainable artificial intelligence techniques implies their vast potential for broad applications in environmental science and technology.
Conditional absolute mean calibration for partial linear multiplicative distortion measurement errors models
2020, Computational Statistics and Data Analysis
In this paper we consider partial linear regression models when all the variables are measured with multiplicative distortion measurement errors. To eliminate the effect caused by the distortion, we propose the conditional absolute mean calibration, which avoids to use the nonzero expectation conditions imposed on the variables. With these calibrated variables, a profile least squares estimator is obtained, associated with its normal approximation based and empirical likelihood based confidence intervals. For the hypothesis testing on parameters, a restricted estimator under the null hypothesis and a test statistic are proposed. A smoothly clipped absolute deviation penalty is employed to select the relevant variables. The resulting penalized estimators are shown to be asymptotically normal and have the oracle property. Lastly, a score-type test statistic is then proposed for checking the validity of partial linear models. We derive asymptotic distribution of the proposed test statistic. The quadratic form of the scaled test statistic has an asymptotic chi-squared distribution under the null hypothesis and follows a noncentral chi-squared distribution under local alternatives, which converge to the null hypothesis at a parametric rate. Simulation studies demonstrate the performance of our proposed procedure and a real example is analyzed as illustrate its practical usage.
Checking normality of model errors under additive distortion measurement errors
2024, Journal of Nonparametric Statistics
Average derivation estimation with multiplicative distortion measurement errors
2024, Communications in Statistics: Simulation and Computation
Testing symmetry of model errors for non linear multiplicative distortion measurement error models
2023, Communications in Statistics - Theory and Methods
Kernel density estimation for multiplicative distortion measurement regression models
2023, Communications in Statistics: Simulation and Computation

View all citing articles on Scopus

¹: Chuanlong Xie is an assistant professor at Jinan University, Guangzhou, China.

²: Lixing Zhu is a chair professor of Department of Mathematics at Hong Kong Baptist University, Hong Kong, and a professor of School of Statistics at Beijing Normal University, Beijing, China. Lixing Zhu’s research was supported by a grant from the University Grants Council of Hong Kong, Hong Kong, China and a grant from the Natural Science Foundation of China (11671042).

View full text

A goodness-of-fit test for variable-adjusted models

Abstract

Introduction

Section snippets

Test statistic construction

Asymptotic properties

Numerical studies

A real data example

J. Statist. Plann. Inference

J. Environ. Econ. Manag.

J. Econometrics

Comput. Statist. Data Anal.

J. Multivariate Anal.

J. Multivariate Anal.

Statist. Probab. Lett.

Measurement Error in Nonlinear Models: A Modern Perspective

Statistical Regression with Measurement Error

Regression Graphics: Ideas for Studying Regressions Through Graphics, Vol. 482

Dimension reduction for conditional mean in regression

Ann. Statist.

Comment

J. Amer. Statist. Assoc.

Covariate-adjusted nonlinear regression

Ann. Statist.

Nonparametric covariate-adjusted regression

Ann. Statist.

A consistent diagnostic test for regression models using projections

Econometric Theory

Measurement Error Models, Vol. 305

Model checking for parametric single-index models: a dimension reduction model-adaptive approach

J. R. Stat. Soc. Ser. B Stat. Methodol.

Projection pursuit

Ann. Stat.