Semiparametric function-on-function quantile regression model with dynamic single-index interactions

doi:10.1016/j.csda.2023.107727

Computational Statistics & Data Analysis

Volume 182, June 2023, 107727

https://doi.org/10.1016/j.csda.2023.107727 Get rights and content

Abstract

In this paper we propose a new semiparametric function-on-function quantile regression model with time-dynamic single-index interactions. Our model is very flexible in taking into account of the nonlinear time-dynamic interaction effects of the multivariate longitudinal/functional covariates on the longitudinal response, that most existing quantile regression models for longitudinal data are special cases of our proposed model. We propose to approximate the bivariate nonparametric coefficient functions by tensor product B-splines, and employ a check loss minimization approach to estimate the bivariate coefficient functions and the index parameter vector. Under some mild conditions, we establish the asymptotic normality of the estimated single-index coefficients using projection orthogonalization technique, and obtain the convergence rates of the estimated bivariate coefficient functions. Furthermore, we propose a score test to examine whether there exist interaction effects between the covariates. The finite sample performance of the proposed method is illustrated by Monte Carlo simulations and an empirical data analysis.

Introduction

The single-index varying coefficient model (SVCM) has attracted much attention since it was proposed by Xia and Li (1999). The model is given by $Y = \sum_{k = 1}^{q} g_{k} (X^{⊤} β_{0}) Z_{k} + ε,$ where Y is the response, ${(X^{⊤}, Z^{⊤})}^{⊤}$ is a vector of covariates consisting of a p-dimensional vector $X = {(X_{1}, \dots, X_{p})}^{⊤}$ and a q-dimensional vector $Z = {(Z_{1}, \dots, Z_{q})}^{⊤}$ with $Z_{1} = 1$ . As a semiparametric regression model, SVCM is a popular way to accommodate multivariate covariates while retaining model flexibility. There has been extensive literature on statistical inference of SVCM. For example, Fan et al. (2003) proposed an efficient backfitting algorithm to estimate SVCM. Xue and Wang (2012) developed statistical inference for SVCM using empirical likelihood method. Ma and Song (2015) studied a more general single-index varying coefficient model in which $β_{0}$ can be different for different function $g_{k}$ . To assess how multiple environmental factors act jointly to modify individual genetic risk on complex disease, Liu et al. (2016) considered a partial linear varying multi-index coefficient model which includes SVCM as a special case. Zhao et al. (2017b) and Lv and Li (2022) studied quantile regression for SVCM.

The aforementioned works studied SVCM in the context of independent observations. As far as we know, there is limited literature on SVCM for longitudinal/functional data. However, with modern technology related to data collection and storage, functional data have become increasingly available in many scientific fields, such as meteorology, chemistry, economics and epidemiology, and thus have gained considerable attention in the literature in recent years; see Wang et al. (2016), Zhu et al. (2019) and Li et al. (2021). Recently, to capture the dynamic interaction effects between population aging and socio-economic variables on COVID-19 mortality rate, Liu et al. (2021) extended SVCM to the longitudinal data framework, and proposed a novel semiparametric single-index varying coefficient mean regression model with longitudinal response data: $Y (t) = \sum_{k = 1}^{q} g_{k} (X^{⊤} β_{0}, t) Z_{k} + ε (t),$ where $g_{k} (\cdot, \cdot)$ are unknown bivariate coefficient functions and the random error $ε (\cdot)$ is assumed to be a stochastic process with mean zero. Model (1) is a natural extension of SVCM by allowing SVCM to hold for any given time t through time-dynamic bivariate coefficient functions, and covers many existing semiparametric models; see Jiang and Wang (2011), Zhu et al. (2012), and Li et al. (2017) for details on the relationship between model (1) and existing models.

Mean regression only provides limited information about the conditional distribution of the response given covariates, and is well known to be very sensitive to heavy-tailed error distributions or outliers in response measurements. On the contrary, quantile regression is more robust to outliers and heavy-tailed distributions. More importantly, quantile regression can produce a more complete description of the conditional response distribution, and uncover different structural relationships between the response and covariates at the upper or lower tails, which is often of significant interest in a variety of fields (Chen and Müller, 2012; Schaumburg, 2012; Zhang, 2018; Zhu et al., 2022). Hence, in this paper, we focus on the following quantile regression model: $Y (t) = \sum_{k = 1}^{q} g_{τ k} (X^{⊤} (t) β_{τ 0}, t) Z_{k} (t) + e_{τ} (t),$ where $X (t) = {(X_{1} (t), \dots, X_{p} (t))}^{⊤}$ , $Z (t) = {(Z_{1} (t), \dots, Z_{q} (t))}^{⊤}$ with $Z_{1} (t) \equiv 1$ , and $P (e_{τ} (t) \leq 0 | X (t), Z (t), t) = τ$ , for $τ \in (0, 1)$ . We call (2) semiparametric function-on-function quantile regression model with dynamic single-index interactions. Model (2) includes many existing quantile regression models for longitudinal data as special cases; see Wang et al. (2009), Leng and Zhang (2014), Zhao et al. (2017a), and Lin et al. (2020).

To estimate the index parameter vector $β_{τ 0}$ as well as the bivariate coefficient functions $g_{τ k}$ in model (2), we first use tensor product B-splines to approximate the bivariate coefficient functions, and then employ a check loss minimization approach to estimate them. Theoretically, our study is complicated due to the challenge of dealing with the time-dynamic single-index semiparametric structure and the nonsmooth loss function. Under some mild conditions, we establish the asymptotic normality of the estimated single-index coefficients, and obtain the convergence rates of the estimated bivariate coefficient functions. In addition, we propose a score test to examine whether there exist interaction effects between the covariates.

The main contributions of our work or the key differences between our paper and Liu et al. (2021) are three-fold. First, we extend model (1) to quantile regression framework. Quantile regression is a valuable alternative to mean regression for analyzing longitudinal data. Compared to mean regression, quantile regression for model (1) is more technically challenging and has not yet been considered in the literature. Second, we consider the sparse and unbalanced longitudinal data case. Liu et al. (2021) limited their discussion to the dense and balanced longitudinal response data case. However, longitudinal data are often highly unbalanced because data were collected at irregular and possibly subject-specific time points. Third, we study a function-on-function model in which the response and covariates are all longitudinal/functional data. Liu et al. (2021) focused on a function-on-scalar model in which the covariates are scalar variables.

The rest of this paper is organized as follows. In Section 2, we introduce the estimation procedure, and establish the asymptotic properties of the resulting estimators. We propose a score test for testing the bivariate coefficient functions in Section 3. Section 4 gives the estimation algorithm and the tuning parameter selection method. Monte Carlo studies and analysis of a real dataset are presented in Sections 5 and 6, respectively. Some concluding remarks are given in Section 7. All proofs are deferred to Appendix A.

Section snippets

Estimation method

To simplify notations, we omit the subscript τ from $g_{τ k}$ , $β_{τ 0}$ and $e_{τ}$ in model (2) wherever clear from the context, but we should bear in mind that those quantities are τ-specific. Assume that we have a random sample with n subjects from model (2). For the ith subject, $i = 1, \dots, n$ , the response $Y_{i} (t)$ and the covariates ${X_{i} (t), Z_{i} (t)}$ are collected at time points $t_{i j}, j = 1, \dots, m_{i}$ . Here $m_{i}$ is the total number of observations for the ith subject. Without loss of generality, we assume that $t_{i j} \in [0, 1]$ for all i

Hypothesis test

In this section, we propose a rank-score-based procedure for testing whether there is an interaction effect between $X (t)$ and $Z_{k} (t)$ . Without loss of generality, we consider testing the null hypothesis that the first $1 \leq q_{1} < q$ bivariate coefficient functions $g_{k} (X^{⊤} (t) β_{0}, t)$ do not change with the index $X^{⊤} (t) β_{0}$ : $H_{0} : g_{k} (X^{⊤} (t) β_{0}, t) = ζ_{k} (t), k = 1, \dots, q_{1},$ for all $t \in [0, 1]$ , where $ζ_{k}$ are functions that depend only on t, versus the alternative hypothesis $H_{1} : one or more functions g_{k} (X^{⊤} (t) β_{0}, t), k = 1, \dots, q_{1}, depend on X^{⊤} (t) β_{0} .$

Estimation algorithm

Minimizing object function (4) with respect to all unknown quantities is usually difficult to compute, which requires constrained nonlinear programming. For this reason, we adopt a profile iterative procedure to estimate $β_{0}$ and $g_{k}$ , for $k = 1, \dots, q$ . The detailed description of the estimation algorithm is as follows.

Step 0 Obtain an initial estimator ${\hat{β}}^{(0)}$ of $β_{0}$ , with ${‖ {\hat{β}}^{(0)} ‖}_{2} = 1$ and ${\hat{β}}_{1}^{(0)} > 0$ . The details on how to get an initial estimator of $β_{0}$ can be found in Appendix A.

Step 1 Given $β_{(- 1)}$ , the

Simulation studies

In this section, we conduct simulation studies to illustrate the performance of our proposed methodology. We generate the sample data from model $Y_{i j} = \sum_{k = 1}^{3} g_{k} (X_{i j}^{⊤} β, t_{i j}) Z_{i j k} + e_{i j}, i = 1, \dots, n, j = 1, \dots, m_{i},$ where $β = {(β_{1}, β_{2}, β_{3})}^{⊤} = {(1, 2, 3)}^{⊤} / \sqrt{14}$ , $g_{1} (X_{i j}^{⊤} β, t_{i j}) = (1 + 2 γ \sin (X_{i j}^{⊤} β)) \cos (0.3 t_{i j})$ , $g_{2} (X_{i j}^{⊤} β, t_{i j}) = (1 + γ \sin (X_{i j}^{⊤} β)) t_{i j}^{2}$ , $g_{3} (X_{i j}^{⊤} β, t_{i j}) = (1 + 0.5 X_{i j}^{⊤} β) \sin (t_{i j})$ , with $γ \geq 0$ , and $Z_{i j 1} = 1$ , $(Z_{i j 2}, Z_{i j 3}) \sim N (0, Σ)$ with $Σ = {({0.5}^{| j - k |})}_{1 \leq j, k \leq 2}$ , $X_{i j} = {(X_{i j 1}, X_{i j 2}, X_{i j 3})}^{⊤}$ , and $X_{i j k}$ are independent of each other, with $X_{i j k} \sim Uniform (0, 1)$ ,

Real data analysis

In this section we apply the proposed methodology to analyze a CD4 data set from a Multi-center AIDS Cohort Study. HIV destroys CD4 cells, which play a vital role in the immune system. The CD4 cell count is thus a good biomarker indicating the health status of HIV infected patients. In this data set, 2376 CD4 observations on 369 subjects were made. Each patient's CD4 cell number was measured repeatedly from 3 years before to 6 years after seroconversion. To see more descriptions related to this

Discussion

In this article, we propose a new semiparametric function-on-function quantile regression model with time-dynamic single-index interactions, which includes many existing quantile regression models for longitudinal data as special cases. The asymptotic properties of the proposed estimators and asymptotic null distribution of the test statistic are established under some regularity conditions. In addition, Monte Carlo simulations and real data analysis indicate that the proposed method performs

Acknowledgements

The authors thank Professor Byeong Park, the Associate Editor and two anonymous referees for their constructive comments, which have greatly improved this paper. Zhu's research was partially supported by the National Natural Science Foundation of China (No. 12201218). Zhang's research was partially supported by the Key University Science Research Project of Jiangsu Province (No. 21KJB110023). Li's research was partially supported by the US National Institutes of Health (No. 5R21AG058198).

References (37)

X. He et al.
On parameters of increasing dimensions
J. Multivar. Anal.
(2000)
F. Lin et al.
Weighted quantile regression in varying-coefficient model with longitudinal data
Comput. Stat. Data Anal.
(2020)
S. Ma et al.
Partially linear single index models for repeated measurements
J. Multivar. Anal.
(2014)
J. Schaumburg
Predicting extreme value at risk: nonparametric quantile regression with refinements from extreme value theory
Comput. Stat. Data Anal.
(2012)
W. Zhao et al.
Gee analysis for longitudinal single-index quantile regression
J. Stat. Plan. Inference
(2017)
H. Zhu et al.
Estimation and testing for partially functional linear errors-in-variables models
J. Multivar. Anal.
(2019)
J. Chen et al.
Semiparametric GEE analysis in partially linear single-index models for longitudinal data
Ann. Stat.
(2015)
K. Chen et al.
Conditional quantile analysis when covariates are functions, with application to growth data
J. R. Stat. Soc., Ser. B, Stat. Methodol.
(2012)
X. Cui et al.
The efm approach for single-index models
Ann. Stat.
(2011)
C. de Boor
A Practical Guide to Splines
(2001)

J. Fan et al.

Adaptive varying-coefficient linear models

J. R. Stat. Soc., Ser. B, Stat. Methodol.

(2003)

X. Feng et al.

Estimation and testing of varying coefficients in quantile regression

J. Am. Stat. Assoc.

(2016)

P. Hall et al.

On the distribution of a studentized quantile

J. R. Stat. Soc. B

(1988)

W. Hendricks et al.

Hierarchical spline models for conditional quantiles and the demand for electricity

J. Am. Stat. Assoc.

(1992)

J.Z. Huang et al.

Polynomial spline estimation and inference for varying coefficient models with longitudinal data

Stat. Sin.

(2004)

C.-R. Jiang et al.

Functional single index models for longitudinal data

Ann. Stat.

(2011)

K. Knight

Limiting distributions for l1 regression estimators under general conditions

Ann. Stat.

(1998)

C. Leng et al.

Smoothing combined estimating equations in quantile regression for longitudinal data

Stat. Comput.

(2014)

Cited by (0)

View full text

Semiparametric function-on-function quantile regression model with dynamic single-index interactions

Abstract

Introduction

Section snippets

Estimation method

Hypothesis test

Estimation algorithm

Simulation studies

Real data analysis

Discussion

Acknowledgements

J. Multivar. Anal.

Comput. Stat. Data Anal.

J. Multivar. Anal.

Comput. Stat. Data Anal.

J. Stat. Plan. Inference

J. Multivar. Anal.

Semiparametric GEE analysis in partially linear single-index models for longitudinal data

Ann. Stat.

Conditional quantile analysis when covariates are functions, with application to growth data

J. R. Stat. Soc., Ser. B, Stat. Methodol.

The efm approach for single-index models

Ann. Stat.

A Practical Guide to Splines

Adaptive varying-coefficient linear models

J. R. Stat. Soc., Ser. B, Stat. Methodol.

Estimation and testing of varying coefficients in quantile regression

J. Am. Stat. Assoc.

On the distribution of a studentized quantile

J. R. Stat. Soc. B

Hierarchical spline models for conditional quantiles and the demand for electricity

J. Am. Stat. Assoc.

Polynomial spline estimation and inference for varying coefficient models with longitudinal data

Stat. Sin.

Functional single index models for longitudinal data

Ann. Stat.

Limiting distributions for l1 regression estimators under general conditions

Ann. Stat.

Smoothing combined estimating equations in quantile regression for longitudinal data

Stat. Comput.