Semiparametric function-on-function quantile regression model with dynamic single-index interactions

https://doi.org/10.1016/j.csda.2023.107727Get rights and content

Abstract

In this paper we propose a new semiparametric function-on-function quantile regression model with time-dynamic single-index interactions. Our model is very flexible in taking into account of the nonlinear time-dynamic interaction effects of the multivariate longitudinal/functional covariates on the longitudinal response, that most existing quantile regression models for longitudinal data are special cases of our proposed model. We propose to approximate the bivariate nonparametric coefficient functions by tensor product B-splines, and employ a check loss minimization approach to estimate the bivariate coefficient functions and the index parameter vector. Under some mild conditions, we establish the asymptotic normality of the estimated single-index coefficients using projection orthogonalization technique, and obtain the convergence rates of the estimated bivariate coefficient functions. Furthermore, we propose a score test to examine whether there exist interaction effects between the covariates. The finite sample performance of the proposed method is illustrated by Monte Carlo simulations and an empirical data analysis.

Introduction

The single-index varying coefficient model (SVCM) has attracted much attention since it was proposed by Xia and Li (1999). The model is given byY=k=1qgk(Xβ0)Zk+ε, where Y is the response, (X,Z) is a vector of covariates consisting of a p-dimensional vector X=(X1,,Xp) and a q-dimensional vector Z=(Z1,,Zq) with Z1=1. As a semiparametric regression model, SVCM is a popular way to accommodate multivariate covariates while retaining model flexibility. There has been extensive literature on statistical inference of SVCM. For example, Fan et al. (2003) proposed an efficient backfitting algorithm to estimate SVCM. Xue and Wang (2012) developed statistical inference for SVCM using empirical likelihood method. Ma and Song (2015) studied a more general single-index varying coefficient model in which β0 can be different for different function gk. To assess how multiple environmental factors act jointly to modify individual genetic risk on complex disease, Liu et al. (2016) considered a partial linear varying multi-index coefficient model which includes SVCM as a special case. Zhao et al. (2017b) and Lv and Li (2022) studied quantile regression for SVCM.

The aforementioned works studied SVCM in the context of independent observations. As far as we know, there is limited literature on SVCM for longitudinal/functional data. However, with modern technology related to data collection and storage, functional data have become increasingly available in many scientific fields, such as meteorology, chemistry, economics and epidemiology, and thus have gained considerable attention in the literature in recent years; see Wang et al. (2016), Zhu et al. (2019) and Li et al. (2021). Recently, to capture the dynamic interaction effects between population aging and socio-economic variables on COVID-19 mortality rate, Liu et al. (2021) extended SVCM to the longitudinal data framework, and proposed a novel semiparametric single-index varying coefficient mean regression model with longitudinal response data:Y(t)=k=1qgk(Xβ0,t)Zk+ε(t), where gk(,) are unknown bivariate coefficient functions and the random error ε() is assumed to be a stochastic process with mean zero. Model (1) is a natural extension of SVCM by allowing SVCM to hold for any given time t through time-dynamic bivariate coefficient functions, and covers many existing semiparametric models; see Jiang and Wang (2011), Zhu et al. (2012), and Li et al. (2017) for details on the relationship between model (1) and existing models.

Mean regression only provides limited information about the conditional distribution of the response given covariates, and is well known to be very sensitive to heavy-tailed error distributions or outliers in response measurements. On the contrary, quantile regression is more robust to outliers and heavy-tailed distributions. More importantly, quantile regression can produce a more complete description of the conditional response distribution, and uncover different structural relationships between the response and covariates at the upper or lower tails, which is often of significant interest in a variety of fields (Chen and Müller, 2012; Schaumburg, 2012; Zhang, 2018; Zhu et al., 2022). Hence, in this paper, we focus on the following quantile regression model:Y(t)=k=1qgτk(X(t)βτ0,t)Zk(t)+eτ(t), where X(t)=(X1(t),,Xp(t)), Z(t)=(Z1(t),,Zq(t)) with Z1(t)1, and P(eτ(t)0|X(t),Z(t),t)=τ, for τ(0,1). We call (2) semiparametric function-on-function quantile regression model with dynamic single-index interactions. Model (2) includes many existing quantile regression models for longitudinal data as special cases; see Wang et al. (2009), Leng and Zhang (2014), Zhao et al. (2017a), and Lin et al. (2020).

To estimate the index parameter vector βτ0 as well as the bivariate coefficient functions gτk in model (2), we first use tensor product B-splines to approximate the bivariate coefficient functions, and then employ a check loss minimization approach to estimate them. Theoretically, our study is complicated due to the challenge of dealing with the time-dynamic single-index semiparametric structure and the nonsmooth loss function. Under some mild conditions, we establish the asymptotic normality of the estimated single-index coefficients, and obtain the convergence rates of the estimated bivariate coefficient functions. In addition, we propose a score test to examine whether there exist interaction effects between the covariates.

The main contributions of our work or the key differences between our paper and Liu et al. (2021) are three-fold. First, we extend model (1) to quantile regression framework. Quantile regression is a valuable alternative to mean regression for analyzing longitudinal data. Compared to mean regression, quantile regression for model (1) is more technically challenging and has not yet been considered in the literature. Second, we consider the sparse and unbalanced longitudinal data case. Liu et al. (2021) limited their discussion to the dense and balanced longitudinal response data case. However, longitudinal data are often highly unbalanced because data were collected at irregular and possibly subject-specific time points. Third, we study a function-on-function model in which the response and covariates are all longitudinal/functional data. Liu et al. (2021) focused on a function-on-scalar model in which the covariates are scalar variables.

The rest of this paper is organized as follows. In Section 2, we introduce the estimation procedure, and establish the asymptotic properties of the resulting estimators. We propose a score test for testing the bivariate coefficient functions in Section 3. Section 4 gives the estimation algorithm and the tuning parameter selection method. Monte Carlo studies and analysis of a real dataset are presented in Sections 5 and 6, respectively. Some concluding remarks are given in Section 7. All proofs are deferred to Appendix A.

Section snippets

Estimation method

To simplify notations, we omit the subscript τ from gτk, βτ0 and eτ in model (2) wherever clear from the context, but we should bear in mind that those quantities are τ-specific. Assume that we have a random sample with n subjects from model (2). For the ith subject, i=1,,n, the response Yi(t) and the covariates {Xi(t),Zi(t)} are collected at time points tij,j=1,,mi. Here mi is the total number of observations for the ith subject. Without loss of generality, we assume that tij[0,1] for all i

Hypothesis test

In this section, we propose a rank-score-based procedure for testing whether there is an interaction effect between X(t) and Zk(t). Without loss of generality, we consider testing the null hypothesis that the first 1q1<q bivariate coefficient functions gk(X(t)β0,t) do not change with the index X(t)β0:H0:gk(X(t)β0,t)=ζk(t),k=1,,q1, for all t[0,1], where ζk are functions that depend only on t, versus the alternative hypothesisH1:one or more functionsgk(X(t)β0,t),k=1,,q1,depend onX(t)β0.

Estimation algorithm

Minimizing object function (4) with respect to all unknown quantities is usually difficult to compute, which requires constrained nonlinear programming. For this reason, we adopt a profile iterative procedure to estimate β0 and gk, for k=1,,q. The detailed description of the estimation algorithm is as follows.

Step  0  Obtain an initial estimator βˆ(0) of β0, with βˆ(0)2=1 and βˆ1(0)>0. The details on how to get an initial estimator of β0 can be found in Appendix A.

Step  1 Given β(1), the

Simulation studies

In this section, we conduct simulation studies to illustrate the performance of our proposed methodology. We generate the sample data from modelYij=k=13gk(Xijβ,tij)Zijk+eij,i=1,,n,j=1,,mi, where β=(β1,β2,β3)=(1,2,3)/14, g1(Xijβ,tij)=(1+2γsin(Xijβ))cos(0.3tij), g2(Xijβ,tij)=(1+γsin(Xijβ))tij2, g3(Xijβ,tij)=(1+0.5Xijβ)sin(tij), with γ0, and Zij1=1, (Zij2,Zij3)N(0,Σ) with Σ=(0.5|jk|)1j,k2, Xij=(Xij1,Xij2,Xij3), and Xijk are independent of each other, with XijkUniform(0,1),

Real data analysis

In this section we apply the proposed methodology to analyze a CD4 data set from a Multi-center AIDS Cohort Study. HIV destroys CD4 cells, which play a vital role in the immune system. The CD4 cell count is thus a good biomarker indicating the health status of HIV infected patients. In this data set, 2376 CD4 observations on 369 subjects were made. Each patient's CD4 cell number was measured repeatedly from 3 years before to 6 years after seroconversion. To see more descriptions related to this

Discussion

In this article, we propose a new semiparametric function-on-function quantile regression model with time-dynamic single-index interactions, which includes many existing quantile regression models for longitudinal data as special cases. The asymptotic properties of the proposed estimators and asymptotic null distribution of the test statistic are established under some regularity conditions. In addition, Monte Carlo simulations and real data analysis indicate that the proposed method performs

Acknowledgements

The authors thank Professor Byeong Park, the Associate Editor and two anonymous referees for their constructive comments, which have greatly improved this paper. Zhu's research was partially supported by the National Natural Science Foundation of China (No. 12201218). Zhang's research was partially supported by the Key University Science Research Project of Jiangsu Province (No. 21KJB110023). Li's research was partially supported by the US National Institutes of Health (No. 5R21AG058198).

References (37)

  • J. Fan et al.

    Adaptive varying-coefficient linear models

    J. R. Stat. Soc., Ser. B, Stat. Methodol.

    (2003)
  • X. Feng et al.

    Estimation and testing of varying coefficients in quantile regression

    J. Am. Stat. Assoc.

    (2016)
  • P. Hall et al.

    On the distribution of a studentized quantile

    J. R. Stat. Soc. B

    (1988)
  • W. Hendricks et al.

    Hierarchical spline models for conditional quantiles and the demand for electricity

    J. Am. Stat. Assoc.

    (1992)
  • J.Z. Huang et al.

    Polynomial spline estimation and inference for varying coefficient models with longitudinal data

    Stat. Sin.

    (2004)
  • C.-R. Jiang et al.

    Functional single index models for longitudinal data

    Ann. Stat.

    (2011)
  • K. Knight

    Limiting distributions for l1 regression estimators under general conditions

    Ann. Stat.

    (1998)
  • C. Leng et al.

    Smoothing combined estimating equations in quantile regression for longitudinal data

    Stat. Comput.

    (2014)
  • Cited by (0)

    View full text