Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: J Comput Graph Stat. 2021 Aug 4;31(1):219–230. doi: 10.1080/10618600.2021.1950006

Fast Univariate Inference for Longitudinal Functional Models

Erjia Cui 1, Andrew Leroux 2, Ekaterina Smirnova 3, Ciprian M Crainiceanu 4
PMCID: PMC9197085  NIHMSID: NIHMS1761166  PMID: 35712524

Abstract

We propose fast univariate inferential approaches for longitudinal Gaussian and non-Gaussian functional data. The approach consists of three steps: (1) fit massively univariate pointwise mixed effects models; (2) apply any smoother along the functional domain; and (3) obtain joint confidence bands using analytic approaches for Gaussian data or a bootstrap of study participants for non-Gaussian data. Methods are motivated by two applications: (1) Diffusion Tensor Imaging (DTI) measured at multiple visits along the corpus callosum of multiple sclerosis (MS) patients; and (2) physical activity data measured by body-worn accelerometers for multiple days. An extensive simulation study indicates that model fitting and inference are accurate and much faster than existing approaches. Moreover, the proposed approach was the only one that was computationally feasible for the physical activity data application. Methods are accompanied by R software, though the method is “read-and-use”, as it can be implemented by any analyst who is familiar with mixed effects model software.

Keywords: longitudinal functional data, mixed model, DTI, wearable devices

1. Introduction

Longitudinal high dimensional data have become ubiquitous. For example, a Diffusion Tensor Imaging (DTI) study (Greven et al., 2010; Goldsmith et al., 2011, 2012; Scheipl et al., 2015) collected fractional anisotropy (FA) at multiple locations along the corpus callosum at multiple visits for each study participant. FA is a measure of water diffusion direction that is thought to be associated with white matter integrity and myelination. Figure 1 displays the FA measures along the corpus callosum for two study participants (left and right panels). Each curve represents data collected during a particular visit (color-coded according to visit number). The visit time, covariates, and health outcomes are collected at each visit but are not displayed; for more details see Goldsmith et al. (2012). Visual inspection of the left panel does not indicate an obvious temporal trend across visits, while each FA profile exhibits substantial within-visit noise. The FA profiles are less noisy for the second study participant (right panel), though the much lower FA values at the second visit could indicate substantial technical or biological variability. Given such data, the scientific goal is to quantify how FA at each location of the corpus callosum changes between visits and how these changes are associated with study-participant characteristics. To conceptualize the data structure, we denote by Yij(s) the FA value for study participant i at visit j at time tij and location sS of the corpus callosum.

Fig. 1.

Fig. 1

The fractional anisotropy (FA) tract profiles for the corpus callosum (functional domain) of two study participants in the DTI study. Left panel: ID 2017. Right Panel: ID 2085. For each study participant, each curve represents the tract profiles at one longitudinal visit. The visit number is color coded.

Another example is the National Health and Nutrition Examination Survey (NHANES) (Leroux et al., 2019; Smirnova et al., 2019; Cui et al., 2020), where minute-level activity counts (AC), a proprietary measure of physical activity (PA), were collected for up to seven consecutive days by hip-worn accelerometers. Data can be further transformed at the minute level to an active/inactive indicator depending on whether the observed AC is above/below a threshold. Here we use the threshold of 100, which was proposed by Matthews et al. (2008) for the NHANES hip accelerometry data and has been used extensively in the literature (Koster et al., 2012). Therefore, the data are binary (active/inactive), functional (minute-level measurements during the day), and multilevel (multiple days). To conceptualize the data structure, Yij(s) denotes the binary indicator of whether a study participant i was active on day j at minute sS={1,,1440}. For the purpose of this paper, we are interested in studying the association between Yij(s) and sex, age, day number from the beginning of the study, and day of the week.

Both these examples contain functional data measured at multiple visits, while the sample size in NHANES is over 10 times larger than the DTI study. Such data structures generalize standard longitudinal data, as instead of observing one scalar variable at each visit, one observes a high dimensional function. Functional data methods are sometimes used to model scalar longitudinal data (Yao et al., 2005), but longitudinal functional data has a more complex structure due to the large number of observations with complex correlations at each visit. Methods have been proposed for functional data with complex correlation structures (Guo, 2002; Morris and Carroll, 2006; Greven et al., 2010; Zipunnikov et al., 2014; Scheipl et al., 2015; Brockhaus et al., 2015; Scheipl et al., 2016; Shou et al., 2015; Zhu et al., 2019). In particular, Scheipl et al. (2015, 2016) proposed an inferential framework and associated software for correlated functional responses based on additive mixed models. This is an important step forward, though the method cannot currently handle very large data sets. For example, it takes more than 24 hours on a regular laptop (2.7 GHz Dual-Core Intel Core i5, 8GB RAM) to fit a functional random intercept model with I = 1000 subjects and an average of 5 visits per subject. The approach of Goldsmith et al. (2015) runs into similar scaling up problems. Indeed, they reported a total computation time of 10 days in their applications using a dataset with fewer than 600 subjects, 5 visits per subject, and 144 observations per curve. Applying either method to our NHANES example is computationally impractical.

Thus, we conclude that this area of research is still in its initial stages of statistical development. Indeed, there is an increased need for methods that: (1) are scalable both in terms of number of study participants and of the dimension of the functional data; (2) can be applied to Gaussian and non-Gaussian data; and (3) preserve the interpretability of standard mixed effects models. To address this need, we propose fast univariate inferential (FUI) approaches for longitudinal functional data of any type. The approach consists of three steps: (1) fit massively univariate pointwise mixed effects models; (2) apply a smoother along the functional domain; and (3) obtain joint confidence bands using analytic approaches for Gaussian data or a bootstrap of study participants for non-Gaussian data. The first two steps are conceptually similar with Fan and Zhang (2000) and Reiss et al. (2017) for function-on-scalar regressions. However, to the best of our knowledge, these approaches did not model correlated functional responses or provided joint inference that accounts for this correlation. The methods proposed by Park et al. (2018) are the closest to our methods, though there are important differences. First, they estimate the fixed effects under independence across- and within-visits. We use mixed effects models across visits. Second, because Park et al. (2018) rely on fixed effects, their approach cannot be used to predict visit-specific functional effects or missing data. Third, their approach was developed for a narrower set of models.

The remainder of the paper is organized as follows. Section 2 introduces the longitudinal functional model. Section 3 describes two approaches for fixed effects inference, one analytic for Gaussian data and one based on the bootstrap of study participants for any type of data. We further discuss in Section 4 a simple and flexible simulation-based approach to obtain joint confidence bands. Section 5 presents the simulation results and comparisons with existing methods. Section 6 describes the applications of our model on DTI and NHANES study. We close with a discussion in Section 7. All code for model implementation, simulation and application is available on the supplementary material.

2. Massively Univariate Longitudinal Functional Model

Assume that data is of the type Yij(s) on a grid {s1, s2, …, sL} of the compact functional domain S. Data can be Gaussian or non-Gaussian, i =1, 2, …, I is the index of the study participant, and j =1, 2, …, Ji is the index of the longitudinal visit at time tij. In addition to the functional outcomes, Yij(s), Xij=[Xij1,Xij2,,Xijp]Tp are the fixed and Zij=[Zij1,Zij2,,Zijq]Tq are the random effects variables. We posit the following marginal three-step inferential approach:

First step:

At each location slS, l = 1,2,…, L, fit a separate pointwise generalized linear mixed model (GLMM) Yij(sl) ~ EF{μij(sl)}, where EF denotes the exponential family distribution, μij (sl) is the conditional mean, and

ηij(sl)=g{μij(sl)}=XijTβ(sl)+ZijTui(sl). (1)

Here g(·) is a link function and ui (sl) is a q × 1 dimensional vector of random effects that depends on the location. Denote the estimates of fixed effects as β(s1), …, β(sL) and of the linear predictors as η˜ij(s1),,η˜ij(sL) obtained from these univariate GLMMs. We refer to this as massively univariate analysis because a GLMM is fit at every location sl, l = 1, …, L, where L can be very large (hence, the use of the word “ massive”).

Second step:

Smooth the estimated fixed effects β(s1), …, β(sL) and/or linear predictors η˜ij(s1),,η˜ij(sL) along the functional domain. Denote these smooth estimators by {β(s),sS} and {η^ij(s),sS}, respectively. This can use any smoother that is or is not data adaptive, including not smoothing and taking the average over all locations.

Third step:

Obtain joint confidence bands for fixed effects parameters and/or linear predictors using analytic approaches for Gaussian data or a bootstrap of study participants for Gaussian and non-Gaussian data.

The key insight of our method is to decompose the complex correlation structure into longitudinal and functional directions. The first step of the analysis is to use the familiar univariate GLMMs, a procedure that can be easily implemented using parallel computing. This substantially reduces the computational burden of existing methods. The third step allows for joint inferences that take into account within- and between-visit correlations. This approach is not limited to estimating β(s) and ηij(s) and can be used for any measures of interest, including random effects, quantiles, and group means. We discuss the fixed effects inference of our approach below.

3. Fixed Effects Inference

Developing a principled statistical inferential framework for fixed effects in longitudinal functional models is difficult. Several approaches that account for the complex within- and between-study participant correlations exist and include different Bayesian approaches (Morris and Carroll, 2006; Morris et al., 2006; Zhu et al., 2011; Goldsmith et al., 2015; Zhang et al., 2016). Unfortunately, many of these methods require specialized software, are slow, and do not scale up with the number of study participants and dimension of the functional domain. Our proposed approach is philosophically closer to the methods proposed in Crainiceanu et al. (2012) and Park et al. (2018), which use bootstrap of study participants. For Gaussian functional data we provide an analytic solution, while for all types of data we propose a bootstrap approach.

3.1. Analytic Inference for Gaussian Functional Data

For Gaussian functional data the pointwise linear mixed model has the form

Yij(sl)=XijTβ(sl)+ZijTui(sl)+ϵij(sl). (2)

For the ith study participant at slS data are {Yi(sl), Xi, Zi}, where Yi(sl)=[Yi1(sl),YiJi(sl)], Xi=[Xi1,,XiJi], and Zi=[Zi1,,ZiJi]. The observations for the entire study population are then denoted as {Y(sl), X, Z}, where Y(sl) = [Y1(sl), …, YI (sl)]T, X =[X1, …, XI]T, and Z=diag(Z1T,,ZIT). The matrix notation of equation (2) is Y(sl) = (sl) + Zu(sl) + ϵ(sl), where u(sl) = [u1(sl)T, …, uI(sl)T]T and ϵ(s)=[ϵ11(s),ϵ12(s),,ϵIJI(s)]T are mutually independent random coefficients and errors with a joint multivariate Gaussian distribution. The pointwise estimator of the fixed effects is β(sl) = {XTV−1(sl)X}−1XTV−1(sl)Y(sl), where V(sl) = ZH(sl)ZT + R(sl) and H(sl) and R(sl) are covariance matrices of u(sl) and ϵ(sl), respectively. These fixed effects estimators are correlated across sl, which needs to be taken into account when conducting joint inference, including when building joint confidence bands or conducting multiple testing. This correlation is modeled intrinsically by assuming Cov{u(sl1),u(sl2)}=G(sl1,sl2) and Cov{ϵ(sl1),ϵ(sl2)}=0 for all sl1sl2. Thus, the covariance of the raw estimates at slS is Var{β(sl)} = {XTV−1(sl)X}−1, and between sl1 and sl2S is

Cov{β(sl1),β(sl2)}={XTV1(sl1)X}1XTV1(sl1)W(sl1,sl2)V1(sl2)X{XTV1(sl2)X}1. (3)

Here W(sl1,sl2)=ZG(sl1,sl2)ZT.

Estimates of H(sl) and R(sl) can be obtained directly from the mixed effects model software. An additional smoothing approach, for example using penalized splines (Ruppert et al., 2003), can be applied to each entry of these matrices along the functional domain. Estimations of the G(sl1,sl2) is not as intuitive. However, the method of moments (MoM) procedure introduced in Greven et al. (2010) provides the blueprint for our procedure. More precisely, for any sl1sl2

E[{Yik(sl1)XikTβ(sl1)}{Yij(sl2)XijTβ(sl2)}]=v=1qt=1qZijvZiktCov{uit(sl1),uiv(sl2)}, (4)

for any j, k =1, …, Ji. This suggests a simple approach: regressing linearly the residual products {Yik(sl1)XikTβ(sl1)}{Yij(sl2)XijTβ(sl2)} onto the covariates {ZijvZikt : j, k =1, …, Ji}. A bivariate smoother, for example, the fast bivariate P-splines (Xiao et al., 2013), could be used to further reduce the variability of the covariance estimators. This approach does not guarantee that the resulting H(sl), R(sl), and G(sl1,sl2) are positive definite. This is handled by trimming the negative eigenvalues of these estimators at 0, as suggested in the literature (Yao et al., 2003; Hall et al., 2008; Greven et al., 2010).

While the model allows different smoothers in the second step, we provide the closed form solution for penalized splines. For simplicity, we focus on the rth fixed effect βr (s). Denoted by βr=[β˜r(s1),,β˜r(sL)]T the raw estimates of βr (s) obtained from the first step. Let Br = [br1, …, brK] be the K-dimensional spline basis matrix, where brk = [brk(s1), …, brk(sL)]T, k = 1, …, K. K is usually chosen to be much smaller than L. Given a smoothing parameter λr, penalty matrix Pr, define Sr=Br(BrTBr+λrPr)1BrT. The smoothed estimator of the rth fixed effect is βr = Srβr. The covariance matrix of βr is Cov(βr) = SrCov(βr)Sr, a sandwich smoother of Cov(βr) obtained from the first step.

This inferential approach is explicit when working with linear mixed effects models with Gaussian random effects and errors. While the inferential approach described in this section involves additional notation, the computations are straightforward. This analytic inferential solution offers the potential for substantially reducing the computational burden of performing bootstrap inference for large-scale Gaussian functional data and performs well in our simulation study (close to nominal coverage for 95% confidence bands). For outcomes with general (Gaussian and non-Gaussian) distributional assumptions we now discuss the bootstrap of study participants as a general solution.

3.2. Nonparametric Bootstrap Approach

Bootstrapping functional data is a practical approach for fixed effects inference (Cuevas et al., 2006; Crainiceanu et al., 2012). For complex correlated functional data, Park et al. (2018) proposed both study participant and residual bootstrap. Here we focus only on the study participant bootstrap approach because the residual bootstrap is not defined for generalized outcomes. The approach is described in Algorithm 1.

Algorithm 1:

Nonparametric Bootstrap for fixed effects inference

Data: {Y(sl), l = 1,…,L}, X, Z
Result: Var(β(sl)), l = 1,…,L.
for b = 1,…,B do
1. Re-sample / subject indices from {1,…,I} with replacement. Denote the vector of re-sampled indices as M(b);
2. For the i′th element of M(b),i′ = 1,…,I, include all observations of the corresponding subject in the bootstrap sample. Denote the bth bootstrap sample as {{YM(b)(sl),l=1,,L},XM(b),ZM(b)};
3. Fit the model in Section 2 using the bth bootstrap sample. Derive the fixed effects estimates {β(sl)(b), l = 1,…,L};
end
4. For l = 1,…,L, derive Var(β(sl)) from B bootstrap estimates {β(sl)(1),…,β(sl)(B)}. In practice we calculate the sample variance and use it as the estimator.

3.3. Extension to Random Effects Inference

The proposed inference procedure has a natural extension to random effects and visit-specific predictions. We provide a brief introduction for this framework. Given equation (2) and notations introduced in Section 3.1, the pointwise BLUP of the random effects is u(sl) = H(sl)ZTV−1(sl){Y(sl) − (sl)}. Without loss of generality assume R(sl) = σ2(sl)I. The uncertainty of u(sl) can be measured through the conditional variance (Morris, 1983) as Var{(u(sl)|Y(sl)} = σ2(sl){H(sl) − H(sl)ZTV−1(sl)ZH(sl)}. The uncertainty of random effects between locations can be measured similarly using the empirical Bayes estimator.

4. Joint Confidence Bands

Inference for functional data has a natural connection with the problem of multiple testing due to the inherent correlations in the observed data along the functional domain. The pointwise confidence bands described in Sections 3.1 and 3.2 do not provide any information about joint coverage probabilities over the entire domain and are, in fact, valid when averaged across the functional domain. The Bonferonni correction (Bonferroni, 1936) is exact for independent data, but is inappropriate for functional data, which is correlated and can have arbitrary sampling density. For example, a functional domain could be sampled at one hundred or one million equally spaced points. In both situations the point estimators would barely change, whereas the joint Bonferonni corrected confidence intervals would depend on the choice of number of samples within the functional domain. At one extreme, if the number of points is allowed to go to infinity, the length of the joint confidence intervals diverges to infinity. This is unacceptable and cannot be addressed by changing the testing criterion to, say, the Benjamini-Hochberg false discovery rate (Benjamini and Hochberg, 1995). The reason is that both methods are overly conservative when the test statistics are highly correlated. As functional data can be sampled densely and correlations between observations increase with sampling density, the probability of failing to reject the null hypothesis using these corrections rapidly approaches one as the data are more densely sampled. An alternative was proposed by Cox and Lee (2008) for functional data based on the Westfall-Young randomization method (Westfall and Young, 1993).

The construction of joint confidence bands in the context of functional data analysis has been studied using various approaches including local linear estimators (Degras, 2011), piecewise constant splines (Ma et al., 2012), and polynomial splines (Cao et al., 2012). However, most of these methods assume independence between curves, which is not the setting considered here. Crainiceanu et al. (2012) and Park et al. (2018) proposed the bootstrap of study participants as a general inferential procedure for functional data with arbitrarily complex functional correlation structures. Here we follow a similar approach.

Based on a bootstrap of study participants we obtain the following estimators βr=[β^r(s1),,β^r(sL)]T and Var(βr), where {βr(s),sS} is the rth functional fixed effect. Let Ns be the sample size of simulated data. Our approach requires simulations from the multivariate normal distribution N{βr,Var(βr)}. When the dimension L of the functional domain is large, this can be quite slow, but the problem can be addressed using a PCA decomposition of the bootstrap samples βr(1), …, βr(B); for details see Algorithm 2.

Algorithm 2:

Level α joint confidence bands of β^r(s)

Data: βr,Var(βr), βr(1),…, βr(B), Ns.
Result: Joint confidence bands of {β^r(s),sS}.
1. Perform Functional Principal Component Analysis (FPCA) on [βr(1),…,βr(B)]T. Derive the mean function μ = [μ(s1),…,μ(sL)]T, eigenvalues λ1,…,λL and eigenfunctions ψ1,…,ψL, where ψk = [ψk(s1),…,ψk(sL)]T, k = 1,…,L;
for n = 1,…, Ns do
2. Simulate ξnk~N(0,λk) for k = 1,…,KT. Calculate βr,n=μ+k=1KTξnkψk;
3. Calculate un=maxslS{|βr,nβr|/diag(Var(βr))};
end
4. Obtain q1−α, the (1−α) empirical quantile of {u1,,uNs};
5. The joint confidence interval at slS is calculated as β^r(sl)±q1αVar(βr)(l,l). The upper and lower bounds of the joint confidence bands can be smoothed.

The number of functional principal component basis KT in Step 2 is selected based on proportion of variance explained, as suggested in the FPCA literature, and usually does not exceed 100 when using a 95% variance explained threshold. As a result, this modified algorithm reduces the dimension of simulation from potentially large L to a more acceptable KT; see supplementary material for a comparison of computing time between this method and direct simulations from multivariate normal distribution.

5. Simulations

An extensive simulation study is used to assess: (1) the performance of the estimators and the pointwise/joint confidence bands; (2) how methods compare to existing approaches. The R code for the simulation study is provided in the supplementary material.

5.1. Simulation Setup

We simulate functional responses on an equally-spaced grid of S=[0,1] with length L. For simplicity, we fix p = 2, the number of fixed effects for each point s on the functional domain and q = 1, the number of random effects for each point on the functional domain. Therefore, Xij = [1, Xij1]T and ui (s) = ui(s), though the approach is designed for much larger p and q. For subject i at visit j the data generating model is

ηij(s)=g{μij(s)}=β0(s)+Xij1β1(s)+ui(s),sS.

The fixed effects covariates are simulated as Xij1~N(0,4), while the random effects are simulated as ui(s) = ci1ψ1(s) + ci2ψ2(s). We use the scaled orthonormal functions ψ1(s) ∝ 1.5−sin(2πs)−cos(2πs) and ψ2(s) ∝ sin(4πs) to capture the individual-level fluctuations. The random coefficients are generated from ci1~N(0,2σB2) and ci2~N(0,σB2), respectively. Here σB2 is determined by the relative importance of random effects SNRB, as described below. We consider the following simulation scenarios:

  1. Simulation parameters
    • Distribution of the functional responses: (a) Gaussian, (b) binary.
    • Functional fixed effects β(s):
    • S1: β0(s) = −0.15−0.1*sin(2πs)−0.1*cos(2πs), β1(s)=120ϕ(s0.60.0225);
    • S2: β0(s) = 0.53 + 0.06sin(3πs)−0.03sin(6.5πs), β1(s)=160ϕ(s0.20.12)+1200ϕ(s0.350.12)1250ϕ(s0.650.062)+160ϕ(s10.072).
  2. Sample size parameters
    • Number of subjects: I ∈{50,100,200,400}.
    • Mean number of visits per subject: J ∈{5,10,20,40}. For subject i the number of visits Ji is drawn from Poisson(J) with a minimum of 1 visit.
    • Dimension of the functional domain: L∈{50,100,200,400}.
  3. Signal-noise parameters
    • Relative importance of random effects: SNRB ∈{0.5, 1}. Here SNRB is the standard deviation of the fixed effects functions divided by the standard deviation of the random effects functions; see Scheipl et al. (2015) for detailed descriptions of this parameter.
    • Signal-to-noise ratio: SNRϵ ∈{0.5,1} (Gaussian response only). Here SNRϵ is the standard deviation of the linear predictors divided by the standard deviation of the noise σϵ.

In terms of fixed effects, the S1 functions are similar to those used by Goldsmith et al. (2015), are smooth and easy-to-estimate. The S2 functions have more complex shapes and are designed to approximate the estimated effects in our DTI application. To be specific, β0(s) mimics a typical fractional anisotropy (FA) trajectory on the corpus callosum in DTI while β1(s) represents a non-monotonic effect of scan date on the FA.

For each scenario we conducted 200 simulations. Considering all combinations of parameters would have been impossible even for our extensive computing resources. Instead, for each combination of simulation parameters (denoted by “Gaussian S1”, “Gaussian S2”, “Binary S1”, “Binary S2” in the simulation results), we: (1) fix the sample size parameters at their baseline (I = 50, J = 5, L = 50), then change the signal-noise parameters; and (2) fix the signal-noise parameters at their baseline (SNRB = 0.5, SNRϵ =1), then change the sample size parameter one at a time while fixing the other two sample size parameters at baseline. For example, we fix J = 5, L = 50 to evaluate model performance for a different number of study participants, I.

5.2. Comparisons to Existing Methods

Many methods have been developed to model correlated functional responses, including Functional Additive Mixed Models (FAMM) (Scheipl et al., 2015, 2016), Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis (GenMFPCA) (Goldsmith et al., 2015), Functional Linear Array Model (FLAM) (Brockhaus et al., 2015), Wavelet-based Functional Mixed Models (WFMM) (Morris and Carroll, 2006), FMEM (Yuan et al., 2014; Zhu et al., 2019). As described in Scheipl et al. (2015), FAMM outperformed Bayesian WFMM for smooth curves. For FLAM, Brockhaus et al. (2015) reported a similar estimation accuracy with FAMM in their simulations. Although inference for FLAM could potentially be conducted via subject-level bootstrap, this implementation was not available in the FDboost package (Brockhaus et al., 2017). For FMEM we could not identify general purpose software. Therefore, we compare FAMM and GenMFPCA, which have well-documented and easy-to-use implementations for inference.

Our method (FUI) is implemented in the lfosr3s() function in the supplementary material, which implements univariate GLMMs and then applies a penalized cubic spline smoother. Results are highly robust to the choice of smoother. For FAMM we use the pffr() function from refund package (Goldsmith et al., 2020) in R. We used 15 and 20 cubic B-splines bases with first order difference penalty for the population average and global functional intercept respectively; see bs.yindex and bs.int arguments in the pffr() function. We have increased the number of bases from the default to increase the performance of FAMM.

Because Goldsmith et al. (2015) reported considerably larger computing time than FAMM, comparisons with GenMFPCA are restricted to smaller sample sizes. In addition, GenMFPCA software is only applicable to binary response. We have attempted to manually implement GenMFPCA for Gaussian responses, but, probably due to our sub-optimal implementation, our implementation was quite slow. To ensure a fair comparison of computing time, we focus on comparing with GenMFPCA only for binary response. The results are shown in the supplementary material.

5.3. Model Evaluation Criteria

We compare the performance of each method (FUI, FAMM, GenMFPCA) with respect to: (1) accuracy in estimating fixed effects; (2) inference on fixed effects; and (3) computational efficiency.

Accuracy of fixed effects estimation was assessed using the integrated squared error (ISE) of fixed effects, defined as ISEk=01(β^k(s)βk(s))2ds, k = 0, 1. The mean integrated squared error (MISE) is calculated by averaging ISE across simulated datasets. We show ISE of β1(s), as similar results were obtained for β0(s).

Inferential performance was assessed by calculating the empirical coverage probability of 95% pointwise confidence bands at each location, then taking the average along the functional domain. For FUI, we also report the empirical coverage probability of 95% joint confidence bands proposed in Section 4. For FUI we use analytic inference (mean±2sd) for Gaussian responses and bootstrap inference for other families of distributions. In the nonparametric bootstrap, we have used the formula mean±2.2sd instead of mean±2sd. This is a simple solution that provides remarkably good results and seems to account for the extra within-subject variability that may be missed by conducting a bootstrap of study participants.

For each scenario, we do 200 simulations on the Joint High Performance Computing Exchange (JHPCE) Cluster with 1 core per simulation. The computing time of each method is obtained under different scenarios.

5.4. Simulation Results: Signal-to-noise parameters

The simulation results for different signal-noise parameters are shown in Figure 2 under the scenario with the smallest number of subjects, mean number of visits per subject, and dimension of the observed functional responses (I = 50, J = 5, L = 50). We only display results for the Gaussian response, as similar results were obtained for binary responses. Left two panels: fixed effect is S1. Right two panels: fixed effect is S2. The MISE decreases as signal increases, either by increasing SNRB or SNRϵ. In addition, these two parameters exhibit similar scaling behavior using both estimation methods. Specifically, increasing SNRB (or SNRϵ) from 0.5 to 1 decreases MISE by about 60% in S1 and about 50% in S2 using either FUI or FAMM. For this small sample size, the coverage is close to the nominal level and the computing time for both methods is almost identical (not displayed).

Fig. 2.

Fig. 2

Estimation accuracy for FUI (red) and FAMM (blue) under different relative importance of random effects (SNRB, x axis) and signal-to-noise ratios (SNRϵ, labels in the gray-shaded area of each panel). Functional response is Gaussian; parameters: I = 50, J = 5, L = 50. Left two panels: S1. Right two panels: S2.

5.5. Simulation Results: Sample Size Parameters

The simulation results for different sample size parameters are shown in Figure 3. As results tend to be quite consistent, we display the results for Gaussian outcomes and fixed effects S1 (denoted by “Gaussian S1” in the title of each panel). Results for other combinations are in the supplementary material. The baseline setting is I = 50, J = 5, L = 50, SNRB = 0.5, SNRϵ = 1. All other parameters are fixed at their baseline values when one sample size parameter is changed. Left column: number of subjects (I). Middle column: mean number of visits per subject (J). Right column: dimension of the functional domain (L). The ISE and computing time for 200 simulations are displayed in the top and bottom row, respectively. The inference results are shown in Table 1. For FUI, we report the empirical coverage probability of both joint (denoted as “Coverage (Joint)”) and pointwise (denoted as “Coverage (Pointwise)”) 95% confidence bands. For FAMM, we report the coverage of the pointwise confidence bands (denoted as “Coverage”).

Fig. 3.

Fig. 3

Estimation accuracy (top row) and computing time (bottom row) for FUI (red) and FAMM (blue) from 200 simulations. Response is Gaussian and the true fixed effects functions are S1. The baseline setting is I = 50, J = 5, L = 50, SNRB = 0.5, SNRϵ =1. All other parameters are fixed at their baseline values when one sample size parameter is changed. Left column: number of subjects (I). Middle column: mean number of visits per subject (J). Right column: dimension of the functional domain (L).

Table 1.

Empirical coverage probability of 95% joint and pointwise confidence bands using FUI and 95% pointwise confidence bands using FAMM from 200 simulations. Response is Gaussian and the true fixed effects functions are S1. The pointwise confidence band is constructed as mean±2sd and the joint is mean ±q0.975 ×sd. The baseline setting is I = 50, J = 5, L = 50, SNRB = 0.5, SNRϵ =1. All other parameters are fixed at their baseline values when one sample size parameter is changed.

Method Type Number of subjects (l)
50 100 200 400
FUI Coverage (Joint) 0.93 0.96 0.94 0.95
Coverage (Pointwise) 0.94 0.95 0.94 0.95
FAMM Coverage 0.96 0.96 0.96 0.94
Method Type Mean number of visits per subject (J)
5 10 20 40
FUI Coverage (Joint) 0.93 0.95 0.97 0.96
Coverage (Pointwise) 0.94 0.95 0.95 0.95
FAMM Coverage 0.96 0.96 0.96 0.96
Method Type Dimension of the functional domain (L)
50 100 200 400
FUI Coverage (Joint) 0.93 0.94 0.94 0.94
Coverage (Pointwise) 0.94 0.94 0.95 0.95
FAMM Coverage 0.96 0.96 0.96 0.96

As the number of subjects increases, the MISE for both FUI and FAMM decreases. The estimation accuracy of the two methods is similar, and the coverage of the confidence bands, both joint and pointwise for FUI and pointwise for FAMM, reach their nominal level. However, FAMM computing time increases substantially when the number of subjects increases (see bottom left panel). Indeed, the median computing time exhibits a second-order polynomial shape in number of subjects, with median computing time exceeding 10 hours on the cluster when I = 400. In addition, the memory usage for FAMM increases substantially despite the use of the efficient mgcv::bam implementation. For example, we were not able to perform simulations for FAMM when I = 1000, as both memory (more than 40 GB RAM) and computing time (unknown) exceeded our extensive resources. In contrast, FUI required only 40 seconds for I = 500 study participants and there are no problems with fitting even for I = 10000.

As the mean number of visits per subject (second column of panels) and dimension of the functional domain (third column of panels) increase, both methods display similar estimation accuracy. The confidence bands of both methods, including joint and pointwise confidence bands for FUI and pointwise confidence bands for FAMM, also have good coverage close to the nominal level (middle and bottom blocks of Table 1). For I = 50, L = 50 FAMM requires similar computation time (3 to 7 minutes) for 5 to 40 visits per study participant, while FUI takes on average less than 1 minute in all scenarios. For I = 50, J = 5 computing time of FUI increases with the dimension of the functional domain (bottom right panel) while FAMM remains unaffected. This increase is expected as we run a GLMM at every location and the time for these GLMMs simply add up. However, our method is easy to parallelize, which would reduce the fitting time to the the time it would take to fit a single GLMM. To the best of our knowledge, FAMM does not currently have a parallel implementation.

The computation time advantages of FUI should not be surprising given the way mgcv is used to estimate functional models. Specifically, random effects are incorporated by fully constructing the random effects design matrix and applying ridge penalties. Therefore, for a model with a subject-specific functional random intercept, u0i (s), the design matrix adds kb columns, where kb is the number of spline bases used to represent the functional random intercept. This is not a problem when I is in the range of 50 to 100, but it becomes problematic when I > 200. Take our physical activity data application for example where L = 1440, J = 7 and I = 1680 and consider a simple functional model β0(s) + ui0(s). Assume that the population mean function β0(s) uses 20 B-spline basis functions (the default for FAMM). The design matrix without random effects is 1440×7×1680 = 16,934,400 rows and 20 columns. Assume that the model ui0(s) uses 15 B-spline basis functions (kb = 15). Then the full design matrix is as large as 16,934,400×25,220, since 20+ (15×1680) = 25,220, amounting to over 400 billion elements. This explains why FAMM runs into substantial computational challenges when the number of study participants increases. The mgcv package does have method for handling large datasets through the mgcv::bam function which avoids constructing and performing computations on the full design matrix. However, even with these added efficiencies FAMM runs into substantial computational challenges. This could be addressed in the future, but our current solution provides a practical, “read-and-use”, stable alternative for a moderate to large number of study participants.

5.6. Simulation Summary

Our method achieves similar accuracy with the state-of-the-art FAMM method for fixed effects under different simulation settings, including different signal-to-noise parameters and sample size parameters. FUI is much faster than FAMM when the number of subjects is large. To the best of our knowledge, FUI is the first inferential method that is demonstrated to work with over 1500 study participants. The reason for implementing such approaches is practical, as many datasets, including our NHANES application, contain such sample sizes. Both joint and pointwise confidence bands of FUI exhibit good coverage to the nominal level. For both FUI and FAMM, estimation accuracy is affected by the change of signal-noise parameters and sample size parameters. These results for FAMM are consistent with the simulation results reported in Scheipl et al. (2015).

6. Applications

In this section, we apply our method to the motivating examples introduced in Section 1.

6.1. DTI Study

Multiple sclerosis (MS) is an autoimmune mediated disease that affects the central nervous system (CNS) and can lead to substantial motor and cognitive disability. While the exact cause of MS remains unknown, modern neuroimaging has played a crucial role in the diagnosis and management of MS. A promising imaging technique is Diffusion Tensor Imaging based on Magnetic Resonance Imaging (DTI-MRI or, shorter, DTI). DTI provides measures of water diffusion in the brain, which are thought to be associated with white matter integrity. Fractional anisotropy (FA) is a measure of diffusion anisotropy derived from DTI. A zero value of FA corresponds to perfectly isotropic diffusion (water diffuses unrestricted in all directions), while a value of one of FA corresponds to perfectly anisotropic diffusion (water diffuses only in one direction). Values of FA fall somewhere within the (0, 1) range with higher values corresponding to more anisotropic (more organized) water diffusion. FA can be calculated at every location in the brain.

Here we focus on the FA calculated along the corpus callosum, a nerve tract connecting the left and right cerebral hemispheres; see Goldsmith et al. (2011) and Greven et al. (2010) for in-depth descriptions of the data. For our purposes, the data set consists of 142 study participants (42 healthy individuals and 100 MS patients). For healthy individuals there is only one visit, whereas for MS patients there are multiple visits with an average of 3.4 and a maximum of 8 visits per MS patient. There were a total of 382 visits across MS patients and healthy individuals. Corpus callosum is a three-dimensional C-shaped nerve fiber bundle that connects the left and right brain hemispheres. For the purpose of this application, several landmarks were manually identified on the brain image and FA was calculated as an average FA at 93 locations along the corpus callosum. Thus, the data consist of a 382 × 93 dimensional matrix, where each row corresponds to a brain image visit and each column corresponds to a particular location in the corpus callosum, resulting in a total of 35526 observations. The study participant ID, age, sex and date of scan information are also available for each study participant at each visit.

For illustration purposes, we are interested in quantifying the association between age, sex and date of scan with FA measurements along the corpus callosum. Using the notation introduced in Section 2, the longitudinal functional responses are denoted by Yij(s), and are the observed FA values along the equally-spaced grid of sS={1,93}. For the ith individual at the jth visit, the fixed effects Xij =[1, Xij1, Xij2, Xij3, Xij4]T where Xij1 is a binary indicator of case (1 for MS patients and 0 for healthy individuals), Xij2 is the date of scan (converted into year unit and treated as numeric), Xij3 is a binary indicator of sex (there were no self-identified non-binary participants in this study) and Xij4 is the age at baseline scan (in years). For each location we fit a random intercept and slope model

Yij(s)=β0(s)+Xij1β1(s)+Xij2β2(s)+Xij3β3(s)+Xij4β4(s)+ui0(s)+ui1(s)Xij2+ϵij(s),

where (ui0(s), ui1(s)) have a joint zero-mean bivariate Normal distribution independent of ϵij(s), which are iid N{0, σ2(s)}. This standard linear mixed effects model is fit 93 times, once for each location s. After pointwise fitting, a penalized spline smoother is used to smooth each βl(·), l = 0,1,2,3,4 coefficient separately. All five smoothers use a cubic basis with 15 equally spaced knots and REML estimation of the smoothing parameter. The pointwise and joint confidence intervals are obtained as described in Section 3 and Section 4.

Figure 4 displays the point estimators of the fixed effects parameters (dashed blue lines) for the intercept, case, scan date, sex, and age (five panels from left to right), respectively. The dark gray regions correspond to the 95% pointwise confidence bands, while the light gray regions correspond to the 95% joint confidence bands. The intercept estimator is consistent with the geometry of the corpus callosum and previously published literature. Compared with healthy individuals, MS patients of the same sex and age at the same date of scan have significantly lower FA at most locations along the corpus callosum. This result may indicate lower anisotropy corresponds among MS patients, which may be consistent with brain micro-structure damage. The middle panel shows a highly significant increase in the FA as a function of scan date at most locations of the corpus callosum. To the best of our knowledge, there is no biological plausible reason for such an increase in anisotropy. Therefore, the result may correspond to the change in technology and software, which led to a large, deterministic, increase in measured FA. The effects of sex and age are not statistically significant at any point along the corpus callosum.

Fig. 4.

Fig. 4

Fixed effects estimates (dashed blue line), 95% pointwise confidence intervals (dark gray shaded area), and 95% joint confidence intervals (light gray shaded area) in the DTI study. Panels from left to right: intercept, case, date of scan, sex, age at baseline.

Figure 5 displays the data (left panels), together with the pointwise estimators (middle panels) and smoothed estimators along the functional domain (right panels). The first and second rows correspond to study participants ID 2017 and 2085, respectively. For study participant ID 2017, the point estimators are consistent with substantial reduction in the visit-to-visit variability. Unsurprisingly, after smoothing (right-top panel) visit-specific profiles are slightly smoother along the functional domain, but with a similar reduced visit-to-visit variability. Comparing the middle and right top panels indicates that the pointwise linear mixed effects models did the “heavy lifting”, while the functional smoothing led to mostly cosmetic changes. This need not be the case in general when the noise and correlation structures could be quite different. Results are similar for study participant ID 2085. These results suggest that: (1) there is a statistically significant, but small fixed effect for the date of the visit; and (2) much of the observed variability is due to visit-to-visit fluctuations in FA trajectories (measurement error); and (3) the effect of scan date is largely contained in the fixed effects. The results about the decomposition of the observed residual variability after accounting for fixed effects are consistent with the literature. Indeed, Greven et al. (2010) showed that only 2 to 3% of the observed variability can be attributed to the longitudinal functional slope.

Fig. 5.

Fig. 5

Fractional anisotropy (FA) tract profiles and estimated predictors for two study participants (first row: ID 2017, second row: ID 2085). First column: FA tract profiles for the corpus callosum over multiple visits. Second column: pointwise estimated predictor η˜ij. Third column: smoothed estimated predictor η^ij of the pointwise predictors.

6.2. NHANES Study

The National Health and Nutrition Examination Survey (NHANES) is a large cohort study conducted by the US Centers for Disease Control (CDC) in two-year waves to assess the health and nutritional status of the US population. The objectively measured physical activity (PA) data were collected using hip-worn accelerometers on study participants in the 2003–2006 waves. The accelerometry data are publicly available as minute-level activity counts (AC), a proprietary measure of PA, and can be accessed in an analysis-ready format through the R rnhanesdata package (Leroux et al., 2019). Specifically, the accelerometry data were collected on 14631 individuals in the NHANES 2003–2004 and 2005–2006 waves. In this study, we focus on individuals with age between 18 and 30 at the time of accelerometer wear. In addition, we exclude individuals who had less than 3 days of data with at least 10 hours of estimated wear time or were labeled as poor data quality by NHANES. The number of available days vary between individuals with a maximum of 7. The final data include 1680 individuals with 8765 days, each with 1440 observations per day for a total of 12621600 minute-level observations.

We would like to investigate whether being non-sedentary is associated with gender, age, and day of the week (e.g., Monday, Tuesday). For the jth day of the ith study participant, the longitudinal functional response Yij(s) is now a binary indicator, which equals to 1 if the AC at minute s ∈ {1, …, 1440} exceeds 100 and 0 if not. The fixed effect Xij=[1,Xij1,Xij2,Xij3,Xij4T]T where Xij1 = j is the day number, Xij2 is a binary indicator of sex (female=1), Xij3 is the age, and Xij4 is a 6 × 1 binary vector indicating the day of the week of day j with order {Mon,Tue, …, Sat}. For example, for study participant i if day 3 is Tuesday, then the second element of Xi34 is 1 while all others are 0; if day 3 is Sunday, all elements in Xi34 are 0. Denote f(s) = [f1(s), …, f6(s)]T. At every minute s of the day we fit a random intercept and slope model

logit{Pr(Yij(s)=1|Xij,ui)}=β0(s)+Xij1β1(s)+Xij2β2(s)+Xij3β3(s)+Xij4Tf(s)+ui0(s)+ui1(s)j,

where [ui0(s), ui1(s)]T ~ N{0, Σu(s)}. This GLMM is fit 1440 times at every location s. The same penalized spline smoother using cubic basis with 15 equally spaced knots and REML estimated smoothing parameter is applied to smooth estimated coefficients from pointwise fits separately.

Figure 6 displays the estimated coefficients together with the 95% pointwise (dark gray shaded area) and joint (light gray shaded area) confidence bands based on 100 bootstrap replicates. The shape of the functional intercept is consistent with the published literature and indicates less activity during the night, a sharp increase in the morning, sustained activity during the day and a reduction of activity in late evening. The effect of sex in this age group (18 to 30) is statistically significant throughout most of the day with the exception of the late afternoon/early evening period (≈ 4–10PM). The sex effect is strongest during the late evening/early morning hours, when, on average, females are less active. This result could correspond to more restful sleep, more sleep, or higher compliance to study protocol for women. The effect of age is also highly significant during the night and early to late morning indicating that older study participants in this age group (18 to 30) tend to have less activity during the night and more activity in the morning and early afternoon. These findings are consistent with those reported by Varma et al. (2017).

Fig. 6.

Fig. 6

Estimated coefficients from the NHANES data application. Smoothed coefficient estimates are denoted using blue dashed lines. Pointwise and joint 95% confidence intervals are shown as the dark and light gray shaded area, respectively.

The fixed effect of day number (β^1(s)) indicates that individuals are slightly more likely to be active during the nighttime hours and less likely to be active during normal waking hours as a function of day, though the nighttime effect is only pointwise significant during the period roughly corresponding to 4AM-6AM, but not significant when considering the joint confidence bands. This suggests that there may be a small “habituation effect”, particularly during the daytime. Habituation effects was proposed as a potential psychological effect of increasing PA at the beginning of wearing a device merely by its presence. Compared to Sundays (the reference category), weekdays correspond to lower levels of activity during the predawn hours (12AM-4AM), higher levels activity in the morning (6AM-11AM), and about the same levels of activity during the afternoon and evening hours. Fridays and Saturdays correspond to more activity in the evening than Sundays. Saturdays tend to have lower activity in the morning compared to weekdays, but more activity than Sundays. These results are consistent with previous findings that individuals tend to be less active during the night and more active during the day on weekdays. These differences are likely due to social behaviors on the weekends and obligations related to school and/or work during the weekdays.

Computationally, the initial model was fit in 672 minutes, with bootstrapping requiring ~ 67200 additional minutes (1120 hours); results are reported on a standard laptop. However, because of the parallel nature of our method, each location-specific fit can be estimated separately and combined at the end. This would reduce the computation time by 3 orders of magnitude, as most computational time is taken by fitting 1440 univariate GLMMs. A fully parallel implementation the entire procedure would take ~ 30 seconds for one model fit and ~1 hour for the inferential procedure. Though this may seem like a long time, we are not aware of any other methods that could fit such a model for this large longitudinal functional data set.

7. Discussion

We have introduced a fast univariate inferential approach for longitudinal functional models, a computationally efficient method for quantifying the association between covariates and a broad family of longitudinal functional outcomes. The model is estimated using a three-step procedure: (1) fit a series of separate standard longitudinal mixed models; (2) smooth estimators along the functional domain; and (3) construct pointwise and joint confidence bands using analytic approaches for Gaussian data or a nonparametric bootstrap of study participants for any type of data. The proposed method is highly computationally efficient because the first step can be parallelized to allow fitting large high-dimensional datasets. The second step is actually optional and one can either smooth or not smooth the resulting coefficients. Building joint confidence bands is a crucial component for conducting joint inference and performing testing multiplicity adjustment. Another major advantage of the proposed approach is its conceptual simplicity and availability in the R software. Most importantly, methods are “read-and-use”, meaning that data scientists with a working knowledge of GLMMs can easily implement and apply our procedures.

The most important methodological contribution of this paper is to provide practical methods for building pointwise and joint confidence bands for very large longitudinal functional datasets. Simulation results suggest that our method achieves similar estimation accuracy and nominal coverage compared with existing methods, while the computation is much faster when the number of subjects is large (> 100).

Our work is not without limitations. First, the smoothing parameter selection assumes that the residuals of the raw estimated fixed effects around the true coefficient along the functional domain are independent. Second, changing the quantile of the confidence bands from 2 to 2.2 (lengthening the confidence bands by 10%) for bootstrap inference works well in our simulation study, but a more rigorous procedure and associated simulations may be necessary. Third, our method is only applicable to concurrent functional models, and can only take into account functional covariates that are measured on the same grid as the functional responses. Fourth, we focus on the fixed effects inference in this paper. While the inference for visit-specific predictions and other metrics falls into a similar framework, as introduced in Section 3.3, the extension is nontrivial and exceeds the scope of the current paper.

Supplementary Material

Supp 1

ACKNOWLEDGEMENTS

The project described was supported by Award Number R01 NS060910 from the National Institute of Neurological Disorders and Stroke, and training grant T32 AG000247 from the National Institute on Aging. The MRI/DTI data were collected at Johns Hopkins University and the Kennedy-Krieger Institute.

Footnotes

SUPPLEMENTARY MATERIALS

FUI_supp.pdf

Comparison of the computing time of two methods to obtain joint confidence bands, and additional simulation results. (.pdf file)

code.zip

The R code for model implementation and simulation. (.zip file)

Contributor Information

Erjia Cui, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA.

Andrew Leroux, Department of Biostatistics and Informatics, University of Colorado, USA.

Ekaterina Smirnova, Department of Biostatistics, Virginia Commonwealth University, USA.

Ciprian M. Crainiceanu, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA

References

  1. Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57(1):289–300. [Google Scholar]
  2. Bonferroni C (1936). Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3–62. [Google Scholar]
  3. Brockhaus S, Rügamer D, and Greven S (2017). Boosting functional regression models with fdboost. arXiv preprint arXiv:1705.10662. [Google Scholar]
  4. Brockhaus S, Scheipl F, Hothorn T, and Greven S (2015). The functional linear array model. Statistical Modelling, 15(3):279–300. [Google Scholar]
  5. Cao G, Yang L, and Todem D (2012). Simultaneous inference for the mean function based on dense functional data. Journal of Nonparametric Statistics, 24(2):359–377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cox DD and Lee JS (2008). Pointwise testing with functional data using the westfall–young randomization method. Biometrika, 95(3):621–634. [Google Scholar]
  7. Crainiceanu CM, Staicu A-M, Ray S, and Punjabi N (2012). Bootstrap-based inference on the difference in the means of two correlated functional processes. Statistics in Medicine, 31(26):3223–3240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cuevas A, Febrero M, and Fraiman R (2006). On the use of the bootstrap for estimating functions with functional data. Computational Statistics & Data Analysis, 51(2):1063–1074. [Google Scholar]
  9. Cui E, Crainiceanu CM, and Leroux A (2020). Additive functional cox model. Journal of Computational and Graphical Statistics, pages 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Degras DA (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, pages 1735–1765. [Google Scholar]
  11. Fan J and Zhang J-T (2000). Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2):303–322. [Google Scholar]
  12. Goldsmith J, Bobb J, Crainiceanu CM, Caffo B, and Reich D (2011). Penalized functional regression. Journal of Computational and Graphical Statistics, 20(4):830–851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Goldsmith J, Crainiceanu CM, Caffo B, and Reich D (2012). Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. Journal of the Royal Statistical Society: Series C (Applied Statistics), 61(3):453–469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Goldsmith J, Scheipl F, Huang L, Wrobel J, Di C, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, and Reiss PT (2020). refund: Regression with Functional Data. R package version 0.1–23. [Google Scholar]
  15. Goldsmith J, Zipunnikov V, and Schrack J (2015). Generalized multilevel function-on-scalar regression and principal component analysis. Biometrics, 71(2):344–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Greven S, Crainiceanu C, Caffo BS, and Reich DS (2010). Longitudinal functional principal component analysis. Electronic Journal of Statistics, pages 1022–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Guo W (2002). Functional mixed effects models. Biometrics, 58(1):121–128. [DOI] [PubMed] [Google Scholar]
  18. Hall P, Müller H-G, and Yao F (2008). Modelling sparse generalized longitudinal observations with latent gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4):703–723. [Google Scholar]
  19. Koster A, Caserotti P, Patel KV, Matthews CE, Berrigan D, Van Domelen DR, Brychta RJ, Chen KY, and Harris TB (2012). Association of sedentary time with mortality independent of moderate to vigorous physical activity. PloS One, 7(6):e37696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Leroux A, Di J, Smirnova E, Mcguffey EJ, Cao Q, Bayatmokhtari E, Tabacu L, Zipunnikov V, Urbanek JK, and Crainiceanu C (2019). Organizing and analyzing the activity data in nhanes. Statistics in Biosciences, 11(2):262–287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ma S, Yang L, and Carroll RJ (2012). A simultaneous confidence band for sparse longitudinal regression. Statistica Sinica, 22:95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Matthews CE, Chen KY, Freedson PS, Buchowski MS, Beech BM, Pate RR, and Troiano RP (2008). Amount of time spent in sedentary behaviors in the united states, 2003–2004. American Journal of Epidemiology, 167(7):875–881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Morris CN (1983). Parametric empirical bayes inference: theory and applications. Journal of the American statistical Association, 78(381):47–55. [Google Scholar]
  24. Morris JS, Arroyo C, Coull BA, Ryan LM, Herrick R, and Gortmaker SL (2006). Using wavelet-based functional mixed models to characterize population heterogeneity in accelerometer profiles: a case study. Journal of the American Statistical Association, 101(476):1352–1364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Morris JS and Carroll RJ (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2):179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Park SY, Staicu A-M, Xiao L, and Crainiceanu CM (2018). Simple fixed-effects inference for complex functional models. Biostatistics, 19(2):137–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Reiss PT, Huang L, Wu P-S, Chen H, and Colcombe S (2017). Pointwise influence matrices for functional-response regression. Biometrics, 73(4):1092–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ruppert D, Wand MP, and Carroll RJ (2003). Semiparametric regression. Number 12. Cambridge university press. [Google Scholar]
  29. Scheipl F, Gertheiss J, Greven S, et al. (2016). Generalized functional additive mixed models. Electronic Journal of Statistics, 10(1):1455–1492. [Google Scholar]
  30. Scheipl F, Staicu A-M, and Greven S (2015). Functional additive mixed models. Journal of Computational and Graphical Statistics, 24(2):477–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Shou H, Zipunnikov V, Crainiceanu CM, and Greven S (2015). Structured functional principal component analysis. Biometrics, 71(1):247–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Smirnova E, Leroux A, Cao Q, Tabacu L, Zipunnikov V, Crainiceanu C, and Urbanek JK (2019). The predictive performance of objective measures of physical activity derived from accelerometry data for 5-year all-cause mortality in older adults: National health and nutritional examination survey 2003–2006. The Journals of Gerontology: Series A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Varma VR, Dey D, Leroux A, Di J, Urbanek J, Xiao L, and Zipunnikov V (2017). Re-evaluating the effect of age on physical activity over the lifespan. Preventive Medicine, 101:102–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Westfall PH and Young SS (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment, volume 279. John Wiley & Sons. [Google Scholar]
  35. Xiao L, Li Y, and Ruppert D (2013). Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B (Statistical Methodology), pages 577–599. [Google Scholar]
  36. Yao F, Müller H-G, Clifford AJ, Dueker SR, Follett J, Lin Y, Buchholz BA, and Vogel JS (2003). Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics, 59(3):676–685. [DOI] [PubMed] [Google Scholar]
  37. Yao F, Müller H-G, and Wang J-L (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470):577–590. [Google Scholar]
  38. Yuan Y, Gilmore JH, Geng X, Martin S, Chen K, Wang J. l., and Zhu H (2014). Fmem: Functional mixed effects modeling for the analysis of longitudinal white matter tract data. NeuroImage, 84:753–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zhang L, Baladandayuthapani V, Zhu H, Baggerly KA, Majewski T, Czerniak BA, and Morris JS (2016). Functional car models for large spatially correlated functional datasets. Journal of the American Statistical Association, 111(514):772–786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zhu H, Brown PJ, and Morris JS (2011). Robust, adaptive functional regression in functional mixed model framework. Journal of the American Statistical Association, 106(495):1167–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Zhu H, Chen K, Luo X, Yuan Y, and Wang J-L (2019). Fmem: Functional mixed effects models for longitudinal functional responses. Statistica Sinica, 29(4):2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Zipunnikov V, Greven S, Shou H, Caffo B, Reich DS, and Crainiceanu C (2014). Longitudinal high-dimensional principal components analysis with application to diffusion tensor imaging of multiple sclerosis. The Annals of Applied Statistics, 8(4):2175. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES