Repeated measures analysis for functional data

https://doi.org/10.1016/j.csda.2011.06.007Get rights and content

Abstract

Most of the traditional statistical methods are being adapted to the Functional Data Analysis (FDA) context. The repeated measures analysis which deals with the k-sample problem when the data are from the same subjects is investigated. Both the parametric and the nonparametric approaches are considered. Asymptotic, permutation and bootstrap approximations for the statistic distribution are developed. In order to explore the statistical power of the proposed methods in different scenarios, a Monte Carlo simulation study is carried out. The results suggest that the studied methodology can detect small differences between curves even with small sample sizes.

Highlights

► The repeated measured methodology is generalized to Functional Data Analysis context. ► Both parametric and non-parametric techniques are considered. ► Bootstrap, permutation and asymptotic approximations are explored. ► In order to study the statistical power, a Monte Carlo simulation studied is developed.

Introduction

The great advances of computational and analytical techniques have allowed many processes to be continuously monitored. The increase in the number of the data to be analyzed is its direct consequence and, as usual, new statistical methods must be developed. Functional data analysis (FDA) deals with the statistical study of samples of random functions. The books by Ramsay and Silverman, 1997, Ramsay and Silverman, 2002, Ramsay and Silverman, 2005 have contributed to popularize the FDA techniques. They offer a broad perspective of the available methods and a number of appealing case studies and practical methodologies. The book of Ferraty and Vieu (2006) offers a nonparametric perspective on FDA. As usual, a lot of standard statistical methods are being adapted for functional data. For example, principal component analysis (see, for example, Boente and Fraiman, 2000 and references therein or Berrendero et al., 2011 for a recent issue), discriminant analysis (Ferraty and Vieu, 2003), regression (see, for example, Cuevas et al., 2002 among other). Of course, techniques for testing the homogeneity in high dimensional and functional frameworks have also been considered. Mas (2007) proposed a test for the regression operator in a linear model with functional inputs and Jiofack and Nkiet (2009) studied the equality of functional means. These two last papers considered random variables defined on an abstract probability space with values in a infinite, real and separable Hilbert space. The traditional k-sample problems as the parametric ANOVA (see, Cuevas et al., 2004, Cuesta and Febrero, 2010 and references therein) and the non-parametric one (Delicado, 2007) have also been considered.

FDA is strongly linked with the better known area of longitudinal data analysis (LDA). Both fields even shared a special issue of the journal Statistica Sinica. Although both methodologies are devoted to analyze data collected over time on the same subjects, FDA and LDA are also intrinsically different (Davidian et al., 2004).

Longitudinal data are involved in follow up studies (common on biomedical sciences) which, usually, require several (few) measurements of the variables of interest for each individual along the period of study. They are often treated by multivariate parametric techniques which study the variation among the means along the time controlled by a number of covariates. In contrast, functional data are frequently recorded by mechanical instruments (more common in engineering and physical sciences although also in a increasing number of biomedical problems) which collect many repeated measurements per subject. Its basic units of study are complex objects such as curves (commonly), images or shapes (information along the time of the same individual is jointly considered). Conceptually, functional data can be considered sample paths of a continuous stochastic process (Valderrama, 2007) and to study the covariance structure is the usual focus on FDA. In addition, the infinite dimensional structure of the functional data makes that the links with standard nonparametric statistics (in particular with smoothing techniques) were specially strong (González-Manteiga and Vieu, 2007).

Despite these differences, which involve, mainly, the viewpoints and ways of thinking about the data of both fields, Zhao et al. (2004) connected them and, illustrating the ideas in the context of a gene expression study example, introduced LDA to the FDA viewpoint.

This paper deals with the issue of comparing two (or more) functions from paired design (classical repeated measures analysis). This situation occurs in a variety of problems. Most direct application is the comparison of biomedical parameter measured during a period of time on the same patients in different situations. For instance, in pharmacokinetic (or the absorption, distribution, and elimination of drugs), the comparison of the concentration versus time curves for different drugs is studied on the same individuals. Formally, we have Xi,j(t),i1,,n,j1,,k,t[a,b],n×k trajectories, from n (random) subjects, drawn from L2-processes Xj,(1jk) such that E[Xj,(t)]=mj(t), and we want to test the null hypothesis H0:m1(t)==mk(t)(=m(t))t[a,b]. Although in FDA the data are considered as curves, in practice, they are invariably in a discrete fashion and, really, only certain (finites) values for the trajectories are known. From these values, the trajectories are often estimated from some interpolation or smoothing method (usually based on local-polynomial or spline methods) in order to obtain the smooth curves to which the functional data analysis methods are applied. Hall and Van Keilegom (2007) suggested ways of pre-processing the data, so as to minimize the effects of smoothing, for two-sample tests. We assume that the values of the trajectories are known for a huge number of arbitrary points tl(1lH), besides these points are not necessarily the same for the different curves. It is worth mentioning that the interpolation method effect (also the fact that the known points are different) is negligible from adequately large H. The rest of the paper is organized as follows: In Section 2, additivity is assumed and the usual parametric repeated measures test (for two-sample problems) for functional data is developed. In the Theorem 1 a not fully asymptotic distribution for the considered statistic is developed (based on the L2-norm). This result can be used in practice arguing as in Cuevas et al. (2004) and approximating the P-value from the Monte Carlo method. Section 3 is devoted to the nonparametric approach and a bootstrap and a permutation tests are explored (also for the two-sample case). The bootstrap procedure uses an auxiliary statistic and has a similar distribution (asymptotically equal) to the original statistic under the null whether the data verifies the null hypothesis or not. Its main particularity is that the null is involved in order to compute the bootstrap value instead of in order to obtain the bootstrap samples. A Monte Carlo simulation study is carried out in Section 4. Finally, in Section 5, some indications for the k-sample case are considered.

Section snippets

Two-sample case: Methodology

For independent and homoscedastic samples, it is assume that each trajectory (we can assume, without lost of generality that [0,1] is the considered interval) is in the way Xi,j(t)=mj(t)+ei,j(t)t[0,1] where ei,j(t) with t[0,1](1in,1jk) are random functions centered in the mean. Cuevas et al. (2004) computed the ratio of variability between samples and intra-sample and proposed the following functional version for the classical F-ratio of the ANOVA model, FN=j=1knj(Xj,(t)X,(t))2dt/(k

Non-parametric approach

Additivity assumption is not adequate for a number of functions. For instance, when working with densities. Density functions must verify some particular conditions (being non negative and integrating 1 on R) which make inappropriate fixed structures. In this section, from a nonparametric approach, we propose two different resample plans which do not require any previous assumptions.

Although the use of the bootstrap on the paired-sample problem is straightforward (even in FDA context) in order

Monte Carlo simulation study

In order to investigate the behavior of the proposed method, as usual, a Monte Carlo simulation study was carried out. Two different problems have been considered. In the first one, four different functions were proposed. Fig. 2 shows the different shapes of these curves: 1.m0,1(t)=6t/πe6tI[0,1](t)2.m1,1(t)=13t/(2π)e13t/2I[0,1](t)3.m2,1(t)=11t/(2π)e11t/2I[0,1](t)4.m3,1(t)=5t2/3e7tI[0,1](t).

The procedure to compute each artificial trajectory (sample sizes of n=25,35,50 are considered), Xi(t)

Considerations for the k-sample case

There exist several ways to generalize Cn to the k-sample case. Perhaps, the most direct way is the one employed by Kiefer (1959) and, recently, by Martínez-Camblor and de Uña-Álvarez (2009) given by, Cn(k)=j=1kn01(X(t+(j1))X̄(t))2dt, where X(t)=n1i=1nXi(t) with t[0,k] and X̄(t)=k1j=1kX(t+(j1)) with t[0,1] and 1jk (note that Cn=2Cn(2)).

It is not difficult to develop a similar result to Theorem 1 for the above statistic. Arguing as in Martínez-Camblor et al.

Conclusions

Functional data analysis (FDA) has been the focus of numerous and interesting works along the last few years. Because the number of areas in which this kind of data appears is increasing, FDA earns interest and, recently, most of the usual statistical methods have been adapted to this context (the references are uncountable (including special issues of several journals like Computational Statistics, Journal of Multivariate Analysis or Computational Statistics and Data Analysis), see for example

Acknowledgments

The first author is specially grateful to Susana Diaz-Coto whose patient ear and wise counsel have improved this work. We are really grateful to the Editor and the anonymous reviewers whose comments and suggestions have improved the paper.

References (40)

  • O. Rosen et al.

    A Bayesian regression model for multivariate functional data

    Computational Statistics & Data Analysis

    (2009)
  • R.J. Adler

    An introduction to continuity, extrema and related topics for general Gaussian processes

  • M.A. Arcones et al.

    On the bootstrap of U and V statistics

    Annals of Statistics

    (1992)
  • J. Barrientos-Marin et al.

    Locally modelled regression and functional data

    Journal of Nonparametric Statistics

    (2010)
  • P. Billinsley

    Convergence of Probability Measures

    (1968)
  • G & Boente et al.

    Kernel-based functional principal components

    Annals of Statistics

    (2000)
  • F. Burba et al.

    k-Nearest neighbor method in functional nonparametric regression

    Journal of Nonparametric Statistics

    (2009)
  • J.A. Cuesta et al.

    A simple multiway ANOVA for functional data

    Test

    (2010)
  • A. Cuevas et al.

    Linear functional regression: the case of fixed design and functional response

    Canadian Journal of Statistics

    (2002)
  • M. Davidian et al.

    Introduction: Emerging issues in longitudinal and functional data analysis

    Statistica Sinica

    (2004)
  • Cited by (22)

    • The exposure-crossover design is a new method for studying sustained changes in recurrent events

      2013, Journal of Clinical Epidemiology
      Citation Excerpt :

      Repeated measures analysis is the name applied to this group of statistical methods, and one popular specific model uses generalized estimating equations (GEEs) [48]. Other methods are possible and outside the scope of this review [49–51]. A GEE approach also provides investigators opportunities to explore and model the induction interval for quantitative longitudinal data analysis [52].

    • A general bootstrap algorithm for hypothesis testing

      2012, Journal of Statistical Planning and Inference
      Citation Excerpt :

      The developed method has many practical applications. Due the studied algorithm allows preserve the internal covariance data structure, perhaps the marginal distribution comparison in a multivariate random variable could be the most direct one (see Martínez-Camblor, 2010a; Martínez-Camblor et al., 2011a, 2011b for usual applications on k-sample problem for paired design and Martínez-Camblor and Corral (2011c) for the generalization of the repeated measures problem to functional data). However, it is also really useful when the null hypothesis implications are not clear (see Martínez-Camblor et al., 2011b for a practical application of this case).

    • Statistical comparison of the genetic sequence type diversity of invasive Neisseria meningitidis isolates in northern Spain (1997-2008)

      2011, Ecological Informatics
      Citation Excerpt :

      Therefore, the usual bootstrap (which resample from the pool sample) cannot be used in this case. Arguing as in Martínez-Camblor (2010) and in Martínez-Camblor and Corral (2011), the null is applied in order to compute the statistic value instead of in order to draw the bootstrap sample. Obviously, this procedure can be used in order to approximate the distribution function (under the null) for other statistics which equality does not imply the equality among the underlying distributions like, for instance, the well known Gini Index (Gini, 1995).

    View all citing articles on Scopus
    View full text