Extremal dependence measure for functional data
Introduction
We first concisely state main contributions of the paper with the caveat that detailed definitions and formulations will be provided in the following. Consider a sample of functions , such that each of them has the same distribution as . The Karhunen–Loéve expansion is . The functions are the functional principal components (FPCs) and the random variables are their scores. We want to estimate extremal dependence of and . We define a measure of such a dependence, which we denote by . We then define an estimator of and formulate conditions under which it is consistent (Theorem 1) and asymptotically normal (Theorem 2). The main difficulty is that the population scores are not observable.
This paper thus makes a contribution at the nexus of functional data analysis (FDA) and extreme value theory (EVT). We assume that the reader is familiar with mathematical foundations of functional data analysis and central principles of extreme value theory. The FDA background given in Chapters 2 and 3 of [17] is sufficient, and more detailed treatment is provided in [18]. Recent advances in FDA are surveyed in [15], [1], and [5].
Chapters 2 and 6 of [31] provide sufficient background in extreme value theory. Other references are cited when needed. We assume that all functions are elements of the space , where the measure space is such that , with the usual inner product, is a separable Hilbert space. This will be ensured if the measure on is -finite and defined on a countably generated -algebra, see e.g. Proposition 3.4.5 in [2]. In particular, can be taken to be a complete separable metric space (Polish space).
Suppose are mean zero iid functions in with , and denote by a generic random function with the same distribution as each . A main dimension reduction tool of functional data analysis is to project the infinite dimensional functions onto a finite dimensional subspace spanned by the FPCs. We now recall the required definitions. Consider the population covariance operator of , defined by The eigenfunctions of are the FPCs, denoted by , i.e., , where the are the eigenvalues of . The FPCs lead to the commonly used Karhunen–Loéve expansion The FPCs and the eigenvalues are estimated by and , which are solutions to the equations where is the sample covariance operator defined by Each curve can then be approximated by a linear combination of a finite set of the estimated FPCs , i.e., , where the are the sample scores. Each quantifies the contribution of the curve to the shape of the curve . Thus, the vector of the sample scores, , encodes the shape of to a good approximation. To illustrate, Fig. 1 displays the first three sample FPCs, , for intraday return curves , for Walmart stock from July 05, 2006 to Dec 30, 2011. These data are described in detail in Section II of the supplement. The curves show how a return on an investment changes throughout a trading day as two examples are shown in Fig. 2. The curve is a monotonic trend throughout the day. If the score corresponding to it is large, trading in this stock on a given day was dominated by a systematic increase (or decline if the score is negative) in the price of the stock. Notice the gradually decreasing slope of , which reflects the well-known fact that the most intense trading takes place after the opening of the trading floor. The second FPC, , has a large score, if there is a significant reversal in investor sentiment during a given trading day. These observations are illustrated in Fig. 2.
The main interest in this paper is the estimation of extremal dependence between the scores corresponding to different FPCs. Extremal dependence is a tendency of large values of one component to be coupled with large values of another component of a random vector. In the context of our Walmart stock example, extreme dependence between the first and second scores indicates that an extremely high monotonic trend and a pronounced reversion tend to occur simultaneously. We assess extremal dependence of the scores by means of the extremal dependence measure (EDM), which is constructed based on the theory of heavy-tailed regularly-varying random vectors. There has been considerable research on quantifying the tail dependence between extreme values in a heavy-tailed random vector. [23], [24], [25] defined the coefficient of tail dependence, which was later generalized to the extremogram by [7]. While these approaches are essentially based on the exponent measure of a random vector, the EDM is defined in terms of the spectral measure. The EDM was introduced by [30] and further investigated by [22]. Important related papers are [14] and [3].
In this paper, we quantify extremal dependence of scores using the EDM. To estimate the EDM of population scores, we consider an extension of the estimator proposed by [22]. It is important to emphasize that in our functional setting, the estimator can only be computed using the sample scores , not the population scores because the are unobservable. Establishing large sample properties of any estimator based on sample scores requires taking the effect of the estimation of the scores into account. Since the estimator in (3) depends on the whole sample , the vectors are no longer independent, even if are i.i.d functions. They form a triangular array of dependent identically distributed vectors of dimension . We also note that the population scores satisfy if and the sample correlation of the sample scores and is also zero. However, the correlation is a measure of the overall dependence, and there may be strong dependence, e.g. between the positive parts and , in particular there may be extremal dependence in specific quadrants. Another point to keep in mind is that for regularly varying observations, zero covariance does not imply independence.
The remainder of the paper is organized as follows. In Section 2, we introduce preliminaries on multivariate regular variation and the EDM, and extend the concept of the EDM to multivariate data. Our main large sample results are presented in Section 3, which deals with the EDM for scores of functional observations. Section 4 presents a number of preliminary results. These results allow us to streamline the exposition of the proofs of the results of Section 3, which are presented in Section 5.
The paper is accompanied by online Supplementary Material, which contains several sections. Section 1 explains how to normalize tail indexes of components of multivariate vectors. This is a well-researched topic in EVT, but may be less known in the FDA community, so a brief account needed to understand the application in Section 2 of the supplement is provided. Sections 2 Multivariate regular variation and the EDM, 3 The EDM for scores of functional data, present, respectively, an application to functional return data and a simulation study. Section 4 contains additional tables discussed in Section 3.
We hope that this work will be received with some interest by researchers working in two exciting and dynamic fields: functional data analysis and extreme value theory.
Section snippets
Multivariate regular variation and the EDM
We start by introducing multivariate regular variation for random vectors with positive components because the extremal dependence measure (EDM) was defined in such context. Following [31], we denote by the nonnegative orthant compactified at infinity. We denote by the space of Radon measures on , and by the vague convergence in . An -valued random vector with distribution function is regularly varying with index , , if there exist a sequence
The EDM for scores of functional data
In this section, we consider the estimation of the EDM of scores of functional data. Following the framework introduced in Section 1, recall that are mean zero iid functions in with , and that each admits the Karhunen–Loéve expansion (2). The unknown population scores in (2) are estimated by the sample scores , where the are estimators of the FPCs . We introduce the following random variables:
Preliminary results
We put together several preliminary results in this section to avoid burdening the proofs in Section 5, so that readers can keep track of the main flow of the argument made in Section 5.
The first lemma follows from Lemma 3.7 of [20] and is needed to prove Lemma 2.
Lemma 1 Suppose random variables , , satisfy and . Then, .
In the following lemma, we present a sufficient condition to guarantee the convergence between random measures defined on a
Proofs of the results of Section 3
Proof of Proposition 3 First, note that and iff . Observe that, for any set in , To prove the regular variation of in , we will apply Theorem 2.3 of [26]. To do this, we must show that the are continuity sets of , i.e., . The verification uses the same idea described in the proof of Proposition 3.1 of [21], but the difference is that we work with the different projection and its relevant set
Acknowledgments
This research was partially supported by the United States NSF grants DMS-1923142, DMS-1914882 and DMS-2123761.
References (33)
- et al.
Recent advances in functional data analysis and high-dimensional statistics
J. Multivariate Anal.
(2019) A partial overview of the theory of statistics with functional data
J. Statist. Plann. Inference
(2014)- et al.
More limit theory for the sample correlation function of moving averages
Stoch. Process. Appl.
(1985) - et al.
An introduction to recent advances in high/infinite dimensional statistics
J. Multivariate Anal.
(2016) Measure Theory
(2013)- et al.
Decompositions of dependence for high-dimensional extremes
Biometrika
(2019) - et al.
Kernel estimates of the tail index of a distribution
Ann. Statist.
(1985) - et al.
The sample autocorrelations of heavy-tailed processes with applications to ARCH
Ann. Statist.
(1998) - et al.
The extremogram: A correlogram for extreme events
Bernoulli
(2009) - et al.
Limit theory for moving averages of random variables with regularly varying tail probabilities
Ann. Probab.
(1985)
Limit theory for the sample covariance and correlation functions of moving averages
Ann. Statist.
Modelling Extremal Events for Insurance and Finance
Functional peaks-over-threshold analysis for complex extreme events
High-dimensional peaks-over-threshold inference
Biometrika
Multivariate max-stable spatial processes
Biometrika
On asymptotic normality of hill’s estimator for the exponent of regular variation
Ann. Statist.
Cited by (4)
A data-driven soft-sensing approach using probabilistic latent variable model with functional data framework
2024, Transactions of the Institute of Measurement and ControlTransformed-Linear Models for Time Series Extremes
2024, Journal of Time Series Analysis
- 1
Both Authors, Mihyun Kim and Piotr Kokoszka, contributed equally to this work.