Conditional independence testing via weighted partial copulas

https://doi.org/10.1016/j.jmva.2022.105120Get rights and content

Abstract

The test statistic proposed in this paper is an explicit Cramér–von Mises transformation of a certain weighted partial copula function. The regions of rejection are computed using a bootstrap procedure which mimics conditional independence by generating samples from the product measure of the estimated conditional marginals. Under certain (high-level) conditions (on the estimated conditional marginals), rates of convergence for the weighted partial copula process and the test statistic as well as the weak convergence under the null of the normalized test statistic are established. These high-level conditions on the estimated margins are shown to be valid in a variety of examples ranging from nonparametric kernel to linear quantile regression estimates. Finally, an experimental section demonstrates that the proposed test has competitive power compared to recent state-of-the-art methods such as kernel-based test.

Introduction

Let (Y1,Y2,X)R×R×Rd be a triple of continuous random variables. We say that Y1 and Y2 are conditionally independent given X if (y1,y2,x)R×R×Rd: Pr(Y1y1,Y2y2X=x)=Pr(Y1y1X=x)Pr(Y2y2X=x).This property is denoted by Y1Y2X and roughly speaking, it means that for a given value of X, the knowledge of Y1 does not provide any further information on Y2 (and vice versa). Determining conditional independence has become in the recent years a fundamental question in statistics and machine learning. For instance, it plays a key role in defining graphical models [1], [39]; see also [45] for a study specific to cellular networks. Moreover the concept of conditional independence lies at the core of sufficient dimension reduction methods [42] and is useful to conduct variable selection in regression [41]. Finally, conditional independence is relevant in many application fields such as economy [32] or psychometry [2].

The approach taken in this paper is related to the well-studied problem of (unconditional) independence testing, in which, inspired by [35], rank-based statistics have received an increasing interest [13], [52], [53], [54]. Because they do not depend on the marginals, rank-based statistics have became a key tool for modeling the joint distribution of random variables without being affected by their margins. A most natural quantity that measures the dependency between random variables is the empirical copula function, defined as the estimated joint cumulative distribution between the ranks. It has been used with success in independence testing [22], [23] and we refer to [20], [55] for additional details on the estimation of the copula function. A common difficulty when using the empirical copula is that the distribution (even the asymptotic one) of the resulting statistic is unknown and making difficult the computation of quantiles used to build the rejection regions. To overcome this issue, bootstrap procedures have been of prime interest because they allow, using simulated data, to approximate the statistic distribution [8], [20], [49].

The conditional copula of Y1 and Y2 given X is defined in the same way as the copula of Y1 and Y2 but uses the conditional distribution of Y1 and Y2 given X in place of the joint distribution of Y1 and Y2. Compared to the copula, the conditional copula captures the conditional dependency between random variables and is thus useful to build conditional dependency measures [15], [26]. Therefore, as in the case of independence testing, the conditional copula appears to be a relevant tool for building a statistical test of conditional independence. This has been pointed out as an “interesting open issue” in [62, Section4].

In this work, a new statistical test procedure, called the weighted partial copula test is investigated to assess conditional independence. The proposed approach follows from using the weighted partial copula, an integral transform allowing a simple characterization of conditional independence. Given estimators of the conditional marginals of Y1 and Y2 given X, the empirical weighted partial copula is introduced to estimate the weighted partial copula and the test statistic results from an easy-to-compute Cramér–von Mises transformation. The use of the weighted partial copula is motivated by the conditional moment restrictions literature (see [40] and the reference therein) in which integrated criteria, similar to the weighted partial copula, have been frequently used. Those criteria are interesting because even when they involve local estimates converging at a slower rate than 1/n, their convergence rates are in many cases in 1/n.

Inspired by the independence testing literature [3], [38], the computation of the quantiles is made using a bootstrap procedure which generates bootstrap samples from the product of the marginal estimators to mimic the null hypothesis. Thanks to this bootstrap procedure, one is allowed to perform the weighted partial copula test using any marginal estimates as soon as one can generate random variables from these margins.

The theoretical results of the paper are as follows. Under the uniform convergence of the estimated marginals, convergence rates are established for the empirical weighted partial copula and for the resulting test statistic. These results are interesting because they allow, in principle, to use any reasonable estimates of the margins when running the test. Moreover, under an additional assumption on the estimated quantiles, which is satisfied for standard parametric estimate such as in linear quantile regression [36], the weak convergence of the test statistic is obtained.

Nonparametric testing for conditional independence between continuous variables has received an increasing interest the past few years [43]. Some of the existing approaches are based on comparing the (estimated) conditional distributions involved in the definition of conditional independence. The probability distributions can be compared using their conditional characteristic functions as in Su and White, Wang et al. [59], [63], their conditional densities as proposed in [60], or their conditional copulas as studied in [7]. Unfortunately, the estimation of these conditional quantities are subjected to the well-known curse of dimensionality, i.e., the convergence rates are badly affected by the dimension of the conditioning variable. As a consequence, the power of the previous tests rapidly deteriorates if the conditioning variable has a large dimension. Note also Bergsma [4] that uses partial copulas to derive the test statistic. Unfortunately, partial copulas fail to capture the whole conditional distribution and lead to detect a null hypothesis much larger than conditional independence. Other approaches rely on the characterization of conditional independence using cross-covariance operators defined on reproducing kernel Hilbert spaces [21]. Extending the Hilbert–Schmidt independence criterion proposed in Gretton et al., Zhang et al. [27], [65] defines a kernel-based conditional independence test (KCI-test) by estimating the cross-covariance operator (see also [58]). A surge of recent research [16], [51], [56] has focused on testing conditional independence using permutation-based tests. The work of [9] had led to many conditional independence tests depending on the availability of an approximation to the distribution of Yj|X, j=1,2, such as the conditional permutation test proposed in [5].

While it is impossible to claim the superiority of our approach compared to the existing methods, we may emphasize several notable advantages:

  • (flexibility) Any marginal estimate can be used to compute the test statistic. Since the test accuracy depends on the approximation quality of the marginal distribution, the test shall benefit from a precise estimation procedure of the margins.

  • (bootstrap quantiles) A nice feature of the proposed framework is that the quantiles of the test can be approximated using the bootstrap as soon as random generation from each margins is available.

  • (convergence rate and dimension) In certain situations (parametric and smoothed nonparametric marginal estimate), the rate of convergence of the test statistic under the null is 1/n and hence is free from the covariates dimension (whereas it impacts most nonparametric procedures). This property suggests that the proposed test has reasonable power in the multidimensional context.

The outline is as follows. In Section 2, we introduce the weighted partial copula test and provide implementation details regarding the bootstrap procedure. In Section 3, we state the main theoretical results. In Section 4, the proposed approach is compared to several competitors on simulated datasets. The mathematical proofs are gathered in the Appendix.

Section snippets

Set-up and definitions

Let fX,Y be the density function (with respect to the Lebesgue measure) of the random triple (X,Y)=(X,Y1,Y2)Rd×R×R. Let fX and SX denote the density and the support of X, respectively. The conditional cumulative distribution function of Y given X=x is given by yH(yx) for xSX. The generalized inverse of a univariate distribution function F is defined as F(u)=inf{yR:F(y)u}, for all u[0,1], with the convention that inf=+. Since H(|x) is a continuous bivariate cumulative distribution

Consistency of the empirical weighted partial copula and of the test statistic

The results of the section are provided in a general setting in which the estimates of the conditional margins are left unspecified but should satisfy a particular condition, namely the uniform consistency. We give some examples after the statements of the main results. Let P denote the probability measure on the underlying probability space associated to the whole sequence (Xi,Yi)i=1,2,. In what follows (rn)n1 denotes a positive sequence. We write Zn=OP(rn), if Zn/rn is a tight sequence of

Numerical experiments

In this section, we apply the proposed copula test to synthetic and real data to evaluate its performance. The weighted partial copula test is put to work with two different margins estimates. As a first approach, we consider the following parametric estimate of the conditional margins: Fˆn,j(LR)(y|x)=Φxβˆj,σˆ(y),(yR),where Φ(m,σ) denotes the Gaussian cumulative distribution function with mean m and variance σ2. This estimate will be referred to as the linear regression (LR) estimate because β

Conclusion

In this work, we have developed a test of conditional independence between two continuous variables Y1 and Y2 given a third variable, still continuous, X. The test is based on the weighted partial copulas, introduced in the paper, and can be implemented with any conditional marginal estimate as long as one can generate random variables under this marginal estimate. This last requirement allows to rely on a simple bootstrap strategy for computing the quantiles of the test. From the theoretical

CRediT authorship contribution statement

Pascal Bianchi: Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing. Kevin Elgui: Methodology, Software, Supervision, Visualization, Writing – original draft, Writing – review & editing. François Portier: Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing.

Acknowledgment

The PhD thesis of Kevin Elgui was founded partly by the Sigfox company (Toulouse, France) .

References (65)

  • RémillardB. et al.

    Testing for equality between two copulas

    J. Multivariate Anal.

    (2009)
  • SuL. et al.

    A consistent characteristic function-based test for conditional independence

    J. Econometrics

    (2007)
  • WenocurR.S. et al.

    Some special vapnik-chervonenkis classes

    Discrete Math.

    (1981)
  • BachF.R. et al.

    Learning graphical models with Mercer kernels

  • BellR.C. et al.

    Conditional independence in a clustered item test

    Appl. Psychol. Meas.

    (1988)
  • BergsmaW.

    Nonparametric testing of conditional independence by means of the partial copula

    (2010)
  • BerrettT.B. et al.

    The conditional permutation test for independence while controlling for confounders

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2019)
  • BiauG. et al.

    Lectures on the Nearest Neighbor Method. Vol. 246

    (2015)
  • BouezmarniT. et al.

    Nonparametric copula-based test for conditional independence with applications to Granger causality

    J. Bus. Econom. Statist.

    (2012)
  • CandesE. et al.

    Panning for gold:‘model-X’knockoffs for high dimensional controlled variable selection

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2018)
  • DabrowskaD.M.

    Uniform consistency of the kernel conditional Kaplan-Meier estimate

    Ann. Statist.

    (1989)
  • DawidA.P.

    Conditional independence in statistical theory

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (1979)
  • DerumignyA. et al.

    About tests of the “simplifying” assumption for conditional copulas

    Depend. Model.

    (2017)
  • DerumignyA. et al.

    Conditional empirical copula processes and generalized dependence measures

    (2020)
  • DoranG. et al.

    A permutation-based kernel conditional independence test

  • DuranteF. et al.

    Principles of Copula Theory

    (2015)
  • EinmahlU. et al.

    An empirical process approach to the uniform consistency of kernel-type function estimators

    J. Theoret. Probab.

    (2000)
  • FanJ. et al.
  • FermanianJ.-D. et al.

    Weak convergence of empirical copula processes

    Bernoulli

    (2004)
  • FukumizuK. et al.

    Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces

    J. Mach. Learn. Res.

    (2004)
  • GenestC. et al.

    Test of independence and randomness based on the empirical copula process

    Test

    (2004)
  • GijbelsI. et al.

    Estimation of a copula when a covariate affects only marginal distributions

    Scand. J. Stat.

    (2015)
  • Cited by (1)

    View full text