Elsevier

Signal Processing

Volume 92, Issue 8, August 2012, Pages 1767-1778
Signal Processing

Consistency and asymptotic normality of FastICA and bootstrap FastICA

https://doi.org/10.1016/j.sigpro.2011.11.025Get rights and content

Abstract

Independent component analysis (ICA) is possibly the most widespread approach to solve the blind source separation problem. Many different algorithms have been proposed, together with several highly successful applications. There is also an extensive body of work on the theoretical foundations and limits of the ICA methodology.

One practical concern about the use of ICA with real world data is the robustness of its estimates. Slight variations in the estimates, may stem from the inherent stochastic nature of the algorithms used or some deviations from the theoretical assumptions. To overcome this problem, different approaches have been proposed, most of which are based on the use of multiple runs of ICA algorithms with bootstrap.

Here we show the consistency and asymptotic normality of FastICA and bootstrap FastICA, based on empirical process theory, including Z-estimators and Hoeffding's inequality. These results give theoretical grounds for the robust use of FastICA, in a multiple run, bootstrap and randomly initialized manner. In this framework, it is also possible to assess the convergence of the algorithm through a normality test.

Introduction

In the recent years, blind source separation (BSS) has become a mainstream topic in signal and image processing, with independent component analysis (ICA) as possibly the most widespread approach to solve the aforementioned problem. The considerable amount of publications and dedicated conferences and workshops in these topics attest to their scientific spread. Furthermore, it is believed that the basic theoretical foundations of ICA, as well as various of its applications are rather well understood. A multitude of algorithms has been proposed and carefully analyzed (cf., [14], [6], [4]). Especially, for the FastICA algorithm (cf., [13], [15]), which is the main topic of this paper, theoretical limits have been presented earlier (cf., [16], [13], [19]).

One persistent concern for the practical use of ICA with real data is the robustness of the estimated sources. Repeated use of most ICA algorithms results in slight variations in the estimated components. This may have a variety of possible factors, including the possible inherently stochastic nature of the ICA implementation, or some departures from the ideal theoretical assumptions made [23]. To assess the robustness of the source estimation, several methods have been proposed, often based on a bootstrapped analysis of the estimated components [8], or using a bagging approach of FastICA (cf., [15]), a widely used algorithmic implementation of ICA. In [10], the multiple runs of FastICA are clustered by a self-organizing map, revealing common properties of the many estimated sources. In [23], using a different clustering strategy, the estimating variability itself is addressed, as a source of information on the robustness properties of the estimates.

We have shown experimentally that a multiple run of FastICA in a bagging manner [2] can lead to very interesting new insights into the analysis of independent components of functional magnetic resonance images (fMRI, [12]) (cf., [21], [10]). Nevertheless, in spite of such positive experimental outcomes, there has been no clear study of the theoretical validity of a bootstrapping approach to ICA algorithms.

In the present paper, we show the asymptotic normality of FastICA and bootstrapped FastICA, using the method of empirical process theory and Z-estimators in particular and their bootstrapped versions [20]. We first review the basic theoretical background behind FastICA, followed by a thorough theoretic study of the consistency and asymptotic normality of the algorithm. Experimental illustrations of these properties will be shown, using real fMRI data. The paper concludes with a brief discussion of the limitations and the implications of the current study. Note that, although we focus on the FastICA algorithm, we believe that several of the considerations made throughout the paper can be extended to other ICA methods. Yet, such extension is beyond the scope of the current paper.

In the following, we introduce the notation used throughout the paper. We assume x1,,xn are independent and identically distributed by some joint distribution P, xiXRd, satisfying the ICA model defined by x=As. s=(s1,,sd) with si and sj statistically independent for ij, and ARd×d is full rank. Let us denote the covariance matrix of x by Σd=cov{x}. Then, we define the whitened random vector z=Σd1/2x, which has identity covariance matrix. In general, the empirical estimate will be denoted by θ^n or w^n. Convergence in distribution is denoted by , whereas convergence in probability is denoted by P. The bootstrap sample and corresponding estimate carry superscript , e.g. zi, θ^n for original samples zi and sample estimates θ^n. Parameter with subscript “0”, w0, denotes the true parameter. EfXfdP and Enf(1/n)i=1nf(xi), where En=(1/n)i=1nδxi(·) is the empirical measure, which puts mass 1/n on samples. The r-norm is defined by fr(E|f|r)1/r. Lr(P) is the space defined by the norm ·r. The operator diag[] returns a diagonal matrix with its argument on the diagonal elements. The Lipschitz norm is denoted by |·|L. oP stands for stochastic order and O is the numerical order. The operator norm is denoted by ·o and is defined by Ao=supt2=1At2.

Section snippets

FastICA algorithm and previous convergence results

Let us assume the whitened samples {zi}i=1n satisfying the ICA model with an orthogonal mixing matrix. It has been shown that ICA model is identifiable up to rotation and scaling, for details see [6]. One direction of deriving an algorithm for ICA is to search for a suitable vector wRd which maximizes the non-Gaussianity of wz. Intuitively, by Central Limit Theorem, the maximum non-Gaussianity corresponds to a maximum degree of separation. This framework resembles Projection Pursuit [11].

Main results

The main focus in this paper is to show that FastICA is a consistent estimator and it asymptotically converges to a normal random vector. The consistency of the estimator is similar to a weak law of large numbers [20], where we need to show that the estimator converges in probability to a nearby local or global solution as the number of samples increases. For the asymptotic normality we will show that the difference between the random estimator and the true solution, or true mapping, converges

Experimental results

To illustrate the theoretical implications in practice, a series of experiments were performed with both artificially generated and real-world data. The robust ICA approach used in the experiments was introduced in [23] and is implemented in the Arabica toolbox [22]. Using the toolbox, FastICA can be run multiple times, with varying initial conditions and bootstrap sampling.

In both simulated and real cases, an initial test with 100 runs of FastICA was performed without bootstrap, in order to

Discussion

ICA algorithm and FastICA, in particular, have been successfully applied in practice to source separation of many data types, e.g. biomedical, audio signal processing, and hyperspectral image analysis [5]. Yet, when applying the algorithm twice with the same data, with different initializations or data subsamplings, one may encounter variations in the estimated sources. This perturbation phenomenon is due to the nonlinear and non-convex random mapping nature of the algorithm. To address this

References (23)

  • Cited by (14)

    • Directed acyclic graph based information shares for price discovery

      2022, Journal of Economic Dynamics and Control
      Citation Excerpt :

      The asymptotic normality of the ICA estimates have been already proven for a variety of different optimization procedures. A comprehensive theoretical discussion on the statistical properties of the FastICA estimator can be found in Reyhani et al. (2012). It should be mentioned that also other studied proposed to use non-Normal distributions to identify structural shocks in SVAR models (Gouriéroux et al., 2017; Lanne and Lütkepohl, 2010; Lanne et al., 2017) by assuming specific density functions for the shocks.

    • Application of underdetermined blind source separation in ultra-wideband communication signals

      2013, Journal of China Universities of Posts and Telecommunications
    View all citing articles on Scopus
    View full text