Consistency and asymptotic normality of FastICA and bootstrap FastICA

doi:10.1016/j.sigpro.2011.11.025

Signal Processing

Volume 92, Issue 8, August 2012, Pages 1767-1778

https://doi.org/10.1016/j.sigpro.2011.11.025 Get rights and content

Abstract

Independent component analysis (ICA) is possibly the most widespread approach to solve the blind source separation problem. Many different algorithms have been proposed, together with several highly successful applications. There is also an extensive body of work on the theoretical foundations and limits of the ICA methodology.

One practical concern about the use of ICA with real world data is the robustness of its estimates. Slight variations in the estimates, may stem from the inherent stochastic nature of the algorithms used or some deviations from the theoretical assumptions. To overcome this problem, different approaches have been proposed, most of which are based on the use of multiple runs of ICA algorithms with bootstrap.

Here we show the consistency and asymptotic normality of FastICA and bootstrap FastICA, based on empirical process theory, including Z-estimators and Hoeffding's inequality. These results give theoretical grounds for the robust use of FastICA, in a multiple run, bootstrap and randomly initialized manner. In this framework, it is also possible to assess the convergence of the algorithm through a normality test.

Introduction

In the recent years, blind source separation (BSS) has become a mainstream topic in signal and image processing, with independent component analysis (ICA) as possibly the most widespread approach to solve the aforementioned problem. The considerable amount of publications and dedicated conferences and workshops in these topics attest to their scientific spread. Furthermore, it is believed that the basic theoretical foundations of ICA, as well as various of its applications are rather well understood. A multitude of algorithms has been proposed and carefully analyzed (cf., [14], [6], [4]). Especially, for the FastICA algorithm (cf., [13], [15]), which is the main topic of this paper, theoretical limits have been presented earlier (cf., [16], [13], [19]).

One persistent concern for the practical use of ICA with real data is the robustness of the estimated sources. Repeated use of most ICA algorithms results in slight variations in the estimated components. This may have a variety of possible factors, including the possible inherently stochastic nature of the ICA implementation, or some departures from the ideal theoretical assumptions made [23]. To assess the robustness of the source estimation, several methods have been proposed, often based on a bootstrapped analysis of the estimated components [8], or using a bagging approach of FastICA (cf., [15]), a widely used algorithmic implementation of ICA. In [10], the multiple runs of FastICA are clustered by a self-organizing map, revealing common properties of the many estimated sources. In [23], using a different clustering strategy, the estimating variability itself is addressed, as a source of information on the robustness properties of the estimates.

We have shown experimentally that a multiple run of FastICA in a bagging manner [2] can lead to very interesting new insights into the analysis of independent components of functional magnetic resonance images (fMRI, [12]) (cf., [21], [10]). Nevertheless, in spite of such positive experimental outcomes, there has been no clear study of the theoretical validity of a bootstrapping approach to ICA algorithms.

In the present paper, we show the asymptotic normality of FastICA and bootstrapped FastICA, using the method of empirical process theory and Z-estimators in particular and their bootstrapped versions [20]. We first review the basic theoretical background behind FastICA, followed by a thorough theoretic study of the consistency and asymptotic normality of the algorithm. Experimental illustrations of these properties will be shown, using real fMRI data. The paper concludes with a brief discussion of the limitations and the implications of the current study. Note that, although we focus on the FastICA algorithm, we believe that several of the considerations made throughout the paper can be extended to other ICA methods. Yet, such extension is beyond the scope of the current paper.

In the following, we introduce the notation used throughout the paper. We assume $x_{1}, \dots, x_{n}$ are independent and identically distributed by some joint distribution P, $x_{i} \in X \subset R^{d}$ , satisfying the ICA model defined by $x = A s$ . $s = (s_{1}, \dots, s_{d})^{⊤}$ with s_i and s_j statistically independent for $i \neq j$ , and $A \in R^{d \times d}$ is full rank. Let us denote the covariance matrix of $x$ by $Σ_{d} = cov {x}$ . Then, we define the whitened random vector $z = Σ_{d}^{- 1 / 2} x$ , which has identity covariance matrix. In general, the empirical estimate will be denoted by ${\hat{θ}}_{n}$ or ${\hat{w}}_{n}$ . Convergence in distribution is denoted by $⇝$ , whereas convergence in probability is denoted by $\overset{P}{\to}$ . The bootstrap sample and corresponding estimate carry superscript $⁎$ , e.g. $z_{i}^{⁎}$ , ${\hat{θ}}_{n}^{⁎}$ for original samples $z_{i}$ and sample estimates ${\hat{θ}}_{n}$ . Parameter with subscript “0”, $w_{0}$ , denotes the true parameter. $Ef ≔ \int_{X} f d P$ and $E_{n} f ≔ (1 / n) \sum_{i = 1}^{n} f (x_{i})$ , where $E_{n} = (1 / n) \sum_{i = 1}^{n} δ_{x_{i}} (\cdot)$ is the empirical measure, which puts mass $1 / n$ on samples. The r-norm is defined by $∥ f ∥_{r} ≔ (E | f |^{r})^{1 / r}$ . L_r(P) is the space defined by the norm $∥ \cdot ∥_{r}$ . The operator $diag [\dots]$ returns a diagonal matrix with its argument on the diagonal elements. The Lipschitz norm is denoted by $| \cdot |_{L}$ . o_P stands for stochastic order and O is the numerical order. The operator norm is denoted by $∥ \cdot ∥_{o}$ and is defined by $∥ A ∥_{o} = \sup_{∥ t ∥_{2} = 1} ∥ At ∥_{2}$ .

Section snippets

FastICA algorithm and previous convergence results

Let us assume the whitened samples ${z_{i}}_{i = 1}^{n}$ satisfying the ICA model with an orthogonal mixing matrix. It has been shown that ICA model is identifiable up to rotation and scaling, for details see [6]. One direction of deriving an algorithm for ICA is to search for a suitable vector $w \in R^{d}$ which maximizes the non-Gaussianity of $w^{⊤} z$ . Intuitively, by Central Limit Theorem, the maximum non-Gaussianity corresponds to a maximum degree of separation. This framework resembles Projection Pursuit [11].

Main results

The main focus in this paper is to show that FastICA is a consistent estimator and it asymptotically converges to a normal random vector. The consistency of the estimator is similar to a weak law of large numbers [20], where we need to show that the estimator converges in probability to a nearby local or global solution as the number of samples increases. For the asymptotic normality we will show that the difference between the random estimator and the true solution, or true mapping, converges

Experimental results

To illustrate the theoretical implications in practice, a series of experiments were performed with both artificially generated and real-world data. The robust ICA approach used in the experiments was introduced in [23] and is implemented in the Arabica toolbox [22]. Using the toolbox, FastICA can be run multiple times, with varying initial conditions and bootstrap sampling.

In both simulated and real cases, an initial test with 100 runs of FastICA was performed without bootstrap, in order to

Discussion

ICA algorithm and FastICA, in particular, have been successfully applied in practice to source separation of many data types, e.g. biomedical, audio signal processing, and hyperspectral image analysis [5]. Yet, when applying the algorithm twice with the same data, with different initializations or data subsamplings, one may encounter variations in the estimated sources. This perturbation phenomenon is due to the nonlinear and non-convex random mapping nature of the algorithm. To address this

References (23)

P. Comon
Independent component analysis, a new concept?
Signal Processing
(1994)
S. Harmeling et al.
Injecting noise for analysing the stability of ICA components
Signal Processing
(2004)
J. Himberg et al.
Validating the independent components of neuroimaging time series via clustering and visualization
NeuroImage
(2004)
A. Hyvärinen et al.
Independent component analysis: algorithms and applications
Neural Networks
(2000)
J. Ylipaavalniemi et al.
Dependencies between stimuli and spatially independent fMRI sources: towards brain correlates of natural stimuli
NeuroImage
(2009)
J. Ylipaavalniemi et al.
Analyzing consistency of independent components: an fMRI illustration
NeuroImage
(2008)
T.W. Anderson
An Introduction to Multivariate Statistical Analysis
(2003)
L. Breiman
Bagging predictors
Machine Learning
(1996)
G. Cheng et al.
Bootstrap consistency for general semiparametric m-estimation
The Annals of Statistics
(2010)
A. Cichocki et al.
Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications
(2003)

Cited by (14)

Directed acyclic graph based information shares for price discovery
2022, Journal of Economic Dynamics and Control
Citation Excerpt :
The asymptotic normality of the ICA estimates have been already proven for a variety of different optimization procedures. A comprehensive theoretical discussion on the statistical properties of the FastICA estimator can be found in Reyhani et al. (2012). It should be mentioned that also other studied proposed to use non-Normal distributions to identify structural shocks in SVAR models (Gouriéroux et al., 2017; Lanne and Lütkepohl, 2010; Lanne et al., 2017) by assuming specific density functions for the shocks.
The possibility to measure the contribution of agents and exchanges to the price formation process in financial markets acquired increasing importance in the literature. In this paper I propose to exploit a data-driven approach to identify structural vector error correction models (SVECM) typically used for price discovery. Exploiting the non-Normal distributions of the variables under consideration, I propose a variant of the widespread Information Share measure, which I will refer to as the Directed Acyclic Graph based-Information Shares(DAG-IS), which can identify the leaders and the followers in the price formation process through the exploitation of a causal discovery algorithm well established in the area of machine learning. The approach will be illustrated from a semi-parametric perspective, solving the identification problem with no need to increase the computational complexity which usually arises when working at incredibly short time scales. Finally, an empirical application on IBM intraday data will be provided.
Performance analysis for complex-valued FastICA and its improvement based on the Tukey M-estimator
2021, Digital Signal Processing: A Review Journal
Independent component analysis (ICA) is increasingly utilized to modern digital signal processing. Complex-valued FastICA, a fast fixed-point algorithm for ICA, is one of the most non-trivial algorithms for solving the ICA problems in the complex domain. Hitherto, there have been several attempts to give performance analysis for complex-valued FastICA. Rigorous theoretical analysis, however, still has room for improvement further. Consequently, the purposes of this paper are threefold: Firstly, the uniformity of the complex-valued FastICA estimator is constructed for the first time. Secondly, the stability of the complex-valued ICA algorithm is rigorously deduced based on the augmented generating matrix. Meanwhile, the local convergence of complex-valued FastICA algorithm is derived based on circular source signals. Finally, for improving the performance of separation, we select a novel alternative for nonlinearity based on the Tukey M-estimator in the complex-valued FastICA algorithm. Further, we prove the existence of local optimal solution and stability of the complex ICA problem based on the Tukey M-estimator. Simulations are presented to demonstrate the accuracy of our analysis. Additionally, the experimental results with synthetic data and complex-valued wind signal show the superiorities of the improved method.
Statistical inference for independent component analysis: Application to structural VAR models
2017, Journal of Econometrics
The well-known problem of non-identifiability of structural VAR models disappears if the structural shocks are independent and if at most one of them is Gaussian. In that case, the relevant estimation technique is the Independent Component Analysis (ICA). Since the introduction of ICA by Comon (1994), various semi-parametric estimation methods have been proposed for “orthogonalizing” the error terms. These methods include pseudo maximum likelihood (PML) approaches and recursive PML. The aim of our paper is to derive the asymptotic properties of the PML approaches, in particular to study their consistency. We conduct Monte Carlo studies exploring the relative performances of these methods. Finally, an application based on real data shows that structural VAR models can be estimated without additional identification restrictions in the non-Gaussian case and that the usual restrictions can be tested.
Application of underdetermined blind source separation in ultra-wideband communication signals
2013, Journal of China Universities of Posts and Telecommunications
Aiming to the estimation of source numbers, mixing matrix and separation of mixing signals under underdetermined case, the article puts forward a method of underdetermined blind source separation (UBSS) with an application in ultra-wideband (UWB) communication signals. The method is based on the sparse characteristic of UWB communication signals in the time domain. Firstly, finding the single source area by calculating the ratio of observed sampling points. Then an algorithm called hough-windowed method was introduced to estimate the number of sources and mixing matrix. Finally the separation of mixing signals using a method based on amended subspace projection. The simulation results indicate that the proposed method can separate UWB communication signals successfully, estimate the mixing matrix with higher accuracy and separate the mixing signals with higher gain compared with other conventional algorithms. At the same time, the method reflects the higher stability and the better noise immunity.
Calibration and Validation of Macroeconomic Simulation Models by Statistical Causal Search
2023, SSRN
Estimating the higher-order co-moment with non-Gaussian components and its application in portfolio selection
2022, Statistics

View all citing articles on Scopus

View full text

Consistency and asymptotic normality of FastICA and bootstrap FastICA

Abstract

Introduction

Section snippets

FastICA algorithm and previous convergence results

Main results

Experimental results

Discussion

Signal Processing

Signal Processing

NeuroImage

Neural Networks

NeuroImage

NeuroImage

An Introduction to Multivariate Statistical Analysis

Bagging predictors

Machine Learning

Bootstrap consistency for general semiparametric m-estimation

The Annals of Statistics

Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications