Free Component Analysis: Theory, Algorithms and Applications

Nadakuditi, Raj Rao; Wu, Hao

doi:10.1007/s10208-022-09564-w

Free Component Analysis: Theory, Algorithms and Applications

Published: 11 April 2022

Volume 23, pages 973–1042, (2023)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Raj Rao Nadakuditi¹ &
Hao Wu²

540 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We describe a method for unmixing mixtures of freely independent random variables in a manner analogous to the independent component analysis (ICA)-based method for unmixing independent random variables from their additive mixtures. Random matrices play the role of free random variables in this context so the method we develop, which we call free component analysis (FCA), unmixes matrices from additive mixtures of matrices. Thus, while the mixing model is standard, the novelty and difference in unmixing performance comes from the introduction of a new statistical criteria, derived from free probability theory, that quantify freeness analogous to how kurtosis and entropy quantify independence. We describe the theory, the various algorithms, and compare FCA to vanilla ICA which does not account for spatial or temporal structure. We highlight why the statistical criteria make FCA also vanilla despite its matricial underpinnings and show that FCA performs comparably to, and sometimes better than, (vanilla) ICA in every application, such as image and speech unmixing, where ICA has been known to succeed. Our computational experiments suggest that not-so-random matrices, such as images and short-time Fourier transform matrix of waveforms are (closer to being) freer “in the wild” than we might have theoretically expected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Component Analysis: A General Framework for Linear and Nonlinear Blind Source Separation and Mixture Identification

Independent Component Analysis and Bayesian Separation Methods

Extensions and Conclusion

Notes

Here ${{\widehat{F}}}(\cdot )$ is either the (self-adjoint or rectangular) free kurtosis, the free entropy or a higher (than fourth)-order (even-valued) free cumulant. See Table 2.

References

Almeida, L.B.: MISEP–Linear and nonlinear ICA based on mutual information. Journal of Machine Learning Research 4(Dec), 1297–1318 (2003)
MATH Google Scholar
Anderson, G.W., Farrell, B.: Asymptotically liberating sequences of random unitary matrices. Advances in Mathematics 255, 381–413 (2014)
Article MathSciNet MATH Google Scholar
Arora, S., Ge, R., Moitra, A., Sachdeva, S.: Provable ica with unknown gaussian noise, with implications for gaussian mixtures and autoencoders. In: Advances in Neural Information Processing Systems, pp. 2375–2383 (2012)
Barry, D., Coyle, E., Fitzgerald, D., Lawlor, R.: Single channel source separation using short-time independent component analysis. In: Audio Engineering Society Convention 119 (2005). Audio Engineering Society
Bell, A.J., Sejnowski, T.J.: The “independent components” of natural scenes are edge filters. Vision research 37(23), 3327–3338 (1997)
Article Google Scholar
Benaych-Georges, F.: Rectangular random matrices, entropy, and fisher’s information. Journal of Operator Theory, 371–419 (2009a)
Benaych-Georges, F.: Rectangular random matrices, related convolution. Probability Theory and Related Fields 144(3-4), 471–515 (2009b)
Article MathSciNet MATH Google Scholar
Benaych-Georges, F.: Rectangular random matrices, related convolution. Probability Theory and Related Fields 144(3-4), 471–515 (2009c)
Article MathSciNet MATH Google Scholar
Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal processing 81(11), 2353–2362 (2001)
Article MATH Google Scholar
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research 15, 1455–1459 (2014)
MATH Google Scholar
Brakel, P., Bengio, Y.: Learning independent features with adversarial nets for non-linear ica. arXiv preprint arXiv:1710.05050 (2017)
Cardoso, J.-F.: High-order contrasts for independent component analysis. Neural computation 11(1), 157–192 (1999)
Article MathSciNet Google Scholar
Casey, M.A., Westner, A.: Separation of mixed audio sources by independent subspace analysis. In: ICMC, pp. 154–161 (2000)
Cébron, G., Dahlqvist, A., Male, C.: Universal constructions for spaces of traffics. arXiv preprint arXiv:1601.00168 (2016)
Chen, A., Bickel, P.J.: Efficient independent component analysis. The Annals of Statistics 34(6), 2825–2855 (2006)
Article MathSciNet MATH Google Scholar
Chissom, B.S.: Interpretation of the kurtosis statistic. The American Statistician 24(4), 19–22 (1970)
Google Scholar
Chistyakov, G., Götze, F.: Characterization problems for linear forms with free summands. arXiv preprint arXiv:1110.1527 (2011)
Comon, P.: Independent component analysis, a new concept? Signal processing 36(3), 287–314 (1994)
Article MATH Google Scholar
Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic press, Cambridge, MA (2010)
Google Scholar
Cornish, E.A., Fisher, R.A.: Moments and cumulants in the specification of distributions. Revue de l’Institut international de Statistique, 307–320 (1938)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, Hoboken, NJ (2012)
MATH Google Scholar
Cruces, S., Castedo, L., Cichocki, A.: Robust blind source separation algorithms using cumulants. Neurocomputing 49(1-4), 87–118 (2002)
Article MATH Google Scholar
Davies, M.E., James, C.J.: Source separation using single channel ica. Signal Processing 87(8), 1819–1832 (2007)
Article MATH Google Scholar
De Lathauwer, L., Castaing, J., Cardoso, J.-F.: Fourth-order cumulant-based blind identification of underdetermined mixtures. IEEE Transactions on Signal Processing 55(6), 2965–2973 (2007)
Article MathSciNet MATH Google Scholar
Edelman, A., Rao, N.R.: Random matrix theory. Acta Numerica 14, 233–297 (2005)
Article MathSciNet MATH Google Scholar
Eriksson, J., Koivunen, V.: Blind identifiability of class of nonlinear instantaneous ICA models. In: 2002 11th European Signal Processing Conference, pp. 1–4 (2002). IEEE
Eriksson, J., Koivunen, V.: Identifiability, separability, and uniqueness of linear ica models. IEEE signal processing letters 11(7), 601–604 (2004)
Article Google Scholar
Frieze, A., Jerrum, M., Kannan, R.: Learning linear transformations. In: Proceedings of 37th Conference on Foundations of Computer Science, pp. 359–368 (1996). IEEE
Gao, P., Chang, E.-C., Wyse, L.: Blind separation of fetal ecg from single mixture using svd and ica. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 3, pp. 1418–1422 (2003). IEEE
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
Haykin, S., Chen, Z.: The cocktail party problem. Neural computation 17(9), 1875–1902 (2005)
Article Google Scholar
Hiai, F., Petz, D.: The Semicircle Law, Free Random Variables and Entropy. Mathematical Surveys and Monographs, vol. 77, p. 376. American Mathematical Society, Providence, RI (2000)
Hoyer, P.O., Hyvärinen, A.: Independent component analysis applied to feature extraction from colour and stereo images. Network: computation in neural systems 11(3), 191–210 (2000)
Article MATH Google Scholar
Hyvarinen, A.J., Morioka, H.: Nonlinear ICA of temporally dependent stationary sources. (2017). Proceedings of Machine Learning Research
Hyvarinen, A.: A family of fixed-point algorithms for independent component analysis. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. 3917–3920 (1997a). IEEE
Hyvarinen, A.: One-unit contrast functions for independent component analysis: A statistical analysis. In: Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop, pp. 388–397 (1997b). IEEE
Hyvarinen, A.: Fast and robust fixed-point algorithms for Independent Component Analysis. IEEE transactions on Neural Networks 10(3), 626–634 (1999)
Article Google Scholar
Hyvarinen, A., Morioka, H.: Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. In: Advances in Neural Information Processing Systems, pp. 3765–3773 (2016)
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4-5), 411–430 (2000)
Article Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis vol. 46. John Wiley & Sons, Hoboken, NJ (2004)
Google Scholar
Hyvarinen, A., Sasaki, H., Turner, R.E.: Nonlinear ICA using auxiliary variables and generalized contrastive learning. arXiv preprint arXiv:1805.08651 (2018)
Ilmonen, P., Nordhausen, K., Oja, H., Ollila, E.: A new performance index for ica: properties, computation and asymptotic analysis. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 229–236 (2010). Springer
Lee, T.-W.: Independent Component Analysis. In: Independent Component Analysis, pp. 27–66. Springer, Boston (1998)
Lee, T.-W., Girolami, M., Bell, A.J., Sejnowski, T.J.: A unifying information-theoretic framework for independent component analysis. Computers & Mathematics with Applications 39(11), 1–21 (2000)
Article MathSciNet MATH Google Scholar
Lehner, F.: Cumulants in noncommutative probability theory i. noncommutative exchangeability systems. Mathematische Zeitschrift 248(1), 67–100 (2004)
Article MathSciNet MATH Google Scholar
Male, C.: Traffic distributions and independence: permutation invariant random matrices and the three notions of independence. arXiv preprint arXiv:1111.4662 (2011)
Meyer, C.D., Stewart, G.W.: Derivatives and perturbations of eigenvectors. SIAM Journal on Numerical Analysis 25(3), 679–691 (1988)
Article MathSciNet MATH Google Scholar
Mika, D., Budzik, G., Jozwik, J.: Single channel source separation with ica-based time-frequency decomposition. Sensors 20(7), 2019 (2020)
Article Google Scholar
Mingo, J.A., Speicher, R.: Free Probability and Random Matrices vol. 35. Springer, New York (2017)
MATH Google Scholar
Mitsui, Y., Kitamura, D., Takamichi, S., Ono, N., Saruwatari, H.: Blind Source Separation based on independent low-rank matrix analysis with sparse regularization for time-series activity. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference On, pp. 21–25 (2017). IEEE
Mogensen, P.K., Riseth, A.N.: Optim: A mathematical optimization package for Julia. Journal of Open Source Software 3(24), 615 (2018). https://doi.org/10.21105/joss.00615
Article Google Scholar
Nadakuditi, R.R., Wu, H.: lingluanwh/FCA.jl: a blind source separation package based on the random matrix theory and free probability (2019). https://doi.org/10.5281/zenodo.2655944
Nica, A., Speicher, R.: Lectures on the Combinatorics of Free Probability vol. 13. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Oja, E., Yuan, Z.: The fastica algorithm revisited: Convergence analysis. IEEE Transactions on Neural Networks 17(6), 1370–1381 (2006)
Article Google Scholar
Pearson, K.: Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(11), 559–572 (1901)
Article MATH Google Scholar
Pourazad, M., Moussavi, Z., Farahmand, F., Ward, R.: Heart sounds separation from lung sounds using independent component analysis. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pp. 2736–2739 (2006). IEEE
Smith, P.J.: A recursive formulation of the old problem of obtaining moments from cumulants and vice versa. The American Statistician 49(2), 217–218 (1995)
MathSciNet Google Scholar
Speicher, R.: Multiplicative functions on the lattice of non-crossing partitions and free convolution. Mathematische Annalen 298(1), 611–628 (1994)
Article MathSciNet MATH Google Scholar
Voiculescu, D.: Limit laws for random matrices and free products. Inventiones mathematicae 104(1), 201–220 (1991)
Article MathSciNet MATH Google Scholar
Voiculescu, D.: The analogues of entropy and of fisher’s information measure in free probability theory, i. Communications in mathematical physics 155(1), 71–92 (1993)
Article MathSciNet MATH Google Scholar
Voiculescu, D.: The analogues of entropy and of fisher’s information measure in free probability theory, ii. Inventiones mathematicae 118(1), 411–440 (1994)
Article MathSciNet MATH Google Scholar
Voiculescu, D.: Operations on certain non-commutative operator-valued random variables, in recent advances in operator algebras. Astérisque 232, 243–275 (1995)
MATH Google Scholar
Voiculescu, D.: The analogues of entropy and of fisher’s information measure in free probability theory, iv: maximum entropy and freeness, in free probability theory. Fields Inst. Commun. 12, 293–302 (1997)
MATH Google Scholar

Download references

Acknowledgements

We thank Peter Bickel for inspiring us to revisit FCA via a serendipitous meeting at the Santa Fe Institute in December 2015. That meeting, and his remarks on ICA and all the ways in which it is natural, provided the spark for us spending the rest of that workshop and the following month thinking about all the ways that FCA was natural for random matrices and images. We implemented our first FCA algorithm soon thereafter and leaned into the theory after getting, and being overjoyed by, the image separation results in Fig. 3d. We thank Arvind Prasadan for his detailed comments and suggestions on earlier versions of this manuscript. We are grateful to Alfred Hero for his suggestion to try the denoising simulation in Fig. 3a which brought into sharp focus for us for the first time that FCA could do (much) better than ICA. (This was a simulation we had been avoiding till because we feared the opposite!) This work has benefited from Roland Speicher’s many insightful comments and suggestions and from Octavio Arizmendi Echegaray’s remarks that made us better understand the underlying free probabilistic structures that made some of the FCA identifiability-related questions fundamentally different than their ICA counterparts. This research was supported by ONR grant N00014-15-1-2141, DARPA Young Faculty Award D14AP00086 and ARO MURI W911NF-11-1-039. A Julia implementation of the FCA algorithm as well as code to reproduce the simulations and figures in this paper is available at Github [52].

Author information

Authors and Affiliations

Department of EECS, University of Michigan, 1301 Beal Avenue, Ann Arbor, 48109, MI, USA
Raj Rao Nadakuditi
Department of Mathematics, University of Michigan, 530 Church St, Ann Arbor, 48109, MI, USA
Hao Wu

Authors

Raj Rao Nadakuditi
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Wu.

Additional information

Communicated by Rachel Ward.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

What is Freeness of Random Variables?

The goal of this section is to introduce the freeness of non-commutative random variable. We first discuss independence (freeness) in the context of the scalar probability, free probability for self-adjoint (non-commutative) random variables and free probability for rectangular (non-commutative) random variables, respectively. We focus on the behavior of (free) cumulants and (free) entropy of independent (free) random variables, which are the basis ICA (FCA). The connection between independent random matrices and free random variables is given at the end.

For a detailed introduction of free probability, readers are referred to [32, 49, 53].

1.1 Prologue: What is Independence of Commuting Random Variables?

Here, we briefly review statistical independence in scalar probability. We state the behavior of cumulants and entropy of independent random variables, which are the basis of ICA. In the end, we discuss the unique role of the Gaussian random variables play in ICA.

1.1.1 Mixed Moments Point of View

Let I denote an index set, and $(x_i)_{i\in I}$ denote random variables. They are independent if for any $n \in {\mathbb {N}}$ and $m_1, \ldots , m_n \ge 0$,

$$\begin{aligned} {{\,\mathrm{{\mathbb {E}}}\,}}[x_{i(1)}^{m_1} \ldots x_{i(n)}^{m_n}] = {{\,\mathrm{{\mathbb {E}}}\,}}[x_{i(1)}^{m_1}] \ldots {{\,\mathrm{{\mathbb {E}}}\,}}[x_{i(n)}^{m_n}]. \end{aligned}$$

if $i(j) \in I$, $j = 1,\ldots n$ are all distinct. An alternative definition is that for any polynomials $P_1,\ldots P_n$ of one variables,

$$\begin{aligned} {{\,\mathrm{{\mathbb {E}}}\,}}[P_1(x_{i(1)})\ldots P_n(x_{i(n)})] = 0 \end{aligned}$$

(A1)

if ${{\,\mathrm{{\mathbb {E}}}\,}}[P_j(x_{i(j)})] = 0$ for all $j = 1, \ldots , n$ and $i(j) \in I$, $j = 1,\ldots n$ are all distinct.

1.1.2 Cumulants: Kurtosis and Higher Order—Independent Additivity

The (joint) cumulants of n random variables $a_1, \ldots , a_k$ are defined by

$$\begin{aligned} \begin{aligned} c_n(a_1, \ldots , a_n) = \sum _{\pi } (|{\pi }| - 1)! (-1)^{|{\pi }| - 1} \prod _{B \in \pi }{{\,\mathrm {{\mathbb {E}}}\,}}\left[ \prod _{i \in B} a_i\right] , \end{aligned} \end{aligned}$$

(A2)

where $\pi $ runs through all partitions of $\{1, \ldots , m\}$ and B runs through all blocks of partition $\pi $. Equivalently, $\{c_m\}_{m \ge 1}$ is defined through

$$\begin{aligned} \begin{aligned} E(x_1 \ldots x_n) = \sum _{\pi } \prod _{B \in \pi } c_{|{B}|} (a_i:i\in B) \end{aligned} \end{aligned}$$

(A3)

The reason that ICA adapts an optimization problem involving cumulants is the following property: If $(x_i)_{i \in I}$ are independent, then for any $n\in {\mathbb {N}}$

$$\begin{aligned} c_n(x_{i(1)}, \ldots , x_{i(n)}) = 0 \end{aligned}$$

(A4)

whenever there exists $1\le \ell , k\le n$ with $i(\ell ) \ne i(k)$. That is, any cumulants involving two (or more) independent random variables are zero. Adapt the notation

$$\begin{aligned} c_n(x) := c_n(x, \ldots , x). \end{aligned}$$

A quick consequence of (A4) is that for independent $x_1$ and $x_2$,

$$\begin{aligned} c_n(x_1 + x_2) = c_n(x_1) + c_n(x_2). \end{aligned}$$

(A5)

1.1.3 Entropy: Independent Additivity

For random variables $x_1,\ldots , x_n$ with joint distribution $f(x_1,\ldots ,x_n)$, the (joint) entropy is defined by [21]

$$\begin{aligned} h(x_1,\ldots ,x_n) = -\int f(\alpha _1,\dots ,\alpha _n) \log f(\alpha _1,\ldots ,\alpha _n) \mathrm {d} \alpha _1\ldots \mathrm {d} \alpha _n. \end{aligned}$$

(A6)

The joint entropy of a set of variables is less than or equal to the sum of the individual entropies of the variables in the set,

$$\begin{aligned} \begin{aligned} h(x_1,\ldots ,x_n) \le h(x_1) + \cdots + h(x_n). \end{aligned} \end{aligned}$$

(A7)

In particular, the equality in (A7) holds if and only if $x_1,\ldots , x_n$ are independent. Therefore, entropy is regarded as a measure of independence and thus can be used in ICA.

We also want to recall another handful property of entropy. For random vectors x, y satisfying linear relation $y = \varvec{A} x$, we have that

$$\begin{aligned} h(y_1, \ldots , y_n) = h(x_1, \ldots , x_n) + \log |{\mathrm {det}\mathbf {A}}|.\end{aligned}$$

(A8)

In particular, the entropy is invariant under orthogonal linear transformation.

1.1.4 Why Gaussians cannot be Unmixed: Gaussians have Zero Higher-Order Cumulants

In ICA, the optimization problem people used finds the independent direction by maximizing the kurtosis (fourth cumulant). However, all cumulants of order larger than 2 for Gaussian random variables vanish. Thus, ICA fails to unmix Gaussian random variables. ICA based on the entropy also fails to unmix Gaussian random variables, as nontrivial mixtures of independent Gaussian random variables can still be independent Gaussian. On the other hand, it was shown that this is the only case where ICA does not work [18]. A result of this kind is called an identifiability condition.

1.2 Freeness of Self-Adjoint Random Variables

We first introduce the definition of probability space for non-commutative random variables. The starting point is an unital algebra of non-commutative variables.

Definition 5

Let ${\mathcal {X}}$ be a vector space over ${\mathbb {C}}$ equipped with product $\cdot : {\mathcal {X}}\times {\mathcal {X}}\mapsto {\mathcal {X}}$. Denote the vector space addition by $+$, we call ${\mathcal {X}}$ an algebra if for all $a, b, c\in {\mathcal {X}}$ and $\alpha \in {\mathbb {C}}$,

(a)
$a(bc) = (ab)c$,
(b)
$a(b + c) = ab + ac$,
(c)
$\alpha (ab) = (\alpha a)b = a(\alpha b)$.

We call ${\mathcal {X}}$ a unital algebra if there is a unital element $1_{\mathcal {X}}$ such that, for all $a \in {\mathcal {X}}$

$$\begin{aligned} a = a 1_{\mathcal {X}}= 1_{\mathcal {X}}a. \end{aligned}$$

(A9)

An algebra ${\mathcal {X}}$ is called a $*$-algebra if it is also endowed with an antilinear $*$-operation ${\mathcal {X}}\ni a \mapsto a^* \in {\mathcal {X}}$, such that $(\alpha a)^* = {\bar{\alpha }} a^*$, $(a^*)^* = a$ and $(ab)^* = b^*a^*$ for all $\alpha \in {\mathbb {C}}$, $a, b \in {\mathcal {X}}$.

Note that $ab = ba$ does not necessarily hold for general $a,b \in {\mathcal {X}}$, i.e., they are non-commutative.

Definition 6

A (non-commutative) $*$-probability space $({\mathcal {X}}, \varphi )$ consists of a unital $*$-algebra and a linear functional $\varphi : {\mathcal {X}}\rightarrow {\mathbb {C}}$, which serves as the “expectation.” We also require that $\varphi $ satisfies

(a)
(positive) $\varphi (aa^*) \ge 0$ for all $a \in {\mathcal {X}}$.
(b)
(tracial) $\varphi (ab) = \varphi (ba)$ for all $a, b \in {\mathcal {X}}$.
(c)
$\varphi (1_{\mathcal {X}}) = 1$.

The elements $a \in {\mathcal {X}}$ are called non-commutative random variables. (We may omit the word non-commutative if there is no ambiguity.) Given a series of random variables $x_1, \ldots , x_k \in {\mathcal {X}}$, for any choice of $n \in {\mathbb {N}}$, $i(1),\ldots ,i(n) \in [1..k]$ and $\epsilon _1, \ldots , \epsilon _n \in \{1, *\}$, $\varphi (x_{i(1)}^{\epsilon _1}\ldots x^{\epsilon _n}_{i(n)})$ is a mixed moment of $\{x_i\}_{i = 1}^k$. The collection of all moments is called the joint distribution of $x_1,\ldots , x_k$.

The moments of general random variables can be complex-valued; self-adjoint random variables, which are defined below, necessarily have real-valued moments and will be the object of our study.

Definition 7

Let $({\mathcal {X}}, \varphi )$ be a non-commutative probability space, an element $a \in {\mathcal {X}}$ is self-adjoint if $a = a^*$. In particular, the moments of self-adjoint elements are real (see Remark 1.2 in [53]).

The counterpart of independence in free probability is freely independence or simply free. We now consider the freeness of self-adjoint random variables from various perspectives as in Sect. A.1.

1.2.1 Mixed Moments Point of View

The following official definition of freeness should be compared with (A1).

Definition 8

Let $({\mathcal {X}}, \varphi )$ be a non-commutative probability space and fix a positive integer $n \ge 1$.

For each $i \in I$, let ${\mathcal {X}}_i \subset {\mathcal {X}}$ be a unital subalgebra. The subalgebras $({\mathcal {X}}_i)_{i \in I}$ are called freely independent (or simply free), if for all $k \ge 1$

$$\begin{aligned} \varphi (x_1\ldots x_k) = 0 \end{aligned}$$

whenever $\varphi (x_j) = 0$ for all $j = 1, \ldots , k,$ and neighboring elements are from different subalgebras, i.e., $x_j \in {\mathcal {X}}_{i(j)}$, $i(1) \ne i(2), i(2) \ne i(3),\ldots , i(k-1)\ne i(k)$.

In particular, a series of elements $(x_i)_{i \in I}$ are called free if the subalgebras generated by $x_i$ and $x_i^*$ are free.

1.2.2 Free Cumulants: Free Additivity

The analog of cumulants for non-commutative random variables is called free cumulants, which was proposed by Roland Speicher [53, 58].

The notion of non-crossing partition lies underneath the free probability and free cumulants.

Definition 9

(Non-crossing Partition, Definition 9.1 of [53]) Consider set $S = [1..n]$.

(a)
We call $\pi = \{V_1, \ldots , V_r\}$ a partition of the set S if and only if $V_i$ ($1\le i \le r$) are pairwise disjoint, non-void subsets of S such that $\cup _{i =1}^r V_{i} = S$. We call $V_1, \ldots , V_r$ the block of $\pi $. Given two elements $a, b \in S$, we write $a \sim _\pi b$ if a and b belong to the same block of $\pi $.
(b)
A partition $\pi $ of the set S is called non-crossing if there does not exist any $a_1< b_1< a_2 < b_2 $ in S such that $a_1 \sim _\pi a_2 \not \sim b_1 \sim _{\pi } b_2$.
(c)
The set of all non-crossing partitions of S is denoted by NC(n).

Definition 10

Given a $*$-probability space $({\mathcal {X}}, \varphi )$, the free cumulants refer to a family of multilinear functionals $\{\kappa _m: {\mathcal {X}}^m \mapsto {\mathbb {C}}\}_{m \ge 1}$. Here, the multilinearity means that $\kappa _m$ is linear in one variable when others hold constant, i.e., for any $\alpha , \beta \in {\mathbb {C}}$ and $a, b \in {\mathcal {X}}$,

$$\begin{aligned} \kappa _m(\ldots , \alpha a + \beta b, \ldots ) = \alpha \kappa _m(\ldots , a, \ldots ) + \beta \kappa _m(\ldots , b, \ldots ). \end{aligned}$$

(A10)

Explicitly, for $a_1, \ldots , a_n \in {\mathcal {X}}$, their mixed free cumulant is defined through (cf. (A3))

$$\begin{aligned} \begin{aligned} \varphi (a_1\ldots a_n) = \sum _{\pi \in NC(n)} \prod _{B \in \pi } \kappa _{|{B}|}\left( a_i:i\in B \right) . \end{aligned} \end{aligned}$$

(A11)

Equivalently (cf. (A2)),

$$\begin{aligned} \kappa _n(a_1, \ldots , a_n) = \sum _{\pi \in NC(n)} \mu (\pi , \varvec{1}_n) \sum _{B \in \pi } \varphi \left( \prod _{i \in B} a_i\right) , \end{aligned}$$

(A12)

where $\mu $ is the Möbius function on NC(n).

Example 1

We have that

$$\begin{aligned} \kappa _1(a_1) = \varphi (a_1), \end{aligned}$$

$$\begin{aligned} \kappa _2(a_1, a_2) = \varphi (a_1a_2) - \varphi (a_1)\varphi (a_2), \end{aligned}$$

$$\begin{aligned} \kappa _3(a_1,a_2,a_3)&= \varphi (a_1a_2a_3) - \varphi (a_1)\varphi (a_2a_3) - \varphi (a_2)\varphi (a_1a_3) \\&- \varphi (a_3)\varphi (a_1a_2) + 2 \varphi (a_1)\varphi (a_2)\varphi (a_3). \end{aligned}$$

Recall that in the scalar probability, mixed cumulants of independent random variables vanish (see (A4)). The same holds for the free cumulants in the free probability.

Theorem 12

(Theorem 11.16 of [53]) Let $({\mathcal {X}},\varphi )$ be a non-commutative probability space with associated free cumulants $(\kappa _\ell )_{\ell \in {\mathbb {N}}}$. Consider random variables $(x_i)_{i\in I}$. Assume that they are freely independent. Then, for all $n \ge 2$, and $i(1), \ldots , i(n) \in I$, we have $\kappa _n(a_{i(1)},\ldots ,a_{i(n)}) = 0$ whenever there exist $1\le l,k \le n$ with $i(l)\ne i(k)$.

With the above theorem, one can easily show the free additivity of free cumulants.

Proposition 13

Consider a non-commutative probability space $({\mathcal {X}}, \varphi )$. For a self-adjoint random variable $a \in {\mathcal {X}}$, set

$$\begin{aligned} \kappa _m(a):= \kappa _m(a,a,\ldots ,a). \end{aligned}$$

(A13)

(a)
For any $m \ge 1$ and $\alpha \in {\mathbb {C}}$, we have that
$$\begin{aligned} \kappa _m(\alpha a) = \alpha ^m \kappa _m(a). \end{aligned}$$
(A14)
This immediately follows from the multilinearity of free cumulants (see (A10)).
(b)
(Free additivity, Proposition 12.3 in [53]) For any $m \ge 1$, if $a, b \in {\mathcal {X}}$ are freely independent, then
$$\begin{aligned} \kappa _m(a + b) = \kappa _m(a) + \kappa _m(b). \end{aligned}$$
(A15)
The above equation should be compared with (A5).

1.2.3 Free Entropy: Free Additivity

For non-commutative random variables, the free entropy is introduced by Voiculescu [60, 61, 63]. Here, we provide a brief introduction. Readers are referred to Section 6 of [32] for further details.

We first examine the Boltzmann–Gibbs formula of classical entropy. The idea is that the entropy of a “macrostate” is proportional to the logarithm of its probability, which is determined by the count of associated “microstates.” Mathematically, the association is defined through an appropriate distance, and the probability of a “macrostate” is given by the volume of all close “microstates.” This motivates the following formulation of scalar entropy.

Let a be a random variable supported in a finite interval $[-R, R]$, then its entropy is a limit of log volumes:

$$\begin{aligned} h(a)= & {} \lim _{\begin{array}{c} r\rightarrow \infty \\ \epsilon \rightarrow 0 \end{array}} \lim _{N \rightarrow \infty } \frac{1}{N} \log \lambda _N \nonumber \\&\times \,\left( \{ x \in [-R, R]^N : \left|m_k(\delta _N(x)) - m_k(a)\right|\le \epsilon , k \le r\}\right) , \end{aligned}$$

(A16)

where $\lambda _N$ is the N-dimensional Lebesgue measure, $m_k$ denotes the kth moment and $\delta _N(x)$ is the atomic measure $(\delta (x_1) + \delta (x_2) + \cdots + \delta (x_N)) / N$ serving as “microstates.” Here, the volume is Lebesgue measure of $x \in {\mathbb {R}}^n$ whose corresponding atomic measure approximates a up to rth moments. One then takes a normalized limit improving the approximation to get entropy.

The moments are estimated using the functional $\varphi (\cdot )$ in free probability. Due to the non-commutative nature of matrices and the fact that free independence occurs asymptotically among large matrices (see Sect. A.4), one can adapt self-adjoint matrices for “microstates.” We then arrive at the following definition of free entropy.

Definition 11

Let $M_N({\mathbb {C}})^{sa}$ denote all $N \times N$ self-adjoint matrices and ${{\,\mathrm{\mathrm{tr}}\,}}(\cdot ):= \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(\cdot )$ denote normalized trace. Given a $*$-probability space $({\mathcal {X}}, \varphi )$ and a self-adjoint element $a \in {\mathcal {X}}$. For $n,r \in {\mathbb {N}}$, $\epsilon > 0$ and $R > 0$, we define the set

$$\begin{aligned} \begin{aligned} \Gamma (a; R, N, r, \epsilon ) = \{A \in M_N({\mathbb {C}})^{sa}: \left\| A \right\| _2 \le R, |{{{\,\mathrm {\mathrm {tr}}\,}}(A^k) - \varphi (a^k)}| \le \epsilon , k \le r\}. \end{aligned} \end{aligned}$$

Recall that there is a natural linear bijection between $M_N({\mathbb {C}})^{sa}$ and ${\mathbb {R}}^{N^2}$, and let $\Lambda _N$ denote the induced measure on $M_N({\mathbb {C}})^{sa}$ from the Lebesgue measure of ${\mathbb {R}}^{N^2}$, the free entropy of a is then defined by:

$$\begin{aligned} \chi (a) = \sup _{R > 0} \lim _{\begin{array}{c} r\rightarrow \infty \\ \epsilon \rightarrow 0 \end{array}} \limsup _{N \rightarrow \infty }\left[ \frac{1}{N^2} \log \Lambda _N\left( \Gamma (a; R, N, r, \epsilon ) \right) + \frac{1}{2} \log N\right] . \end{aligned}$$

(A17)

One can extend the above definition to multivariate case. For self-adjoint elements $a_1, \ldots , a_s\in {\mathcal {X}}$, define the set

$$\begin{aligned} \Gamma (a_1,\ldots , a_s;&R, N, r, \epsilon ) = \{(A_1,\ldots , A_s) \in (M_N({\mathbb {C}})^{sa})^{s}: \left\| A_i \right\| _2 \le R, \\&|{{{\,\mathrm {\mathrm {tr}}\,}}(A_{i_1} \ldots A_{i_k}) - \varphi (a_{i_1} \ldots a_{i_k})}| \le \epsilon \text { for all }1 \le i_1, \ldots , i_k \le s, k \le r\}, \end{aligned}$$

the joint free entropy is then given by

$$\begin{aligned} \begin{aligned}&\chi (a_1, \ldots , a_s) = \\&\sup _{R > 0} \lim _{\begin{array}{c} r\rightarrow \infty \\ \epsilon \rightarrow 0 \end{array}} \limsup _{N \rightarrow \infty }\left[ \frac{1}{N^2} \log \Lambda _N^{\otimes s}\left( \Gamma (a_1,\ldots , a_s; R, N, r, \epsilon ) \right) + \frac{s}{2} \log N\right] . \end{aligned} \end{aligned}$$

(A18)

The free entropy shares the similar properties with the scalar entropy.

Proposition 14

Let $\varvec{x}= (x_1,\ldots ,x_s)^T$ where $x_i$ are self-adjoint non-commutative random variables. Let $\varvec{O}(s)$ denote the set of $s\times s$ orthogonal matrices. Then, for any $\varvec{Q}= (q_{ij})_{i,j=1}^s\in \varvec{O}(s)$,

$$\begin{aligned} \chi \left( (\varvec{Q}\varvec{x})_1,\ldots ,(\varvec{Q}\varvec{x})_s\right) = \chi (x_1,\ldots ,x_s). \end{aligned}$$

(A19)

That is, the free entropy is invariant under the orthogonal transformation (cf. (A8)).

Proof

This proposition is a special case of a general result. For any matrix $\varvec{A} \in {\mathbb {R}}^{n \times n}$, we actually have that (see Corollary 6.3.2 in [32]),

$$\begin{aligned} \begin{aligned} \chi \left( (\mathbf {A} \mathbf {x})_1,\ldots ,(\mathbf {A} \mathbf {x})_s\right) = \chi (x_1,\ldots ,x_s) + \log |{\det \mathbf {A}}|. \end{aligned} \end{aligned}$$

(A20)

Now, for $\varvec{Q}\in \varvec{O}(s)$, $\varvec{Q}^T\varvec{Q}= \varvec{I}$, thus

$$\begin{aligned} (\det \varvec{Q})^2 = \det \varvec{Q}^T \det \varvec{Q}= \det (\varvec{Q}^T \varvec{Q}) = \det \varvec{I} = 1. \end{aligned}$$

(A21)

That is, $|{\det \mathbf {Q}}| = 1$ and thus $\log |{\det \mathbf {Q}}| = 0$. Now, set $\varvec{A} = \varvec{Q}$ in (A20), we obtain (A19). $\square $

The following proposition is the analogue of (A7) for free entropy.

Proposition 15

Let $x_1,\ldots ,x_s$ be self-adjoint non-commutative random variables, then

$$\begin{aligned} \chi (x_1,\ldots ,x_s) \le \sum _{i = 1}^s\chi (x_i). \end{aligned}$$

(A22)

Further assume that $\chi (x_i) > -\infty $ for $i = 1,\ldots ,n$, then the above equality holds if and only if $x_1,\ldots ,x_s$ are freely independent.

Proof

The proof for the inequality can be found in Proposition 6.1.1 in [32]. The equivalence between the equality and freely independence is Theorem 6.4.1 in [32]. $\square $

1.2.4 Analogue of Gaussian Random Variables in Free Probability: The Free Semicircular Element

The analogous element to a Gaussian random variable in a $*$-probability space is a semicircular element. Recall that the Gaussian random variable is characterized by vanishing cumulants of order higher than 2; the semicircular elements can be defined in a similar manner.

Definition 12

Given a $*$-probability space $({\mathcal {X}}, \varphi )$, we call a random variable $a \in {\mathcal {X}}$ a semicircular element if

$$\begin{aligned} \kappa _m(a) \equiv 0, \qquad \text {for }m \ge 3, \end{aligned}$$

(A23)

and $\kappa _2(a) > 0$ (such that a is not constant).

1.3 Freeness of Non-self-adjoint Random Variables

We briefly introduce the mathematical preliminaries for a rectangular probability space. We omit some technicalities, which are beyond the scope of this paper. For a thorough introduction, readers are referred to [6, 7].

Consider a $*$-probability space $({\mathcal {X}}, \varphi )$ with $p_1,p_2$ of nonzero self-adjoint projections which are pairwise orthogonal (i.e. $\forall i \ne j, p_ip_j = 0$), and such that $p_1 + p_2 = 1_{\mathcal {X}}$. Then, any element $a \in {\mathcal {X}}$ can be represented in the following block form

$$\begin{aligned} a = \begin{bmatrix} a_{11} &{} a_{12} \\ a_{21} &{} a_{22}\end{bmatrix}, \end{aligned}$$

(A24)

where $\forall i,j = 1,2, a_{ij} = p_i a p_j$ and we define ${\mathcal {X}}_{ij} := p_i {\mathcal {X}}p_j$. Note that ${\mathcal {X}}_{ii}$ is a subalgebra, and we equip it with the functional $\varphi _k = \frac{1}{\rho _k} \varphi \vert _{{\mathcal {X}}_{kk}}$, where $\rho _k := \varphi (p_k)$. That is,

$$\begin{aligned} \varphi _1(a) = \frac{1}{\rho _1} \varphi (a^{11}), \text {where }a^{11} = \begin{bmatrix} a_{11} &{} 0 \\ 0 &{} 0\end{bmatrix}, \end{aligned}$$

(A25)

and similar for $\varphi _2(x)$. The functionals $\varphi _i$, $i = 1, 2$, are tracial in the sense that $\varphi _k(p_k) = 1$ and for all i, j, $x \in {\mathcal {X}}_{ij}$, $y \in {\mathcal {X}}_{ji}$,

$$\begin{aligned} \rho _i \varphi _i(xy) = \rho _j \varphi _j(yx). \end{aligned}$$

(A26)

Definition 13

Such a family $({\mathcal {X}}, p_1, p_2, \varphi _1, \varphi _2)$ is called a $(\rho _1, \rho _2)$-rectangular probability space. We call $a \in {\mathcal {X}}_{12} = p_1 {\mathcal {X}}p_2$ rectangular random variable.

Remark 7

If a is a rectangular element, then in the matrix decomposition (A24), only $a_{12}$ is nonzero. Later, in Sect. A.4.2, we will model rectangular matrices by embedding them into $a_{12}$ of rectangular random variables.

For such a rectangular probability space, the linear span of $p_1, p_2$ is denoted by ${\mathcal {D}}$. Then, ${\mathcal {D}}$ is subalgebra of finite dimension. Define the ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(a) = \sum _{i = 1}^2 \varphi (a_{ii})p_i$. It can be checked that ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(1_{\mathcal {X}}) = 1_{\mathcal {X}}$ and $\forall (d, a, d') \in {\mathcal {D}}\times {\mathcal {X}}\times {\mathcal {D}}$, ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(dad') = d{{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(a)d'$. The map ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(\cdot )$ is regarded as the conditional expectation from ${\mathcal {X}}$ to ${\mathcal {D}}$.

We now consider the freeness in rectangular probability space.

1.3.1 Mixed Moments Point of View

The following definition of freeness should be compared with (A1) and Definition 8.

Definition 14

Given a rectangular probability space and subalgebra ${\mathcal {D}}$ with the corresponding conditional expectation ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}$. A family $({\mathcal {X}}_i)_{i\in I}$ of subalgebras containing ${\mathcal {D}}$ is said to be free with amalgamation over ${\mathcal {D}}$ (we simply use the word free when there is no ambiguity) if for all $k \ge 1$

$$\begin{aligned} {{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(x_1 \ldots x_k) = 0 \end{aligned}$$

(A27)

whenever ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(x_j) = 0$ for all $j = 1, \ldots , k$, and neighboring elements are from different subalgebras, i.e., $x_j \in {\mathcal {X}}_{i(j)}$, $i(1) \ne i(2), i(2) \ne i(3),\ldots , i(k-1)\ne i(k)$. In particular, a family of rectangular random variables $\{x_i\}_{i\in I}$ are called free if the subalgebras generated by ${\mathcal {D}}$, $x_i$, and $x_i^*$ are free.

1.3.2 Rectangular Free Cumulants: Free Additivity

The free cumulants are also defined for rectangular probability space [6, 7].

Definition 15

(Analogue of cumulant in rectangular probability space) Given a $(\rho _1,\rho _2)$-probability space $({\mathcal {X}}, p_1, p_2, \varphi _1, \varphi _2)$, for any $n \ge 1$, we denote nth tensor product over ${\mathcal {D}}$ of ${\mathcal {X}}$ by ${\mathcal {X}}^{\otimes _{{\mathcal {D}}^m}}$. We recall a family of linear functions $\{\kappa _{m}: {\mathcal {X}}^{\otimes _{{\mathcal {D}}^m}} \mapsto {\mathbb {C}}\}_{m \ge 1}$ introduced in [7] (which are denoted as $c^{(1)}$ in [7], see Section 3.1 there). By linearity, we mean that for $m \ge 1$ and any $a, b \in {\mathcal {X}}$ and $a, b \in {\mathbb {C}}$,

$$\begin{aligned} \kappa _m(\cdots \otimes (\alpha a + \beta b) \otimes \cdots ) = \alpha \kappa _m(\cdots \otimes a\otimes \cdots ) + \beta \kappa _m(\cdots \otimes b\otimes \cdots ).\nonumber \\ \end{aligned}$$

(A28)

For convenience, we call $\{\kappa _{m}\}_{m \ge 1}$ rectangular free kurtosis (or kurtosis when there is no ambiguity). For each $m \ge 1$ and any rectangular random variable a, we put

$$\begin{aligned} \kappa _{2m}(a) := \kappa _{2m}(a \otimes a^* \otimes \cdots \otimes a \otimes a^*). \end{aligned}$$

(A29)

We consider the even order as odd order cumulants vanishes for all rectangular elements.

Remark 8

In [6, 7], the free cumulants refer to a family of linear functions between ${\mathcal {X}}^{\otimes _{{\mathcal {D}}^n}}$ and ${\mathcal {D}}$. The rectangular cumulants throughout the paper are their coefficient functions of $p_1$.

The following vanishing lemma holds for the rectangular cumulants defined as in above.

Theorem 16

(Vanishing of mixed cumulants, Theorem 2.1 of [6]) A family $(x_i)_{i \in I}$ of elements in ${\mathcal {X}}$ is free with amalgamation over ${\mathcal {D}}$ if and only if for all $n \ge 2$, and $i(1), \ldots , i(n) \in I$, we have $\kappa _n(x_{i(1)}\otimes \cdots \otimes x_{i(n)}) = 0$ whenever there exists $1\le l, k\le n$ with $i(l) \ne i(k)$.

Consequently, the analogue of Proposition 13 also hold for rectangular cases with rectangular free kurtosis defined in (A29). The analog of (A15) for rectangular free kurtosis follows from equation (10) in [7]. The analog of (A14) is a direct result of (A28).

1.3.3 Rectangular Free Entropy: Free Additivity

The free entropy $\chi $ for rectangular free probability space is introduced in [6]. The idea is similar to the self-adjoint case. One adapts rectangular matrices as “microstates” and use conditional expectation ${{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(\cdot )$ to evaluate moments. Readers are referred to Section 5.1 of [6] for a precise definition.

The analogues of Proposition 14 and 15 also hold for rectangular free entropy. The orthogonal invariance of rectangular free entropy is a direct result of Corollary 5.11 of [6]. On the other hand, Proposition 5.3, Theorem 5.7 and Corollary 5.16 of [6] together prove the analogue of Proposition 15 for rectangular case.

1.3.4 Analogue of Gaussian Random Variables in Rectangular Free Probability: The Free Poisson Element

Definition 16

Given a rectangular probability space $({\mathcal {X}},\varphi )$. A rectangular random variable $a \in {\mathcal {X}}_{12}$ is a free Poisson element if

$$\begin{aligned} \kappa _{2m}(a) \equiv 0, \qquad \text {for }m \ge 2. \end{aligned}$$

(A30)

1.4 When are Random Matrices (Asymptotic) Free?

Here, we describe the free probability in the context of random matrices and the explicit formulas of free kurtosis and entropy as functions of the input matrices.

1.4.1 Symmetric Random Matrix

Given a $N > 0$, we consider the algebra consists of all the real $N \times N$ matrices over scalar random variables $L^{2}(\Sigma , P)$:

$$\begin{aligned} {\mathcal {X}}= M_N(L^2(\Sigma , P)) \end{aligned}$$

(A31)

and for any $\varvec{X} \in {\mathcal {X}}$, the functional $\varphi $ on it is

$$\begin{aligned} \varphi (\varvec{X}) = \frac{1}{N} {{\,\mathrm{{\mathbb {E}}}\,}}[ {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X})]. \end{aligned}$$

(A32)

Denote the matrix transpose with complex conjugate by $*$. Then, $({\mathcal {X}}, \varphi )$ is a $*$-probability space.

We recall the notion of convergence in distribution and the definition of asymptotic freely independence [53].

Definition 17

(Asymptotic freely independence) Let $({\mathcal {X}}_N , \varphi _N)~(N \in \mathbb {N})$ and $({\mathcal {X}}, \varphi )$ be non-commutative probability spaces. Let I be an index set and consider for each $i \in I$ random variables $a_i(N) \in {\mathcal {X}}_N$ and $a_i \in {\mathcal {X}}$. We say that $(a_i(N))_{i \in I}$ converges in distribution toward $(a_i)_{i \in I}$ if we have each joint moment of $(a_i(N))_{i \in I}$ converges toward the corresponding joint moment of $(a_i)_{i \in I}$, i.e., for all $n \in N$ and all $i(i), \ldots , i(n) \in I$

$$\begin{aligned} \lim _{N \rightarrow \infty } \varphi _N(a_{i(1)}(N)\ldots a_{i(n)}(N)) = \varphi (a_{i(1)}\ldots a_{i(n)}). \end{aligned}$$

(A33)

Furthermore, we say $(a_i(N))_{i \in I}$ are asymptotic free if it converges in distribution to a limit $(a_i)_{i\in I}$, which is free in $({\mathcal {X}},\varphi )$.

A pair of symmetric (Hermitian) random matrices with isotropically random eigenvectors that are independent of the eigenvalues (and each other) are asymptotically free [53].

Given the $*$-probability space $({\mathcal {X}}, \varphi (\cdot ))$ defined as above, recall the free kurtosis defined in (14). Thus, for a self-adjoint random matrix $\varvec{X} \in {\mathcal {X}}$ with $\varphi (\varvec{X}) = 0$, the free kurtosis is explicitly given by

$$\begin{aligned} \kappa _4(\varvec{X}) = \frac{1}{N} {{\,\mathrm{{\mathbb {E}}}\,}}[{{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^4)] - 2\left( \frac{1}{N} {{\,\mathrm{{\mathbb {E}}}\,}}[{{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^2)]\right) ^2. \end{aligned}$$

(A34)

Also, denote the eigenvalues density function of $\varvec{X}$ by $\mu (x)$, free entropy is defined by [32]

$$\begin{aligned} \begin{aligned} \chi (\mathbf {X}) = \int \int \log |{x - y}| \mathrm {d} \mu (x) \mathrm {d} \mu (y). \end{aligned} \end{aligned}$$

(A35)

For a large class of random matrices $\varvec{X}$, the free kurtosis and entropy concentrate around a deterministic value when N is large. For example, if $\varvec{X}$ is a Wigner matrix or Wishart matrix, then $\mathrm {Var}[\kappa _4(\varvec{X})] \rightarrow 0$ and $\mathrm {Var}[\chi (\varvec{X})] \rightarrow 0$ as $N \rightarrow \infty $. Thus, single sample gives us an accurate empirical estimate. Given a realization x of a random matrix $\varvec{X}$ with ${{\,\mathrm{{\mathbb {E}}}\,}}\left[ {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X})\right] = 0$, the empirical free kurtosis is

$$\begin{aligned} {\widehat{\kappa }}_4(x) = \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(x^4) - 2\left( \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(x^2)]\right) ^2. \end{aligned}$$

(A36)

Also, the empirical free entropy is given by

$$\begin{aligned} \begin{aligned} {\widehat{\chi }}(x) = \frac{1}{N(N - 1)}\sum _{i \ne j} \log |{\lambda _i - \lambda _j}|, \end{aligned} \end{aligned}$$

(A37)

where $\lambda _i$ denotes the eigenvalue of x.

1.4.2 Rectangular Random Matrix

Consider a rectangular random matrix of size $N \times M$, and assume that $N \le M$. In [7], the author embedded a $N \times M$ matrix into the top right block of a $(N + M) \times (N + M)$ “extension matrix.” The algebra of all $(N + M) \times (N + M)$ random matrices together with this block structure is defined as a rectangular probability space $(\mathbb M_{N + M}(L^2(\Sigma , {\mathbb {P}})), \mathrm {diag}(I_N, 0_M), \mathrm {diag}(0_N, I_M), \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}, \frac{1}{M} {{\,\mathrm{\mathrm{Tr}}\,}})$ [7].

We recall the following definition of asymptotic freely independence in rectangular probability space [6].

Definition 18

(Asymptotic free independence) Let for each $N \in {\mathbb {N}}$, $({\mathcal {X}}_N, p_1(N), p_2(N), \varphi _{1,N}, \varphi _{2,N})$ be a $(\rho _{1,N}, \rho _{2,N})$-rectangular probability space such that

$$\begin{aligned} (\rho _{1,N}, \rho _{2,N}) \rightarrow (\rho _1, \rho _2), \qquad N \rightarrow \infty . \end{aligned}$$

Let I be an index set and consider for each $i \in I$ random variables $a_i(N) \in {\mathcal {X}}_N$. We say that $(a_i(N))_{i\in I}$ converges in ${\mathcal {D}}$-distribution toward $(a_i)_{i\in I}$ for some random variables $a_i \in {\mathcal {X}}$ in some $(\rho _1, \rho _2)$-probability space $({\mathcal {X}},p_1, p_2,\varphi _1, \varphi _2)$ if the ${\mathcal {D}}$-distribution converge pointwise.

Furthermore, we say $(a_i(N))_{i \in I}$ are asymptotically free $(N \rightarrow \infty )$, if the limits $(a_i)_{i\in I}$ are free in $({\mathcal {X}},p_1, p_2,\varphi _1, \varphi _2)$.

Independent bi-unitary invariant rectangular random matrices with converging singular law are asymptotically freely independent [6, 7].

Following (15), the free kurtosis for a single $N \times M$ random matrices $\varvec{X}$ is given by

$$\begin{aligned} \kappa _4(\varvec{X}) = \frac{1}{N}{{\,\mathrm{{\mathbb {E}}}\,}}[{{\,\mathrm{\mathrm{Tr}}\,}}((\varvec{X} \varvec{X}^H)^2)] - (1 + \frac{N}{M}) \left( \frac{1}{N} {{\,\mathrm{{\mathbb {E}}}\,}}[{{\,\mathrm{\mathrm{Tr}}\,}}((\varvec{X} \varvec{X}^H))]\right) ^2. \end{aligned}$$

(A38)

Denoting the probability density function of eigenvalues of $\varvec{X} \varvec{X}^H$ by $\mu (x)$, setting $\alpha = \frac{N}{N + M}$ and $\beta = \frac{M}{N + M}$, the free entropy is given by [6]

$$\begin{aligned} \begin{aligned} \chi (\mathbf {X}) = \alpha ^2\int \int \log |{x - y}|\mathrm {d}\mu (x) \mathrm {d}\mu (y) + (\beta - \alpha )\alpha \int \log x \mathrm {d}\mu (x). \end{aligned} \end{aligned}$$

(A39)

Again, empirical statistics over a single sample of large dimension give an accurate estimate of limit value. Given a realization x of a rectangular random matrix $\varvec{X}$, the empirical free kurtosis is given by

$$\begin{aligned} {\widehat{\kappa }}_4(x) = \frac{1}{N}{{\,\mathrm{\mathrm{Tr}}\,}}(( x x^H)^2) - (1 + \frac{N}{M}) \left( \frac{1}{N}{{\,\mathrm{\mathrm{Tr}}\,}}(x x^H)\right) ^2. \end{aligned}$$

(A40)

The empirical free entropy is given by

$$\begin{aligned} \begin{aligned} {\widehat{\chi }} (x) = \frac{\alpha ^2}{N(N - 1)} \sum _{i\ne j} \log |{\lambda _i - \lambda _j}| + \frac{(\beta - \alpha )\alpha }{N} \sum _{i = 1}^N \log \lambda _i, \end{aligned} \end{aligned}$$

(A41)

where $\lambda _i$ denote the eigenvalue of $xx^H$.

Proof of Propositions 6 and 7

We proof Propositions 6 and 7 for the covariance matrix for rectangular case. The self-adjoint case can be proved with straightforward modification.

1.1 Proof of Proposition 6

By Remark 1.2 of [53], for any random variable a, $\varphi (a^*) = \overline{\varphi (a)}$. Thus,

$$\begin{aligned} \begin{aligned} \overline{[\varvec{C}_{\varvec{z} \varvec{z}}]}_{ij}&= \overline{\varphi _1({{\tilde{z}}}_i {{\tilde{z}}}_j^*)} \\&= \varphi _1(({{\tilde{z}}}_i {{\tilde{z}}}_j^*)^*)\\&= \varphi _1({{\tilde{z}}}_j{{\tilde{z}}}_i^*)= [\varvec{C}_{\varvec{z} \varvec{z}}]_{ji}. \end{aligned} \end{aligned}$$

(B42)

Therefore, $\varvec{C}_{\varvec{z} \varvec{z}}$ is Hermitian.

We turn to show that $[\varvec{C}_{\varvec{z} \varvec{z}}]$ is positive semi-definite. Actually, as $\varphi $ is a linear functional, for any column vector $\varvec{\alpha }= [\alpha _1, \ldots , \alpha _s]$,

$$\begin{aligned} \varvec{\alpha }\varvec{C}_{\varvec{z} \varvec{z}} \varvec{\alpha }^H = \varphi ((\sum _{i = 1}^s\alpha _i {{\tilde{z}}}_i) (\sum _{i = 1}^s\alpha _i {{\tilde{z}}}_i)^*) \ge 0 \end{aligned}$$

(B43)

where we used that $\varphi (\cdot )$ is positive. This completes the proof.

1.2 Proof of Proposition 7

Since $\varvec{z} = \varvec{A} \varvec{x}$ and $\varvec{C}_{xx} = \varvec{I}$,

$$\begin{aligned} \varvec{C}_{\varvec{z}\varvec{z}} = \varvec{A} \varvec{C}_{\varvec{x}\varvec{x}} \varvec{A}^H = \varvec{A}\varvec{A}^H. \end{aligned}$$

Note that we assume that $\varvec{A}$ is real and non-singular, $\varvec{C}_{\varvec{z}\varvec{z}}$ is real and positive-definite.

Proofs of the Main Results

1.1 Proof of Theorem 1

The proof of Theorem 1 relies on the free additivity of free cumulants, for which readers are referred to Proposition 13 (and its rectangular analogue in Sect. A.3.2).

1.1.1 Proof of Theorem 1(a)

Set $\varvec{g} = \varvec{Q}^T \varvec{w}$, then $\varvec{w}= \varvec{Q}\varvec{g}$. As $\varvec{x}$ and $\varvec{y}$ are related via (12), we have that

$$\begin{aligned} \varvec{w}^T \varvec{y} = \varvec{w}^T \varvec{Q}\varvec{x} = (\varvec{Q}^T \varvec{w})^T \varvec{x} =\varvec{g}^T\varvec{x}. \end{aligned}$$

(C44)

Adapt the notation $\varvec{g} = (g_1,\ldots ,g_s)^T$. Note that $x_i$ are freely independent, then using (A15), we have that

$$\begin{aligned} \kappa _4(\varvec{g}^T\varvec{x}) = \kappa _4\left( \sum _{i = 1}^sg_ix_i\right) = \sum _{i = 1}^s\kappa _4(g_i x_i). \end{aligned}$$

(C45)

By (A14), $\kappa _4(g_ix_i) = g_i^4 \kappa _4(x_i)$ for $i = 1, \ldots , s$; thus, the above equation becomes

$$\begin{aligned} \kappa _4(\varvec{g}^T\varvec{x}) = \sum _{i = 1}^sg_i^4 \kappa _4(x_i). \end{aligned}$$

(C46)

Combining (C44) and (C46), we get

$$\begin{aligned} \left|\kappa _4(\varvec{w}^T \varvec{y}) \right|= \left|\sum _{i = 1}^sg_i^4\kappa _4(x_i) \right|. \end{aligned}$$

(C47)

When $\varvec{w}$ runs over all unit vectors, $g = \varvec{Q}^T \varvec{w}$ also runs over all unit vectors. Therefore, if $\varvec{w}^{(1)}$ is a maximizer of (17), then $\varvec{w}^{(1)} = \varvec{Q}g^{(1)}$ where $g^{(1)}$ is a maximizer of

$$\begin{aligned} \mathop {\max }_{\varvec{g} \in {\mathbb {R}}^s,~\left\| u \right\| _2 = 1} \left|\sum _{i = 1}^sg_i^4\kappa _4(x_i) \right|. \end{aligned}$$

(C48)

Thus, in order to prove (a), it is equivalent to show that $\varvec{g}^{(1)}$ is maximizer of (C48) if and only if $\varvec{g}^{(1)} \in \{(\pm 1, 0,\ldots ,0)^T\}$.

For any unit vector u, since $|{g_i}| \le 1$, we have that

$$\begin{aligned} \sum _{i = 1}^sg_{i}^4 \le \sum _{i = 1}^sg_i^2 = 1. \end{aligned}$$

(C49)

Note that the equality holds if and only if there is a index i such that $g_i \in \{\pm 1\}$ (thus $g_j = 0$ for all $j \ne i$). Then, using (16) and (C49),

$$\begin{aligned} \begin{aligned} \left| \sum _{i = 1}^sg_i^4\kappa _4(x_i) \right|&\le \sum _{i = 1}^sg_i^4 |{\kappa _4(x_i)}| \\ {}&\le \sum _{i = 1}^sg_i^4 | {\kappa _4(x_1)}| \\ {}&\le |{\kappa _4(x_1)}|. \end{aligned} \end{aligned}$$

(C50)

On the other hand, for $\varvec{g} = (\pm 1, 0,\ldots , 0)^T$, it can be checked that all equalities in (C50) hold. Thus,

$$\begin{aligned} \begin{aligned} \mathop {\max }_{\mathbf {g} \in {\mathbb {R}}^s,~\left\| \mathbf {g} \right\| _2 = 1} \left| \sum _{i = 1}^sg_i^4\kappa _4(x_i) \right| = |{\kappa _4(x_1)}| \end{aligned} \end{aligned}$$

(C51)

and $\varvec{g}^{(1)}$ is a maximizer of (C48) if $\varvec{g}^{(1)} \in \{(\pm 1, 0,\ldots ,0)^T\}$.

For the other direction, if $\varvec{g}^{(1)}$ is maximizer of (C48), then the second equality in (C50) holds for $\varvec{g} = \varvec{g}^{(1)}$. That is,

$$\begin{aligned} \begin{aligned} 0 = \sum _{i = 1}^s(g^{(1)}_i)^4 \left( |{\kappa _4(x_i)}| - |{\kappa _4(x_1)}|\right) . \end{aligned} \end{aligned}$$

(C52)

Due to (18), $|{\kappa _4(x_i)}| - |{\kappa _4(x_1)}| < 0$ for $i = 2,\ldots ,s$. Thus, (C52) implies $g^{(1)}_i = 0$ for $i = 2,\ldots ,s$. Since $\varvec{g}^{(1)}$ is a unit vector, $\varvec{g}^{(1)} \in \{(\pm 1, 0, \ldots , 0)^T\}$. This completes the proof.

1.1.2 Proof of Theorem 1(b)

In the proof of (a), the arguments up to (C52) only rely on properties of free kurtosis $\kappa (\cdot )$ and condition (16). Thus, (C48), (C50), (C51) and (C52) also apply in the setting of (b). Thus, in order to prove (b), it is equivalent to show that $u^{(1)}$ is a maximizer of (C48) if and only if

(i)
$g^{(1)}_i = 0$ for $i = r + 1,\ldots ,s$,
(ii)
there is an index i such that $g_i^{(1)} \in \{\pm 1\}$.

The backward direction can be checking directly using $|{\kappa _4(x_1)}| = \cdots = |{\kappa _4(x_r)}|$.

We now prove the forward direction. If $\varvec{g}^{(1)}$ maximizes (C48), then it satisfies (C52). By (20), $|* |{\kappa _4(x_i)} - |* |{\kappa _4(x_1)} = 0$ for $i = 1,\ldots ,r$ and $|{\kappa _4(x_i)}| - |{\kappa _4(x_1)}| < 0$ for $i = r + 1,\ldots ,s$. (i) then follows. On the other hand, as $|{\kappa _4(x_1)}| = \cdots = |{\kappa _4(x_r)}|$, enforcing the third equality in (C50) implies

$$\begin{aligned} \sum _{i = 1}^r (g^{(1)}_i)^4 = 1. \end{aligned}$$

(C53)

By the observation below (C49), this indicates (ii). This completes the proof.

1.2 Proof of Theorem 2

Set $\varvec{g} = \varvec{Q}^T \varvec{w}$, we use the notation $\varvec{g} = [g_1, \ldots , g_s]^T$. As $\varvec{w}^{(i)} \in \{\pm \varvec{Q}_i\}$ for $i = 1, \ldots , k - 1$,

$$\begin{aligned} \left\| \varvec{w} \right\| _2 = 1, \varvec{w}\perp \varvec{w}^{(1)}, \ldots , \varvec{w}^{(k - 1)} \iff \left\| \varvec{g} \right\| _2 = 1, g_1 = \cdots = g_{k - 1} = 0. \end{aligned}$$

(C54)

Using (C47), if $\varvec{w}^{(k)}$ is a maximizer of (22), then $\varvec{w}^{(k)} = \varvec{Q}\varvec{g}^{(k)}$ where $\varvec{g}^{(k)}$ is a maximizer of

$$\begin{aligned} \mathop {\mathop {\max }_{\varvec{g} \in {\mathbb {R}}^s, ~\left\| \varvec{g} \right\| _2 = 1}}_{g_1 = \cdots = g_{k - 1} = 0} \left|\sum _{i = 1}^n g_i^4\kappa _4(x_i) \right|. \end{aligned}$$

(C55)

Thus, in order to prove (a), it is equivalent to show that $\varvec{g}^{(k)} = (g_1^{(k)}, \ldots , g_s^{(k)})^T$ is maximizer of (C55) if and only if $g^{(k)}_k \in \{\pm 1\}$ (thus $g^{(k)}_j = 0$ for $j \ne k$).

As we are maximizing over unit vector $\varvec{g}$ such that $g_1 = \cdots = g_{k - 1} = 0$, again using (16) and (C49)

$$\begin{aligned} \begin{aligned} \left| \sum _{i = 1}^{s} g_i^4\kappa _4(x_{i}) \right|&= \left| \sum _{i = k}^{s} g_i^4\kappa _4(x_{i}) \right| \\ {}&\le \sum _{i = k}^sg_i^4 |{\kappa _4(x_i)}| \\&\le \sum _{i = k}^sg_i^4 |{\kappa _4(x_k)}| \\&\le |{\kappa _4(x_k)}|. \end{aligned} \end{aligned}$$

(C56)

For $\varvec{g}$ with $g_k \in \{ \pm 1\}$, it can be checked that all equalities in (C56) hold. Thus,

$$\begin{aligned} \begin{aligned} \mathop {\mathop {\max }_{g \in {\mathbb {R}}^s, ~\left\| g \right\| _2 = 1}}_{g_1 = \cdots = g_{k - 1} = 0} \left|{\sum _{i = 1}^sg_i^4\kappa _4(x_i) }\right|= |{\kappa _4(x_k)}|, \end{aligned} \end{aligned}$$

(C57)

and $\varvec{g}^{(k)}$ is a maximizer if $g^{(k)}_k \in \{ \pm 1\}$.

For the other direction, if $\varvec{g}^{(k)}$ is a maximizer of (C55), all equalities in (C56) hold with $g = g^{(k)}$. In particular, the third equality in (C56) implies

$$\begin{aligned} \begin{aligned} 0 = \sum _{i = k}^s\left( g^{(k)}_i\right) ^4 \left( |{\kappa _4(x_i)}| - |{\kappa _4(x_k)}|\right) . \end{aligned} \end{aligned}$$

(C58)

Due to (18), $|{\kappa _4(x_i)}| - |{\kappa _4(x_k)}| < 0$ for $i = k + 1,\ldots ,n$. Thus, (C58) implies that $g^{(k)}_i = 0$ for $i = k + 1, \ldots , s$. Since $\varvec{g}^{(k)}$ is a unit vector, $g^{(k)}_k \in \{\pm 1\}$. This completes the proof.

1.3 Proof of Theorem 3

We prove Theorem 3 by showing the following:

(a)
$\varvec{Q}$ is a maximizer of (24).
(b)
For any permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$, $\varvec{Q}\varvec{S}\varvec{P}$ is a maximizer of (24).
(c)
Any maximizer $\varvec{W}$ of (24) satisfies $\varvec{W}= \varvec{Q}\varvec{S}\varvec{P}$ for some permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$.

1.3.1 Proof of (a)

We prove (a) by showing that

$$\begin{aligned} \begin{aligned} \mathop {\max }_{\mathbf {W}\in \mathbf {O}(s)} \sum _{i = 1}^s\left| \kappa _4\left( (\mathbf {W}^T \mathbf {y})_i\right) \right| = \sum _{i = 1}^s|{\kappa _4(x_i)}| \end{aligned} \end{aligned}$$

(C59)

and $\varvec{W}= \varvec{Q}$ reaches the maximum. Set $\varvec{G} = \varvec{Q}^T\varvec{W}\in \varvec{O}(s)$. As $\varvec{x}$ and $\varvec{y}$ are related via (12),

$$\begin{aligned} \begin{aligned} \varvec{W}^T \varvec{y}&= \varvec{W}^T \varvec{Q}\varvec{x}\\&= (\varvec{Q}^T\varvec{W})^T \varvec{x} \\&= \varvec{G}^T \varvec{x}. \end{aligned} \end{aligned}$$

(C60)

Adapt the notation $\varvec{G} = (g_{ij})_{i,j= 1}^s$. Then, for all $i = 1,\ldots ,n$, $(\varvec{W}^T \varvec{y})_i = (\varvec{G}^T \varvec{x})_i = \sum _{j = 1}^sg_{ji} x_j$. Together with (A15) and (A14), for any $i = 1, \ldots , s$, we have that

$$\begin{aligned} \begin{aligned} \kappa _4((\varvec{W}^T \varvec{y})_i) =&\kappa _4\left( \sum _{j = 1}^sg_{ji} x_j\right) \\ =&\sum _{j = 1}^s\kappa _4\left( g_{ji}x_j\right) \\ =&\sum _{j = 1}^sg_{ji}^4 \kappa _4(x_j). \end{aligned} \end{aligned}$$

(C61)

Apply triangular inequality to above equation, we get

$$\begin{aligned} \left|\kappa _4((\varvec{W}^T \varvec{y})_i)\right|\le \sum _{j = 1}^sg_{ji}^4 \left|\kappa _4\left( x_j\right) \right|. \end{aligned}$$

(C62)

Note that $(g_{j1},\ldots ,g_{jn})^T$ is a unit vector, by (C49), $\sum _{j = 1}^sg_{ji}^4 \le 1$. Then, summing (C62) over $i = 1, \ldots , n$, we obtain that

$$\begin{aligned} \begin{aligned} \sum _{i = 1}^s\left| \kappa _4((\mathbf {W}^T y)_i)\right| \le&\sum _{i = 1}^s\sum _{j = 1}^sg_{ij}^4 \left| \kappa _4\left( x_j\right) \right| \\ =&\sum _{j = 1}^s\left( \sum _{i = 1}^sg_{ji}^4\right) |{\kappa _4(x_j)}| \\ \le&\sum _{j = 1}^s|{\kappa _4(x_j)}|. \end{aligned} \end{aligned}$$

(C63)

Actually, for $\varvec{W}= \varvec{Q}$, $\varvec{Q}^T \varvec{y} = \varvec{Q}^T \varvec{Q}\varvec{x} = \varvec{x}$, thus

$$\begin{aligned} \sum _{i = 1}^s\left|\kappa _4((\varvec{Q}^T \varvec{y})_i)\right|= \sum _{i = 1}^s|{\kappa _4(x_i)}|. \end{aligned}$$

(C64)

Equations (C64) and (C63) together imply (C59). Then, by (C64), $\varvec{Q}$ is a maximizer of (24).

1.3.2 Proof of (b)

We first introduce several notations. For a permutation matrix $\varvec{P}= (p_{ji})_{i,j = 1}^s$, there is an associate permutation $\sigma $ such that $p_{\sigma (i)i} = 1$ and $p_{ji} = 0$ for all $i = 1, \ldots , s$ and $j \ne \sigma (i)$. For a signature matrix $\varvec{S}$, we denote its ith diagonal elements by $S_i$.

Now for any $\varvec{P}$ and $\varvec{S}$, under the light of (C59), it is desired to show that $\sum _{i = 1}^s\left| \kappa _4\left( ((\mathbf {Q}\mathbf {P}\mathbf {S})^T \mathbf {y})_i\right) \right| = \sum _{i = 1}^s|{\kappa _4(x_i)}|$. As $\varvec{x}$ and $\varvec{y}$ satisfy (12), we have

$$\begin{aligned} \begin{aligned} (\varvec{Q}\varvec{P}\varvec{S})^T \varvec{y} =&\varvec{S}^T \varvec{P}^T \varvec{Q}^T \varvec{y} \\ =&\varvec{S}^T \varvec{P}^T \varvec{x} \\ =&(S_1x_{\sigma (1)}, \ldots ,S_sx_{\sigma (s)})^T. \end{aligned} \end{aligned}$$

(C65)

As $S_i \in \{\pm 1\}$, by (A14)

$$\begin{aligned} \kappa _4(S_ix_{\sigma (i)}) = S_i^4 \kappa _4(x_{\sigma (i)}) = \kappa _4(x_{\sigma (i)}). \end{aligned}$$

(C66)

Combining (C65) and (C66) together, we obtain that

$$\begin{aligned} \begin{aligned} \sum _{i = 1}^s\left| \kappa _4\left( ((\mathbf {Q}\mathbf {P}\mathbf {S})^T \mathbf {y})_i\right) \right| =&\sum _{i = 1}^s\left| \kappa _4(S_ix_{\sigma (i)}) \right| \\ =&\sum _{i = 1}^s|{\kappa _4(x_{\sigma (i)})}| \\ =&\sum _{i = 1}^s|{\kappa _4(x_{i})}|. \end{aligned} \end{aligned}$$

(C67)

This completes the proof of (b).

1.3.3 Proof of (c)

By (b), any matrix ${\widehat{\varvec{W}}}$ of the form $\widehat{\varvec{W}} = \varvec{Q}\varvec{P}\varvec{S}$ is a maximizer. For the other direction, we want to show that any maximizer ${\widehat{\varvec{W}}}$ can be written in this form.

Actually, if ${\widehat{\varvec{W}}}$ is a maximizer, we consider $ ({{\widehat{g}}}_{ij})_{i,j = 1}^s= \widehat{\varvec{G}}= \varvec{Q}^T {\widehat{\varvec{W}}}$. The third equality of (C63) holds with $g_{ij} = {{\widehat{g}}}_{ij}$. That is,

$$\begin{aligned} \begin{aligned} \sum _{j = 1}^s\left( \sum _{i = 1}^s{{\widehat{g}}}_{ji}^4\right) |{\kappa _{4}(x_j)}| = \sum _{j = 1}^s|{\kappa _{4}(x_j)}|. \end{aligned} \end{aligned}$$

(C68)

Since we assume the components of $x$ have nonzero free kurtosis (see (25)) and $\sum _{i = 1}^s\widehat{g}_{ji}^4 \le 1$ for $j = 1,\ldots , s$, (C68) is equivalent to

$$\begin{aligned} \sum _{i = 1}^s{{\widehat{g}}}_{ji}^4 = 1, \qquad \text {for }j = 1,\ldots , s. \end{aligned}$$

(C69)

By the observation below (C49), for each j, there is a i such that ${{\widehat{g}}}_{ji} \in \{\pm 1\}$, while ${{\widehat{g}}}_{jk} = 0$ for $k \ne i$. That is, each column of $\widehat{\varvec{G}}$ has exactly one nonzero entry. By Proposition 17, $\widehat{\varvec{G}} \in \varvec{O}_{sp}$ and thus $\widehat{\varvec{G}} = \varvec{P}\varvec{S}$ for some permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$. Recall that $\widehat{\varvec{W}} = \varvec{Q}\widehat{\varvec{G}}$, we arrive at ${\widehat{\varvec{W}}} = \varvec{Q}\varvec{P}\varvec{S}$. This completes the proof.

1.4 Proof of Theorem 4

The proof of Theorem 4 relies on the orthogonal invariance and subadditivity of free entropy, for which readers are referred to Proposition 14 and 15 (and their rectangular analogues in Sect. A.3.3).

As in the proof of Theorem 3, we will show the following:

(a)
$\varvec{Q}$ is a maximizer of (27).
(b)
For any permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$, $\varvec{Q}\varvec{S}\varvec{P}$ is a maximizer of (27).
(c)
Any maximizer $\varvec{W}$ of (27) satisfies $\varvec{W}= \varvec{Q}\varvec{S}\varvec{P}$ for some permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$.

1.4.1 Proof of (a)

Set $\varvec{Z} = \varvec{Q}^T\varvec{W}$. As $\varvec{x}$ and $\varvec{y}$ are related via (12), $\varvec{W}^T \varvec{y} = (\varvec{Q} \varvec{Z})^T \varvec{Q} \varvec{x}= \varvec{Z}^T \varvec{x}$. Then, by (A22),

$$\begin{aligned} \sum _{i = 1}^s\chi \left( (\varvec{W}^T \varvec{y})_i\right) = \sum _{i = 1}^s\chi \left( (\varvec{Z}^T \varvec{x})_i\right) \ge \chi \left( (\varvec{Z}^T x)_1,\ldots ,(\varvec{Z}^T \varvec{x})_s\right) . \end{aligned}$$

(C70)

On the other hand, note that Z is an orthogonal matrix, then by (A19),

$$\begin{aligned} \chi \left( (\varvec{Z}^T \varvec{x})_1,\ldots ,(\varvec{Z}^T \varvec{x})_s\right) = \chi \left( x_1,\ldots ,x_s\right) \end{aligned}$$

(C71)

Combining (C70) and (C71) together, we obtain that, for any $\varvec{W}\in \varvec{O}(s)$,

$$\begin{aligned} \sum _{i = 1}^s\chi \left( (\varvec{W}^T \varvec{y})_i\right) \ge \chi \left( x_1,\ldots ,x_s\right) \end{aligned}$$

(C72)

Now consider $\varvec{W}= \varvec{Q}$. As $\varvec{Q}^T \varvec{y} = \varvec{Q}^T \varvec{Q}\varvec{x}= \varvec{x}$, we have that

$$\begin{aligned} \sum _{i = 1}^s\chi \left( (\varvec{Q}^T \varvec{y})_i\right) = \sum _{i = 1}^s\chi \left( x_i\right) . \end{aligned}$$

(C73)

On the other hand, as $x_i$ are freely independent, then by Proposition 15,

$$\begin{aligned} \sum _{i = 1}^s\chi (x_i) = \chi (x_1,\ldots ,x_s). \end{aligned}$$

(C74)

Then, (C73) becomes

$$\begin{aligned} \sum _{i = 1}^s\chi \left( (\varvec{Q}^T \varvec{y})_i\right) = \chi \left( x_1,\ldots ,x_s\right) . \end{aligned}$$

(C75)

Equations (C75) and (C72) together indicate

$$\begin{aligned} \min _{\varvec{W}\in \varvec{O}(s)} \sum _{i = 1}^s\chi \left( (\varvec{W}^T \varvec{y})_i\right) = \chi \left( x_1,\ldots ,x_s\right) \end{aligned}$$

(C76)

and $\varvec{Q}$ is a maximizer of (27).

1.4.2 Proof of (b)

Adapt the notations introduced in the proof of Theorem 3 (b). For any permutation matrix $\varvec{P}$ associated with permutation $\sigma $ and signature matrix $\varvec{S}= {{\,\mathrm{\mathrm{diag}}\,}}(S_1,\ldots ,S_s)$, we have that (see (C65))

$$\begin{aligned} (\varvec{Q}\varvec{P}\varvec{S})^Ty = (S_1x_{\sigma (1)}, \ldots ,S_sx_{\sigma (n)})^T. \end{aligned}$$

(C77)

Thus,

$$\begin{aligned} \sum _{i = 1}^s\chi \left( ((\varvec{Q}\varvec{P}\varvec{S})^T y)_i\right) = \sum _{i = 1}^s\chi (S_ix_{\sigma (i)}). \end{aligned}$$

(C78)

As $S_i \in \{\pm 1\}$ can be regarded as 1-by-1 orthogonal matrices, then the one-dimensional version of (A20) yields

$$\begin{aligned} \chi (S_i x_{\sigma (i)}) = \chi (x_{\sigma (i)}), \qquad \text {for }i = 1, \ldots , n. \end{aligned}$$

(C79)

Then, (C78) becomes

$$\begin{aligned} \sum _{i = 1}^s\chi \left( ((\varvec{Q}\varvec{P}\varvec{S})^T y)_i\right) = \sum _{i = 1}^s\chi (x_{i}). \end{aligned}$$

(C80)

Under the light of (C76), $\varvec{Q}\varvec{P}\varvec{S}$ is a maximizer of (27).

1.4.3 Proof of (c)

By (b), any matrix ${\widehat{\varvec{W}}}$ of the form $\widehat{\varvec{W}} = \varvec{Q}\varvec{P}\varvec{S}$ is a maximizer. For the other direction, it is enough to show that any maximizer $\widehat{\varvec{W}}$ of (27) can be written in the form $\widehat{\varvec{W}} = \varvec{Q}\varvec{P}\varvec{S}$ for some permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$. Actually, if $\widehat{\varvec{W}}$ maximize (27), then by (C76),

$$\begin{aligned} \sum _{i = 1}^s\chi \left( ({\widehat{\varvec{W}}} ^T y)_i\right) = \chi \left( x_1,\ldots ,x_s\right) \end{aligned}$$

(C81)

Since ${\widehat{\varvec{W}}}^T\varvec{Q}$ is a orthogonal matrix, then by (A19) and (12),

$$\begin{aligned} \begin{aligned} \chi \left( x_1,\ldots ,x_s\right)&= \chi \left( ({\widehat{\varvec{W}}}^T \varvec{Q}x)_1,\ldots ,({\widehat{\varvec{W}}}^T \varvec{Q}x)_s\right) \\&= \chi \left( ({\widehat{\varvec{W}}}^T y)_1,\ldots ,({\widehat{\varvec{W}}}^T y)_s\right) \end{aligned} \end{aligned}$$

(C82)

Then, (C81) becomes

$$\begin{aligned} \sum _{i = 1}^s\chi \left( ({\widehat{\varvec{W}}}^T y)_i\right) = \chi \left( ({\widehat{\varvec{W}}}^T y)_1,\ldots ,({\widehat{\varvec{W}}}^T y)_s\right) \end{aligned}$$

(C83)

By Proposition 15, the above equation indicates that ${\widehat{\varvec{W}}}^Ty$ has freely independent components. As we assume that $x$ has at most one semicircular element, Theorem 5 implies that ${\widehat{\varvec{W}}} = \varvec{Q}\varvec{P}\varvec{S}$ for some permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$. This completes the proof.

1.5 Proof of Theorem 5

Definition 19

We denote all matrices of size $s\times s$ which are product of a permutation matrix and a signature matrix by

$$\begin{aligned} \varvec{O}_{ps}= \varvec{O}_{ps}(s) := \{\varvec{P}\varvec{S}~\vert ~ \varvec{P}\text { is a permutation matrix, }\varvec{S}\text { is a signature matrix}\}.\nonumber \\ \end{aligned}$$

(C84)

Let $\varvec{O}:= \varvec{O}(s)$ denote the sets of orthogonal matrix of size $s\times s$. Note that any permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$ belong to $\varvec{O}$. Furthermore, it can be checked that $\varvec{O}_{ps}$ is a subgroup of $\varvec{O}$.

We first prove two propositions of $\varvec{O}_{{ps}}$. An orthogonal matrix must contain at least one nonzero entry in each column (and each row). On the other hand, the matrix belonging to $\varvec{O}_{ps}$ has exactly one nonzero entry in each column (and each row). The following proposition states that this characterizes the matrices contained in $\varvec{O}_{ps}$.

Proposition 17

Fix a positive integer $s\ge 1$, $\varvec{Q}\in \varvec{O}(s)$ has exactly one nonzero entry in each column if and only if $\varvec{Q}\in \varvec{O}_{ps}(s)$.

Proof

If $\varvec{Q}\in \varvec{O}_{{ps}}$, then $\varvec{Q}= \varvec{P}\varvec{S}$ for some permutation matrix $\varvec{P}$ and signature matrix $\varvec{S}$. Thus, it follows that $\varvec{Q}$ has exactly one nonzero entry in each column.

For the other direction, consider an arbitrary $\varvec{Q}\in \varvec{O}(s)$ with exactly one nonzero entry in each column. Note that $\varvec{Q}$ has totally n nonzero entries. As $\varvec{Q}$ is non-singular, it also has exactly one nonzero entry in each row. As a result, there exists a permutation matrix $\varvec{P}$ such that $\varvec{P}^T \varvec{Q}$ is a diagonal matrix.

On the other hand, note that $(\varvec{P}^T\varvec{Q})^T(\varvec{P}^T\varvec{Q}) = \varvec{Q}^T\varvec{Q}= I$, $\varvec{P}^T\varvec{Q}$ is a diagonal orthogonal matrix. Thus, the diagonal entries of $ \varvec{P}^T\varvec{Q}$ are either $+1$ or $-1$. Then, there exists a signature matrix $\varvec{S}$ such that $ \varvec{P}^T \varvec{Q}= \varvec{S}$. That is equivalent to $\varvec{Q}= \varvec{P}\varvec{S}\in \varvec{O}_{{ps}}$. This completes the proof. $\square $

By the above proposition, for any $\varvec{Q}\in \varvec{O}\backslash \varvec{O}_{{ps}}$, there must be a column with more than one nonzero entry. For the later purpose, we prove a stronger result.

Proposition 18

Given any $s\ge 2$, consider matrix $\varvec{Q}= (q_{ij})_{i,j= 1}^s \in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)$. Then, there is a $2\times 2$ submatrix of $\varvec{Q}$ with all four entries nonzero. Explicitly, there exist $i,j,k,\ell \in \{1,\ldots ,n\}$ ($i \ne j$, $k \ne \ell $) such that all $q_{ik}$, $q_{i\ell }$, $q_{jk}$, and $q_{j\ell }$ are nonzero.

Proof

We first make the following observation. Two orthogonal vectors either share 0 or more than 2 positions for nonzero entries. Actually, consider any $u = (u_1,\ldots ,u_s)^T$ and $v = (v_1,\ldots ,v_s)^T$ such that u and v are orthogonal. Assume that there is exactly one index k such that both $u_k$ and $v_k$ are nonzero, then

$$\begin{aligned} u^T v = \sum _{i = 1}^su_iv_i = u_kv_k \ne 0. \end{aligned}$$

(C85)

This contradicts the fact that $u^T v = 0$.

Now, we are ready to prove the proposition. Denote ith columns of $\varvec{Q}$ by $\varvec{Q}_i$, for $i = 1,\ldots ,s$. Note that the $\{\varvec{Q}_i\}_{i = 1}^s$ form an orthonormal basis. As $\varvec{Q}\in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)$, there must be a column containing more than two nonzero entries. Without lose of generality, assume it is $\varvec{Q}_1$. If all $\varvec{Q}_2,\ldots ,\varvec{Q}_{s}$ share 0 positions of nonzero entry with $\varvec{Q}_1$, then $\{\varvec{Q}_i\}_{i= 2}^s$ span a linear space of dimension less than $n - 2$. This contradicts with the fact that $\{\varvec{Q}_i\}_{i= 2}^s$ span a linear space of dimension $s- 1$. Thus, there must exist a $j \in \{2,\ldots ,s\}$ such that $\varvec{Q}_1$ and $\varvec{Q}_j$ share at least one positions for nonzero entry. By the observation we made in the last paragraph, $\varvec{Q}_1$ and $\varvec{Q}_j$ then share at least two positions of nonzero entry. This completes the proof. $\square $

Corollary 19

Fix a positive integer $n \ge 2$ and a $\varvec{Q}\in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)$. There exist indexes $i,j,k,\ell \in [1,..,s]$ ($i \ne j$ and $k \ne \ell $), such that for any $m \ge 3$,

$$\begin{aligned} q_{ik}^{m - 1}q_{jk} \ne 0, \quad \text {and} \quad q_{i\ell }^{m - 1}q_{j\ell } \ne 0. \end{aligned}$$

(C86)

In particular, if $s= 2$, then for any $m \ge 3$,

$$\begin{aligned} q_{11}^{m - 1}q_{21} \ne 0, \quad \text {and} \quad q_{12}^{m - 1}q_{22} \ne 0. \end{aligned}$$

(C87)

Theorem 5 can be obtained as a corollary of the following lemma.

Lemma 20

Fix a $s\ge 2$, let $\varvec{x}= (x_1,x_2, \ldots , x_s)^T$ and $\varvec{y} = (y_1,y_2, \ldots , y_s)^T$ be two random vectors such that $\varvec{y} = \varvec{Q}\varvec{x}$, where $\varvec{Q}\in \varvec{O}(s)$. Assume $(x_i)_{i = 1}^s$ are freely independent. Now if $(y_i)_{i = 1}^s$ are freely independent, then at least one of the following happens:

(a)
$\varvec{Q}\in \varvec{O}_{ps}(s)$.
(b)
At least two components of $\varvec{x}$ are semicircular (or Poisson in the non-self-adjoint setting).

We first show that Theorem 5 follows from Lemma 20.

Proof of Theorem 5

As $\varvec{x}$ and $\varvec{y}$ satisfy (12), $\varvec{x}= (\varvec{Q}^T \varvec{W}) \varvec{W}^T \varvec{y}$. Now, by assumption, $\varvec{x}$ and $\varvec{W}^T \varvec{y}$ have free components. Then, according to Lemma 20, there are two possibilities: (a) $\varvec{Q}^T \varvec{W}\in \varvec{O}_{ps}$ or (b) $\varvec{x}$ has at least two semicircular components. As (b) has been excluded, (a) happens. That is, there exist a permutation matrix $\varvec{P}$ and a signature matrix $\varvec{S}$ such that $\varvec{Q}^T \varvec{W}= \varvec{P}\varvec{S}$, i.e., $\varvec{W}= \varvec{Q}\varvec{P}\varvec{S}$.

Proof of Lemma 20

We first consider self-adjoint setting. If $\varvec{Q}\in \varvec{O}_{ps}(s)$, then the components of $\varvec{y}$ are exactly the components of $\varvec{x}$ with different order and possible sign change. It is not surprising that $y_i$ are freely independent. In the following, we assume that $\varvec{Q}\in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)$, and $\varvec{x}, \varvec{y}$ has free components, the goal is to show that $\varvec{x}$ has at least two semicircular elements.

We start with the case where $n = 2$. Then, it is desired to show $x_1$ and $x_2$ are both semicircular elements. Recall Definition 12 for the semicircular element; it is enough to show $\kappa _m(x_i) \equiv 0$ for all $m\ge 3$ and $i = 1,2$.

Fix $m \ge 3$, we consider the mixed cumulants of $y_1,y_2$ of the specific form $\kappa _m(y_1,\ldots ,y_1, y_2, y_p)$ for $p = 1,2$. As $y_1,y_2$ are free–independent, these cumulants satisfy the condition of Theorem 12 by noting that $i(1) = 1 \ne i(m - 1) = 2$. Thus, these mixed cumulants vanish, i.e.,

$$\begin{aligned} \kappa _m(y_1,\ldots ,y_1, y_2, y_p) = 0\quad \mathrm {for}~p = 1, 2. \end{aligned}$$

(C88)

On the other hand, as $(y_i)_{i = 1}^2$ are linear combinations of $(x_i)_{i = 1}^2$, using multilinearity of $\kappa _m(\cdot )$ (see (A10)), we will express $\kappa _m(y_1,\ldots ,y_1, y_2, y_p)$ as linear combinations of $\kappa _m(x_i)$ (recall the notation (A13)). Adapt the notation $\varvec{Q}= (q_{ij})_{i,j= 1}^2$, then $y_i = \sum _{j = 1}^2 q_{ij}x_j$. We first derive the expression for $\kappa _m(y_1,\ldots ,y_1, y_2, y_1)$ (i.e., $p = 1$),

$$\begin{aligned} \kappa _m(y_1,\ldots ,y_1, y_2, y_1) = \kappa _m\left( \sum _{j = 1}^2 q_{1j}x_{j}, \ldots , \sum _{j = 1}^2 q_{1j}x_{j}, \sum _{j = 1}^2 q_{2j}x_{j}, \sum _{j = 1}^2 q_{1j}x_{j}\right) .\nonumber \\ \end{aligned}$$

(C89)

Apply (A10) to the right-hand side of (C89) to expand the first variable,

$$\begin{aligned} \begin{aligned} \kappa _m&(y_1,\ldots ,y_1, y_2, y_1) = \\&\sum _{j_1 = 1}^2 q_{1j_1} \kappa _m\left( x_{j_1}, \sum _{j = 1}^2 q_{1j}x_{j},\ldots , \sum _{j = 1}^2 q_{1j}x_{j}, \sum _{j = 1}^2 q_{2j}x_{j}, \sum _{j = 1}^2 q_{1j}x_{j}\right) . \end{aligned} \end{aligned}$$

(C90)

Again apply (A10) for the second variable, we obtain that

$$\begin{aligned} \begin{aligned} \kappa _m(&y_1,\ldots ,y_1, y_2, y_1) = \\&\sum _{j_1 = 1}^2\sum _{j_2 = 1}^2 q_{1j_1} q_{1j_2} \kappa _m\left( x_{j_1}, x_{j_2},\ldots , \sum _{j = 1}^2 q_{1j}x_{j}, \sum _{j = 1}^2 q_{2j}x_{j}, \sum _{j = 1}^2 q_{1j}x_{j}\right) . \end{aligned}\nonumber \\ \end{aligned}$$

(C91)

Repeating applying (A10) for the rest $n-2$ variables, we arrive at

$$\begin{aligned} \kappa _m(y_1,\ldots ,y_1, y_2, y_1) = \sum _{j_1 = 1}^2\ldots \sum _{j_n = 1}^2\left( \prod _{\ell = 1}^{s- 2}q_{1 j_{\ell }} \right) q_{2j_{s- 1}}q_{1j_s} \kappa _m(x_{j_1}, \ldots , x_{j_s}).\nonumber \\ \end{aligned}$$

(C92)

There are in total $2^s$ terms in the above summation. Note that $x_1$ and $x_2$ are free independent. Then, by Theorem 12, most of these cumulants vanish. For example, $\kappa _{s}(x_1,x_2,\ldots x_2) = 0$ where $j_1 = 1 \ne j_2 = 2$. Consequently, there are only two terms corresponding to the choices of indexes $j_1 = j_2 = \cdots = j_s= 1$ and $j_1 = j_2 = \cdots = j_s= 2$ survive. Thus, using the notation (A13), (C92) can be written as

$$\begin{aligned} \kappa _m(y_1,\ldots ,y_1, y_2, y_1) = q_{11}^{m-2}q_{21}q_{11}\kappa _m(x_1) + q_{12}^{m-2}q_{22}q_{12}\kappa _m(x_2). \end{aligned}$$

(C93)

Combining (C93) with (C88), we obtain that

$$\begin{aligned} q_{11}^{m-2}q_{21}q_{11}\kappa _m(x_1) + q_{12}^{m-2}q_{22}q_{12}\kappa _m(x_2) = 0. \end{aligned}$$

(C94)

Repeating (C88) to (C94) for $\kappa _m(y_1,\ldots ,y_1, y_2, y_2)$ (i.e., $p = 2$), we find that

$$\begin{aligned} q_{11}^{m-2}q_{21}q_{21}\kappa _m(x_1) + q_{12}^{m-2}q_{22}q_{22}\kappa _m(x_2) = 0. \end{aligned}$$

(C95)

Writing (C94) and (C95) in the matrix form, we obtain that

$$\begin{aligned} \begin{aligned}&\begin{pmatrix} q_{11}^{m-2}q_{21}q_{11} &{} q_{12}^{m-2}q_{22}q_{12}\\ q_{12}^{m-2}q_{21}q_{21}&{}q_{12}^{m-2}q_{22}q_{22} \end{pmatrix} \begin{pmatrix} \kappa _m(x_1) \\ \kappa _m(x_2) \end{pmatrix} \\&\quad = \begin{pmatrix}q_{11} &{} q_{12} \\ q_{21} &{} q_{22} \end{pmatrix} \begin{pmatrix} q_{11}^{m-2}q_{21} &{} 0 \\ 0 &{} q_{12}^{m-2}q_{22} \end{pmatrix} \begin{pmatrix} \kappa _m(x_1) \\ \kappa _m(x_2) \end{pmatrix} = \mathbf {0} \end{aligned} \end{aligned}$$

(C96)

We actually get a linear equation system for $\kappa _m(x_1)$ and $\kappa _m(x_2)$. Note that $\varvec{Q}= (q_{ij})_{i = 1}^2$ is an orthogonal matrix and thus is invertible. Thus, (C96) is equivalent to

$$\begin{aligned} \begin{pmatrix} q_{11}^{m-2}q_{21} &{} 0 \\ 0 &{} q_{12}^{m-2}q_{22} \end{pmatrix} \begin{pmatrix} \kappa _m(x_1) \\ \kappa _m(x_2) \end{pmatrix} = \mathbf {0}. \end{aligned}$$

(C97)

Now, as $ \varvec{Q}\in \varvec{O}(2) \backslash \varvec{O}_{ps}(2)$, then by (C87), the above linear equation system has a unique solution, $\kappa _m(x_i) = 0$, $i = 1,2$. Note that this holds for all $m \ge 3$. Then, by Definition 12, we conclude that $x_i$ for $i = 1, 2$ are semicircular elements. This concludes the proof for $n = 2$.

For general $n \ge 2$, as $\varvec{Q}\in \varvec{O}\backslash \varvec{O}_{ps}$, by Corollary 19, there exist $i,j,k,\ell $ ($i\ne j$ and $k \ne \ell $) such that (C86) holds. We will show that $x_k,x_\ell $ are semicircular elements. For fixed $m \ge 3$, we consider the vanishing mixed cumulants

$$\begin{aligned} \kappa _m(y_i,\ldots ,y_i, y_j, y_p) = 0\quad \mathrm {for}~p = 1, \ldots , s. \end{aligned}$$

(C98)

Use relation $y_i = \sum _{j = 1}^sq_{ij}x_j$ and multilinearity of $\kappa _m$, we can repeat (C88) to (C94) for each $\kappa _m(y_i,\ldots ,y_i, y_j, y_p)$ and get

$$\begin{aligned} q_{i1}^{m-1}q_{j1}q_{p1} \kappa _m(x_1) + \cdots + q_{is}^{m-1}q_{js}q_{ps} \kappa _m(x_s) = 0, \quad \text {for }p = 1,\ldots ,s.\nonumber \\ \end{aligned}$$

(C99)

Write the above equations in the matrix form:

$$\begin{aligned} \begin{pmatrix}q_{11} &{} \cdots &{} q_{1s} \\ \vdots &{} \ddots &{} \vdots \\ q_{s1} &{} \cdots &{} q_{ss} \end{pmatrix}\begin{pmatrix} q_{i1}^{m-2}q_{j1} &{} &{} \\ &{} \ddots &{} \\ &{} &{} q_{is}^{m-2}q_{js} \end{pmatrix} \begin{pmatrix} \kappa _m(x_1) \\ \vdots \\ \kappa _m(x_s) \end{pmatrix} = \mathbf {0}. \end{aligned}$$

(C100)

Again, $\varvec{Q}= (q_{ij})_{i = 1}^s$ is invertible and $q_{ik}^{m-1}q_{jk} \ne 0$ (see (C86)), thus $\kappa _m(x_k) = 0$. For the same reason, $\kappa _m(x_\ell ) = 0$. As these hold for all $m \ge 3$, $x_k,x_\ell $ are semicircular elements.

For non-self-adjoint setting, the proof is exactly the same as the above with Theorem 12 replaced by Theorem 16 and Definition 12 replace by Definition 16. $\square $

Proof of Theorem 11

Lemma 21

Given $\varvec{Y} = [\varvec{Y}_1,\ldots ,\varvec{Y}_s]^T \in {\mathbb {C}}^{Ns\times N}$ with $\varvec{Y}_i \in {\mathbb {C}}^{N \times N}$ Hermitian matrices and a vector $\varvec{w} = [w_1,\ldots ,w_s] \in {\mathbb {R}}^s$, for

$$\begin{aligned} \varvec{X} = \widetilde{\varvec{w}}^T \varvec{Y}, \quad \text {with }\widetilde{\varvec{w}} = \varvec{w} \otimes \varvec{I}_N, \end{aligned}$$

we recall the empirical free kurtosis

$$\begin{aligned} \widehat{\kappa }_4(\varvec{X}) = \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^4) - 2\left[ \frac{1}{N}{{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^2)\right] ^2. \end{aligned}$$

Then, we have that

$$\begin{aligned} \frac{\partial \widehat{\kappa }_4(\varvec{X})}{\partial w_k} = \frac{4}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_i\varvec{X}^3) - \frac{8}{N^2} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^2) {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_i\varvec{X}). \end{aligned}$$

(D101)

Proof

As ${{\,\mathrm{\mathrm{Tr}}\,}}(\cdot )$ is a linear function of entries of input matrix,

$$\begin{aligned} \frac{\partial \widehat{\kappa }_4(\varvec{X})}{\partial w_k} = \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}\left( \frac{\partial \varvec{X}^4}{w_k}\right) - \frac{4}{N^2} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^2) {{\,\mathrm{\mathrm{Tr}}\,}}\left( \frac{\partial \varvec{X}^2}{\partial w_k}\right) . \end{aligned}$$

(D102)

Note that

$$\begin{aligned} \varvec{X} = \widetilde{\varvec{w}}^T \varvec{Y} = w_1 \varvec{Y}_1 + \cdots + w_s\varvec{Y}_s, \end{aligned}$$

thus, for any $k = 1, \ldots , s,$

$$\begin{aligned} \frac{\partial \varvec{X}}{\partial w_k} = \varvec{Y}_k. \end{aligned}$$

(D103)

Therefore,

$$\begin{aligned} \frac{\partial \varvec{X}^4}{\partial w_k} =\varvec{Y}_k\varvec{X}^3 + \varvec{X} \varvec{Y}_k\varvec{X}^2 + \varvec{X}^2 \varvec{Y}_k \varvec{X} + \varvec{X}^3 \varvec{Y}_k. \end{aligned}$$

(D104)

Using ${{\,\mathrm{\mathrm{Tr}}\,}}(AB) = {{\,\mathrm{\mathrm{Tr}}\,}}(BA)$, we find that

$$\begin{aligned} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_k \varvec{X}^3) = {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X} \varvec{Y}_k\varvec{X}^2) = {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^2 \varvec{Y}_k \varvec{X}) = {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}^3 \varvec{Y}_k) \end{aligned}$$

and thus

$$\begin{aligned} {{\,\mathrm{\mathrm{Tr}}\,}}\left( \frac{\partial \varvec{X}^4}{w_k}\right) = 4{{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_k \varvec{X}^3). \end{aligned}$$

(D105)

Repeating (D104) to (D105) for ${{\,\mathrm{\mathrm{Tr}}\,}}\left( \frac{\partial \varvec{X}^2}{\partial w_k}\right) $, we get that

$$\begin{aligned} {{\,\mathrm{\mathrm{Tr}}\,}}\left( \frac{\partial \varvec{X}^2}{\partial w_k}\right) = 2{{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_k \varvec{X}). \end{aligned}$$

(D106)

Plugging (D105) and (D106) into (D102), we obtain (D101).

$\square $

Lemma 22

Given $\varvec{Y} = [\varvec{Y}_1,\ldots ,\varvec{Y}_s]^T \in {\mathbb {C}}^{Ns\times N}$ with $\varvec{Y}_i \in {\mathbb {C}}^{N \times N}$ are Hermitian matrices and a vector $w = [w_1,\ldots ,w_s] \in {\mathbb {R}}^s$. For

$$\begin{aligned} \varvec{X} = \widetilde{\varvec{w}}^T \varvec{Y}, \quad \text {with }\widetilde{\varvec{w}} = w \otimes I_N, \end{aligned}$$

with eigenvalues $\lambda _i$ and corresponding eigenvectors $v_i$, we recall the empirical free entropy

$$\begin{aligned} \widehat{\chi }(\varvec{X}) = \frac{1}{N(N - 1)}\sum _{i \ne j} \log |{\lambda _i - \lambda _j}|. \end{aligned}$$

Then, we have that

$$\begin{aligned} \frac{\partial \widehat{\chi }(\varvec{X})}{\partial w_k} = \frac{1}{N(N - 1)} \sum _{i \ne j} \frac{\partial _{w_k} \lambda _i - \partial _{w_k} \lambda _j}{\lambda _i - \lambda _j} \end{aligned}$$

(D107)

with $\partial _{w_k}\lambda _i = v_i^T \varvec{Y}_k v_i$.

Proof

Equation (D107) is obtained by directly taking derivative. The fact that $\partial _{w_k}\lambda _i = v_i^T \varvec{Y}_k v_i$ follows from (D103) and perturbation theory of eigenvalues [47]. $\square $

Proof of Theorem 11

We first prove the result for self-adjoint FCF based on free kurtosis. Set $\varvec{X} = [\varvec{X_1},\ldots ,\varvec{X_s}] = \widetilde{\varvec{W}}^T \varvec{Y}$. Recall Definition 3 and (45), for $\widehat{F}(\cdot ) = -\left|\widehat{\kappa }_4(\cdot ) \right|$, we have that

$$\begin{aligned} \sum {{\widehat{F}}}_{\cdot } \left( \widetilde{\varvec{W}}^T \varvec{Y}\right) = - \sum _{i = 1}^s\left|\widehat{\kappa }_4\left( \varvec{X}_i\right) \right|\end{aligned}$$

As only $X_\ell $ explicitly depends on $\varvec{W}_{k\ell }$,

$$\begin{aligned} \partial _{\varvec{W}_{k\ell }} \sum {{\widehat{F}}}_{\cdot } \left( \widetilde{\varvec{W}}^T \varvec{Y}\right) = -\partial _{\varvec{W}_{k\ell }} \left|\widehat{\kappa }_4\left( \varvec{X}_\ell \right) \right|\end{aligned}$$

(D108)

Further notice that $\varvec{X}_\ell = \widetilde{\varvec{w}}_\ell ^T \varvec{Y}$ with $\varvec{w}_\ell = [\varvec{W}_{1\ell }, \ldots , \varvec{W}_{s\ell }]^T$, thus

$$\begin{aligned} \partial _{\varvec{W}_{k\ell }} \left|\widehat{\kappa }_4\left( \varvec{X}_\ell \right) \right|= & {} \mathrm {sign}(\widehat{\kappa }_4\left( \varvec{X}_\ell \right) ) \times \partial _{\varvec{W}_{k\ell }} \widehat{\kappa }_4\left( \varvec{X}_\ell \right) \nonumber \\= & {} \mathrm {sign}(\widehat{\kappa }_4\left( \varvec{X}_\ell \right) ) \left( \frac{4}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_k\varvec{X}_\ell ^3) - \frac{8}{N^2} {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X}_\ell ^2) {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{Y}_k\varvec{X}_\ell )\right) ,\nonumber \\ \end{aligned}$$

(D109)

where we used Lemma 21 for the last equality. The proof is then completed by plugging (D109) into (D108). The result for self-adjoint FCF based on free entropy can be proved in a similar manner by repeating the process from (D108) to (D109), where we replace $-|{\hat{\kappa }_4(\cdot )}|$ with $\chi (\cdot )$ and Lemma 21 with Lemma 22.

We omit the proofs for the rectangular FCF column since these are straightforward modifications of the proofs of Lemmas 21 and 22 and proofs of the self-adjoint FCF case. $\square $

Matrix Embeddings

One restriction of ICA is that it only operates on vector-valued components (see Sect. F). In contrast, FCF applies to data whose matrix-valued components that can be of arbitrary dimensions. Thus, one can embed components into new dimensions to potentially obtain a better performance with FCA. In this section, we list several matrix embedding algorithms.

For $\varvec{Z} = [\varvec{Z}_1, \ldots , \varvec{Z}_N]^T$ where the $\varvec{Z}_i$ are rectangular matrices, Algorithm 3 embeds $Z_i$ in the upper diagonal parts of a $N'\times N'$ self-adjoint matrices. In practice, the target dimension $N'$ should be picked such that there is no loss of information while also avoiding too many artificial zeros. To embed $\varvec{Z}_i$ into rectangular matrices of other dimensions, we introduce Algorithm 5. Putting the above embeddings and appropriate FCF algorithms together, we get Algorithm 4 and Algorithm 6. One easily state the analogs of the above algorithms for data containing self-adjoint matrices; for the sake of brevity, we omit them here.

If the $\varvec{Z}_i$ are vectors, one can use the STFT to embed them into matrices. The STFT matrices of a vector are the alignment of the discrete Fourier transform of a sliding window. The outcome is a complex rectangular matrix to which we can apply rectangular FCFs. This is summarized in Algorithm 7.

Independent Component Factorization

We would like to numerically compare FCA with ICA, and begin by providing a summary of the ICA algorithm. Given data whose components are rectangular matrices, we first vectorize them and then apply ICA. We once again perform a whitening process (see Algorithm 8) and solve an optimization problem.

Here, we present Algorithm 9 whose optimization problem is based on the empirical (scalar) kurtosis $\widehat{c}_4(\cdot )$ or the empirical (scalar) negentropy $\widehat{{\mathcal {E}}}(\cdot )$. We call them kurtosis-based ICF and entropy-based ICF, respectively. Given a centered and whitened vector $x \in {\mathbb {R}}^T$, its empirical kurtosis ${\widehat{c}}_4(x)$ can be expressed as

$$\begin{aligned} {\widehat{c}}_4(x) = \frac{1}{T}\sum _{i = 1}^{T} x_i^4 - 3 \left( \frac{1}{T} \sum _{i = 1}^T x_i^2\right) ^2. \end{aligned}$$

(F110)

The negentropy ${{\mathcal {E}}}(x)$ is defined as

$$\begin{aligned} {\mathcal {E}}(x) = h(g_x) - h(x), \end{aligned}$$

(F111)

where h(x) denotes the entropy of random variable x (see (A6)) and $g_x$ denotes the Gaussian random variable with the same mean and variance as x. It is used as a measure of distance to normality. The empirical negentropy $\widehat{\mathcal {E}}(x)$ involves the empirical distribution of x, which is computationally difficult. Fortunately, it can also be expressed as a infinite sum of cumulants. Thus, in practice, $\widehat{\mathcal {E}}(x)$ can be approximated by a finite truncation of that sum [18, Theorem 14 and (3.2) pp. 295].

In the simulation of this paper, we adapt the following approximation (see Section 5 of [40]):

$$\begin{aligned} \widehat{{\mathcal {E}}}(x) = \frac{1}{12}\left( \frac{1}{T}\sum _{i = 1}^T x_i^3\right) ^2 + \frac{1}{48} {\widehat{c}}_4(x) = \text {also cumulants} \end{aligned}$$

(F112)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nadakuditi, R.R., Wu, H. Free Component Analysis: Theory, Algorithms and Applications. Found Comput Math 23, 973–1042 (2023). https://doi.org/10.1007/s10208-022-09564-w

Download citation

Received: 04 February 2021
Revised: 24 January 2022
Accepted: 26 January 2022
Published: 11 April 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10208-022-09564-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Free Component Analysis: Theory, Algorithms and Applications

Abstract

Access this article

Similar content being viewed by others

Sparse Component Analysis: A General Framework for Linear and Nonlinear Blind Source Separation and Mixture Identification

Independent Component Analysis and Bayesian Separation Methods

Extensions and Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

What is Freeness of Random Variables?

1.1 Prologue: What is Independence of Commuting Random Variables?

1.1.1 Mixed Moments Point of View

1.1.2 Cumulants: Kurtosis and Higher Order—Independent Additivity

1.1.3 Entropy: Independent Additivity

1.1.4 Why Gaussians cannot be Unmixed: Gaussians have Zero Higher-Order Cumulants

1.2 Freeness of Self-Adjoint Random Variables

Definition 5

Definition 6

Definition 7

1.2.1 Mixed Moments Point of View

Definition 8

1.2.2 Free Cumulants: Free Additivity

Definition 9

Definition 10

Example 1

Theorem 12

Proposition 13

1.2.3 Free Entropy: Free Additivity

Definition 11

Proposition 14

Proof

Proposition 15

Proof

1.2.4 Analogue of Gaussian Random Variables in Free Probability: The Free Semicircular Element

Definition 12

1.3 Freeness of Non-self-adjoint Random Variables

Definition 13

Remark 7

1.3.1 Mixed Moments Point of View

Definition 14

1.3.2 Rectangular Free Cumulants: Free Additivity

Definition 15

Remark 8

Theorem 16

1.3.3 Rectangular Free Entropy: Free Additivity

1.3.4 Analogue of Gaussian Random Variables in Rectangular Free Probability: The Free Poisson Element

Definition 16

1.4 When are Random Matrices (Asymptotic) Free?

1.4.1 Symmetric Random Matrix

Definition 17

1.4.2 Rectangular Random Matrix

Definition 18

Proof of Propositions 6 and 7

1.1 Proof of Proposition 6

1.2 Proof of Proposition 7

Proofs of the Main Results

1.1 Proof of Theorem 1

1.1.1 Proof of Theorem 1(a)

1.1.2 Proof of Theorem 1(b)

1.2 Proof of Theorem 2

1.3 Proof of Theorem 3

1.3.1 Proof of (a)

1.3.2 Proof of (b)

1.3.3 Proof of (c)

1.4 Proof of Theorem 4

1.4.1 Proof of (a)

1.4.2 Proof of (b)

1.4.3 Proof of (c)

1.5 Proof of Theorem 5

Definition 19

Proposition 17

Proof

Proposition 18