Abstract
We describe a method for unmixing mixtures of freely independent random variables in a manner analogous to the independent component analysis (ICA)-based method for unmixing independent random variables from their additive mixtures. Random matrices play the role of free random variables in this context so the method we develop, which we call free component analysis (FCA), unmixes matrices from additive mixtures of matrices. Thus, while the mixing model is standard, the novelty and difference in unmixing performance comes from the introduction of a new statistical criteria, derived from free probability theory, that quantify freeness analogous to how kurtosis and entropy quantify independence. We describe the theory, the various algorithms, and compare FCA to vanilla ICA which does not account for spatial or temporal structure. We highlight why the statistical criteria make FCA also vanilla despite its matricial underpinnings and show that FCA performs comparably to, and sometimes better than, (vanilla) ICA in every application, such as image and speech unmixing, where ICA has been known to succeed. Our computational experiments suggest that not-so-random matrices, such as images and short-time Fourier transform matrix of waveforms are (closer to being) freer “in the wild” than we might have theoretically expected.
Similar content being viewed by others
Notes
Here \({{\widehat{F}}}(\cdot )\) is either the (self-adjoint or rectangular) free kurtosis, the free entropy or a higher (than fourth)-order (even-valued) free cumulant. See Table 2.
References
Almeida, L.B.: MISEP–Linear and nonlinear ICA based on mutual information. Journal of Machine Learning Research 4(Dec), 1297–1318 (2003)
Anderson, G.W., Farrell, B.: Asymptotically liberating sequences of random unitary matrices. Advances in Mathematics 255, 381–413 (2014)
Arora, S., Ge, R., Moitra, A., Sachdeva, S.: Provable ica with unknown gaussian noise, with implications for gaussian mixtures and autoencoders. In: Advances in Neural Information Processing Systems, pp. 2375–2383 (2012)
Barry, D., Coyle, E., Fitzgerald, D., Lawlor, R.: Single channel source separation using short-time independent component analysis. In: Audio Engineering Society Convention 119 (2005). Audio Engineering Society
Bell, A.J., Sejnowski, T.J.: The “independent components” of natural scenes are edge filters. Vision research 37(23), 3327–3338 (1997)
Benaych-Georges, F.: Rectangular random matrices, entropy, and fisher’s information. Journal of Operator Theory, 371–419 (2009a)
Benaych-Georges, F.: Rectangular random matrices, related convolution. Probability Theory and Related Fields 144(3-4), 471–515 (2009b)
Benaych-Georges, F.: Rectangular random matrices, related convolution. Probability Theory and Related Fields 144(3-4), 471–515 (2009c)
Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal processing 81(11), 2353–2362 (2001)
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research 15, 1455–1459 (2014)
Brakel, P., Bengio, Y.: Learning independent features with adversarial nets for non-linear ica. arXiv preprint arXiv:1710.05050 (2017)
Cardoso, J.-F.: High-order contrasts for independent component analysis. Neural computation 11(1), 157–192 (1999)
Casey, M.A., Westner, A.: Separation of mixed audio sources by independent subspace analysis. In: ICMC, pp. 154–161 (2000)
Cébron, G., Dahlqvist, A., Male, C.: Universal constructions for spaces of traffics. arXiv preprint arXiv:1601.00168 (2016)
Chen, A., Bickel, P.J.: Efficient independent component analysis. The Annals of Statistics 34(6), 2825–2855 (2006)
Chissom, B.S.: Interpretation of the kurtosis statistic. The American Statistician 24(4), 19–22 (1970)
Chistyakov, G., Götze, F.: Characterization problems for linear forms with free summands. arXiv preprint arXiv:1110.1527 (2011)
Comon, P.: Independent component analysis, a new concept? Signal processing 36(3), 287–314 (1994)
Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic press, Cambridge, MA (2010)
Cornish, E.A., Fisher, R.A.: Moments and cumulants in the specification of distributions. Revue de l’Institut international de Statistique, 307–320 (1938)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, Hoboken, NJ (2012)
Cruces, S., Castedo, L., Cichocki, A.: Robust blind source separation algorithms using cumulants. Neurocomputing 49(1-4), 87–118 (2002)
Davies, M.E., James, C.J.: Source separation using single channel ica. Signal Processing 87(8), 1819–1832 (2007)
De Lathauwer, L., Castaing, J., Cardoso, J.-F.: Fourth-order cumulant-based blind identification of underdetermined mixtures. IEEE Transactions on Signal Processing 55(6), 2965–2973 (2007)
Edelman, A., Rao, N.R.: Random matrix theory. Acta Numerica 14, 233–297 (2005)
Eriksson, J., Koivunen, V.: Blind identifiability of class of nonlinear instantaneous ICA models. In: 2002 11th European Signal Processing Conference, pp. 1–4 (2002). IEEE
Eriksson, J., Koivunen, V.: Identifiability, separability, and uniqueness of linear ica models. IEEE signal processing letters 11(7), 601–604 (2004)
Frieze, A., Jerrum, M., Kannan, R.: Learning linear transformations. In: Proceedings of 37th Conference on Foundations of Computer Science, pp. 359–368 (1996). IEEE
Gao, P., Chang, E.-C., Wyse, L.: Blind separation of fetal ecg from single mixture using svd and ica. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 3, pp. 1418–1422 (2003). IEEE
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
Haykin, S., Chen, Z.: The cocktail party problem. Neural computation 17(9), 1875–1902 (2005)
Hiai, F., Petz, D.: The Semicircle Law, Free Random Variables and Entropy. Mathematical Surveys and Monographs, vol. 77, p. 376. American Mathematical Society, Providence, RI (2000)
Hoyer, P.O., Hyvärinen, A.: Independent component analysis applied to feature extraction from colour and stereo images. Network: computation in neural systems 11(3), 191–210 (2000)
Hyvarinen, A.J., Morioka, H.: Nonlinear ICA of temporally dependent stationary sources. (2017). Proceedings of Machine Learning Research
Hyvarinen, A.: A family of fixed-point algorithms for independent component analysis. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. 3917–3920 (1997a). IEEE
Hyvarinen, A.: One-unit contrast functions for independent component analysis: A statistical analysis. In: Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop, pp. 388–397 (1997b). IEEE
Hyvarinen, A.: Fast and robust fixed-point algorithms for Independent Component Analysis. IEEE transactions on Neural Networks 10(3), 626–634 (1999)
Hyvarinen, A., Morioka, H.: Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. In: Advances in Neural Information Processing Systems, pp. 3765–3773 (2016)
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4-5), 411–430 (2000)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis vol. 46. John Wiley & Sons, Hoboken, NJ (2004)
Hyvarinen, A., Sasaki, H., Turner, R.E.: Nonlinear ICA using auxiliary variables and generalized contrastive learning. arXiv preprint arXiv:1805.08651 (2018)
Ilmonen, P., Nordhausen, K., Oja, H., Ollila, E.: A new performance index for ica: properties, computation and asymptotic analysis. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 229–236 (2010). Springer
Lee, T.-W.: Independent Component Analysis. In: Independent Component Analysis, pp. 27–66. Springer, Boston (1998)
Lee, T.-W., Girolami, M., Bell, A.J., Sejnowski, T.J.: A unifying information-theoretic framework for independent component analysis. Computers & Mathematics with Applications 39(11), 1–21 (2000)
Lehner, F.: Cumulants in noncommutative probability theory i. noncommutative exchangeability systems. Mathematische Zeitschrift 248(1), 67–100 (2004)
Male, C.: Traffic distributions and independence: permutation invariant random matrices and the three notions of independence. arXiv preprint arXiv:1111.4662 (2011)
Meyer, C.D., Stewart, G.W.: Derivatives and perturbations of eigenvectors. SIAM Journal on Numerical Analysis 25(3), 679–691 (1988)
Mika, D., Budzik, G., Jozwik, J.: Single channel source separation with ica-based time-frequency decomposition. Sensors 20(7), 2019 (2020)
Mingo, J.A., Speicher, R.: Free Probability and Random Matrices vol. 35. Springer, New York (2017)
Mitsui, Y., Kitamura, D., Takamichi, S., Ono, N., Saruwatari, H.: Blind Source Separation based on independent low-rank matrix analysis with sparse regularization for time-series activity. In: Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference On, pp. 21–25 (2017). IEEE
Mogensen, P.K., Riseth, A.N.: Optim: A mathematical optimization package for Julia. Journal of Open Source Software 3(24), 615 (2018). https://doi.org/10.21105/joss.00615
Nadakuditi, R.R., Wu, H.: lingluanwh/FCA.jl: a blind source separation package based on the random matrix theory and free probability (2019). https://doi.org/10.5281/zenodo.2655944
Nica, A., Speicher, R.: Lectures on the Combinatorics of Free Probability vol. 13. Cambridge University Press, Cambridge (2006)
Oja, E., Yuan, Z.: The fastica algorithm revisited: Convergence analysis. IEEE Transactions on Neural Networks 17(6), 1370–1381 (2006)
Pearson, K.: Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(11), 559–572 (1901)
Pourazad, M., Moussavi, Z., Farahmand, F., Ward, R.: Heart sounds separation from lung sounds using independent component analysis. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pp. 2736–2739 (2006). IEEE
Smith, P.J.: A recursive formulation of the old problem of obtaining moments from cumulants and vice versa. The American Statistician 49(2), 217–218 (1995)
Speicher, R.: Multiplicative functions on the lattice of non-crossing partitions and free convolution. Mathematische Annalen 298(1), 611–628 (1994)
Voiculescu, D.: Limit laws for random matrices and free products. Inventiones mathematicae 104(1), 201–220 (1991)
Voiculescu, D.: The analogues of entropy and of fisher’s information measure in free probability theory, i. Communications in mathematical physics 155(1), 71–92 (1993)
Voiculescu, D.: The analogues of entropy and of fisher’s information measure in free probability theory, ii. Inventiones mathematicae 118(1), 411–440 (1994)
Voiculescu, D.: Operations on certain non-commutative operator-valued random variables, in recent advances in operator algebras. Astérisque 232, 243–275 (1995)
Voiculescu, D.: The analogues of entropy and of fisher’s information measure in free probability theory, iv: maximum entropy and freeness, in free probability theory. Fields Inst. Commun. 12, 293–302 (1997)
Acknowledgements
We thank Peter Bickel for inspiring us to revisit FCA via a serendipitous meeting at the Santa Fe Institute in December 2015. That meeting, and his remarks on ICA and all the ways in which it is natural, provided the spark for us spending the rest of that workshop and the following month thinking about all the ways that FCA was natural for random matrices and images. We implemented our first FCA algorithm soon thereafter and leaned into the theory after getting, and being overjoyed by, the image separation results in Fig. 3d. We thank Arvind Prasadan for his detailed comments and suggestions on earlier versions of this manuscript. We are grateful to Alfred Hero for his suggestion to try the denoising simulation in Fig. 3a which brought into sharp focus for us for the first time that FCA could do (much) better than ICA. (This was a simulation we had been avoiding till because we feared the opposite!) This work has benefited from Roland Speicher’s many insightful comments and suggestions and from Octavio Arizmendi Echegaray’s remarks that made us better understand the underlying free probabilistic structures that made some of the FCA identifiability-related questions fundamentally different than their ICA counterparts. This research was supported by ONR grant N00014-15-1-2141, DARPA Young Faculty Award D14AP00086 and ARO MURI W911NF-11-1-039. A Julia implementation of the FCA algorithm as well as code to reproduce the simulations and figures in this paper is available at Github [52].
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Rachel Ward.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
What is Freeness of Random Variables?
The goal of this section is to introduce the freeness of non-commutative random variable. We first discuss independence (freeness) in the context of the scalar probability, free probability for self-adjoint (non-commutative) random variables and free probability for rectangular (non-commutative) random variables, respectively. We focus on the behavior of (free) cumulants and (free) entropy of independent (free) random variables, which are the basis ICA (FCA). The connection between independent random matrices and free random variables is given at the end.
For a detailed introduction of free probability, readers are referred to [32, 49, 53].
1.1 Prologue: What is Independence of Commuting Random Variables?
Here, we briefly review statistical independence in scalar probability. We state the behavior of cumulants and entropy of independent random variables, which are the basis of ICA. In the end, we discuss the unique role of the Gaussian random variables play in ICA.
1.1.1 Mixed Moments Point of View
Let I denote an index set, and \((x_i)_{i\in I}\) denote random variables. They are independent if for any \(n \in {\mathbb {N}}\) and \(m_1, \ldots , m_n \ge 0\),
if \(i(j) \in I\), \(j = 1,\ldots n\) are all distinct. An alternative definition is that for any polynomials \(P_1,\ldots P_n\) of one variables,
if \({{\,\mathrm{{\mathbb {E}}}\,}}[P_j(x_{i(j)})] = 0\) for all \(j = 1, \ldots , n\) and \(i(j) \in I\), \(j = 1,\ldots n\) are all distinct.
1.1.2 Cumulants: Kurtosis and Higher Order—Independent Additivity
The (joint) cumulants of n random variables \(a_1, \ldots , a_k\) are defined by
where \(\pi \) runs through all partitions of \(\{1, \ldots , m\}\) and B runs through all blocks of partition \(\pi \). Equivalently, \(\{c_m\}_{m \ge 1}\) is defined through
The reason that ICA adapts an optimization problem involving cumulants is the following property: If \((x_i)_{i \in I}\) are independent, then for any \(n\in {\mathbb {N}}\)
whenever there exists \(1\le \ell , k\le n\) with \(i(\ell ) \ne i(k)\). That is, any cumulants involving two (or more) independent random variables are zero. Adapt the notation
A quick consequence of (A4) is that for independent \(x_1\) and \(x_2\),
1.1.3 Entropy: Independent Additivity
For random variables \(x_1,\ldots , x_n\) with joint distribution \(f(x_1,\ldots ,x_n)\), the (joint) entropy is defined by [21]
The joint entropy of a set of variables is less than or equal to the sum of the individual entropies of the variables in the set,
In particular, the equality in (A7) holds if and only if \(x_1,\ldots , x_n\) are independent. Therefore, entropy is regarded as a measure of independence and thus can be used in ICA.
We also want to recall another handful property of entropy. For random vectors x, y satisfying linear relation \(y = \varvec{A} x\), we have that
In particular, the entropy is invariant under orthogonal linear transformation.
1.1.4 Why Gaussians cannot be Unmixed: Gaussians have Zero Higher-Order Cumulants
In ICA, the optimization problem people used finds the independent direction by maximizing the kurtosis (fourth cumulant). However, all cumulants of order larger than 2 for Gaussian random variables vanish. Thus, ICA fails to unmix Gaussian random variables. ICA based on the entropy also fails to unmix Gaussian random variables, as nontrivial mixtures of independent Gaussian random variables can still be independent Gaussian. On the other hand, it was shown that this is the only case where ICA does not work [18]. A result of this kind is called an identifiability condition.
1.2 Freeness of Self-Adjoint Random Variables
We first introduce the definition of probability space for non-commutative random variables. The starting point is an unital algebra of non-commutative variables.
Definition 5
Let \({\mathcal {X}}\) be a vector space over \({\mathbb {C}}\) equipped with product \(\cdot : {\mathcal {X}}\times {\mathcal {X}}\mapsto {\mathcal {X}}\). Denote the vector space addition by \(+\), we call \({\mathcal {X}}\) an algebra if for all \(a, b, c\in {\mathcal {X}}\) and \(\alpha \in {\mathbb {C}}\),
-
(a)
\(a(bc) = (ab)c\),
-
(b)
\(a(b + c) = ab + ac\),
-
(c)
\(\alpha (ab) = (\alpha a)b = a(\alpha b)\).
We call \({\mathcal {X}}\) a unital algebra if there is a unital element \(1_{\mathcal {X}}\) such that, for all \(a \in {\mathcal {X}}\)
An algebra \({\mathcal {X}}\) is called a \(*\)-algebra if it is also endowed with an antilinear \(*\)-operation \({\mathcal {X}}\ni a \mapsto a^* \in {\mathcal {X}}\), such that \((\alpha a)^* = {\bar{\alpha }} a^*\), \((a^*)^* = a\) and \((ab)^* = b^*a^*\) for all \(\alpha \in {\mathbb {C}}\), \(a, b \in {\mathcal {X}}\).
Note that \(ab = ba\) does not necessarily hold for general \(a,b \in {\mathcal {X}}\), i.e., they are non-commutative.
Definition 6
A (non-commutative) \(*\)-probability space \(({\mathcal {X}}, \varphi )\) consists of a unital \(*\)-algebra and a linear functional \(\varphi : {\mathcal {X}}\rightarrow {\mathbb {C}}\), which serves as the “expectation.” We also require that \(\varphi \) satisfies
-
(a)
(positive) \(\varphi (aa^*) \ge 0\) for all \(a \in {\mathcal {X}}\).
-
(b)
(tracial) \(\varphi (ab) = \varphi (ba)\) for all \(a, b \in {\mathcal {X}}\).
-
(c)
\(\varphi (1_{\mathcal {X}}) = 1\).
The elements \(a \in {\mathcal {X}}\) are called non-commutative random variables. (We may omit the word non-commutative if there is no ambiguity.) Given a series of random variables \(x_1, \ldots , x_k \in {\mathcal {X}}\), for any choice of \(n \in {\mathbb {N}}\), \(i(1),\ldots ,i(n) \in [1..k]\) and \(\epsilon _1, \ldots , \epsilon _n \in \{1, *\}\), \(\varphi (x_{i(1)}^{\epsilon _1}\ldots x^{\epsilon _n}_{i(n)})\) is a mixed moment of \(\{x_i\}_{i = 1}^k\). The collection of all moments is called the joint distribution of \(x_1,\ldots , x_k\).
The moments of general random variables can be complex-valued; self-adjoint random variables, which are defined below, necessarily have real-valued moments and will be the object of our study.
Definition 7
Let \(({\mathcal {X}}, \varphi )\) be a non-commutative probability space, an element \(a \in {\mathcal {X}}\) is self-adjoint if \(a = a^*\). In particular, the moments of self-adjoint elements are real (see Remark 1.2 in [53]).
The counterpart of independence in free probability is freely independence or simply free. We now consider the freeness of self-adjoint random variables from various perspectives as in Sect. A.1.
1.2.1 Mixed Moments Point of View
The following official definition of freeness should be compared with (A1).
Definition 8
Let \(({\mathcal {X}}, \varphi )\) be a non-commutative probability space and fix a positive integer \(n \ge 1\).
For each \(i \in I\), let \({\mathcal {X}}_i \subset {\mathcal {X}}\) be a unital subalgebra. The subalgebras \(({\mathcal {X}}_i)_{i \in I}\) are called freely independent (or simply free), if for all \(k \ge 1\)
whenever \(\varphi (x_j) = 0\) for all \(j = 1, \ldots , k,\) and neighboring elements are from different subalgebras, i.e., \(x_j \in {\mathcal {X}}_{i(j)}\), \(i(1) \ne i(2), i(2) \ne i(3),\ldots , i(k-1)\ne i(k)\).
In particular, a series of elements \((x_i)_{i \in I}\) are called free if the subalgebras generated by \(x_i\) and \(x_i^*\) are free.
1.2.2 Free Cumulants: Free Additivity
The analog of cumulants for non-commutative random variables is called free cumulants, which was proposed by Roland Speicher [53, 58].
The notion of non-crossing partition lies underneath the free probability and free cumulants.
Definition 9
(Non-crossing Partition, Definition 9.1 of [53]) Consider set \(S = [1..n]\).
-
(a)
We call \(\pi = \{V_1, \ldots , V_r\}\) a partition of the set S if and only if \(V_i\) (\(1\le i \le r\)) are pairwise disjoint, non-void subsets of S such that \(\cup _{i =1}^r V_{i} = S\). We call \(V_1, \ldots , V_r\) the block of \(\pi \). Given two elements \(a, b \in S\), we write \(a \sim _\pi b\) if a and b belong to the same block of \(\pi \).
-
(b)
A partition \(\pi \) of the set S is called non-crossing if there does not exist any \(a_1< b_1< a_2 < b_2 \) in S such that \(a_1 \sim _\pi a_2 \not \sim b_1 \sim _{\pi } b_2\).
-
(c)
The set of all non-crossing partitions of S is denoted by NC(n).
Definition 10
Given a \(*\)-probability space \(({\mathcal {X}}, \varphi )\), the free cumulants refer to a family of multilinear functionals \(\{\kappa _m: {\mathcal {X}}^m \mapsto {\mathbb {C}}\}_{m \ge 1}\). Here, the multilinearity means that \(\kappa _m\) is linear in one variable when others hold constant, i.e., for any \(\alpha , \beta \in {\mathbb {C}}\) and \(a, b \in {\mathcal {X}}\),
Explicitly, for \(a_1, \ldots , a_n \in {\mathcal {X}}\), their mixed free cumulant is defined through (cf. (A3))
Equivalently (cf. (A2)),
where \(\mu \) is the Möbius function on NC(n).
Example 1
We have that
Recall that in the scalar probability, mixed cumulants of independent random variables vanish (see (A4)). The same holds for the free cumulants in the free probability.
Theorem 12
(Theorem 11.16 of [53]) Let \(({\mathcal {X}},\varphi )\) be a non-commutative probability space with associated free cumulants \((\kappa _\ell )_{\ell \in {\mathbb {N}}}\). Consider random variables \((x_i)_{i\in I}\). Assume that they are freely independent. Then, for all \(n \ge 2\), and \(i(1), \ldots , i(n) \in I\), we have \(\kappa _n(a_{i(1)},\ldots ,a_{i(n)}) = 0\) whenever there exist \(1\le l,k \le n\) with \(i(l)\ne i(k)\).
With the above theorem, one can easily show the free additivity of free cumulants.
Proposition 13
Consider a non-commutative probability space \(({\mathcal {X}}, \varphi )\). For a self-adjoint random variable \(a \in {\mathcal {X}}\), set
-
(a)
For any \(m \ge 1\) and \(\alpha \in {\mathbb {C}}\), we have that
$$\begin{aligned} \kappa _m(\alpha a) = \alpha ^m \kappa _m(a). \end{aligned}$$(A14)This immediately follows from the multilinearity of free cumulants (see (A10)).
-
(b)
(Free additivity, Proposition 12.3 in [53]) For any \(m \ge 1\), if \(a, b \in {\mathcal {X}}\) are freely independent, then
$$\begin{aligned} \kappa _m(a + b) = \kappa _m(a) + \kappa _m(b). \end{aligned}$$(A15)The above equation should be compared with (A5).
1.2.3 Free Entropy: Free Additivity
For non-commutative random variables, the free entropy is introduced by Voiculescu [60, 61, 63]. Here, we provide a brief introduction. Readers are referred to Section 6 of [32] for further details.
We first examine the Boltzmann–Gibbs formula of classical entropy. The idea is that the entropy of a “macrostate” is proportional to the logarithm of its probability, which is determined by the count of associated “microstates.” Mathematically, the association is defined through an appropriate distance, and the probability of a “macrostate” is given by the volume of all close “microstates.” This motivates the following formulation of scalar entropy.
Let a be a random variable supported in a finite interval \([-R, R]\), then its entropy is a limit of log volumes:
where \(\lambda _N\) is the N-dimensional Lebesgue measure, \(m_k\) denotes the kth moment and \(\delta _N(x)\) is the atomic measure \((\delta (x_1) + \delta (x_2) + \cdots + \delta (x_N)) / N\) serving as “microstates.” Here, the volume is Lebesgue measure of \(x \in {\mathbb {R}}^n\) whose corresponding atomic measure approximates a up to rth moments. One then takes a normalized limit improving the approximation to get entropy.
The moments are estimated using the functional \(\varphi (\cdot )\) in free probability. Due to the non-commutative nature of matrices and the fact that free independence occurs asymptotically among large matrices (see Sect. A.4), one can adapt self-adjoint matrices for “microstates.” We then arrive at the following definition of free entropy.
Definition 11
Let \(M_N({\mathbb {C}})^{sa}\) denote all \(N \times N\) self-adjoint matrices and \({{\,\mathrm{\mathrm{tr}}\,}}(\cdot ):= \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}(\cdot )\) denote normalized trace. Given a \(*\)-probability space \(({\mathcal {X}}, \varphi )\) and a self-adjoint element \(a \in {\mathcal {X}}\). For \(n,r \in {\mathbb {N}}\), \(\epsilon > 0\) and \(R > 0\), we define the set
Recall that there is a natural linear bijection between \(M_N({\mathbb {C}})^{sa}\) and \({\mathbb {R}}^{N^2}\), and let \(\Lambda _N\) denote the induced measure on \(M_N({\mathbb {C}})^{sa}\) from the Lebesgue measure of \({\mathbb {R}}^{N^2}\), the free entropy of a is then defined by:
One can extend the above definition to multivariate case. For self-adjoint elements \(a_1, \ldots , a_s\in {\mathcal {X}}\), define the set
the joint free entropy is then given by
The free entropy shares the similar properties with the scalar entropy.
Proposition 14
Let \(\varvec{x}= (x_1,\ldots ,x_s)^T\) where \(x_i\) are self-adjoint non-commutative random variables. Let \(\varvec{O}(s)\) denote the set of \(s\times s\) orthogonal matrices. Then, for any \(\varvec{Q}= (q_{ij})_{i,j=1}^s\in \varvec{O}(s)\),
That is, the free entropy is invariant under the orthogonal transformation (cf. (A8)).
Proof
This proposition is a special case of a general result. For any matrix \(\varvec{A} \in {\mathbb {R}}^{n \times n}\), we actually have that (see Corollary 6.3.2 in [32]),
Now, for \(\varvec{Q}\in \varvec{O}(s)\), \(\varvec{Q}^T\varvec{Q}= \varvec{I}\), thus
That is, \(|{\det \mathbf {Q}}| = 1\) and thus \(\log |{\det \mathbf {Q}}| = 0\). Now, set \(\varvec{A} = \varvec{Q}\) in (A20), we obtain (A19). \(\square \)
The following proposition is the analogue of (A7) for free entropy.
Proposition 15
Let \(x_1,\ldots ,x_s\) be self-adjoint non-commutative random variables, then
Further assume that \(\chi (x_i) > -\infty \) for \(i = 1,\ldots ,n\), then the above equality holds if and only if \(x_1,\ldots ,x_s\) are freely independent.
Proof
The proof for the inequality can be found in Proposition 6.1.1 in [32]. The equivalence between the equality and freely independence is Theorem 6.4.1 in [32]. \(\square \)
1.2.4 Analogue of Gaussian Random Variables in Free Probability: The Free Semicircular Element
The analogous element to a Gaussian random variable in a \(*\)-probability space is a semicircular element. Recall that the Gaussian random variable is characterized by vanishing cumulants of order higher than 2; the semicircular elements can be defined in a similar manner.
Definition 12
Given a \(*\)-probability space \(({\mathcal {X}}, \varphi )\), we call a random variable \(a \in {\mathcal {X}}\) a semicircular element if
and \(\kappa _2(a) > 0\) (such that a is not constant).
1.3 Freeness of Non-self-adjoint Random Variables
We briefly introduce the mathematical preliminaries for a rectangular probability space. We omit some technicalities, which are beyond the scope of this paper. For a thorough introduction, readers are referred to [6, 7].
Consider a \(*\)-probability space \(({\mathcal {X}}, \varphi )\) with \(p_1,p_2\) of nonzero self-adjoint projections which are pairwise orthogonal (i.e. \(\forall i \ne j, p_ip_j = 0\)), and such that \(p_1 + p_2 = 1_{\mathcal {X}}\). Then, any element \(a \in {\mathcal {X}}\) can be represented in the following block form
where \(\forall i,j = 1,2, a_{ij} = p_i a p_j\) and we define \({\mathcal {X}}_{ij} := p_i {\mathcal {X}}p_j\). Note that \({\mathcal {X}}_{ii}\) is a subalgebra, and we equip it with the functional \(\varphi _k = \frac{1}{\rho _k} \varphi \vert _{{\mathcal {X}}_{kk}}\), where \(\rho _k := \varphi (p_k)\). That is,
and similar for \(\varphi _2(x)\). The functionals \(\varphi _i\), \(i = 1, 2\), are tracial in the sense that \(\varphi _k(p_k) = 1\) and for all i, j, \(x \in {\mathcal {X}}_{ij}\), \(y \in {\mathcal {X}}_{ji}\),
Definition 13
Such a family \(({\mathcal {X}}, p_1, p_2, \varphi _1, \varphi _2)\) is called a \((\rho _1, \rho _2)\)-rectangular probability space. We call \(a \in {\mathcal {X}}_{12} = p_1 {\mathcal {X}}p_2\) rectangular random variable.
Remark 7
If a is a rectangular element, then in the matrix decomposition (A24), only \(a_{12}\) is nonzero. Later, in Sect. A.4.2, we will model rectangular matrices by embedding them into \(a_{12}\) of rectangular random variables.
For such a rectangular probability space, the linear span of \(p_1, p_2\) is denoted by \({\mathcal {D}}\). Then, \({\mathcal {D}}\) is subalgebra of finite dimension. Define the \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(a) = \sum _{i = 1}^2 \varphi (a_{ii})p_i\). It can be checked that \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(1_{\mathcal {X}}) = 1_{\mathcal {X}}\) and \(\forall (d, a, d') \in {\mathcal {D}}\times {\mathcal {X}}\times {\mathcal {D}}\), \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(dad') = d{{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(a)d'\). The map \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(\cdot )\) is regarded as the conditional expectation from \({\mathcal {X}}\) to \({\mathcal {D}}\).
We now consider the freeness in rectangular probability space.
1.3.1 Mixed Moments Point of View
The following definition of freeness should be compared with (A1) and Definition 8.
Definition 14
Given a rectangular probability space and subalgebra \({\mathcal {D}}\) with the corresponding conditional expectation \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}\). A family \(({\mathcal {X}}_i)_{i\in I}\) of subalgebras containing \({\mathcal {D}}\) is said to be free with amalgamation over \({\mathcal {D}}\) (we simply use the word free when there is no ambiguity) if for all \(k \ge 1\)
whenever \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(x_j) = 0\) for all \(j = 1, \ldots , k\), and neighboring elements are from different subalgebras, i.e., \(x_j \in {\mathcal {X}}_{i(j)}\), \(i(1) \ne i(2), i(2) \ne i(3),\ldots , i(k-1)\ne i(k)\). In particular, a family of rectangular random variables \(\{x_i\}_{i\in I}\) are called free if the subalgebras generated by \({\mathcal {D}}\), \(x_i\), and \(x_i^*\) are free.
1.3.2 Rectangular Free Cumulants: Free Additivity
The free cumulants are also defined for rectangular probability space [6, 7].
Definition 15
(Analogue of cumulant in rectangular probability space) Given a \((\rho _1,\rho _2)\)-probability space \(({\mathcal {X}}, p_1, p_2, \varphi _1, \varphi _2)\), for any \(n \ge 1\), we denote nth tensor product over \({\mathcal {D}}\) of \({\mathcal {X}}\) by \({\mathcal {X}}^{\otimes _{{\mathcal {D}}^m}}\). We recall a family of linear functions \(\{\kappa _{m}: {\mathcal {X}}^{\otimes _{{\mathcal {D}}^m}} \mapsto {\mathbb {C}}\}_{m \ge 1}\) introduced in [7] (which are denoted as \(c^{(1)}\) in [7], see Section 3.1 there). By linearity, we mean that for \(m \ge 1\) and any \(a, b \in {\mathcal {X}}\) and \(a, b \in {\mathbb {C}}\),
For convenience, we call \(\{\kappa _{m}\}_{m \ge 1}\) rectangular free kurtosis (or kurtosis when there is no ambiguity). For each \(m \ge 1\) and any rectangular random variable a, we put
We consider the even order as odd order cumulants vanishes for all rectangular elements.
Remark 8
In [6, 7], the free cumulants refer to a family of linear functions between \({\mathcal {X}}^{\otimes _{{\mathcal {D}}^n}}\) and \({\mathcal {D}}\). The rectangular cumulants throughout the paper are their coefficient functions of \(p_1\).
The following vanishing lemma holds for the rectangular cumulants defined as in above.
Theorem 16
(Vanishing of mixed cumulants, Theorem 2.1 of [6]) A family \((x_i)_{i \in I}\) of elements in \({\mathcal {X}}\) is free with amalgamation over \({\mathcal {D}}\) if and only if for all \(n \ge 2\), and \(i(1), \ldots , i(n) \in I\), we have \(\kappa _n(x_{i(1)}\otimes \cdots \otimes x_{i(n)}) = 0\) whenever there exists \(1\le l, k\le n\) with \(i(l) \ne i(k)\).
Consequently, the analogue of Proposition 13 also hold for rectangular cases with rectangular free kurtosis defined in (A29). The analog of (A15) for rectangular free kurtosis follows from equation (10) in [7]. The analog of (A14) is a direct result of (A28).
1.3.3 Rectangular Free Entropy: Free Additivity
The free entropy \(\chi \) for rectangular free probability space is introduced in [6]. The idea is similar to the self-adjoint case. One adapts rectangular matrices as “microstates” and use conditional expectation \({{\,\mathrm{{\mathbb {E}}}\,}}_{\mathcal {D}}(\cdot )\) to evaluate moments. Readers are referred to Section 5.1 of [6] for a precise definition.
The analogues of Proposition 14 and 15 also hold for rectangular free entropy. The orthogonal invariance of rectangular free entropy is a direct result of Corollary 5.11 of [6]. On the other hand, Proposition 5.3, Theorem 5.7 and Corollary 5.16 of [6] together prove the analogue of Proposition 15 for rectangular case.
1.3.4 Analogue of Gaussian Random Variables in Rectangular Free Probability: The Free Poisson Element
Definition 16
Given a rectangular probability space \(({\mathcal {X}},\varphi )\). A rectangular random variable \(a \in {\mathcal {X}}_{12}\) is a free Poisson element if
1.4 When are Random Matrices (Asymptotic) Free?
Here, we describe the free probability in the context of random matrices and the explicit formulas of free kurtosis and entropy as functions of the input matrices.
1.4.1 Symmetric Random Matrix
Given a \(N > 0\), we consider the algebra consists of all the real \(N \times N\) matrices over scalar random variables \(L^{2}(\Sigma , P)\):
and for any \(\varvec{X} \in {\mathcal {X}}\), the functional \(\varphi \) on it is
Denote the matrix transpose with complex conjugate by \(*\). Then, \(({\mathcal {X}}, \varphi )\) is a \(*\)-probability space.
We recall the notion of convergence in distribution and the definition of asymptotic freely independence [53].
Definition 17
(Asymptotic freely independence) Let \(({\mathcal {X}}_N , \varphi _N)~(N \in \mathbb {N})\) and \(({\mathcal {X}}, \varphi )\) be non-commutative probability spaces. Let I be an index set and consider for each \(i \in I\) random variables \(a_i(N) \in {\mathcal {X}}_N\) and \(a_i \in {\mathcal {X}}\). We say that \((a_i(N))_{i \in I}\) converges in distribution toward \((a_i)_{i \in I}\) if we have each joint moment of \((a_i(N))_{i \in I}\) converges toward the corresponding joint moment of \((a_i)_{i \in I}\), i.e., for all \(n \in N\) and all \(i(i), \ldots , i(n) \in I\)
Furthermore, we say \((a_i(N))_{i \in I}\) are asymptotic free if it converges in distribution to a limit \((a_i)_{i\in I}\), which is free in \(({\mathcal {X}},\varphi )\).
A pair of symmetric (Hermitian) random matrices with isotropically random eigenvectors that are independent of the eigenvalues (and each other) are asymptotically free [53].
Given the \(*\)-probability space \(({\mathcal {X}}, \varphi (\cdot ))\) defined as above, recall the free kurtosis defined in (14). Thus, for a self-adjoint random matrix \(\varvec{X} \in {\mathcal {X}}\) with \(\varphi (\varvec{X}) = 0\), the free kurtosis is explicitly given by
Also, denote the eigenvalues density function of \(\varvec{X}\) by \(\mu (x)\), free entropy is defined by [32]
For a large class of random matrices \(\varvec{X}\), the free kurtosis and entropy concentrate around a deterministic value when N is large. For example, if \(\varvec{X}\) is a Wigner matrix or Wishart matrix, then \(\mathrm {Var}[\kappa _4(\varvec{X})] \rightarrow 0\) and \(\mathrm {Var}[\chi (\varvec{X})] \rightarrow 0\) as \(N \rightarrow \infty \). Thus, single sample gives us an accurate empirical estimate. Given a realization x of a random matrix \(\varvec{X}\) with \({{\,\mathrm{{\mathbb {E}}}\,}}\left[ {{\,\mathrm{\mathrm{Tr}}\,}}(\varvec{X})\right] = 0\), the empirical free kurtosis is
Also, the empirical free entropy is given by
where \(\lambda _i\) denotes the eigenvalue of x.
1.4.2 Rectangular Random Matrix
Consider a rectangular random matrix of size \(N \times M\), and assume that \(N \le M\). In [7], the author embedded a \(N \times M\) matrix into the top right block of a \((N + M) \times (N + M)\) “extension matrix.” The algebra of all \((N + M) \times (N + M)\) random matrices together with this block structure is defined as a rectangular probability space \((\mathbb M_{N + M}(L^2(\Sigma , {\mathbb {P}})), \mathrm {diag}(I_N, 0_M), \mathrm {diag}(0_N, I_M), \frac{1}{N} {{\,\mathrm{\mathrm{Tr}}\,}}, \frac{1}{M} {{\,\mathrm{\mathrm{Tr}}\,}})\) [7].
We recall the following definition of asymptotic freely independence in rectangular probability space [6].
Definition 18
(Asymptotic free independence) Let for each \(N \in {\mathbb {N}}\), \(({\mathcal {X}}_N, p_1(N), p_2(N), \varphi _{1,N}, \varphi _{2,N})\) be a \((\rho _{1,N}, \rho _{2,N})\)-rectangular probability space such that
Let I be an index set and consider for each \(i \in I\) random variables \(a_i(N) \in {\mathcal {X}}_N\). We say that \((a_i(N))_{i\in I}\) converges in \({\mathcal {D}}\)-distribution toward \((a_i)_{i\in I}\) for some random variables \(a_i \in {\mathcal {X}}\) in some \((\rho _1, \rho _2)\)-probability space \(({\mathcal {X}},p_1, p_2,\varphi _1, \varphi _2)\) if the \({\mathcal {D}}\)-distribution converge pointwise.
Furthermore, we say \((a_i(N))_{i \in I}\) are asymptotically free \((N \rightarrow \infty )\), if the limits \((a_i)_{i\in I}\) are free in \(({\mathcal {X}},p_1, p_2,\varphi _1, \varphi _2)\).
Independent bi-unitary invariant rectangular random matrices with converging singular law are asymptotically freely independent [6, 7].
Following (15), the free kurtosis for a single \(N \times M\) random matrices \(\varvec{X}\) is given by
Denoting the probability density function of eigenvalues of \(\varvec{X} \varvec{X}^H\) by \(\mu (x)\), setting \(\alpha = \frac{N}{N + M}\) and \(\beta = \frac{M}{N + M}\), the free entropy is given by [6]
Again, empirical statistics over a single sample of large dimension give an accurate estimate of limit value. Given a realization x of a rectangular random matrix \(\varvec{X}\), the empirical free kurtosis is given by
The empirical free entropy is given by
where \(\lambda _i\) denote the eigenvalue of \(xx^H\).
Proof of Propositions 6 and 7
We proof Propositions 6 and 7 for the covariance matrix for rectangular case. The self-adjoint case can be proved with straightforward modification.
1.1 Proof of Proposition 6
By Remark 1.2 of [53], for any random variable a, \(\varphi (a^*) = \overline{\varphi (a)}\). Thus,
Therefore, \(\varvec{C}_{\varvec{z} \varvec{z}}\) is Hermitian.
We turn to show that \([\varvec{C}_{\varvec{z} \varvec{z}}]\) is positive semi-definite. Actually, as \(\varphi \) is a linear functional, for any column vector \(\varvec{\alpha }= [\alpha _1, \ldots , \alpha _s]\),
where we used that \(\varphi (\cdot )\) is positive. This completes the proof.
1.2 Proof of Proposition 7
Since \(\varvec{z} = \varvec{A} \varvec{x}\) and \(\varvec{C}_{xx} = \varvec{I}\),
Note that we assume that \(\varvec{A}\) is real and non-singular, \(\varvec{C}_{\varvec{z}\varvec{z}}\) is real and positive-definite.
Proofs of the Main Results
1.1 Proof of Theorem 1
The proof of Theorem 1 relies on the free additivity of free cumulants, for which readers are referred to Proposition 13 (and its rectangular analogue in Sect. A.3.2).
1.1.1 Proof of Theorem 1(a)
Set \(\varvec{g} = \varvec{Q}^T \varvec{w}\), then \(\varvec{w}= \varvec{Q}\varvec{g}\). As \(\varvec{x}\) and \(\varvec{y}\) are related via (12), we have that
Adapt the notation \(\varvec{g} = (g_1,\ldots ,g_s)^T\). Note that \(x_i\) are freely independent, then using (A15), we have that
By (A14), \(\kappa _4(g_ix_i) = g_i^4 \kappa _4(x_i)\) for \(i = 1, \ldots , s\); thus, the above equation becomes
Combining (C44) and (C46), we get
When \(\varvec{w}\) runs over all unit vectors, \(g = \varvec{Q}^T \varvec{w}\) also runs over all unit vectors. Therefore, if \(\varvec{w}^{(1)}\) is a maximizer of (17), then \(\varvec{w}^{(1)} = \varvec{Q}g^{(1)}\) where \(g^{(1)}\) is a maximizer of
Thus, in order to prove (a), it is equivalent to show that \(\varvec{g}^{(1)}\) is maximizer of (C48) if and only if \(\varvec{g}^{(1)} \in \{(\pm 1, 0,\ldots ,0)^T\}\).
For any unit vector u, since \(|{g_i}| \le 1\), we have that
Note that the equality holds if and only if there is a index i such that \(g_i \in \{\pm 1\}\) (thus \(g_j = 0\) for all \(j \ne i\)). Then, using (16) and (C49),
On the other hand, for \(\varvec{g} = (\pm 1, 0,\ldots , 0)^T\), it can be checked that all equalities in (C50) hold. Thus,
and \(\varvec{g}^{(1)}\) is a maximizer of (C48) if \(\varvec{g}^{(1)} \in \{(\pm 1, 0,\ldots ,0)^T\}\).
For the other direction, if \(\varvec{g}^{(1)}\) is maximizer of (C48), then the second equality in (C50) holds for \(\varvec{g} = \varvec{g}^{(1)}\). That is,
Due to (18), \(|{\kappa _4(x_i)}| - |{\kappa _4(x_1)}| < 0\) for \(i = 2,\ldots ,s\). Thus, (C52) implies \(g^{(1)}_i = 0\) for \(i = 2,\ldots ,s\). Since \(\varvec{g}^{(1)}\) is a unit vector, \(\varvec{g}^{(1)} \in \{(\pm 1, 0, \ldots , 0)^T\}\). This completes the proof.
1.1.2 Proof of Theorem 1(b)
In the proof of (a), the arguments up to (C52) only rely on properties of free kurtosis \(\kappa (\cdot )\) and condition (16). Thus, (C48), (C50), (C51) and (C52) also apply in the setting of (b). Thus, in order to prove (b), it is equivalent to show that \(u^{(1)}\) is a maximizer of (C48) if and only if
-
(i)
\(g^{(1)}_i = 0\) for \(i = r + 1,\ldots ,s\),
-
(ii)
there is an index i such that \(g_i^{(1)} \in \{\pm 1\}\).
The backward direction can be checking directly using \(|{\kappa _4(x_1)}| = \cdots = |{\kappa _4(x_r)}|\).
We now prove the forward direction. If \(\varvec{g}^{(1)}\) maximizes (C48), then it satisfies (C52). By (20), \(|* |{\kappa _4(x_i)} - |* |{\kappa _4(x_1)} = 0\) for \(i = 1,\ldots ,r\) and \(|{\kappa _4(x_i)}| - |{\kappa _4(x_1)}| < 0\) for \(i = r + 1,\ldots ,s\). (i) then follows. On the other hand, as \(|{\kappa _4(x_1)}| = \cdots = |{\kappa _4(x_r)}|\), enforcing the third equality in (C50) implies
By the observation below (C49), this indicates (ii). This completes the proof.
1.2 Proof of Theorem 2
Set \(\varvec{g} = \varvec{Q}^T \varvec{w}\), we use the notation \(\varvec{g} = [g_1, \ldots , g_s]^T\). As \(\varvec{w}^{(i)} \in \{\pm \varvec{Q}_i\}\) for \(i = 1, \ldots , k - 1\),
Using (C47), if \(\varvec{w}^{(k)}\) is a maximizer of (22), then \(\varvec{w}^{(k)} = \varvec{Q}\varvec{g}^{(k)}\) where \(\varvec{g}^{(k)}\) is a maximizer of
Thus, in order to prove (a), it is equivalent to show that \(\varvec{g}^{(k)} = (g_1^{(k)}, \ldots , g_s^{(k)})^T\) is maximizer of (C55) if and only if \(g^{(k)}_k \in \{\pm 1\}\) (thus \(g^{(k)}_j = 0\) for \(j \ne k\)).
As we are maximizing over unit vector \(\varvec{g}\) such that \(g_1 = \cdots = g_{k - 1} = 0\), again using (16) and (C49)
For \(\varvec{g}\) with \(g_k \in \{ \pm 1\}\), it can be checked that all equalities in (C56) hold. Thus,
and \(\varvec{g}^{(k)}\) is a maximizer if \(g^{(k)}_k \in \{ \pm 1\}\).
For the other direction, if \(\varvec{g}^{(k)}\) is a maximizer of (C55), all equalities in (C56) hold with \(g = g^{(k)}\). In particular, the third equality in (C56) implies
Due to (18), \(|{\kappa _4(x_i)}| - |{\kappa _4(x_k)}| < 0\) for \(i = k + 1,\ldots ,n\). Thus, (C58) implies that \(g^{(k)}_i = 0\) for \(i = k + 1, \ldots , s\). Since \(\varvec{g}^{(k)}\) is a unit vector, \(g^{(k)}_k \in \{\pm 1\}\). This completes the proof.
1.3 Proof of Theorem 3
We prove Theorem 3 by showing the following:
-
(a)
\(\varvec{Q}\) is a maximizer of (24).
-
(b)
For any permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\), \(\varvec{Q}\varvec{S}\varvec{P}\) is a maximizer of (24).
-
(c)
Any maximizer \(\varvec{W}\) of (24) satisfies \(\varvec{W}= \varvec{Q}\varvec{S}\varvec{P}\) for some permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\).
1.3.1 Proof of (a)
We prove (a) by showing that
and \(\varvec{W}= \varvec{Q}\) reaches the maximum. Set \(\varvec{G} = \varvec{Q}^T\varvec{W}\in \varvec{O}(s)\). As \(\varvec{x}\) and \(\varvec{y}\) are related via (12),
Adapt the notation \(\varvec{G} = (g_{ij})_{i,j= 1}^s\). Then, for all \(i = 1,\ldots ,n\), \((\varvec{W}^T \varvec{y})_i = (\varvec{G}^T \varvec{x})_i = \sum _{j = 1}^sg_{ji} x_j\). Together with (A15) and (A14), for any \(i = 1, \ldots , s\), we have that
Apply triangular inequality to above equation, we get
Note that \((g_{j1},\ldots ,g_{jn})^T\) is a unit vector, by (C49), \(\sum _{j = 1}^sg_{ji}^4 \le 1\). Then, summing (C62) over \(i = 1, \ldots , n\), we obtain that
Actually, for \(\varvec{W}= \varvec{Q}\), \(\varvec{Q}^T \varvec{y} = \varvec{Q}^T \varvec{Q}\varvec{x} = \varvec{x}\), thus
Equations (C64) and (C63) together imply (C59). Then, by (C64), \(\varvec{Q}\) is a maximizer of (24).
1.3.2 Proof of (b)
We first introduce several notations. For a permutation matrix \(\varvec{P}= (p_{ji})_{i,j = 1}^s\), there is an associate permutation \(\sigma \) such that \(p_{\sigma (i)i} = 1\) and \(p_{ji} = 0\) for all \(i = 1, \ldots , s\) and \(j \ne \sigma (i)\). For a signature matrix \(\varvec{S}\), we denote its ith diagonal elements by \(S_i\).
Now for any \(\varvec{P}\) and \(\varvec{S}\), under the light of (C59), it is desired to show that \(\sum _{i = 1}^s\left| \kappa _4\left( ((\mathbf {Q}\mathbf {P}\mathbf {S})^T \mathbf {y})_i\right) \right| = \sum _{i = 1}^s|{\kappa _4(x_i)}|\). As \(\varvec{x}\) and \(\varvec{y}\) satisfy (12), we have
As \(S_i \in \{\pm 1\}\), by (A14)
Combining (C65) and (C66) together, we obtain that
This completes the proof of (b).
1.3.3 Proof of (c)
By (b), any matrix \({\widehat{\varvec{W}}}\) of the form \(\widehat{\varvec{W}} = \varvec{Q}\varvec{P}\varvec{S}\) is a maximizer. For the other direction, we want to show that any maximizer \({\widehat{\varvec{W}}}\) can be written in this form.
Actually, if \({\widehat{\varvec{W}}}\) is a maximizer, we consider \( ({{\widehat{g}}}_{ij})_{i,j = 1}^s= \widehat{\varvec{G}}= \varvec{Q}^T {\widehat{\varvec{W}}}\). The third equality of (C63) holds with \(g_{ij} = {{\widehat{g}}}_{ij}\). That is,
Since we assume the components of \(x\) have nonzero free kurtosis (see (25)) and \(\sum _{i = 1}^s\widehat{g}_{ji}^4 \le 1\) for \(j = 1,\ldots , s\), (C68) is equivalent to
By the observation below (C49), for each j, there is a i such that \({{\widehat{g}}}_{ji} \in \{\pm 1\}\), while \({{\widehat{g}}}_{jk} = 0\) for \(k \ne i\). That is, each column of \(\widehat{\varvec{G}}\) has exactly one nonzero entry. By Proposition 17, \(\widehat{\varvec{G}} \in \varvec{O}_{sp}\) and thus \(\widehat{\varvec{G}} = \varvec{P}\varvec{S}\) for some permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\). Recall that \(\widehat{\varvec{W}} = \varvec{Q}\widehat{\varvec{G}}\), we arrive at \({\widehat{\varvec{W}}} = \varvec{Q}\varvec{P}\varvec{S}\). This completes the proof.
1.4 Proof of Theorem 4
The proof of Theorem 4 relies on the orthogonal invariance and subadditivity of free entropy, for which readers are referred to Proposition 14 and 15 (and their rectangular analogues in Sect. A.3.3).
As in the proof of Theorem 3, we will show the following:
-
(a)
\(\varvec{Q}\) is a maximizer of (27).
-
(b)
For any permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\), \(\varvec{Q}\varvec{S}\varvec{P}\) is a maximizer of (27).
-
(c)
Any maximizer \(\varvec{W}\) of (27) satisfies \(\varvec{W}= \varvec{Q}\varvec{S}\varvec{P}\) for some permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\).
1.4.1 Proof of (a)
Set \(\varvec{Z} = \varvec{Q}^T\varvec{W}\). As \(\varvec{x}\) and \(\varvec{y}\) are related via (12), \(\varvec{W}^T \varvec{y} = (\varvec{Q} \varvec{Z})^T \varvec{Q} \varvec{x}= \varvec{Z}^T \varvec{x}\). Then, by (A22),
On the other hand, note that Z is an orthogonal matrix, then by (A19),
Combining (C70) and (C71) together, we obtain that, for any \(\varvec{W}\in \varvec{O}(s)\),
Now consider \(\varvec{W}= \varvec{Q}\). As \(\varvec{Q}^T \varvec{y} = \varvec{Q}^T \varvec{Q}\varvec{x}= \varvec{x}\), we have that
On the other hand, as \(x_i\) are freely independent, then by Proposition 15,
Then, (C73) becomes
Equations (C75) and (C72) together indicate
and \(\varvec{Q}\) is a maximizer of (27).
1.4.2 Proof of (b)
Adapt the notations introduced in the proof of Theorem 3 (b). For any permutation matrix \(\varvec{P}\) associated with permutation \(\sigma \) and signature matrix \(\varvec{S}= {{\,\mathrm{\mathrm{diag}}\,}}(S_1,\ldots ,S_s)\), we have that (see (C65))
Thus,
As \(S_i \in \{\pm 1\}\) can be regarded as 1-by-1 orthogonal matrices, then the one-dimensional version of (A20) yields
Then, (C78) becomes
Under the light of (C76), \(\varvec{Q}\varvec{P}\varvec{S}\) is a maximizer of (27).
1.4.3 Proof of (c)
By (b), any matrix \({\widehat{\varvec{W}}}\) of the form \(\widehat{\varvec{W}} = \varvec{Q}\varvec{P}\varvec{S}\) is a maximizer. For the other direction, it is enough to show that any maximizer \(\widehat{\varvec{W}}\) of (27) can be written in the form \(\widehat{\varvec{W}} = \varvec{Q}\varvec{P}\varvec{S}\) for some permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\). Actually, if \(\widehat{\varvec{W}}\) maximize (27), then by (C76),
Since \({\widehat{\varvec{W}}}^T\varvec{Q}\) is a orthogonal matrix, then by (A19) and (12),
Then, (C81) becomes
By Proposition 15, the above equation indicates that \({\widehat{\varvec{W}}}^Ty\) has freely independent components. As we assume that \(x\) has at most one semicircular element, Theorem 5 implies that \({\widehat{\varvec{W}}} = \varvec{Q}\varvec{P}\varvec{S}\) for some permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\). This completes the proof.
1.5 Proof of Theorem 5
Definition 19
We denote all matrices of size \(s\times s\) which are product of a permutation matrix and a signature matrix by
Let \(\varvec{O}:= \varvec{O}(s)\) denote the sets of orthogonal matrix of size \(s\times s\). Note that any permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\) belong to \(\varvec{O}\). Furthermore, it can be checked that \(\varvec{O}_{ps}\) is a subgroup of \(\varvec{O}\).
We first prove two propositions of \(\varvec{O}_{{ps}}\). An orthogonal matrix must contain at least one nonzero entry in each column (and each row). On the other hand, the matrix belonging to \(\varvec{O}_{ps}\) has exactly one nonzero entry in each column (and each row). The following proposition states that this characterizes the matrices contained in \(\varvec{O}_{ps}\).
Proposition 17
Fix a positive integer \(s\ge 1\), \(\varvec{Q}\in \varvec{O}(s)\) has exactly one nonzero entry in each column if and only if \(\varvec{Q}\in \varvec{O}_{ps}(s)\).
Proof
If \(\varvec{Q}\in \varvec{O}_{{ps}}\), then \(\varvec{Q}= \varvec{P}\varvec{S}\) for some permutation matrix \(\varvec{P}\) and signature matrix \(\varvec{S}\). Thus, it follows that \(\varvec{Q}\) has exactly one nonzero entry in each column.
For the other direction, consider an arbitrary \(\varvec{Q}\in \varvec{O}(s)\) with exactly one nonzero entry in each column. Note that \(\varvec{Q}\) has totally n nonzero entries. As \(\varvec{Q}\) is non-singular, it also has exactly one nonzero entry in each row. As a result, there exists a permutation matrix \(\varvec{P}\) such that \(\varvec{P}^T \varvec{Q}\) is a diagonal matrix.
On the other hand, note that \((\varvec{P}^T\varvec{Q})^T(\varvec{P}^T\varvec{Q}) = \varvec{Q}^T\varvec{Q}= I\), \(\varvec{P}^T\varvec{Q}\) is a diagonal orthogonal matrix. Thus, the diagonal entries of \( \varvec{P}^T\varvec{Q}\) are either \(+1\) or \(-1\). Then, there exists a signature matrix \(\varvec{S}\) such that \( \varvec{P}^T \varvec{Q}= \varvec{S}\). That is equivalent to \(\varvec{Q}= \varvec{P}\varvec{S}\in \varvec{O}_{{ps}}\). This completes the proof. \(\square \)
By the above proposition, for any \(\varvec{Q}\in \varvec{O}\backslash \varvec{O}_{{ps}}\), there must be a column with more than one nonzero entry. For the later purpose, we prove a stronger result.
Proposition 18
Given any \(s\ge 2\), consider matrix \(\varvec{Q}= (q_{ij})_{i,j= 1}^s \in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)\). Then, there is a \(2\times 2\) submatrix of \(\varvec{Q}\) with all four entries nonzero. Explicitly, there exist \(i,j,k,\ell \in \{1,\ldots ,n\}\) (\(i \ne j\), \(k \ne \ell \)) such that all \(q_{ik}\), \(q_{i\ell }\), \(q_{jk}\), and \(q_{j\ell }\) are nonzero.
Proof
We first make the following observation. Two orthogonal vectors either share 0 or more than 2 positions for nonzero entries. Actually, consider any \(u = (u_1,\ldots ,u_s)^T\) and \(v = (v_1,\ldots ,v_s)^T\) such that u and v are orthogonal. Assume that there is exactly one index k such that both \(u_k\) and \(v_k\) are nonzero, then
This contradicts the fact that \(u^T v = 0\).
Now, we are ready to prove the proposition. Denote ith columns of \(\varvec{Q}\) by \(\varvec{Q}_i\), for \(i = 1,\ldots ,s\). Note that the \(\{\varvec{Q}_i\}_{i = 1}^s\) form an orthonormal basis. As \(\varvec{Q}\in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)\), there must be a column containing more than two nonzero entries. Without lose of generality, assume it is \(\varvec{Q}_1\). If all \(\varvec{Q}_2,\ldots ,\varvec{Q}_{s}\) share 0 positions of nonzero entry with \(\varvec{Q}_1\), then \(\{\varvec{Q}_i\}_{i= 2}^s\) span a linear space of dimension less than \(n - 2\). This contradicts with the fact that \(\{\varvec{Q}_i\}_{i= 2}^s\) span a linear space of dimension \(s- 1\). Thus, there must exist a \(j \in \{2,\ldots ,s\}\) such that \(\varvec{Q}_1\) and \(\varvec{Q}_j\) share at least one positions for nonzero entry. By the observation we made in the last paragraph, \(\varvec{Q}_1\) and \(\varvec{Q}_j\) then share at least two positions of nonzero entry. This completes the proof. \(\square \)
Corollary 19
Fix a positive integer \(n \ge 2\) and a \(\varvec{Q}\in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)\). There exist indexes \(i,j,k,\ell \in [1,..,s]\) (\(i \ne j\) and \(k \ne \ell \)), such that for any \(m \ge 3\),
In particular, if \(s= 2\), then for any \(m \ge 3\),
Theorem 5 can be obtained as a corollary of the following lemma.
Lemma 20
Fix a \(s\ge 2\), let \(\varvec{x}= (x_1,x_2, \ldots , x_s)^T\) and \(\varvec{y} = (y_1,y_2, \ldots , y_s)^T\) be two random vectors such that \(\varvec{y} = \varvec{Q}\varvec{x}\), where \(\varvec{Q}\in \varvec{O}(s)\). Assume \((x_i)_{i = 1}^s\) are freely independent. Now if \((y_i)_{i = 1}^s\) are freely independent, then at least one of the following happens:
-
(a)
\(\varvec{Q}\in \varvec{O}_{ps}(s)\).
-
(b)
At least two components of \(\varvec{x}\) are semicircular (or Poisson in the non-self-adjoint setting).
We first show that Theorem 5 follows from Lemma 20.
Proof of Theorem 5
As \(\varvec{x}\) and \(\varvec{y}\) satisfy (12), \(\varvec{x}= (\varvec{Q}^T \varvec{W}) \varvec{W}^T \varvec{y}\). Now, by assumption, \(\varvec{x}\) and \(\varvec{W}^T \varvec{y}\) have free components. Then, according to Lemma 20, there are two possibilities: (a) \(\varvec{Q}^T \varvec{W}\in \varvec{O}_{ps}\) or (b) \(\varvec{x}\) has at least two semicircular components. As (b) has been excluded, (a) happens. That is, there exist a permutation matrix \(\varvec{P}\) and a signature matrix \(\varvec{S}\) such that \(\varvec{Q}^T \varvec{W}= \varvec{P}\varvec{S}\), i.e., \(\varvec{W}= \varvec{Q}\varvec{P}\varvec{S}\).
Proof of Lemma 20
We first consider self-adjoint setting. If \(\varvec{Q}\in \varvec{O}_{ps}(s)\), then the components of \(\varvec{y}\) are exactly the components of \(\varvec{x}\) with different order and possible sign change. It is not surprising that \(y_i\) are freely independent. In the following, we assume that \(\varvec{Q}\in \varvec{O}(s) \backslash \varvec{O}_{ps}(s)\), and \(\varvec{x}, \varvec{y}\) has free components, the goal is to show that \(\varvec{x}\) has at least two semicircular elements.
We start with the case where \(n = 2\). Then, it is desired to show \(x_1\) and \(x_2\) are both semicircular elements. Recall Definition 12 for the semicircular element; it is enough to show \(\kappa _m(x_i) \equiv 0\) for all \(m\ge 3\) and \(i = 1,2\).
Fix \(m \ge 3\), we consider the mixed cumulants of \(y_1,y_2\) of the specific form \(\kappa _m(y_1,\ldots ,y_1, y_2, y_p)\) for \(p = 1,2\). As \(y_1,y_2\) are free–independent, these cumulants satisfy the condition of Theorem 12 by noting that \(i(1) = 1 \ne i(m - 1) = 2\). Thus, these mixed cumulants vanish, i.e.,
On the other hand, as \((y_i)_{i = 1}^2\) are linear combinations of \((x_i)_{i = 1}^2\), using multilinearity of \(\kappa _m(\cdot )\) (see (A10)), we will express \(\kappa _m(y_1,\ldots ,y_1, y_2, y_p)\) as linear combinations of \(\kappa _m(x_i)\) (recall the notation (A13)). Adapt the notation \(\varvec{Q}= (q_{ij})_{i,j= 1}^2\), then \(y_i = \sum _{j = 1}^2 q_{ij}x_j\). We first derive the expression for \(\kappa _m(y_1,\ldots ,y_1, y_2, y_1)\) (i.e., \(p = 1\)),
Apply (A10) to the right-hand side of (C89) to expand the first variable,
Again apply (A10) for the second variable, we obtain that
Repeating applying (A10) for the rest \(n-2\) variables, we arrive at
There are in total \(2^s\) terms in the above summation. Note that \(x_1\) and \(x_2\) are free independent. Then, by Theorem 12, most of these cumulants vanish. For example, \(\kappa _{s}(x_1,x_2,\ldots x_2) = 0\) where \(j_1 = 1 \ne j_2 = 2\). Consequently, there are only two terms corresponding to the choices of indexes \(j_1 = j_2 = \cdots = j_s= 1\) and \(j_1 = j_2 = \cdots = j_s= 2\) survive. Thus, using the notation (A13), (C92) can be written as
Combining (C93) with (C88), we obtain that
Repeating (C88) to (C94) for \(\kappa _m(y_1,\ldots ,y_1, y_2, y_2)\) (i.e., \(p = 2\)), we find that
Writing (C94) and (C95) in the matrix form, we obtain that
We actually get a linear equation system for \(\kappa _m(x_1)\) and \(\kappa _m(x_2)\). Note that \(\varvec{Q}= (q_{ij})_{i = 1}^2\) is an orthogonal matrix and thus is invertible. Thus, (C96) is equivalent to
Now, as \( \varvec{Q}\in \varvec{O}(2) \backslash \varvec{O}_{ps}(2)\), then by (C87), the above linear equation system has a unique solution, \(\kappa _m(x_i) = 0\), \(i = 1,2\). Note that this holds for all \(m \ge 3\). Then, by Definition 12, we conclude that \(x_i\) for \(i = 1, 2\) are semicircular elements. This concludes the proof for \(n = 2\).
For general \(n \ge 2\), as \(\varvec{Q}\in \varvec{O}\backslash \varvec{O}_{ps}\), by Corollary 19, there exist \(i,j,k,\ell \) (\(i\ne j\) and \(k \ne \ell \)) such that (C86) holds. We will show that \(x_k,x_\ell \) are semicircular elements. For fixed \(m \ge 3\), we consider the vanishing mixed cumulants
Use relation \(y_i = \sum _{j = 1}^sq_{ij}x_j\) and multilinearity of \(\kappa _m\), we can repeat (C88) to (C94) for each \(\kappa _m(y_i,\ldots ,y_i, y_j, y_p)\) and get
Write the above equations in the matrix form:
Again, \(\varvec{Q}= (q_{ij})_{i = 1}^s\) is invertible and \(q_{ik}^{m-1}q_{jk} \ne 0\) (see (C86)), thus \(\kappa _m(x_k) = 0\). For the same reason, \(\kappa _m(x_\ell ) = 0\). As these hold for all \(m \ge 3\), \(x_k,x_\ell \) are semicircular elements.
For non-self-adjoint setting, the proof is exactly the same as the above with Theorem 12 replaced by Theorem 16 and Definition 12 replace by Definition 16. \(\square \)
Proof of Theorem 11
Lemma 21
Given \(\varvec{Y} = [\varvec{Y}_1,\ldots ,\varvec{Y}_s]^T \in {\mathbb {C}}^{Ns\times N}\) with \(\varvec{Y}_i \in {\mathbb {C}}^{N \times N}\) Hermitian matrices and a vector \(\varvec{w} = [w_1,\ldots ,w_s] \in {\mathbb {R}}^s\), for
we recall the empirical free kurtosis
Then, we have that
Proof
As \({{\,\mathrm{\mathrm{Tr}}\,}}(\cdot )\) is a linear function of entries of input matrix,
Note that
thus, for any \(k = 1, \ldots , s,\)
Therefore,
Using \({{\,\mathrm{\mathrm{Tr}}\,}}(AB) = {{\,\mathrm{\mathrm{Tr}}\,}}(BA)\), we find that
and thus
Repeating (D104) to (D105) for \({{\,\mathrm{\mathrm{Tr}}\,}}\left( \frac{\partial \varvec{X}^2}{\partial w_k}\right) \), we get that
Plugging (D105) and (D106) into (D102), we obtain (D101).
\(\square \)
Lemma 22
Given \(\varvec{Y} = [\varvec{Y}_1,\ldots ,\varvec{Y}_s]^T \in {\mathbb {C}}^{Ns\times N}\) with \(\varvec{Y}_i \in {\mathbb {C}}^{N \times N}\) are Hermitian matrices and a vector \(w = [w_1,\ldots ,w_s] \in {\mathbb {R}}^s\). For
with eigenvalues \(\lambda _i\) and corresponding eigenvectors \(v_i\), we recall the empirical free entropy
Then, we have that
with \(\partial _{w_k}\lambda _i = v_i^T \varvec{Y}_k v_i\).
Proof
Equation (D107) is obtained by directly taking derivative. The fact that \(\partial _{w_k}\lambda _i = v_i^T \varvec{Y}_k v_i\) follows from (D103) and perturbation theory of eigenvalues [47]. \(\square \)
Proof of Theorem 11
We first prove the result for self-adjoint FCF based on free kurtosis. Set \(\varvec{X} = [\varvec{X_1},\ldots ,\varvec{X_s}] = \widetilde{\varvec{W}}^T \varvec{Y}\). Recall Definition 3 and (45), for \(\widehat{F}(\cdot ) = -\left|\widehat{\kappa }_4(\cdot ) \right|\), we have that
As only \(X_\ell \) explicitly depends on \(\varvec{W}_{k\ell }\),
Further notice that \(\varvec{X}_\ell = \widetilde{\varvec{w}}_\ell ^T \varvec{Y}\) with \(\varvec{w}_\ell = [\varvec{W}_{1\ell }, \ldots , \varvec{W}_{s\ell }]^T\), thus
where we used Lemma 21 for the last equality. The proof is then completed by plugging (D109) into (D108). The result for self-adjoint FCF based on free entropy can be proved in a similar manner by repeating the process from (D108) to (D109), where we replace \(-|{\hat{\kappa }_4(\cdot )}|\) with \(\chi (\cdot )\) and Lemma 21 with Lemma 22.
We omit the proofs for the rectangular FCF column since these are straightforward modifications of the proofs of Lemmas 21 and 22 and proofs of the self-adjoint FCF case. \(\square \)
Matrix Embeddings
One restriction of ICA is that it only operates on vector-valued components (see Sect. F). In contrast, FCF applies to data whose matrix-valued components that can be of arbitrary dimensions. Thus, one can embed components into new dimensions to potentially obtain a better performance with FCA. In this section, we list several matrix embedding algorithms.
For \(\varvec{Z} = [\varvec{Z}_1, \ldots , \varvec{Z}_N]^T\) where the \(\varvec{Z}_i\) are rectangular matrices, Algorithm 3 embeds \(Z_i\) in the upper diagonal parts of a \(N'\times N'\) self-adjoint matrices. In practice, the target dimension \(N'\) should be picked such that there is no loss of information while also avoiding too many artificial zeros. To embed \(\varvec{Z}_i\) into rectangular matrices of other dimensions, we introduce Algorithm 5. Putting the above embeddings and appropriate FCF algorithms together, we get Algorithm 4 and Algorithm 6. One easily state the analogs of the above algorithms for data containing self-adjoint matrices; for the sake of brevity, we omit them here.
If the \(\varvec{Z}_i\) are vectors, one can use the STFT to embed them into matrices. The STFT matrices of a vector are the alignment of the discrete Fourier transform of a sliding window. The outcome is a complex rectangular matrix to which we can apply rectangular FCFs. This is summarized in Algorithm 7.
Independent Component Factorization
We would like to numerically compare FCA with ICA, and begin by providing a summary of the ICA algorithm. Given data whose components are rectangular matrices, we first vectorize them and then apply ICA. We once again perform a whitening process (see Algorithm 8) and solve an optimization problem.
Here, we present Algorithm 9 whose optimization problem is based on the empirical (scalar) kurtosis \(\widehat{c}_4(\cdot )\) or the empirical (scalar) negentropy \(\widehat{{\mathcal {E}}}(\cdot )\). We call them kurtosis-based ICF and entropy-based ICF, respectively. Given a centered and whitened vector \(x \in {\mathbb {R}}^T\), its empirical kurtosis \({\widehat{c}}_4(x)\) can be expressed as
The negentropy \({{\mathcal {E}}}(x)\) is defined as
where h(x) denotes the entropy of random variable x (see (A6)) and \(g_x\) denotes the Gaussian random variable with the same mean and variance as x. It is used as a measure of distance to normality. The empirical negentropy \(\widehat{\mathcal {E}}(x)\) involves the empirical distribution of x, which is computationally difficult. Fortunately, it can also be expressed as a infinite sum of cumulants. Thus, in practice, \(\widehat{\mathcal {E}}(x)\) can be approximated by a finite truncation of that sum [18, Theorem 14 and (3.2) pp. 295].
In the simulation of this paper, we adapt the following approximation (see Section 5 of [40]):
Rights and permissions
About this article
Cite this article
Nadakuditi, R.R., Wu, H. Free Component Analysis: Theory, Algorithms and Applications. Found Comput Math 23, 973–1042 (2023). https://doi.org/10.1007/s10208-022-09564-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-022-09564-w