Abstract
Multivariate density estimation is an important problem that is frequently encountered in statistical learning and signal processing. One of the most popular techniques is Parzen windowing, also referred to as kernel density estimation. Gaussianization is a procedure that allows one to estimate multivariate densities efficiently from the marginal densities of the individual random variables. In this paper, we present an optimal density estimation scheme that combines the desirable properties of Parzen windowing and Gaussianization, using minimum Kullback–Leibler divergence as the optimality criterion for selecting the kernel size in the Parzen windowing step. The utility of the estimate is illustrated in classifier design, independent components analysis, and Prices’ theorem.
Similar content being viewed by others
References
R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification,” 2nd ed., Wiley, New York, 2001.
S. Theodoridis and K. Koutroumbas, “Pattern Recognition,” Academic, New York, 2003.
L. Devroye and G. Lugosi, “Combinatorial Methods in Density Estimation,” Springer, Berlin Heidelberg New York, 2001.
A. Papoulis, “Probability, Random Variables, Stochastic Processes,” McGraw-Hill, New York, 1991.
T. M. Cover and J. A. Thomas, “Elements of Information Theory,” Wiley, New York, 1991.
E. Parzen, “On Estimation of a Probability Density Function and Mode,” in Time Series Analysis Papers, Holden-Day, CA, 1967.
R. Jenssen, D. Erdogmus, J. C. Principe, and T. Eltoft, “Towards a Unification of Information Theoretic Learning and Kernel Methods,” in Proceedings of MLSP’04, Sao Luis, Brazil, 2004.
K. E. Hild II, D. Erdogmus, and J. C. Principe, “Blind Source Separation Using Renyi’s Mutual Information,” in IEEE Signal Processing Letters, no. 8, 2001, pp. 174–176.
K. Torkkola, “Visualizing Class Structure in Data Using Mutual Information,” in Proceedings of NNSP’00, Sydney, Australia, 2000, pp. 376–385.
D. Erdogmus, “Information Theoretic Learning: Renyi’s Entropy and its Applications to Adaptive System Training,” Ph.D. Dissertation, University of Florida, Gainesville, Florida, 2002.
M. M. Van Hulle, “Kernel-Based Topographic Map Formation Achieved with an Information–Theoretic Approach,” Neural Netw., vol. 15, 2002, pp. 1029–1039.
N. N. Schraudolph, “Gradient-Based Manipulation of Nonparametric Entropy Estimates,” IEEE Trans. Neural Netw., vol. 15, no. 4, 2004, pp. 828–837.
K. Fukunaga, “Statistical Pattern Recognition,” Academic, New York, 1990.
B. W. Silverman, “Density Estimation for Statistics and Data Analysis,” Chapman & Hall, London, 1986.
M. C. Jones, J. S. Marron, and S. J. Sheather, “A Brief Survey of Bandwidth Selection for Density Estimation,” J. Am. Stat. Assoc., vol. 87, 1996, pp. 227–233.
R. P. W. Duin, “On the Choice of the Smoothing Parameters for Parzen Estimators of Probability Density Functions,” IEEE Trans. Comput., vol. 25, no. 11, 1976, pp. 1175–1179.
S. Amari, “Differential–Geometrical Methods in Statistics,” Springer, Berlin Heidelberg New York, 1985.
P. Viola, N. Schraudolph, and T. Sejnowski, “Empirical Entropy Manipulation for Real-World Problems,” in Proceedings of NIPS’95, 1996, pp. 851–857.
T. Bell and T. Sejnowski, “An Information–Maximization Approach to Blind Separation and Blind Deconvolution,” Neural Comput., vol. 7, 1995, pp. 1129–1159.
A. Hyvarinen, J. Karhunen, and E. Oja, “Independent Component Analysis,” Wiley, New York, 2001.
A. Cichocki and S. I. Amari, “Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications,” Wiley, New York, 2002.
T. W. Lee, “Independent Component Analysis: Theory and Applications,” Kluwer, New York, 1998.
A. Hyvarinen, “Survey on Independent Component Analysis,” Neural Comput. Surv., vol. 2, 1999, pp. 94–128.
C. Jutten and J. Karhunen, “Advances in Nonlinear Blind Source Separation,” in Proceedings of ICA’03, Nara, Japan, 2003, pp. 245–256.
J. Karhunen and J. Joutsensalo, “Representation and Separation of Signals Using Nonlinear PCA Type Learning,” Neural Netw., vol. 7, 1994, pp. 113–127.
D. Erdogmus, Y. N. Rao, and J. C. Principe, “Gaussianizing Transformations for ICA,” in Proceedings of ICA’04, Granada, Spain, 2004, pp. 26–32.
D. Erdogmus, Y. N. Rao, and J. C. Principe, “Nonlinear Independent Component Analysis by Homomorphic Transformation of the Mixtures,” in Proceedings of IJCNN’04, Budapest, Hungary, 2004, pp. 47–52.
A. Hyvarinen and P. Pajunen, “Nonlinear Independent Component Analysis: Existence and Uniqueness Results,” Neural Netw., vol. 12, no. 3, 1999, pp. 429–439.
L. B. Almeida, “MISEP—Linear and Nonlinear ICA Based on Mutual Information,” J. Mach. Learn. Res., vol. 4, 2003, pp. 1297–1318.
H. Valpola, E. Oja, A. Ilin, A. Honkela, and J. Karhunen, “Nonlinear Blind Source Separation by Variational Bayesian Learning,” IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. 86, no. 3, 2003, pp. 532–541.
L. Parra, “Symplectic Nonlinear Independent Component Analysis,” in Proceedings of NIPS’96, 1997, pp. 437–443.
Y. Tan and J. Wang, “Nonlinear Blind Source Separation Using Higher Order Statistics and a Genetic Algorithm,” IEEE Trans. Evol. Comput., vol. 5, no. 6, 2001.
A. Ziehe, M. Kawanabe, S. Harmeling, and K. R. Muller, “Blind Separation of Post-Nonlinear Mixtures Using Linearizing Transformations and Temporal Decorrelation,” J. Mach. Learn. Res., vol. 4, 2003, pp. 1319–1338.
S. Harmeling, A. Ziehe, M. Kawanabem, and K. R. Müller, “Kernel-Based Nonlinear Blind Source Separation,” Neural Comput., vol. 15, 2003, pp. 1089–1124.
S. Fiori, “A Theory for Learning by Weight Flow on Stiefel–Grassman Manifold,” Neural Comput., vol. 13, 2001, pp. 1625–1647.
L. Xu, “Least Mean Square Error Reconstruction Principle for Self-Organizing Neural Nets,” Neural Netw., vol. 6, 1993, pp. 627–648.
R. Price, “A Useful Theorem for Nonlinear Devices Having Gaussian Inputs,” IRE Trans. Inf. Theory, vol. 4, 1958, pp. 69–72.
E. L. McMahon, “An Extension of Price’s Theorem,” IEEE Trans. Inf. Theory, vol. 10, 1964, p. 168.
R. F. Pawula, “A Modified Version of Price’s Theorem,” IEEE Trans. Inf. Theory, vol. 13, no. 2, 1967, pp. 285–288.
A. Papoulis, “Comment on ‘An Extension of Price’s Theorem’,” IEEE Trans. Inf. Theory, vol. 11, 1965, p. 154.
J. L. Brown, “A Generalized Form of Price’s Theorem and Its Converse,” IEEE Trans. Inf. Theory, vol. 13, no. 1, 1967, pp. 27–30.
A. van den Bos, “Price’s Theorem for Complex Variates,” IEEE Trans. Inf. Theory, vol. 42, no. 1, 1996, pp. 286–287.
D. McGraw and J. Wagner, “Elliptically Symmetric Distributions,” IEEE Trans. Inf. Theory, vol. 14, no. 1, 1968, pp. 110–120.
S. S. Chen and R. A. Gopinath, “Gausianization,” in Proceedings of NIPS, 2000.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Erdogmus, D., Jenssen, R., Rao, Y.N. et al. Gaussianization: An Efficient Multivariate Density Estimation Technique for Statistical Signal Processing. J VLSI Sign Process Syst Sign Image Video Technol 45, 67–83 (2006). https://doi.org/10.1007/s11265-006-9772-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-006-9772-7