Abstract
In pattern recognition, a suitable criterion for feature selection is the mutual information (MI) between feature vectors and class labels. Estimating MI in high dimensional feature spaces is problematic in terms of computation load and accuracy. We propose an independent component analysis based MI estimation (ICA-MI) methodology for feature selection. This simplifies the high dimensional MI estimation problem into multiple one-dimensional MI estimation problems. Nonlinear ICA transformation is achieved using piecewise local linear approximation on partitions in the feature space, which allows the exploitation of the additivity property of entropy and the simplicity of linear ICA algorithms. Number of partitions controls the tradeoff between more accurate approximation of the nonlinear data topology and small-sample statistical variations in estimation. We test the ICA-MI feature selection framework on synthetic, UCI repository, and EEG activity classification problems. Experiments demonstrate, as expected, that the selection of the number of partitions for local linear ICA is highly problem dependent and must be carried out properly through cross validation. When this is done properly, the proposed ICA-MI feature selection framework yields feature ranking results that are comparable to the optimal probability of error based feature ranking and selection strategy at a much lower computational load.
Similar content being viewed by others
References
E. Oja, Subspace Methods of Pattern Recognition, Wiley, New York, 1983.
P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach, Prentice Hall, London, 1982.
K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., Academic, New York, 1990.
R. Everson and S. Roberts, “Independent Component Analysis: A Flexible Nonlinearity and Decorrelating Manifold Approach,” Neural Comput., vol. 11, no. 8, 2003, pp. 1957–1983.
A. Hyvärinen, E. Oja, P. Hoyer, and J. Hurri, “Image Feature Extraction by Sparse Coding and Independent Component Analysis,” Proc. of ICPR’98, 1998, pp. 1268–1273.
K. Torkkola, “Feature Extraction by Non-parametric Mutual Information Maximization,” J. Mach. Learn. Res., vol. 3, 2003, pp. 1415–1438.
T. Lan, D. Erdogmus, A. Adami, and M. Pavel, “Feature Selection by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classification,” Proc. of IJCNN’05, Montreal, 2005, pp. 3011–3016.
W. Duch, T. Wieczorek, J. Biesiada, and M. Blachnik, “Comparison of Feature Ranking Methods Based on Information Entropy,” Proc. of International Joint Conference on Neural Networks (IJCNN), IEEE Press, Budapest, 2004, pp. 1415–1420.
R. M. Fano, Transmission of Information: A Statistical Theory of Communications, Wiley, New York, 1961.
M. E. Hellman and J. Raviv, “Probability of Error, Equivocation and the Chernoff Bound,” IEEE Trans. Inf. Theory, vol. 16, 1970, pp. 368–372.
R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Training,” IEEE Trans. Neural Netw., vol. 5, no. 4, 1994, pp. 537–550.
K. Kira and L. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” in Proc. of the Tenth National Conference on Artificial Intelligence (AAAI-92), Menlo Park, 1992, pp. 129–134.
G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” in Proc. of the 11th International Conference on Machine Learning, San Mateo, 1994, pp. 121–129.
I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res. (Special Issue on Variable and Feature Selection), 2003.
J. Karhunen, S. Malaroiu, and M. Ilmoniemi, “Local linear independent component analysis based on clustering,” Int. J. Neural Syst., vol. 10, no. 6, 2000, pp. 439–451.
M. Szummer and T. Jaakkola, “Information Regularization with Partially Labeled Data,” Advances in NIPS 15, 2002.
S. Haykin (Ed.), Unsupervised Adaptive Filtering, Vol. I: Blind Source Separation, 2000, Wiley, York.
A. Hyvarinen and P. Pajunen, “Nonlinear Independent Component Analysis: Existence and Uniqueness Results,” Neural Netw., vol. 12, no. 3, 1999, pp. 429–439.
S. Haykin, Neural Networks—A Comprehensive Foundation, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1998.
O.Vasicek, “A Test for Normality Based on Sample Entropy,” J. R. Stat. Soc., Ser. B, vol. 38, no. 1, 1976, pp. 54–59.
K. E. Hild II, D. Erdogmus, and J. C. Principe, “Blind Source Separation Using Renyi’s Mutual Information,” IEEE Signal Process. Lett., vol. 8, no. 6, 2001, pp. 174–176.
A. Hyvärinen and E. Oja, “A Fast Fixed Point Algorithm for Independent Component Analysis,” Neural Comput., vol. 9, no. 7, 1997, pp. 1483–1492.
L. Parra and P. Sajda, “Blind Source Separation via Generalized Eigenvalue Decomposition,” J. Mach. Learn. Res., vol. 4, 2003, pp. 1261–1269.
http://www.ics.uci.edu/~mlearn/MLRepository.html, UCI Machine Learning Repository.
http://ida.first.fraunhofer.de/projects/bci/competition_iii/#data_set_v, BCI Competition III.
C. C. Chang and C. C. Lin, LIBSVM: A Library for Support Vector Machines, 2001, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
S. Mathan, N. Mazaeva, S. Whitlow, A. Adami, D. Erdogmus, T. Lan, and M. Pavel, “Sensor-based Cognitive State Assessment in a Mobile Environment,” Proc. of AUGCOG’05 (jointly with HCII’05), 2005, Las Vegas, Nevada.
B. Widrow and M. E. Hoff, “Adaptive Switching Circuits,” in IRE WESCON Convention Record, 1960, pp. 96–104.
P. Welch, “The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging Over Short Modified Periodograms,” IEEE Trans. Audio Electroacoust., vol. 15, no. 2, 1967, pp. 70–73.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lan, T., Erdogmus, D. Maximally Informative Feature and Sensor Selection in Pattern Recognition Using Local and Global Independent Component Analysis. J VLSI Sign Process Syst Sign Im 48, 39–52 (2007). https://doi.org/10.1007/s11265-006-0026-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-006-0026-5