Skip to main content
Log in

Maximally Informative Feature and Sensor Selection in Pattern Recognition Using Local and Global Independent Component Analysis

  • Published:
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology Aims and scope Submit manuscript

Abstract

In pattern recognition, a suitable criterion for feature selection is the mutual information (MI) between feature vectors and class labels. Estimating MI in high dimensional feature spaces is problematic in terms of computation load and accuracy. We propose an independent component analysis based MI estimation (ICA-MI) methodology for feature selection. This simplifies the high dimensional MI estimation problem into multiple one-dimensional MI estimation problems. Nonlinear ICA transformation is achieved using piecewise local linear approximation on partitions in the feature space, which allows the exploitation of the additivity property of entropy and the simplicity of linear ICA algorithms. Number of partitions controls the tradeoff between more accurate approximation of the nonlinear data topology and small-sample statistical variations in estimation. We test the ICA-MI feature selection framework on synthetic, UCI repository, and EEG activity classification problems. Experiments demonstrate, as expected, that the selection of the number of partitions for local linear ICA is highly problem dependent and must be carried out properly through cross validation. When this is done properly, the proposed ICA-MI feature selection framework yields feature ranking results that are comparable to the optimal probability of error based feature ranking and selection strategy at a much lower computational load.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. E. Oja, Subspace Methods of Pattern Recognition, Wiley, New York, 1983.

    Google Scholar 

  2. P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach, Prentice Hall, London, 1982.

    MATH  Google Scholar 

  3. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., Academic, New York, 1990.

    MATH  Google Scholar 

  4. R. Everson and S. Roberts, “Independent Component Analysis: A Flexible Nonlinearity and Decorrelating Manifold Approach,” Neural Comput., vol. 11, no. 8, 2003, pp. 1957–1983.

    Article  Google Scholar 

  5. A. Hyvärinen, E. Oja, P. Hoyer, and J. Hurri, “Image Feature Extraction by Sparse Coding and Independent Component Analysis,” Proc. of ICPR’98, 1998, pp. 1268–1273.

  6. K. Torkkola, “Feature Extraction by Non-parametric Mutual Information Maximization,” J. Mach. Learn. Res., vol. 3, 2003, pp. 1415–1438.

    Article  MATH  Google Scholar 

  7. T. Lan, D. Erdogmus, A. Adami, and M. Pavel, “Feature Selection by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classification,” Proc. of IJCNN’05, Montreal, 2005, pp. 3011–3016.

  8. W. Duch, T. Wieczorek, J. Biesiada, and M. Blachnik, “Comparison of Feature Ranking Methods Based on Information Entropy,” Proc. of International Joint Conference on Neural Networks (IJCNN), IEEE Press, Budapest, 2004, pp. 1415–1420.

  9. R. M. Fano, Transmission of Information: A Statistical Theory of Communications, Wiley, New York, 1961.

    Google Scholar 

  10. M. E. Hellman and J. Raviv, “Probability of Error, Equivocation and the Chernoff Bound,” IEEE Trans. Inf. Theory, vol. 16, 1970, pp. 368–372.

    Article  MATH  Google Scholar 

  11. R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Training,” IEEE Trans. Neural Netw., vol. 5, no. 4, 1994, pp. 537–550.

    Article  Google Scholar 

  12. K. Kira and L. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” in Proc. of the Tenth National Conference on Artificial Intelligence (AAAI-92), Menlo Park, 1992, pp. 129–134.

  13. G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” in Proc. of the 11th International Conference on Machine Learning, San Mateo, 1994, pp. 121–129.

  14. I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res. (Special Issue on Variable and Feature Selection), 2003.

  15. J. Karhunen, S. Malaroiu, and M. Ilmoniemi, “Local linear independent component analysis based on clustering,” Int. J. Neural Syst., vol. 10, no. 6, 2000, pp. 439–451.

    Google Scholar 

  16. M. Szummer and T. Jaakkola, “Information Regularization with Partially Labeled Data,” Advances in NIPS 15, 2002.

  17. S. Haykin (Ed.), Unsupervised Adaptive Filtering, Vol. I: Blind Source Separation, 2000, Wiley, York.

  18. A. Hyvarinen and P. Pajunen, “Nonlinear Independent Component Analysis: Existence and Uniqueness Results,” Neural Netw., vol. 12, no. 3, 1999, pp. 429–439.

    Article  Google Scholar 

  19. S. Haykin, Neural Networks—A Comprehensive Foundation, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1998.

    Google Scholar 

  20. O.Vasicek, “A Test for Normality Based on Sample Entropy,” J. R. Stat. Soc., Ser. B, vol. 38, no. 1, 1976, pp. 54–59.

    MATH  Google Scholar 

  21. K. E. Hild II, D. Erdogmus, and J. C. Principe, “Blind Source Separation Using Renyi’s Mutual Information,” IEEE Signal Process. Lett., vol. 8, no. 6, 2001, pp. 174–176.

    Article  Google Scholar 

  22. A. Hyvärinen and E. Oja, “A Fast Fixed Point Algorithm for Independent Component Analysis,” Neural Comput., vol. 9, no. 7, 1997, pp. 1483–1492.

    Article  Google Scholar 

  23. L. Parra and P. Sajda, “Blind Source Separation via Generalized Eigenvalue Decomposition,” J. Mach. Learn. Res., vol. 4, 2003, pp. 1261–1269.

    Article  Google Scholar 

  24. http://www.ics.uci.edu/~mlearn/MLRepository.html, UCI Machine Learning Repository.

  25. http://ida.first.fraunhofer.de/projects/bci/competition_iii/#data_set_v, BCI Competition III.

  26. C. C. Chang and C. C. Lin, LIBSVM: A Library for Support Vector Machines, 2001, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  27. S. Mathan, N. Mazaeva, S. Whitlow, A. Adami, D. Erdogmus, T. Lan, and M. Pavel, “Sensor-based Cognitive State Assessment in a Mobile Environment,” Proc. of AUGCOG’05 (jointly with HCII’05), 2005, Las Vegas, Nevada.

  28. B. Widrow and M. E. Hoff, “Adaptive Switching Circuits,” in IRE WESCON Convention Record, 1960, pp. 96–104.

  29. P. Welch, “The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging Over Short Modified Periodograms,” IEEE Trans. Audio Electroacoust., vol. 15, no. 2, 1967, pp. 70–73.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Lan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lan, T., Erdogmus, D. Maximally Informative Feature and Sensor Selection in Pattern Recognition Using Local and Global Independent Component Analysis. J VLSI Sign Process Syst Sign Im 48, 39–52 (2007). https://doi.org/10.1007/s11265-006-0026-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-006-0026-5

Keywords

Navigation