Skip to main content
Log in

Extraction of independent discriminant features for data with asymmetric distribution

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Standard unsupervised linear feature extraction methods find orthonormal (PCA) or statistically independent (ICA) latent variables that are good for data representation. These representative features may not be optimal for the classification tasks, thus requiring a search of linear projections that can give a good discriminative model. A semi-supervised linear feature extraction method, namely dICA, had recently been proposed which jointly maximizes the Fisher linear discriminant (FLD) and negentropy of the extracted features [Dhir and Lee in Discriminant independent component analysis. In: Proceedings of the international conference intelligent data engineering and automated learning, LNCS 5788:219–225 (Full paper is submitted to IEEE Trans. NN) 2009]. Motivated by the independence and unit covariance of the extracted dICA features, maximizing the determinant of between-class scatter of the features matrix is theoretically the same as the maximization of FLD. This also reduces the computational complexity of the algorithm. In this paper, we concentrate on text databases that follow inherent exponential distribution. Approximation and the maximization of negentropy for data with asymmetric distribution is discussed. Experiments on the text categorization problem show improvements in classification performance and data reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hinton GE, Sejnowski TJ (1999) Unsupervised learning: foundations of neural computation. MIT Press, MIT press publishers, Cambridge

    Google Scholar 

  2. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognitive Neurosci 2(1): 71–86

    Article  Google Scholar 

  3. Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. John Wiley and Sons, Inc, New York

    Book  Google Scholar 

  4. Bell AJ et al (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Comput 7(6): 1129–1159

    Article  Google Scholar 

  5. Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Networks 13(6): 1450–1464

    Article  Google Scholar 

  6. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Networks 5(4): 537–550

    Article  Google Scholar 

  7. Kwak N, Choi CH (2002) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12): 1667–1671

    Article  Google Scholar 

  8. Peng H, Long F, Ding C (2005) Feature selection based on mutual information : criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238

    Article  Google Scholar 

  9. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5: 1205–1224

    MATH  MathSciNet  Google Scholar 

  10. Huang DS et al (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15): 1855–1862

    Article  Google Scholar 

  11. El Akadi, A et al (2010) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst

  12. Dhir CS, Lee SY (2008) Hybrid feature selection: combining fisher criterion and mutual information for efficient feature selection. In: Advances in neural information processing, LNCS 5506:613–620

  13. Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2): 228–233

    Article  Google Scholar 

  14. Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using LDA-based algorithms. IEEE Trans Neural Networks 14(1): 195–200

    Article  Google Scholar 

  15. Duda RO, Hart PE, Stork DG (2001) Pattern classification. 2nd edn. John Wiley and Sons, Inc, New York

    MATH  Google Scholar 

  16. Fukunaga K (1990) Introduction of statistical pattern recognition. 2nd edn. Academic Press, London

    Google Scholar 

  17. Liu K, Cheng YQ, Yang JY, Liu X (1992) An efficient algorithm for FoleySammon optimal set of discriminant vectors by algebraic method. Int J Pattern Recognit Artif Intell 6(5): 817–829

    Article  Google Scholar 

  18. Dhir CS, Lee SY (2009) Discriminant independent component analysis. In: Proceedings of the international conference intelligent data engineering and automated learning, LNCS 5788:219–225 (Full paper is submitted to IEEE Trans. NN)

  19. Li T, Zhu S, Ogihara M (2006) Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10(4): 453–472

    Article  Google Scholar 

  20. Akaho S (2002) Conditionally independent component analysis for supervised feature extraction. Neurocomputing 49(1–4): 139–155

    Article  MATH  Google Scholar 

  21. Amato U et al (2003) Independent discriminant component analysis. Int J Math 3: 735–753

    MATH  MathSciNet  Google Scholar 

  22. Shinji U, Shotaro A, Yasuko S (1999) Supervised independent component analysis and its applications to face image. IEIC Tech Rep Inst Electron Inf Commun Eng 99(58): 9–16

    Google Scholar 

  23. Slonim N, Tishby N (2000) Agglomerative information bottleneck. Adv Neural Process Syst 12: 617–623

    Google Scholar 

  24. Chong EKP, Zak SH (2001) An introduction to optimization. 2nd edn. John Wiley and Sons, Inc, New York

    MATH  Google Scholar 

  25. Murillo JML, Rodriguez AA (2007) Maximization of mutual information for supervised linear feature extraction. IEEE Trans Neural Networks 18(5): 1433–1441

    Article  Google Scholar 

  26. Chen LF et al (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33(10): 1713–1726

    Article  Google Scholar 

  27. Quintero HJR (2006) Statistical feature extraction for machine learning-based text mining. MS Thesis, KAIST

  28. Vilar D, Ney H, Juan A, Vidal E, Für L (2004) Effect of feature smoothing methods in text classification tasks. In: Proceedings of the 4th international workshop on pattern recognition in information systems, pp 108–117

  29. Xindong W et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37

    Article  Google Scholar 

  30. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  31. Joachims T (1999) Making large-scale SVM learning practical. Advances in kernel methods-support vector learning. In: Schlkopf B, Burges C, Smola A (eds). MIT-Press

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soo-Young Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhir, C.S., Lee, J. & Lee, SY. Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30, 359–375 (2012). https://doi.org/10.1007/s10115-011-0381-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0381-9

Keywords

Navigation