Abstract
Standard unsupervised linear feature extraction methods find orthonormal (PCA) or statistically independent (ICA) latent variables that are good for data representation. These representative features may not be optimal for the classification tasks, thus requiring a search of linear projections that can give a good discriminative model. A semi-supervised linear feature extraction method, namely dICA, had recently been proposed which jointly maximizes the Fisher linear discriminant (FLD) and negentropy of the extracted features [Dhir and Lee in Discriminant independent component analysis. In: Proceedings of the international conference intelligent data engineering and automated learning, LNCS 5788:219–225 (Full paper is submitted to IEEE Trans. NN) 2009]. Motivated by the independence and unit covariance of the extracted dICA features, maximizing the determinant of between-class scatter of the features matrix is theoretically the same as the maximization of FLD. This also reduces the computational complexity of the algorithm. In this paper, we concentrate on text databases that follow inherent exponential distribution. Approximation and the maximization of negentropy for data with asymmetric distribution is discussed. Experiments on the text categorization problem show improvements in classification performance and data reconstruction.
Similar content being viewed by others
References
Hinton GE, Sejnowski TJ (1999) Unsupervised learning: foundations of neural computation. MIT Press, MIT press publishers, Cambridge
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognitive Neurosci 2(1): 71–86
Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. John Wiley and Sons, Inc, New York
Bell AJ et al (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Comput 7(6): 1129–1159
Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Networks 13(6): 1450–1464
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Networks 5(4): 537–550
Kwak N, Choi CH (2002) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12): 1667–1671
Peng H, Long F, Ding C (2005) Feature selection based on mutual information : criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5: 1205–1224
Huang DS et al (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15): 1855–1862
El Akadi, A et al (2010) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst
Dhir CS, Lee SY (2008) Hybrid feature selection: combining fisher criterion and mutual information for efficient feature selection. In: Advances in neural information processing, LNCS 5506:613–620
Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2): 228–233
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using LDA-based algorithms. IEEE Trans Neural Networks 14(1): 195–200
Duda RO, Hart PE, Stork DG (2001) Pattern classification. 2nd edn. John Wiley and Sons, Inc, New York
Fukunaga K (1990) Introduction of statistical pattern recognition. 2nd edn. Academic Press, London
Liu K, Cheng YQ, Yang JY, Liu X (1992) An efficient algorithm for FoleySammon optimal set of discriminant vectors by algebraic method. Int J Pattern Recognit Artif Intell 6(5): 817–829
Dhir CS, Lee SY (2009) Discriminant independent component analysis. In: Proceedings of the international conference intelligent data engineering and automated learning, LNCS 5788:219–225 (Full paper is submitted to IEEE Trans. NN)
Li T, Zhu S, Ogihara M (2006) Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10(4): 453–472
Akaho S (2002) Conditionally independent component analysis for supervised feature extraction. Neurocomputing 49(1–4): 139–155
Amato U et al (2003) Independent discriminant component analysis. Int J Math 3: 735–753
Shinji U, Shotaro A, Yasuko S (1999) Supervised independent component analysis and its applications to face image. IEIC Tech Rep Inst Electron Inf Commun Eng 99(58): 9–16
Slonim N, Tishby N (2000) Agglomerative information bottleneck. Adv Neural Process Syst 12: 617–623
Chong EKP, Zak SH (2001) An introduction to optimization. 2nd edn. John Wiley and Sons, Inc, New York
Murillo JML, Rodriguez AA (2007) Maximization of mutual information for supervised linear feature extraction. IEEE Trans Neural Networks 18(5): 1433–1441
Chen LF et al (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33(10): 1713–1726
Quintero HJR (2006) Statistical feature extraction for machine learning-based text mining. MS Thesis, KAIST
Vilar D, Ney H, Juan A, Vidal E, Für L (2004) Effect of feature smoothing methods in text classification tasks. In: Proceedings of the 4th international workshop on pattern recognition in information systems, pp 108–117
Xindong W et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Joachims T (1999) Making large-scale SVM learning practical. Advances in kernel methods-support vector learning. In: Schlkopf B, Burges C, Smola A (eds). MIT-Press
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dhir, C.S., Lee, J. & Lee, SY. Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30, 359–375 (2012). https://doi.org/10.1007/s10115-011-0381-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0381-9