Conditionally independent component analysis for supervised feature extraction
Introduction
Independent component analysis (ICA) has been successfully applied to data analysis, signal processing, and modeling of coding in the brain, especially in the early vision system [6], [13]. However, ICA has been restricted to unsupervised cases. In the present paper, we extend the framework of ICA to supervised learning.
As a direct application of this extension, we consider naive Bayes learning [10]. Despite its simplicity, naive Bayes learning scheme performs well especially on most classification tasks, and is often significantly more accurate than more sophisticated methods. However, for the prediction of real values, the assumption of conditional independence affects the performance. The present paper proposes a method to find conditionally independent components.
The features extracted by conditionally independent components analysis (CICA) can be considered as independent components of observations from which explanatory parts for a target variable are removed. Therefore, this extension is expected to enable feature extraction in supervised data analysis and modeling of higher level coding in the brain.
We begin by reviewing naive Bayes learning in Section 2. The main algorithm to find the conditionally independent components is derived in Section 3. In Section 4, we summarize the kernel density estimation which is used both in CICA and in naive Bayes learning, and give simple simulation results in Section 5. We also give a method of dimension reduction to deal with a large dimensionality based on canonical correlation analysis in Section 6. Finally, we conclude the paper with some discussion and suggestions for future works in Section 7.
Section snippets
Naive Bayes learning
Let us consider the problem of predicting a target value y for an input vector with d elements x1,…,xd. If the probability density function is known, we can choose y to minimize the expected prediction error. However, is not usually known and it must be estimated from a set of training samples. If the dimensionality of is large, the learning is considered to be difficult, which is often called the ‘curse of dimensionality’ in learning theory. Naive Bayes learning solves this
Problem
We assume that there is an unknown vector of source signals whose elements are mutually independent conditionally for given yThe validity of the assumption is discussed in Section 3.4. Observed is considered to be a linear mixture of . For the sake of simplicity, we assume . The case is dealt with as a dimension reduction problem in Section 6. If , the observation is generated bywhere is an unknown nonsingular matrix. Without
Kernel density estimation
Suppose we have n training samples . Let denote an estimation of the demixing matrix W, and denote the value of transformed by , .
Kernel density estimation is a traditional nonparametric method [14], which is written for the joint distribution of zi and y in the formwhere K is a kernel function, and hi and hy are parameters defining kernel widths for zi and y.
A typical choice of K is the Gaussian kernel
Simulations
We show simple simulation results to confirm the proposed algorithm works. In this experiment, we only investigate the case of to concentrate on the performance of CICA.
Dimension reduction
In this section, we study the case that the dimensionality of observations is larger than that of sources, i.e. . In order to apply CICA algorithm, we reduce the dimensionality of by linear transformationwhere B is a matrix and is a transformed vector. There is no problem if the rank of B is equal to ; otherwise, how should we choose B? There are two requirements for : one is that is preferred to preserve as much information on y as possible; the other is that applying
Concluding remarks
In the present paper, we have proposed an extended framework of ICA to a supervised learning scheme and applied it to naive Bayes learning. It has been shown that the cost function is reduced to maximizing the independence of extracted features as well as the sum of the mutual information between extracted features and a target variable. The framework provides a novel method of extracting independent factors embedded in the input variables by removing explanatory parts.
The extension to
Acknowledgements
The author wishes to acknowledge two anonymous reviewers for their valuable suggestions.
Shotaro Akaho received the B.Eng., M.Eng. and Ph.D. degree in Mathematical Engineering from the University of Tokyo in 1988, 1990 and 2001 respectively. From 1990 to 2001, he worked as a researcher in Electrotechnical Laboratory (MITI Japan). Since April 2001, he has been working as a group leader of Mathematical Neuroinformatics Group of Neuroscience Research Institute of AIST (The National Institute of Advanced Industrial Science and Technology). His research interests include statistical
References (16)
- et al.
The ‘independent components’ of natural scenes are edge filters
Vision Res.
(1997) - S. Akaho, Y. Kiuchi, S. Umeyama, MICA: Multimodal independent component analysis, Proceedings of the IJCNN, 1999, pp....
An Introduction to Multivariate Statistical Analysis
(1984)Natural gradient works efficiently in learning
Neural Comput.
(1998)Mutual information maximization: models of cortical self-organization
Network: Comput. Neural Systems
(1996)- et al.
Learning mixture models of spatial coherence
Neural Comput.
(1993) - J.M. Chambers, T.J.Hastie (Eds.), Statistical Models in S, Wadsworth and Brooks, Pacific Grove, CA,...
- et al.
Probabilistic Networks and Expert Systems
(1999)
Cited by (17)
SKICA: A feature extraction algorithm based on supervised ICA with kernel for anomaly detection
2019, Journal of Intelligent and Fuzzy SystemsTraining data reduction in deep neural networks with partial mutual information based feature selection and correlation matching based active learning
2017, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - ProceedingsIndependent component analysis of edge information for face recognition
2014, SpringerBriefs in Applied Sciences and TechnologySeparation of instantaneous mixtures of a particular set of dependent sources using classical ICA methods
2013, Eurasip Journal on Advances in Signal ProcessingExtraction of independent discriminant features for data with asymmetric distribution
2012, Knowledge and Information Systems
Shotaro Akaho received the B.Eng., M.Eng. and Ph.D. degree in Mathematical Engineering from the University of Tokyo in 1988, 1990 and 2001 respectively. From 1990 to 2001, he worked as a researcher in Electrotechnical Laboratory (MITI Japan). Since April 2001, he has been working as a group leader of Mathematical Neuroinformatics Group of Neuroscience Research Institute of AIST (The National Institute of Advanced Industrial Science and Technology). His research interests include statistical learning theory and its applications.