A novel multi-view learning developed from single-view patterns
Highlights
►This paper develops a new multi-view learning (MVL) for single source patterns. ► We first reshape vector representation of patterns into multiple matrix ones. ► Then we propose one joint rather than separated process for the got matrices. ► The experiments show the feasibility and effectiveness of the proposed MVL.
Introduction
It is well-known that it is important to integrate the prior knowledge of dealt patterns in designing classifiers [8]. In practice, patterns can generally be obtained from single or multiple information sources. If each information source is taken as one view, accordingly there are two kinds of patterns, i.e. single-view patterns and multi-view patterns.1 Correspondingly, the learning based on single-view and multi-view patterns can be called as single-view learning (SVL) and multi-view learning (MVL), respectively. It has been proven that co-training as one typical MVL approach has a superior generalization ability to SVL [9]. Co-training learns on both labeled and unlabeled pattern sets. Both labeled and unlabeled patterns are composed of two naturally split attribute sets. Each attribute set is called one view of the patterns. In implementation, co-training algorithm requires that the two views given the class labels are conditionally independent. The independence assumption is guaranteed by the patterns composed of two naturally split attribute sets.
In this paper, we expand the existing MVL to single-view patterns and thus develop a novel MVL framework, whose underlying motivations are:
- •
It is known that patterns can be sorted into single-view patterns and multi-view patterns according to the number M of information sources [9], [10], [11]. However, in most real-world applications there are usually only single-view patterns available since the M has to be one. In that case, the existing MVL framework cannot effectively work since there is not any natural way to partition the attribute space [8], [10], [11], [12]. Therefore, this fact motivates us to develop a new MVL framework. The new MVL is expected to create multiple different views from single-view patterns and then to learn on the generated views simultaneously.
- •
In the existing MVL framework, multi-view patterns are represented by multiple independent sets of attributes. Its base algorithms have the same architecture in each view so as to iteratively bootstrap each other. Here, we expect to utilize the multi-view technique due to its superior generalization to the SVL. However, different from the exist MVL on multi-view patterns, we give a new multi-view viewpoint for a given base classifier on single-view patterns. Concretely, we change the original architecture of the given base classifier and thus obtain a set of sub-classifiers with different architectures from each other. Each derived sub-classifier can be taken as one view of the original base classifier, which forms a set of sub-classifiers with multiple views. For all the derived sub-classifiers, we further adopt a joint rather than separated learning process. Therefore, one new learning algorithm is developed for these multi-view sub-classifiers. It is minimized for the disagreement between the outputs of each derived classifier on the same patterns.
In practice, we select the vector-pattern-oriented linear classifier as the so-discussed base classifier. Before being classified, any pattern whatever form it originally is, should be transformed into a vector representation in the vectorial case [33]. However, it is not always efficient to construct a vector-pattern-oriented classifier since the vectorization for patterns such as images might lead to a high computation and a loss of spatial information [21], [23], [26], [34], [40]. For overcoming the disadvantage, we proposed a matrix-pattern-oriented Ho–Kashyap classifier named MatMHKS [21], [40] in the previous work. MatMHKS is a matrixized version of the vectorial Ho–Kashyap classifier with regularization learning (namely MHKS) [20]. The literature [21], [23], [34], [40] has demonstrated the significant advantages of the matrixized classifier design in terms of both classification and computational performance.
The discriminant function of the vectorial MHKS is given aswhere is a vector pattern, is a weight vector, and is a bias. Correspondingly, the discriminant function of MatMHKS is given aswhere is a matrix pattern, and are the two weight vectors, and is a bias. It is found that for a given pattern, there can be one vector-form representation in the formulation (1) but multiple matrix-form representations with different dimensional sizes for the m and n in the formulation (33). In other words, there are multiple ways for reshaping the vector to the matrix. For instance, a vector could be assembled into two different matrices: Consequently, only one MHKS can be created for classifying the given pattern x. In contrast, multiple MatMHKSs can be created for the same task due to multiple reshaping ways from a vector to a matrix. Therefore, for the same classification problem, the solution set of single MHKS corresponds to the solution sets of multiple MatMHKSs, where the weight vector sets are different from each other in terms of the dimensional size but can share a common discriminant function form . Here, MHKS is viewed as the base classifier. Each MatMHKS is taken as one view of the base MHKS. Our previous work [21] has validated that each MatMHKS provides one hypothesis and exhibits one representation of the original pattern. Thus multiple MatMHKSs can provide a complementarity for each other in classification due to their different representations for patterns. In order to achieve the complementarity, we syncretize the learning processing of multiple MatMHKSs into one single processing. In this case, each MatMHKS is expected to correctly classify one given pattern with the same attributes. Meanwhile it should be minimized for the disagreement between the outputs of all MatMHKSs. As a result, the single learning process is produced and one multi-view-combined classifier named MultiV-MHKS is proposed. Through the Rademacher complexity analysis, we demonstrate that the proposed multi-view MultiV-MHKS has a tighter generalization risk bound compared with the single-view MHKS.
- •
The proposed MultiV-MHKS algorithm is a nice way to solve the view selection problem of MatMHKS [21]. In MatMHKS, it is always a problem to select the best right matrix-form reshaped from a given vector pattern. This paper suggests one way to bypass it through choosing all the relevant ones and optimizing over them jointly. It is known that from a vector pattern as the input of MHKS to a matrix as the input of MatMHKS, the classification performance of MatMHKS relies on the different reshaping or matrixization ways [21], [40]. In the processing of matrixizing a vector, different reshaping ways can induce multiple matrix patterns with different dimensional sizes of the row and column. Consequently, different reshaping ways result in different classification performances of MatMHKSs on the same vector patterns. Then for the best performance, we have to make a choice in multiple reshaping ways with the cross-validation technique at the cost of high computation [21]. Since the proposed MultiV-MHKS here simultaneously considers multiple MatMHKSs with multiple matrices, the choice of matrixizing ways could be avoided to great extent.
- •
The proposed MultiV-MHKS algorithm adopts the data representation in multiple views different from the other main strategies for creating good ensembles of classifiers: sampling either pattern sets or attribute (interchangeably feature) sets [13], [14], [48]. Compared with sampling pattern sets or feature sets, the proposed multi-view classifier design provides an alternative novel approach of producing multiple data sets for base learners, i.e. reshaping a vector pattern to different matrix ones with the same full features. In this case, the proposed multi-view classifier has the advantages in terms of the actual number of the unique samples, the size of the feature set and the representations, which brings up the superior performance of the proposed MVL here. In addition, different from the strategy of sampling pattern sets or feature sets, the proposed MVL employs a joint optimization rather than a separate learning in the training processing. To the best of our knowledge, it is novel for the proposed strategy of generating multiple training data sets on the base classifier. The implemented experimental results here have also shown that the proposed classifier MultiV-MHKS algorithm has a superior classification performance to the other strategies of ensembles.
We highlight the contributions of this paper as follows:
- •
Significance: This paper introduces the creation of multiple views from a single view for multi-view learning. It is known that though the existing MVL has been shown effective in the literature [8], [10], [11], [12], it still relies heavily on the naturally separating the feature set into two independent components. In many settings, there might not be any natural way to partition the feature space, and thus the existing MVL framework might not be applicable. In such a scenario, the proposed approach suggested in this paper can potentially create multiple independent or at least weaker correlated views from a single view and then learn from the generated multiple views simultaneously.
- •
Novelty in the two aspects: In the first aspect, the learning approach proposed in this paper is different from the existing multi-view learning approach. Instead of the classifiers trained on two views iteratively boot-strapping each other, this paper proposes a joint learning approach that minimizes the disagreement across the classifications with multiple views. There might be some similarities with ensemble learning. In ensemble learning, the predictions from different sub-classifiers over a single view are combined. But, in contrast, the critical difference of the proposed MVL here is in the joint optimization. In the second aspect, compared with the typical ensemble models: Bagging or Boosting based on pattern sampling [13], [48] and Attribute Bagging based on attribute sampling [14], our strategy is neither sampling patterns nor sampling attributes, instead reshaping the original pattern set to the matrix pattern set in multiple times. Each reshaping can develop a corresponding sub-classifier and then can be syncretized together, which leads to a performance gain.
- •
Generalization: The proposed MVL is a wrapper technique and is not restricted to the MHKS classifier. It acts as the state-of-the-art kernelization technique applied to linear algorithms. The proposed multi-view-combined learning can fall into the framework as follows:where , , . Jind denotes that M learning machines train according to the criterion Ip, respectively. Jcom makes M machines fp corresponding to M views of the common labels achieve as much agreement on their outputs as possible. Jcom tries to achieve the complementarity between M learning machines fp's. When the individual machine adopts the classifier g of Eq. (33) in practice, the learning framework becomes the proposed algorithm MultiV-MHKS.
The rest of this paper is organized as follows. Section 2 discusses the related work on the multi-view learning. Section 3 gives how to create multiple pattern representations from single-view patterns. Section 4 introduces the multi-view viewpoint into MHKS and MatMHKS, and further gives the description about the structure of the proposed multi-view-combined classifier MultiV-MHKS. The experiments in Section 5 have demonstrated the feasibility and effectiveness of the proposed MVL. Following that, both conclusion and future work are given in Section 6.
Section snippets
Related work
One typical example of the existing MVL is web-page classification [9], where each web page can be represented by either the words on itself (view one) or the words contained in anchor texts of inbound hyperlinks (view two). In [9], Blum and Mitchell design a co-training algorithm for the labeled and unlabeled web pattern sets composed of two naturally split views. On labeled web set, two sub-classifiers of co-training algorithm are incrementally built with their corresponding views, and thus
The way of multiviewization
This section gives the way to generate multiple pattern representations from the single-view patterns. Firstly, according to the size of sources for patterns, we sort patterns into single-view and multi-view patterns. In our opinion, each source of a pattern can form a set of attributes for the pattern. Thus, each set of attributes can be taken as one view of the pattern. Then suppose that there are patterns , where each pattern zi has M views denoted as and each view of all
Joint learning on multiple views
This section describes how to learn on the multiple pattern representations generated from the single-view patterns. We first review the base classifier MHKS [20] and its corresponding matrixization version MatMHKS, respectively. Based on both MHKS and MatMHKS, we further propose a multi-view-combined classifier namely MultiV-MHKS.
Experiments
This section gives the demonstration on the effectiveness of the proposed MVL. The proposed MVL can be composed of two components: (1) pattern representation in multiple views (multiviewization); (2) the joint learning processing with the generated views. Thus, in order to demonstrate the proposed approach, we have to demonstrate the effectiveness of the two components, respectively. For the first component, we give a comparison between the proposed multiviewization processing and Bagging [13],
Conclusions and future work
In this paper, we have developed a new multi-view classifier MultiV-MHKS that is composed of multiviewization and a joint learning process. It takes MHKS as the base classifier and each corresponding MatMHKS generated from MHKS as one view, and combines all the views into one single learning process. Different from the existing multi-view viewpoint that patterns are represented by multiple independent attribute sets, the proposed multi-view viewpoint is to reshape the original vector
Acknowledgment
The authors thank (Key) Natural Science Foundations of China under Grant nos. 61035003, 60903091, and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant no. 20090074120003 for partial support. This work is also supported by the Open Projects Program of National Laboratory of Pattern Recognition and the Fundamental Research Funds for the Central Universities.
Zhe Wang received the B.Sc. and Ph.D. degrees in Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 2003 and 2008, respectively. He is now an Associate Professor in Department of Computer Science and Engineering, East China University of Science and Technology (ECUST), Shanghai, China. His research interests include machine learning, pattern recognition, and image processing.
References (49)
Ho-Kashyap classifier with generalization control
Pattern Recognition Letters
(2003)- et al.
Matrix-pattern-oriented Ho–Kashyap classifier with regularization learning
Pattern Recognition
(2007) - et al.
Matrix-pattern-oriented least squares support vector classifier with AdaBoost
Pattern Recognition Letters
(2008) - et al.
Monte Carlo cross validation
Chemometrics and Intelligent Laboratory Systems
(2001) - et al.
Fast cross-validation of high-breakdown resampling methods for PCA
Computational Statistics & Data Analysis
(2007) - et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000) - et al.
Feature extraction approaches based on matrix pattern: MatPCA and MatFLDA
Pattern Recognition Letters
(2005) - et al.
Multi-view kernel machine on single-view data
Neurocomputing
(2009) Rademacher penalties and structural risk minimization
IEEE Transactions on Information Theory
(2001)- et al.
On the uniform convergence of relative frequencies of events to their probabilities
Theory of Probability and its Applications
(1971)
Rademacher and Gaussian complexities: risk bounds and structural results
Journal of Machine Learning Research
Object representation, sample size and data complexity
Combining labeled and unlabeled data with co-training
Analyzing the effectiveness and applicability of co-training
Active+semi-supervised learning = robust multi-view learning
A hierarchical multiple-view approach to three-dimensional object recognition
IEEE Transactions on Neural Networks
Bagging predictors
Machine Learning
Attribute Bagging: improving accuracy of classifier ensembles by using random feature subsets
Pattern Recognition
Ensembles of learning machines
Towards a theoretical framework for ensemble classification
Accuracy/diversity and ensemble MLP classifier design
IEEE Transactions on Neural Networks
The ensemble approach to neural-network learning and generalization
IEEE Transactions on Neural Networks
An algorithm for linear inequalities and its applications
IEEE Transactions on Electronics Computers
Statistical Learning Theory
Cited by (40)
A review of feature set partitioning methods for multi-view ensemble learning
2023, Information FusionMulti-view ensemble learning using multi-objective particle swarm optimization for high dimensional data classification
2022, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :The MEL approach extracts the pattern from single or multiple information sources. Single-view learning and MEL are learning from a single view and multiple view pattern, respectively (Wang et al., 2011). Effective exploitation of redundant views by MEL makes a significant difference from single-view learning.
Incomplete multi-view clustering with cosine similarity
2022, Pattern RecognitionCost-sensitive Fuzzy Multiple Kernel Learning for imbalanced problem
2019, NeurocomputingCitation Excerpt :However, the Rademacher complexity considers the distribution of the data. Accordingly, it is demonstrated to be an effective technique to measure the generalization risk bound of one classifier [70]. The classification performance of CFMKL has investigated in the first and second part in Section 4.3.
Matrix-pattern-oriented classifier with boundary projection discrimination
2018, Knowledge-Based SystemsPrediction of replication sites in Saccharomyces cerevisiae genome using DNA segment properties: Multi-view ensemble learning (MEL) approach
2018, BioSystemsCitation Excerpt :The classifiers are induced from each view using a classification algorithm to achieve the class of target concept and later the induced classifiers are considered as an ensemble (Ganchev et al., 2012). Multiple views obtained from different feature subsets ensure that each view best represents the data and improve overall prediction accuracy (Kumar and Minz, 2015; Muller et al., 2010; Wang et al., 2011). Thus, multi-view learning is considered to be more accurate on training data and shows better generalization ability on unseen data.
Zhe Wang received the B.Sc. and Ph.D. degrees in Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 2003 and 2008, respectively. He is now an Associate Professor in Department of Computer Science and Engineering, East China University of Science and Technology (ECUST), Shanghai, China. His research interests include machine learning, pattern recognition, and image processing.
Songcan Chen received the B.Sc. degree in mathematics from Hangzhou University (now merged into Zhejiang University), Hangzhou, China, in 1983, the M.Sc. degree in computer applications from Shanghai Jiaotong University, Shanghai, China, in 1985, and the Ph.D. degree in communication and information systems from the Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 1997. He was an Assistant Lecturer at NUAA, where since 1998, he has been a Full Professor at the Department of Computer Science and Engineering. He has authored or coauthored over 130 scientific journal papers. His research interests include pattern recognition, machine learning, and neural computing.
Daqi Gao received the Ph.D. degree from Zhejiang University, China, in 1996. Currently, he is a Professor in East China University of Science and Technology. He is a member of the International Neural Network Society (INNS). He has published over 50 scientific papers. His research interests are pattern recognition, neural networks, and machine olfactory.