Elsevier

Pattern Recognition

Volume 44, Issues 10–11, October–November 2011, Pages 2395-2413
Pattern Recognition

A novel multi-view learning developed from single-view patterns

https://doi.org/10.1016/j.patcog.2011.04.002Get rights and content

Abstract

The existing multi-view learning (MVL) learns how to process patterns with multiple information sources. In generalization this MVL is proven to have a significant advantage over the usual single-view learning (SVL). However, in most real-world cases we only have single source patterns to which the existing MVL is unable to be directly applied. This paper aims to develop a new MVL technique for single source patterns. To this end, we first reshape the original vector representation of single source patterns into multiple matrix representations. In doing so, we can change the original architecture of a given base classifier into different sub-ones. Each newly generated sub-classifier can classify the patterns represented with the matrix. Here each sub-classifier is taken as one view of the original base classifier. As a result, a set of sub-classifiers with different views are come into being. Then, one joint rather than separated learning process for the multi-view sub-classifiers is developed. In practice, the original base classifier employs the vector-pattern-oriented Ho–Kashyap classifier with regularization learning (called MHKS) as a paradigm which is not limited to MHKS. Thus, the proposed joint multi-view learning is named as MultiV-MHKS. Finally, the feasibility and effectiveness of the proposed MultiV-MHKS is demonstrated by the experimental results on benchmark data sets. More importantly, we have demonstrated that the proposed multi-view approach generally has a tighter generalization risk bound than its single-view one in terms of the Rademacher complexity analysis.

Highlights

►This paper develops a new multi-view learning (MVL) for single source patterns. ► We first reshape vector representation of patterns into multiple matrix ones. ► Then we propose one joint rather than separated process for the got matrices. ► The experiments show the feasibility and effectiveness of the proposed MVL.

Introduction

It is well-known that it is important to integrate the prior knowledge of dealt patterns in designing classifiers [8]. In practice, patterns can generally be obtained from single or multiple information sources. If each information source is taken as one view, accordingly there are two kinds of patterns, i.e. single-view patterns and multi-view patterns.1 Correspondingly, the learning based on single-view and multi-view patterns can be called as single-view learning (SVL) and multi-view learning (MVL), respectively. It has been proven that co-training as one typical MVL approach has a superior generalization ability to SVL [9]. Co-training learns on both labeled and unlabeled pattern sets. Both labeled and unlabeled patterns are composed of two naturally split attribute sets. Each attribute set is called one view of the patterns. In implementation, co-training algorithm requires that the two views given the class labels are conditionally independent. The independence assumption is guaranteed by the patterns composed of two naturally split attribute sets.

In this paper, we expand the existing MVL to single-view patterns and thus develop a novel MVL framework, whose underlying motivations are:

  • It is known that patterns can be sorted into single-view patterns and multi-view patterns according to the number M of information sources [9], [10], [11]. However, in most real-world applications there are usually only single-view patterns available since the M has to be one. In that case, the existing MVL framework cannot effectively work since there is not any natural way to partition the attribute space [8], [10], [11], [12]. Therefore, this fact motivates us to develop a new MVL framework. The new MVL is expected to create multiple different views from single-view patterns and then to learn on the generated views simultaneously.

  • In the existing MVL framework, multi-view patterns are represented by multiple independent sets of attributes. Its base algorithms have the same architecture in each view so as to iteratively bootstrap each other. Here, we expect to utilize the multi-view technique due to its superior generalization to the SVL. However, different from the exist MVL on multi-view patterns, we give a new multi-view viewpoint for a given base classifier on single-view patterns. Concretely, we change the original architecture of the given base classifier and thus obtain a set of sub-classifiers with different architectures from each other. Each derived sub-classifier can be taken as one view of the original base classifier, which forms a set of sub-classifiers with multiple views. For all the derived sub-classifiers, we further adopt a joint rather than separated learning process. Therefore, one new learning algorithm is developed for these multi-view sub-classifiers. It is minimized for the disagreement between the outputs of each derived classifier on the same patterns.

In practice, we select the vector-pattern-oriented linear classifier as the so-discussed base classifier. Before being classified, any pattern whatever form it originally is, should be transformed into a vector representation in the vectorial case [33]. However, it is not always efficient to construct a vector-pattern-oriented classifier since the vectorization for patterns such as images might lead to a high computation and a loss of spatial information [21], [23], [26], [34], [40]. For overcoming the disadvantage, we proposed a matrix-pattern-oriented Ho–Kashyap classifier named MatMHKS [21], [40] in the previous work. MatMHKS is a matrixized version of the vectorial Ho–Kashyap classifier with regularization learning (namely MHKS) [20]. The literature [21], [23], [34], [40] has demonstrated the significant advantages of the matrixized classifier design in terms of both classification and computational performance.

The discriminant function of the vectorial MHKS is given asg(x)=ω˜Tx+ω0,where xRd is a vector pattern, ω˜Rd is a weight vector, and ω0R is a bias. Correspondingly, the discriminant function of MatMHKS is given asg(A)=uTAv˜+v0,where ARm×n is a matrix pattern, uRm and v˜Rn are the two weight vectors, and v0R is a bias. It is found that for a given pattern, there can be one vector-form representation in the formulation (1) but multiple matrix-form representations with different dimensional sizes for the m and n in the formulation (33). In other words, there are multiple ways for reshaping the vector to the matrix. For instance, a vector x=[1,2,3,4,5,6,7,8]T could be assembled into two different matrices: 13572468and12345678T.Consequently, only one MHKS can be created for classifying the given pattern x. In contrast, multiple MatMHKSs can be created for the same task due to multiple reshaping ways from a vector to a matrix. Therefore, for the same classification problem, the solution set {ω˜,ω0} of single MHKS corresponds to the solution sets {up,v˜p,v0p}p=1M of multiple MatMHKSs, where the weight vector sets {up,v˜p}p=1M are different from each other in terms of the dimensional size but can share a common discriminant function form g(A)=uTAv˜+v0. Here, MHKS is viewed as the base classifier. Each MatMHKS is taken as one view of the base MHKS. Our previous work [21] has validated that each MatMHKS provides one hypothesis and exhibits one representation of the original pattern. Thus multiple MatMHKSs can provide a complementarity for each other in classification due to their different representations for patterns. In order to achieve the complementarity, we syncretize the learning processing of multiple MatMHKSs into one single processing. In this case, each MatMHKS is expected to correctly classify one given pattern with the same attributes. Meanwhile it should be minimized for the disagreement between the outputs of all MatMHKSs. As a result, the single learning process is produced and one multi-view-combined classifier named MultiV-MHKS is proposed. Through the Rademacher complexity analysis, we demonstrate that the proposed multi-view MultiV-MHKS has a tighter generalization risk bound compared with the single-view MHKS.

  • The proposed MultiV-MHKS algorithm is a nice way to solve the view selection problem of MatMHKS [21]. In MatMHKS, it is always a problem to select the best right matrix-form reshaped from a given vector pattern. This paper suggests one way to bypass it through choosing all the relevant ones and optimizing over them jointly. It is known that from a vector pattern as the input of MHKS to a matrix as the input of MatMHKS, the classification performance of MatMHKS relies on the different reshaping or matrixization ways [21], [40]. In the processing of matrixizing a vector, different reshaping ways can induce multiple matrix patterns with different dimensional sizes of the row and column. Consequently, different reshaping ways result in different classification performances of MatMHKSs on the same vector patterns. Then for the best performance, we have to make a choice in multiple reshaping ways with the cross-validation technique at the cost of high computation [21]. Since the proposed MultiV-MHKS here simultaneously considers multiple MatMHKSs with multiple matrices, the choice of matrixizing ways could be avoided to great extent.

  • The proposed MultiV-MHKS algorithm adopts the data representation in multiple views different from the other main strategies for creating good ensembles of classifiers: sampling either pattern sets or attribute (interchangeably feature) sets [13], [14], [48]. Compared with sampling pattern sets or feature sets, the proposed multi-view classifier design provides an alternative novel approach of producing multiple data sets for base learners, i.e. reshaping a vector pattern to different matrix ones with the same full features. In this case, the proposed multi-view classifier has the advantages in terms of the actual number of the unique samples, the size of the feature set and the representations, which brings up the superior performance of the proposed MVL here. In addition, different from the strategy of sampling pattern sets or feature sets, the proposed MVL employs a joint optimization rather than a separate learning in the training processing. To the best of our knowledge, it is novel for the proposed strategy of generating multiple training data sets on the base classifier. The implemented experimental results here have also shown that the proposed classifier MultiV-MHKS algorithm has a superior classification performance to the other strategies of ensembles.

We highlight the contributions of this paper as follows:

  • Significance: This paper introduces the creation of multiple views from a single view for multi-view learning. It is known that though the existing MVL has been shown effective in the literature [8], [10], [11], [12], it still relies heavily on the naturally separating the feature set into two independent components. In many settings, there might not be any natural way to partition the feature space, and thus the existing MVL framework might not be applicable. In such a scenario, the proposed approach suggested in this paper can potentially create multiple independent or at least weaker correlated views from a single view and then learn from the generated multiple views simultaneously.

  • Novelty in the two aspects: In the first aspect, the learning approach proposed in this paper is different from the existing multi-view learning approach. Instead of the classifiers trained on two views iteratively boot-strapping each other, this paper proposes a joint learning approach that minimizes the disagreement across the classifications with multiple views. There might be some similarities with ensemble learning. In ensemble learning, the predictions from different sub-classifiers over a single view are combined. But, in contrast, the critical difference of the proposed MVL here is in the joint optimization. In the second aspect, compared with the typical ensemble models: Bagging or Boosting based on pattern sampling [13], [48] and Attribute Bagging based on attribute sampling [14], our strategy is neither sampling patterns nor sampling attributes, instead reshaping the original pattern set to the matrix pattern set in multiple times. Each reshaping can develop a corresponding sub-classifier and then can be syncretized together, which leads to a performance gain.

  • Generalization: The proposed MVL is a wrapper technique and is not restricted to the MHKS classifier. It acts as the state-of-the-art kernelization technique applied to linear algorithms. The proposed multi-view-combined learning can fall into the framework as follows:minL=Jind+γJcomwhere Jind=p=1MIp(fp), Jcom=p=1M(fpq=1Mrqfq), q=1Mrq=1. Jind denotes that M learning machines fp,p=1,,M train according to the criterion Ip, respectively. Jcom makes M machines fp corresponding to M views of the common labels achieve as much agreement on their outputs as possible. Jcom tries to achieve the complementarity between M learning machines fp's. When the individual machine fp adopts the classifier g of Eq. (33) in practice, the learning framework becomes the proposed algorithm MultiV-MHKS.

The rest of this paper is organized as follows. Section 2 discusses the related work on the multi-view learning. Section 3 gives how to create multiple pattern representations from single-view patterns. Section 4 introduces the multi-view viewpoint into MHKS and MatMHKS, and further gives the description about the structure of the proposed multi-view-combined classifier MultiV-MHKS. The experiments in Section 5 have demonstrated the feasibility and effectiveness of the proposed MVL. Following that, both conclusion and future work are given in Section 6.

Section snippets

Related work

One typical example of the existing MVL is web-page classification [9], where each web page can be represented by either the words on itself (view one) or the words contained in anchor texts of inbound hyperlinks (view two). In [9], Blum and Mitchell design a co-training algorithm for the labeled and unlabeled web pattern sets composed of two naturally split views. On labeled web set, two sub-classifiers of co-training algorithm are incrementally built with their corresponding views, and thus

The way of multiviewization

This section gives the way to generate multiple pattern representations from the single-view patterns. Firstly, according to the size of sources for patterns, we sort patterns into single-view and multi-view patterns. In our opinion, each source of a pattern can form a set of attributes for the pattern. Thus, each set of attributes can be taken as one view of the pattern. Then suppose that there are patterns {zi}i=1N, where each pattern zi has M views denoted as {zip}p=1M and each view of all

Joint learning on multiple views

This section describes how to learn on the multiple pattern representations generated from the single-view patterns. We first review the base classifier MHKS [20] and its corresponding matrixization version MatMHKS, respectively. Based on both MHKS and MatMHKS, we further propose a multi-view-combined classifier namely MultiV-MHKS.

Experiments

This section gives the demonstration on the effectiveness of the proposed MVL. The proposed MVL can be composed of two components: (1) pattern representation in multiple views (multiviewization); (2) the joint learning processing with the generated views. Thus, in order to demonstrate the proposed approach, we have to demonstrate the effectiveness of the two components, respectively. For the first component, we give a comparison between the proposed multiviewization processing and Bagging [13],

Conclusions and future work

In this paper, we have developed a new multi-view classifier MultiV-MHKS that is composed of multiviewization and a joint learning process. It takes MHKS as the base classifier and each corresponding MatMHKS generated from MHKS as one view, and combines all the views into one single learning process. Different from the existing multi-view viewpoint that patterns are represented by multiple independent attribute sets, the proposed multi-view viewpoint is to reshape the original vector

Acknowledgment

The authors thank (Key) Natural Science Foundations of China under Grant nos. 61035003, 60903091, and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant no. 20090074120003 for partial support. This work is also supported by the Open Projects Program of National Laboratory of Pattern Recognition and the Fundamental Research Funds for the Central Universities.

Zhe Wang received the B.Sc. and Ph.D. degrees in Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 2003 and 2008, respectively. He is now an Associate Professor in Department of Computer Science and Engineering, East China University of Science and Technology (ECUST), Shanghai, China. His research interests include machine learning, pattern recognition, and image processing.

References (49)

  • P. Bartlett et al.

    Rademacher and Gaussian complexities: risk bounds and structural results

    Journal of Machine Learning Research

    (2002)
  • R. Duin et al.

    Object representation, sample size and data complexity

  • A. Blum et al.

    Combining labeled and unlabeled data with co-training

  • K. Nigam et al.

    Analyzing the effectiveness and applicability of co-training

  • I. Muslea et al.

    Active+semi-supervised learning = robust multi-view learning

  • W.C. Lin et al.

    A hierarchical multiple-view approach to three-dimensional object recognition

    IEEE Transactions on Neural Networks

    (1991)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • R. Brylla et al.

    Attribute Bagging: improving accuracy of classifier ensembles by using random feature subsets

    Pattern Recognition

    (2003)
  • G. Valentini1 et al.

    Ensembles of learning machines

  • A.K. Seewald

    Towards a theoretical framework for ensemble classification

  • T. Windeatt

    Accuracy/diversity and ensemble MLP classifier design

    IEEE Transactions on Neural Networks

    (2006)
  • B. Igelnik et al.

    The ensemble approach to neural-network learning and generalization

    IEEE Transactions on Neural Networks

    (1999)
  • E. Ho et al.

    An algorithm for linear inequalities and its applications

    IEEE Transactions on Electronics Computers

    (1965)
  • V. Vapnik

    Statistical Learning Theory

    (1998)
  • Cited by (40)

    • Multi-view ensemble learning using multi-objective particle swarm optimization for high dimensional data classification

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      The MEL approach extracts the pattern from single or multiple information sources. Single-view learning and MEL are learning from a single view and multiple view pattern, respectively (Wang et al., 2011). Effective exploitation of redundant views by MEL makes a significant difference from single-view learning.

    • Cost-sensitive Fuzzy Multiple Kernel Learning for imbalanced problem

      2019, Neurocomputing
      Citation Excerpt :

      However, the Rademacher complexity considers the distribution of the data. Accordingly, it is demonstrated to be an effective technique to measure the generalization risk bound of one classifier [70]. The classification performance of CFMKL has investigated in the first and second part in Section 4.3.

    • Prediction of replication sites in Saccharomyces cerevisiae genome using DNA segment properties: Multi-view ensemble learning (MEL) approach

      2018, BioSystems
      Citation Excerpt :

      The classifiers are induced from each view using a classification algorithm to achieve the class of target concept and later the induced classifiers are considered as an ensemble (Ganchev et al., 2012). Multiple views obtained from different feature subsets ensure that each view best represents the data and improve overall prediction accuracy (Kumar and Minz, 2015; Muller et al., 2010; Wang et al., 2011). Thus, multi-view learning is considered to be more accurate on training data and shows better generalization ability on unseen data.

    View all citing articles on Scopus

    Zhe Wang received the B.Sc. and Ph.D. degrees in Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 2003 and 2008, respectively. He is now an Associate Professor in Department of Computer Science and Engineering, East China University of Science and Technology (ECUST), Shanghai, China. His research interests include machine learning, pattern recognition, and image processing.

    Songcan Chen received the B.Sc. degree in mathematics from Hangzhou University (now merged into Zhejiang University), Hangzhou, China, in 1983, the M.Sc. degree in computer applications from Shanghai Jiaotong University, Shanghai, China, in 1985, and the Ph.D. degree in communication and information systems from the Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 1997. He was an Assistant Lecturer at NUAA, where since 1998, he has been a Full Professor at the Department of Computer Science and Engineering. He has authored or coauthored over 130 scientific journal papers. His research interests include pattern recognition, machine learning, and neural computing.

    Daqi Gao received the Ph.D. degree from Zhejiang University, China, in 1996. Currently, he is a Professor in East China University of Science and Technology. He is a member of the International Neural Network Society (INNS). He has published over 50 scientific papers. His research interests are pattern recognition, neural networks, and machine olfactory.

    View full text