Elsevier

Neurocomputing

Volume 490, 14 June 2022, Pages 17-29
Neurocomputing

When multi-view classification meets ensemble learning

https://doi.org/10.1016/j.neucom.2022.02.052Get rights and content

Abstract

With the coming of big data era, multi-view data represented by multiple features have been involved in many terms, such as machine learning, data mining and computer vision and so on. Due to the complex structure hidden in data, how to utilize the complementary and correlative information among multiple view features to improve classification performance is a challenging task. Moreover, the another challenging task is how to assign an appropriate weight for each classifier on the basis of its performance. To solve above problems, we proposed a supervised multi-view classification method based on Least Square Regression (LSR) and Ensemble Learning. To be specific, all samples for each view firstly can be classified by using Multi-class Support Vector Machine (MSVM); Then, to evaluate the classification results of different views for each sample, the optimal weight of each sample classification result is learned; Furthermore, considering the view difference with different classification quality, the view weights are assigned adaptively; Finally, we adopt the decision function values to determine the final classification results. Extensive experimental results show that the proposed method outperforms most state-of-the-art multi-view classification methods.

Introduction

In many real-world applications, such as crowd detection and face recognition, data often can be generated from different sources or diverse extractors. This kind of data is called multi-view data in which each representation is deemed to a view [1], [2], [3]. Much multi-view dimensionality reduction [4], [5], classification and clustering [6] performance can be improved by using the complementarity and compatibility among views. Therefore, multi-view learning has caught a lot of attention. For the multi-view dimensionality reduction problem, some researchers proposed some effective approaches. For example, Hou et al. [7] proposed a Multiple View Semi-Supervised Dimensionality Reduction (MVSSDR) method which can learn a hidden consensus pattern in the low dimensional space. In the multi-view clustering, Huang et al. [8] proposed a Partially View-aligned Clustering (PVC) approach to solve the Partially View-aligned Problem (PVP). Moreover, Peng et al. [9] developed a CrOss-view MatchIng Clustering (COMIC) method which can automatically learn the cluster number and introduced parameters. In this paper, we mainly focus on the multi-view classification problem.

According to the difference of statistical distribution, the existing multi-view classification methods can be generally divided into two categories: Semi-supervise and Supervise. To be specific, the semi-supervised multi-view classification approaches mainly include two types: Co-training and Graph-based methods. Co-training [10] is merely applicable to two views. Its operating principle is firstly to train a classifier for each view; Then, adding the most confidently predicted classification results from a classifier to training sets of another classifier; Next, the two classifiers update training data sets each other; Finally, the training is terminated until the setting condition is reached. In addition, many variants based on Co-training are proposed, such as Bayesian Co-training [11] and Co-regularization [12]. However, these approaches exist the limitation that the initial classification results influence the following training. Apart from these, the disagreement-based optimization methods are presented. For instance, Zhou et al. [13] explained that multiple learners are firstly trained for the same tasks, then integrating these base learners to exploit unlabeled samples by virtue of maintaining a large disagreement. While these disagreement-based methods have obtained many successful applications, sometimes the exploitation of unlabeled data for semi-supervised learning may result in performance degeneration.

The other one is graph-based multi-view semi-supervised classification methods. For example, Cai et al. [14] proposed an Adaptive Multi-Model Semi-Supervised classification (AMMSS) algorithm which is used to image classification. Wang et al. [15] presented an optimized multi-graph-based Semi-supervised Learning method which fuses multiple graphs into one; Then, it propagates the labels from the labeled data sets to unlabeled data sets according to the constructed similarity graph. Despite the fact that these graph-based approaches have achieved outstanding performance, the disadvantages are still existing. For instance, firstly, in constructing similarity graphs, the Gaussian kernel is usually used which introduces an extra parameter; Secondly, the graph-based strategies belong to transductive models which need to reconstruct a new similarity matrix when handling the out of samples. To solve these problems, Tao et al. [16] proposed a Multi-View Semi-Supervised Classification via Adaptive Regression (MVAR) method. Moreover, there are several other works which belong to semi-supervised approaches such as co-EMT [17] and Co-Testing [18].

Further, we can find some studies which tackled the multi-view classification problem in the form of supervised learning. The supervised multi-view classification methods attract more and more attention due to the availability of label information. For example, Multiple Kernel Learning (MKL) [19] is developed to cope with multi-view data sets in supervised learning settings. For identifying the Alzheimer’s Disease, Zhu et al. [20] designed two multi-view classification models which are named as Single-direction Mapping Multi-view Learning (SMML) method and Directly Concatenating Multi-view Learning (DCML) approach. Further, in order to explore the complex correlation between the features and class labels about multi-view data sets, Zhang et al. [21] proposed a Multi-Layer Multi-View Classification (ML-MVC) method which can capture the high-order complementarity among different views. Moreover, some multi-view boosting algorithms [22], [23] have been proposed. To consider the difference of each view to classification performance, Yang et al. [24] proposed a multi-view classification method based on adaptive-weighting discriminative regression which can simultaneously consider the correlative and complementary information in the projected discriminative subspace. Observing these given methods, most multi-view classification approaches are based on feature fusion. In this paper, inspired by the ensemble learning, we proposed a new classification method which is named as Multi-view Classification based on Ensemble Learning with Weight Optimization (MCELWO). In order to show the proposed method in detail, Fig.1 gives the framework. The first column denotes the training samples. Here, to easily demonstrate, we select seven classes with thirty-five samples; Then, different features are extracted and each feature is regarded as a view; Next, the initial classification results can be represented in the third column; Finally, to effectively combine the classification results, the optimal weight matrix for each view can be learned which is presented in the fourth column. After training the model, the test data sets are used to measure the performance of the learning model. Moreover, the contributions can be summarized as follows:

  • Improving classification results by transforming the feature-level fusion to decision-level fusion. Multi-view learning is a feature-level based fusion and ensemble learning is a decision-level based fusion. Due to decision-level fusion can deeper excavates the semantic information, it is significant to transform the feature-level fusion to decision-level fusion. To effectively utilize the complementarity among view features to improve classification performance, the proposed algorithm combines multi-view initial classification results. Specifically, each view feature is firstly classified by using Multi-class SVM. Then, the multiple classification results are used to train a classifier for each view feature. Next, the training samples are classified by using the learned voting weight. Finally, the testing samples are put into the learned classifiers to implement the classification task.

  • Assigning different voting weight for each classifier. To make the classifier with better classification performance be assigned bigger weight and the classifier with low-quality classification performance be assigned smaller weight, the strategy about that each view classification result for each sample can be adaptively assigned an optimal weight is adopted by taking the classification differences among different classifiers into consideration. Specifically, the voting weights are learned by making the difference among the voting score of each sample and real ground truth as close as possible.

  • Assigning different weight for each view feature. Considering that the view feature contributions to the final classification results are different, therefore, the proposed algorithm adopts Frobenius form in designing model. Thus, the re-weighted method is used to solve the designed multi-view classification model. In solving process, learned weight of each view feature is inversely proportional to the loss function value. When the loss function value is small, it indicates that the view feature is important to the classification results, therefore, the bigger view weight is assigned and vice versa.

Section snippets

Related work

In this section, we firstly introduce the notations throughout the paper and briefly review Support Vector Machine (SVM) and Least Square Regression (LSR). Then, some relevant regression-based multi-view classification methods are introduced.

Notations

Let X=[x̃1,x̃2,,x̃n]TRn×d in which the n denotes the number of samples and the d represents the number of features. Y=[ỹ1,ỹ2,,ỹn]TRn×C, where ỹi=[ỹi1,ỹi2,,ỹiC]TRC with ỹic=1 if the i-th sample belongs to c-th class, and otherwise ỹ

The proposed method

In this section, we firstly give the formulation of problem, then, the training data sets are classified by using multi-class SVM for each view. Next, to effectively combine the initial classification results, the weight of each classification result from different classifiers can be learned by using the proposed framework. Finally, according to the vote scores, the samples can be assigned to corresponding clusters.

Optimization algorithm

Observing the above proposed model, it is difficult to directly solve. Therefore, we adopt the alternating minimization strategy to settle the problem. The problem(5) can be transformed asminW,M,αv=1Vαv||FvWvT-Y-BM||F2s.t.M0.where αv=12||FvWvT-Y-BM||F.It is clear that the above mentioned problem totally includes three variables. Hence, we solve the one by fixing the other variate.

Experiments

In this section, we firstly give the descriptions about multi-view data sets; Then, some single-view and multi-view classification methods are briefly introduced; Next, to show the feasibility of proposed method, four classification measure indicators are presented; Further, the experiment results are shown and analyzed; Finally, to demonstrate the proposed approach is not only effective but also efficient, the time cost and convergence about all approaches are shown.

Conclusions and discussion

In this paper, we develop a new supervised multi-view classification method. In particular, each sample firstly is classified by multi-class SVM for each view; Then, regression-based loss function is formulated to learn the optimal weights for each initial classification results; Finally, the weighting vote is used to assign the samples to relevant class. Moreover, considering that there is different contributions to classification results for each view, the proposed method can adaptively

CRediT authorship contribution statement

Shaojun Shi: Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Resources, Data curation, Writing - original draft, Writing - review & editing. Feiping Nie: Formal analysis, Funding acquisition, Project administration, Supervision. Rong Wang: Formal analysis, Project administration, Supervision. Xuelong Li: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by Special Construction Fund for Key Disciplines of Shaanxi Provincial Higher Education, and in part by the National Natural Science Foundation of China under Grant 61936014, Grant 61772427 and Grant61751202, and in part by the Fundamental Research Funds for the Central Universities under Grant G2019KY0501.

Shaojun Shi is now working toward her PhD degree in the School of Computer Science and the Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, China. Her research interests include topics in data mining and machine learning.

References (40)

  • Z. Huang, P. Hu, J.T. Zhou, J. Lv, X. Peng, Partially view-aligned clustering, Adv. Neural Inform. Process. Syst....
  • X. Peng et al.

    Comic: Multi-view clustering without parameter selection, in

    International Conference on Machine Learning, PMLR

    (2019)
  • A. Blum et al.

    Combining labeled and unlabeled data with co-training

    (1998)
  • S. Yu et al.

    Bayesian co-training

    J. Mach. Learn. Res.

    (2011)
  • V. Sindhwani, P. Niyogi, M. Belkin, A co-regularization approach to semi-supervised learning with multiple views, in:...
  • Z.H. Zhou et al.

    Semi-supervised learning by disagreement

    Knowl. Inf. Syst.

    (2010)
  • C. Xiao et al.

    Heterogeneous image features integration via multi-modal semi-supervised learning for image categorization, in

    IEEE International Conference on Computer Vision

    (2014)
  • M. Wang et al.

    Unified video annotation via multigraph learning

    IEEE Trans. Circuits Syst. Video Technol.

    (2009)
  • H. Tao et al.

    Scalable multi-view semi-supervised classification via adaptive regression

    IEEE Trans. Image Process.

    (2017)
  • I. Muslea, S. Minton, C.A. Knoblock, Active+ semi-supervised learning= robust multi-view learning, in: ICML, Vvol. 2,...
  • Cited by (2)

    • Weakly Supervised Learning in a Group of Learners with Communication

      2022, 2022 12th International Conference on Computer and Knowledge Engineering, ICCKE 2022

    Shaojun Shi is now working toward her PhD degree in the School of Computer Science and the Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, China. Her research interests include topics in data mining and machine learning.

    Feiping Nie received the Ph.D. degree in Computer Science from Tsinghua University, China in 2009, and currently is full professor in Northwestern Polytechnical University, China. His research interests are machine learning and its applications, such as pattern recognition, data mining, computer vision, image processing and information retrieval. He has published more than 100 papers in the following journals and conferences: TPAMI, IJCV, TIP, TNNLS, TKDE, ICML, NIPS, KDD, IJCAI, AAAI, ICCV, CVPR, ACM MM. His papers have been cited more than 10000 times and the H-index is 57. He is now serving as Associate Editor or PC member for several prestigious journals and conferences in the related fields.

    Rong Wang received the B.S. degree in information engineering, the M.S. degree in signal and information processing, and the Ph.D. degree in computer science from Xi’an Research Institute of Hi-Tech, Xi’an, China, in 2004, 2007 and 2013, respectively. During 2007 and 2013, he also studied in the Department of Automation, Tsinghua University, Beijing, China for his Ph.D. degree. He is currently an associate professor at the School of Cybersecurity and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, China. His research interests focus on machine learning and its applications.

    Xuelong Li is a full professor with School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an 710072, P.R. China. He is a fellow of the IEEE.

    View full text