Consistency and diversity neural network multi-view multi-label learning
Introduction
Multi-label(ML) learning solves the problem of label ambiguity by associating a single instance with a set of labels. For example, a picture can be labeled as the “lake”, “mountain”, and “forest” at the same time, and there may be a strong correlation among these labels. The main task of ML learning is to build a learning model, which can effectively predict the possible label set of unknown objects. In the past few decades, scholars have proposed many effective ML learning algorithms in various domains, such as image annotation [1], [2], video annotation [3], and bioinformatics [4], [5] etc.
Traditional ML learning is to learn knowledge from a single data structure. However, in the real world, due to the increasing diversity of data collection and feature extraction methods, a single example has multiple views. This type of data is usually associated with numerous heterogeneous feature representations simultaneously, and each feature representation provides a different view of the data [6], [7], [8], [9]. For example, in image classification, natural scene images can usually be reproduced by visual features or described by a specific text. The main challenge of this type of task is how to effectively learn the heterogeneity among multiple views while accurately classifying data. Therefore, in the face of more complex data classification problems in real scenes, a multi-view multi-label(MVML) learning framework has emerged.
Due to the widespread existence of MVML datasets, MVML learning has become an active research area in many practical applications [10], [11]. In MVML learning, each instance is represented by multiple heterogeneous feature data, which is also associated with multiple class labels. Both solutions based on multi-view(MV) learning or ML learning have their fundamental problems to be resolved. The main issues to be solved urgently in the method that focuses on MV learning are:
- 1.
There should be consistent information representation among different views, so how to effectively solve the consistency problem among mining views and fusing the correlation between high-dimensional heterogeneous data is of paramount importance.
- 2.
Information observed among views is different, so there is individual diversity in the information obtained. Individual diversified information mining contributes to enhancing communication among views, thereby improving the performance of the algorithm.
- 3.
The structural differences of the data among the various views lead to different importance of each view, so the contribution degree of each view is also different.
Based on these problems, we divide these algorithms into two types. The first type of method is a two-step learning strategy. The first step directly uses the MV learning method to solve the MV learning problem. The second step utilizes the existing ML learning method to solve the ML learning problem. For example, Liu et al. [12] proposed a multi-view framework lrMMC based on matrix decomposition. The framework first seeks the shared representation of multiple views and then completes the classification based on the matrix of the shared feature space. Furthermore, [11] maps each view to a shared space to eliminate noise and redundancy while maintaining the sparse and manifold structure of the image data, respectively. This two-step learning strategy learning often results in sub-optimal results.
The second type of method is the joint learning strategy. They established a unified, MVML learning model to solve the problem. For example, Zhao et al. [13] introduced a predictive reliability measurement method to select samples for sharing labels to share information with other opinions in a co-training manner; Luo et al. [14] jointly extract the consistency and specificity of heterogeneous features for subspace learning; Zhang et al. [15] proposed an MVML method based on matrix factorization, which uses the complementarity among different views to obtain a common semantic representation. The complexity of this type of method is high, but the performance of the model has been significantly improved.
The main problems that need to be solved urgently in the method that focuses on ML learning are:
- 1.
In the face of high-dimensional heterogeneous data, how to effectively predict the label set of unknown instances;
- 2.
Effectively fusion of information among each view and mining label correlation to improve the performance of the classifier.
Based on this, some scholars have proposed solutions. One type of method is tantamount to connect all the data views in series to one data view, and then use the ML learning method to solve it. However, this concatenation strategy has the following problems: it ignores the different physical interpretations of the view data features; the concatenation strategy causes the feature dimension of the data to be too large, and the model training will overfit when the training; the other type establishes an ML classification model for each MV heterogeneous data, and unify the results of these models to construct the final prediction model. For example, Ren et al. [16] fuse multiple views into a mixed feature matrix and use low-rank structure and manifold regularization to utilize global label correlation and local smoothness. Nevertheless, parallel strategy forces all views to output consistent results, ignoring the diversity among different views. Another common problem in MVML learning is that there is an individual difference in the degree of contribution among other views.
Based on the above analysis, it can be seen that the current MVML learning mainly faces the following challenges: (1) the problems of consistency, diversity, and contribution degree among views. (2) the problem of label correlation in ML learning. Some existing learning methods that adopt step-by-step strategies ignore the information exchange between MV and ML when solving MVML problems, so they often get sub-optimal results. However, the learning method that adopts a unified strategy will have excessive model complexity or insufficient consideration of critical issues. To effectively solve the current problems facing MVML learning, a consistency and diversity neural network multi-view multi-label learning method was proposed and named CDMM. First, we design a random single-hidden layer feedforward neural network(SLFN) to perform ML learning for each view to ensure the consistency of all views. Then, we use the Hilbert–Schmidt Independence Criterion(HSIC) [17], [18], [19] to induce diversity among different views to learn the diversity information represented by different views. Finally, in the classification, we combine the advantages of the proposed classifier to enrich the original label space with the idea of label dependence propagation; in the prediction, we make the final prediction according to the different effects of each view on the contribution of the MVML learning task.
The main contributions of the paper are as follows:
- 1.
An MVML feedforward neural network model is constructed, different from the traditional neural network model. The CDMM does not require iteration, it is efficient and straightforward, and its parts are closely integrated.
- 2.
CDMM has a unified framework to jointly study the view consistency and diversity issues in MVML learning, and at the same time, integrate the correlation of labels and different contributing factors of views into the classification and prediction models. In addition, a similar ensemble learning method is used to predict the final model.
- 3.
A large number of empirical results of CDMM on the benchmark data set proves that it has certain advantages compared with some related and competitive methods.
The remainder of this paper is organized as follows. In Section 2, the related work of MVML learning is briefly introduced. Section 3 introduces the technical details of CDMM. The results of comparative experiments and specific analyses are illustrated in Section 4. Finally, Section 5 concludes this paper.
Section snippets
Multi-label learning
The difference from traditional single-label learning tasks is that the goal of ML learning is to assign multiple class labels for a single instance, which has attracted the attention and research of a large number of scholars in different machine learning tasks. According to different types of label correlations used, the existing ML methods can usually be divided into three categories. First-order: Consider that each label has its unique attributes and ignores the correlation among labels,
Problem statement and notations
Let represents a feature space dataset with views, where represents the entire feature space of the th view with samples. represents the corresponding label space, where is the label vector of , and represents the number of labels. Table 1 summarizes the definitions of some notations used in this paper.
Label matrix reconstruction
Label correlation is a crucial factor to improve the performance of ML learning, so in this section, we embed the label
Datasets
To verify the effectiveness of the CDMM algorithm, we use a total of 7 benchmark MVML data sets for performance evaluation, which can be download from MULAN1 and LEAR.2 The details of the data sets are summarized in Table 2.
Comparing algorithms
To verify the performance of CDMM, we selected three state-of-the-art MVML learning algorithms and one incomplete view MVML weak-label learning algorithm
Conclusion
In this paper, we have studied how to mine the information of view-consistency and view-diversity in multi-view data to achieve effective multi-view multi-label classification. For this reason, a multi-view multi-label learning framework called CDMM is proposed. It uses a unified feedforward neural network model to find out the consistency and diversity among heterogeneous views and additionally considers label correlation factors and view contribution factors. The difference from previous
CRediT authorship contribution statement
Dawei Zhao: Conceptualization, Methodology, Software, Investigation, Data curation, Writing - original draft, Writing - review & editing. Qingwei Gao: Validation, Supervision, Project administration, Funding acquisition. Yixiang Lu: Visualization, Investigation. Dong Sun: Formal analysis, Validation. Yusheng Cheng: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is supported by Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education (Anhui University), China (2020A003), and the Nature Science Foundation of Anhui (2008085MF183).
References (48)
- et al.
Learning multi-label scene classification
Pattern Recognit.
(2004) - et al.
A study of graph-based system for multi-view clustering
Knowl.-Based Syst.
(2019) - et al.
Joint multi-view representation and image annotation via optimal predictive subspace learning
Inform. Sci.
(2018) - et al.
Multi-view label embedding
Pattern Recognit.
(2018) - et al.
A subspace co-training framework for multi-view clustering
Pattern Recognit. Lett.
(2014) - et al.
ML-KNN: A lazy learning approach to multi-label learning
Pattern Recognit.
(2007) - et al.
Multi-label learning with label-specific feature reduction
Knowl.-Based Syst.
(2016) - et al.
Joint multi-label classification and label correlations with missing labels and feature selection
Knowl.-Based Syst.
(2019) - et al.
Improving multi-label classification with missing labels by learning label-specific features
Inform. Sci.
(2019) - et al.
Weakly-supervised multi-label learning with noisy features and incomplete labels
Neurocomputing
(2020)