Elsevier

Knowledge-Based Systems

Volume 178, 15 August 2019, Pages 1-10
Knowledge-Based Systems

Multi-label learning with kernel extreme learning machine autoencoder

https://doi.org/10.1016/j.knosys.2019.04.002Get rights and content

Abstract

In multi-label learning, in order to improve the accuracy of classification, many scholars have considered the relationship between features and features, features and labels or labels and labels, but how to combine the correlation among them is rarely studied. Based on this, this paper proposes a multi-label learning algorithm with kernel extreme learning machine autoencoder. Firstly, the label space is reconstructed by using the non-equilibrium labels completion method in the label space. Then, the non-equilibrium labels space information is added to the input node of the kernel extreme learning machine autoencoder network, and the input features are output as the target. Finally, the kernel extreme learning machine is used for classification. Our method implements the information fusion between features and features, between labels and features, and between labels and labels. Compared with the traditional autoencoder network, the extreme learning machine autoencoder has no iterative process, which reduces the network training time and improves the classification accuracy. The experimental results of the proposed algorithm in the opening benchmark multi-label data sets show that the KELM-AE algorithm has some advantages over other comparative multi-label learning algorithms and the statistical hypothesis testing and stability analysis further illustrate the effectiveness of the proposed algorithm.

Introduction

In multi-label learning [1], a single instance is associated with multi-label, and a valid model is trained through the training set to effectively predict the set of labels belonging to unknown instances. Many scholars have proposed a lot of multi-label learning algorithms. For example, the BR [2] (binary relevance) algorithm, the LP (label power set) algorithm, etc., the methods solve the multi-label problem by increasing the number of classifiers or the types of the label but affect the efficiency of the classifier to some extent. Back-propagation for multi-label learning BP-MLL [3] (rank-propagation for multi-label learning) introduces the ranking loss factor and the MLKNN [4] algorithm used the maximized a posteriori probability (MAP) to solve the multi-label learning prediction problem, the performance of them increases the complexity of its calculation although the classification was improved.

The relationship between labels in the real world are often not independent of each other, and there is a certain correlation between them. For the correlation between labels, many scholars have proposed the correlation algorithm and achieved good results. For example, the RankSVM [5] uses the maximum interval criteria strategy to adapt to multi-label learning. During the modeling process, the SVM classifier is constructed for the sorting loss between relevant labels and irrelevant labels corresponding to samples. But the time consumption is relatively large because a large number of variables need to calculate.

At the same time, as an effective measure of uncertainty, information entropy [6] and other relevant information theory have been widely used in the research of label correlation. Based on this theory, Zhang [7] et al. proposed a multi-label classification algorithm based on correlation information entropy. On the basis of the RAkEL(random k-label sets) algorithm, it used the relevant information entropy to measure the correlation between the labels to improve the performance of multi-label classification. Lee [8] et al. proposed a new multi-label learning method based on the CC(classifier chains) algorithm. Using the directed acyclic graph to model the correlation of labels, and using conditional entropy to design a multi-label learning method to maximize the correlation between labels, these methods achieved good results. It has achieved better results by using the information entropy theory to measure the correlation between labels. However, these methods basically only calculated the mutual information between two labeled labels and then measure the interaction between labeled labels by mutual information. It can be seen that using this kind of basic label confidence matrix to measure the relevant information between labels only consider the mutual influence between labeled objects but ignoring the influence of the labeled of unlabeled labels on the quality of label sets and the impact of known labels on unlabeled labels.

Besides, a method of reconstructing information of a feature space using label information was also widely used. The LIFT [9] method first used the K-means clustering algorithm to cluster the positive and negative examples of each labels and calculated the distance between the sample and the cluster center to generate the exclusive features of each label, thus obtained a new training set. Based on this training set, binary relevance classification learning was performed for each labels. However, the LIFT method did not consider labels correlation, some scholars have proposed a joint learning of label-specific features and label correlations. Zhang [10] et al. account the correlation among labels by constructing additional features. Huang et al. [11] proposed to learn the label-specific features and shared features by using pairwise label correlations to distinguish each category labels, and then constructed a multi-label classifier on the low-dimensional data representations composed of these learned features. Zhang [12] et al. proposed a multi-label learning with feature-induced labeling information enrichment(MLFE), which changed the structural information in the feature space by enriching the label information of the multi-label samples. Based on tailored multiple regression method, the classification effect of the algorithm can be improved with rich labels information from the training samples. In the multi-label learning data set, the number of labels in the label data set is generally large, but the average number of labels and the labels density are not high for each object. This phenomenon is also consistent with common sense: the labeled labels of an object should not be larger than the unlabeled labels, otherwise, the multi-label of the object will lose its meaning. It is undeniable that there may be a lot of valuable information in the unlabeled labels, just as it is about the abnormal research. For the purpose, a method of non-equilibrium labels completion is introduced to describe the relationship between labels.

It is not difficult to find that the classification performance of the algorithm is improved by the construction of the features with the labels and the labels-to-labels relationship. In recent years, a large number of unsupervised learning methods have been applied in the field of data mining. Based on graph basis system (GBS) [13] multi-view clustering method was proposed to tackle the limitations of the existing graph-based. Further, a clustering method based on the local linear embedding(LLE) and Laplace feature mapping(LEE) method(L3E-M2VC) [14] were proposed to deal with multi-task multi-view problems. The Autoencoder neural network [15] was an unsupervised learning paradigm that automatically learns features from unlabeled data. Autoencoder has been widely used in image classification [16]. The autoencoder neural network was a class of models which aim to map the input to a latent space and map it back to the original space, with low reconstruction error as its objective. But the current training process of autoencoder neural network involved lots of iterations. The ELM algorithm, which was proposed by Huang [17], [18] as a simple and efficient single hidden layer feed-forward neural network learning algorithm, does not need any iterative adjustment to the network weight and bias in the training process. Compared with the traditional neural network algorithm, its training speed is fast. In this regard, L.L.C. Kasun [19] et al. put forward an ELM-AE classification algorithm which was a novel method of neural network. ELM-AE can reproduce the input signal as well as autoencoder. Based on this, this paper proposes a kernel extreme learning machine autoencoder for multi-label learning algorithm(KELM-AE). We use a two-layer KELM module as the base model and the first KELM as a autoencoder block and adds labels node information in the input layer, and the output layer outputs features that contain feature and labels relationships. The second KELM model serves as classification module, is used during the classification process while the labels space uses the non-equilibrium labels completion matrix algorithm. The experimental and statistical hypothesis testing of the algorithm on multiple published multi-label data sets proves that the algorithm has a certain validity, and it is also confirmed that the combination of feature space reconstruction and label correlation can improve the rationality of algorithm performance.

The rest of the paper is organized as follows. Section 2 gives some basic notions related to multi-label learning and the rough entropy. Section 3 introduces the modeling of the non-equilibrium matrix. Section 4 introduces the modeling of KELM. Our proposed method for the multi-label classification of KELM-AE is proposed in Section 5. In Section 6, experimental results of the KELM-AE in opening multi-label data sets shows that our algorithm is effective and statistical hypothesis tests further prove our method in Section 6 too. In the last section, we sum up what has been discussed and point out further research.

Section snippets

The multi-label learning and traditional entropy

Let X=[x1,,xN]TRN×d be the d-dimensional input feature space, where N denotes the number of samples. xiRd denotes the feature vector corresponding to the ith sample; Y=[y1,,yN]TRN×k denotes the label matrix corresponding to the sample, where k denotes the number of labels in the data; yi={1,1}k denotes the binary label indicator vector corresponding to the ith sample. Therefore, the multi-label training data set containing N samples is: D={xi,Yi1iN}Rd×{+1,1}k

Definition 1

[6], [7], [20]

Suppose the set A={a1,,am}

The modeling of the non-equilibrium label completion matrix

The number of unlabeled items of a sample in the real world is much larger than that of annotated ones, as seen in the example that a picture with known labels including greenmountains and clearwater is more probable to contain unlabeled forests, rather than unlabeled deserts or sea. We have found in many cases that researchers calculate the conditional information between annotated and unlabeled elements in each label set of the sample by applying Eq. (4), to obtain the basic label confidence

The theory of kernel extreme learning machine

The ELM algorithm is an effective single-hidden layer feed forward neural networks learning algorithm. The learning parameters of the hidden layer in the ELM algorithm network structure are randomly selected that only necessary to set the number of hidden layer network neurons. Finally, the output weight of the hidden layer is obtained by the least squares method, and no iteration is required for the network weight and offset in the process. Therefore, compared with the traditional neural

The description of the experimental data sets

In order to illustrate the effectiveness of the algorithm KELM-AE, we choose 14 sets of data sets such as Emotions, Natural scene and Yeast 3 Mulan datasets and 11 sets of Yahoo Web Pages. The Mulan datasets is from http://mulan.sourceforge.net/datasets-mlc.html. The Yahoo Web Pages datasets is from http://www.kecl.ntt.co.jp/as/members/ueda/yahoo.tar. The specific description is shown in Table 1.

The experimental environment and evaluation indicators

The experiment is conducted on a computer equipped with Windows 7 Operation System, Intel

The stability analysis and statistical hypothesis test

In order to further illustrate the effectiveness of the proposed method, the stability analysis and hypothesis testing of the algorithm are carried out based on the experimental results.

Conclusion

In multi-label classification learning, it is very important to study the correlation between feature information and labels in multi label learning. In this paper we propose a Kernel Extreme Learning Machine AutoEncoder algorithm, which use the kernel extreme learning machine to autoencoder the fuzzy associations between the features in the input space, and use the non-equilibrium labels completion algorithm to add the correlation between the labels in the labels space. Therefore, the relevant

Acknowledgments

This research is supported by the Natural Science Foundation of Higher Education of Anhui Province, China (No. KJ2017A177), and Program for Innovative Research Team of Anqing Normal University, China .

References (27)

  • ElisseeffAndré et al.

    A kernel method for multi-labelled classification

  • ShannonClaude Elwood

    A mathematical theory of communication

    Bell Syst. Tech. J.

    (1948)
  • ZhangZhenhai et al.

    A multi-label classification algorithm using correlation information entropy

    J. Northwest. Polytech. Univ.

    (2012)
  • Cited by (0)

    View full text