Partial multi-label learning with mutual teaching
Introduction
Multi-label learning (MLL) tackles a particular learning problem with accurate supervision, where each object is associated with multiple relevant class labels simultaneously [1]. As this learning framework can handle the object with rich semantic information, a lot of recent works have witnessed the success of MLL in many real-world scenarios, such as bioinformatics [2], image annotation [3], and document categorization [4].
Most existing MLL studies rely on a common assumption that each training instance has been precisely annotated with all of its relevant labels. However, in many real-world scenarios, this assumption hardly holds since it is difficult and costly to assign each instance with fully accurate labels. Partial multi-label learning (PML), which handles the inaccurate supervision problem where each training instance is associated with a set of candidate labels, naturally arises in many real-world applications due to its effectiveness of reducing the annotation cost significantly. For example, in crowdsourcing image annotation (as shown in Fig. 1), the union of the annotations collected from multiple potential crowdsourcing annotators forms the candidate label set, which could be noisy due to the potential unreliable annotators. Compared with the standard MLL task, the PML task is more challenging since the ground-truth labels (in black color) as well as the false positive labels (in red color) are concealed in the candidate label set.
To deal with the PML problem, a straightforward strategy is to simply regard the original PML task as the standard MLL task by treating all the candidate labels as ground-truth ones. Then the PML problem can be easily solved by any off-the-shelf MLL algorithms, such as BR (Binary Relevance) [5], ML-KNN [6], RAkEL [7], and so on. However, this strategy may not generalize well on future multi-label data since the false positive labels concealed in the candidate label set will mislead the training procedure.
To overcome the above problem, an intuitive method aims at handling noisy label by disambiguation, i.e., identifying the correct labels from the candidate label set. A recent attempt in [8] tries to recover the ground-truth label information from the provided candidate label set by introducing label confidence. The label confidence and the predictive model are optimized alternatively by minimizing a confidence weighted ranking loss between the candidate and non-candidate labels. Although some promising results can be achieved, this work might be suboptimal since the incorrectly updated label confidence would in turn impair the performance of predictive model. In this way, the recovered label confidence would be error-prone especially when the false positive labels dominate. In addition to the above attempt, some researchers leverage the low-rank assumption to disambiguate false positive labels by conducting sparse matrix decomposition [9], [10]. Another recent work in [11] tires to handle PML problem by employing a credible label elicitation strategy. It first recovers the label confidence of each candidate label by using the iterative label propagation strategy [12]. Then, the credible label elicitation strategy is adopted to identify the ground-truth labels according to the recovered label confidence which can be used to induce a predictive model. However, the selected credible labels may be unreliable owning to the cumulative error induced in the propagation process, which would consequently degrade the performance of PML, especially when the number of false positive labels becomes larger.
In this paper, we propose a simple yet effective model called PML-MT (Partial Multi-label Learning with Mutual Teaching), which effectively performs label confidence refinement by optimizing the couple prediction networks under the supervision of the iteratively refined label confidence in a mutual teaching manner. Specially, the proposed PML-MT model provides more reliable label confidence to simultaneously train two prediction networks in a mutual teaching manner. To avoid the training error amplification, a self-ensemble teacher network of each prediction network is introduced to refine the label confidence for supervising the other prediction network in a collaborative training manner. Furthermore, PML-MT utilizes the refined label confidences from the couple teacher networks to explore the label correlations. In addition, we further leverage a co-regularization term to reduce the diversity of the two prediction networks by maximizing the agreement on their predictions. Finally, we conduct extensive experiments on real-world and synthesized datasets under the PML setting. The empirical results show that the proposed PML-MT model yields the state-of-the-art PML performance.
The rest of this paper is organized as follows. Firstly, related works on partial multi-label learning are briefly reviewed in Section 2. Secondly, technical details of the proposed approach are introduced in Section 3. Thirdly, the results of the comparison experiments are reported in Section 4. Finally, we conclude this paper in Section 6.
Section snippets
Related work
Partial multi-label learning (PML) [8], [11], [13], [14], [15] is a weakly supervised framework to tackle the problem of multi-label learning with partial labels. It is the combination of two prevalent learning frameworks, i.e., multi-label learning [1], [16] and partial label learning [17], [18], [19], [20], [21], [22], [23].
The proposed approach
In this section, we introduce our proposed PML model called PML-MT (Partial Multi-label Learning with Mutual Teaching). We begin with introducing some basic notations. Given a PML training set , where denotes the d-dimensional input feature vector for the th instance, is the corresponding assigned label indicator vector for the th instance. The labels assigned with 1 in form the candidate label set , which is usually noisy and may contain the ground truth
Datasets
To thoroughly evaluate the performance of comparison methods, we perform experiments on totally ten datasets including synthetic as well as real-wold PML datasets. These datasets spanned a broad range of applications: music_emotion for music recognition; image, scene, corel5k and mirflickr for image annotation; enron, eurlex_dc, eurlex_sm, delicious and tmc2007 for text categorization. For these ten datasets, two of them are real-world PML datasets including music_emotion and mirflickr [44].
The PL-MT approach
As discussed in Section 2, PLL has some connections with PML, but addresses different problems. However, the proposed PML-MT model can be also applied on partial label learning problems easily by dropping the label correlations learning term. This extension leads to the following degraded version of PML-MT, which we denote as PL-MT. By dropping the label correlations learning term, we can obtain the overall training for the PL-MT model: where is the trade-off
Conclusion
In this paper, we propose a novel mutual teaching model, PML-MT, for partial multi-label learning. Specially, the proposed PML-MT model provides a reliable label confidence matrix in an iterative learning manner by consulting a couple of teacher networks. With the refined label confidence matrix, two same prediction networks are induced simultaneously in a mutual teaching manner. Besides, we propose a novel regularization term to exploit the label correlations from the outputs of the couple
CRediT authorship contribution statement
Yan Yan: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft. Shining Li: Resources, Supervision, Project administration, Funding acquisition. Lei Feng: Conceptualization, Methodology, Formal analysis, Investigation, Writing - review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant 61872434, the National Key Scientific Research Project of China under Grant No. MJ-2018-S-33, and National Key R&D Program of China under Grant No. 2018YFB1004803.
References (56)
- et al.
Learning multi-label scene classification
Pattern Recognit.
(2004) - et al.
Ml-knn: A lazy learning approach to multi-label learning
Pattern Recognit.
(2007) - et al.
A review on multi-label learning algorithms
IEEE Trans. Knowl. Data Eng.
(2013) - et al.
Multilabel neural networks with applications to functional genomics and text categorization
IEEE Trans. Knowl. Data Eng.
(2006) - et al.
Multi-label image recognition with graph convolutional networks
- et al.
Semantic-unit-based dilated convolution for multi-label text classification
(2018) - et al.
Random k-labelsets for multilabel classification
IEEE Trans. Knowl. Data Eng.
(2010) - et al.
Partial multi-label learning
- et al.
Partial multi-label learning by low-rank and sparse decomposition
- et al.
Feature-induced partial multi-label learning
Partial multi-label learning via credible label elicitation
Learning with local and global consistency
Adversarial partial multi-label learning
Partial multi-label learning via probabilistic graph matching mechanism
Collaboration based multi-label learning
Learning from partial labels
Mach. Learn. Res.
Gm-pll: Graph matching based partial label learning
IEEE Trans. Knowl. Data Eng.
Hera: Partial label learning by combining heterogeneous loss with sparse and low-rank regularization
ACM Trans. Interact. Intell. Syst.
Partial label learning with batch label correction
Progressive identification of true labels for partial-label learning
Provably consistent partial-label learning
Multi-level generative models for partial label learning with non-random label noise
Multi-label learning with label-specific feature selection
Multilabel classification via calibrated label ranking
Mach. Learn.
Multi-labelled classification using maximum entropy method
Estimating latent relative labeling importances for multi-label learning
Online multi-label dependency topic models for text classification
Mach. Learn.
Cited by (18)
Partial multi-label learning via robust feature selection and relevance fusion optimization
2024, Knowledge-Based SystemsPartial multi-label learning via three-way decision-based tri-training
2023, Knowledge-Based SystemsPartial label learning: Taxonomy, analysis and outlook
2023, Neural NetworksDLSA: Semi-supervised partial label learning via dependence-maximized label set assignment
2022, Information SciencesCitation Excerpt :The most common method for SPL problems is to fit off-the-shelf PL techniques to all partially labeled examples. Existing PL techniques focus on manipulating the label space through candidate label disambiguation [10,35,41]. For identification-based disambiguation techniques, the valid label is regarded as a latent variable and identified through an iterative refining procedure [4].
Partial multi-label learning based on sparse asymmetric label correlations
2022, Knowledge-Based SystemsCitation Excerpt :Therefore, the PML is converted to a feature completion problem and the multi-label classification model with the completed features and the candidate labels is directly trained. PML-MT (Partial multi-label Learning with Mutual Teaching) [44] refines the label confidence matrix iteratively with a couple of self-ensemble teacher works and trains two prediction networks simultaneously. End-to-end learning-based PML methods fuse label disambiguation and model induction with iterative optimization, which is simple and direct.