Elsevier

Knowledge-Based Systems

Volume 212, 5 January 2021, 106624
Knowledge-Based Systems

Partial multi-label learning with mutual teaching

https://doi.org/10.1016/j.knosys.2020.106624Get rights and content

Abstract

Partial Multi-label Learning (PML) tackles the problem where each training instance is associated with a set of candidate labels that include both the relevant ground-truth labels and irrelevant false positive labels. Most of the existing PML methods try to iteratively update the confidence of each candidate label, while the estimated label confidence may be not reliable due to the cumulative error induced in the confidence updating process, especially when false positive labels dominate. In this paper, we propose a simple yet effective model called PML-MT (Partial Multi-label Learning with Mutual Teaching), in which a couple of prediction networks as well as the corresponding teacher networks are adopted to learn collaboratively and teach each other throughout the training process. Specially, the proposed PML-MT model iteratively refines the label confidence matrix through a couple of self-ensemble teacher networks and trains two prediction networks simultaneously in a mutual teaching manner. Moreover, we propose a novel regularization term to further exploit label correlations from the outputs of the prediction networks under the supervision of the refined label confidence matrix. In addition, a co-regularization term is introduced to maximize the agreement on the outputs of the couple prediction networks, so that the predictions of each network would be more reliable. Extensive experiments on synthesized and real-world PML datasets demonstrate that the proposed approach outperforms the state-of-the-art counterparts.

Introduction

Multi-label learning (MLL) tackles a particular learning problem with accurate supervision, where each object is associated with multiple relevant class labels simultaneously [1]. As this learning framework can handle the object with rich semantic information, a lot of recent works have witnessed the success of MLL in many real-world scenarios, such as bioinformatics [2], image annotation [3], and document categorization [4].

Most existing MLL studies rely on a common assumption that each training instance has been precisely annotated with all of its relevant labels. However, in many real-world scenarios, this assumption hardly holds since it is difficult and costly to assign each instance with fully accurate labels. Partial multi-label learning (PML), which handles the inaccurate supervision problem where each training instance is associated with a set of candidate labels, naturally arises in many real-world applications due to its effectiveness of reducing the annotation cost significantly. For example, in crowdsourcing image annotation (as shown in Fig. 1), the union of the annotations collected from multiple potential crowdsourcing annotators forms the candidate label set, which could be noisy due to the potential unreliable annotators. Compared with the standard MLL task, the PML task is more challenging since the ground-truth labels (in black color) as well as the false positive labels (in red color) are concealed in the candidate label set.

To deal with the PML problem, a straightforward strategy is to simply regard the original PML task as the standard MLL task by treating all the candidate labels as ground-truth ones. Then the PML problem can be easily solved by any off-the-shelf MLL algorithms, such as BR (Binary Relevance) [5], ML-KNN [6], RAkEL [7], and so on. However, this strategy may not generalize well on future multi-label data since the false positive labels concealed in the candidate label set will mislead the training procedure.

To overcome the above problem, an intuitive method aims at handling noisy label by disambiguation, i.e., identifying the correct labels from the candidate label set. A recent attempt in [8] tries to recover the ground-truth label information from the provided candidate label set by introducing label confidence. The label confidence and the predictive model are optimized alternatively by minimizing a confidence weighted ranking loss between the candidate and non-candidate labels. Although some promising results can be achieved, this work might be suboptimal since the incorrectly updated label confidence would in turn impair the performance of predictive model. In this way, the recovered label confidence would be error-prone especially when the false positive labels dominate. In addition to the above attempt, some researchers leverage the low-rank assumption to disambiguate false positive labels by conducting sparse matrix decomposition [9], [10]. Another recent work in [11] tires to handle PML problem by employing a credible label elicitation strategy. It first recovers the label confidence of each candidate label by using the iterative label propagation strategy [12]. Then, the credible label elicitation strategy is adopted to identify the ground-truth labels according to the recovered label confidence which can be used to induce a predictive model. However, the selected credible labels may be unreliable owning to the cumulative error induced in the propagation process, which would consequently degrade the performance of PML, especially when the number of false positive labels becomes larger.

In this paper, we propose a simple yet effective model called PML-MT (Partial Multi-label Learning with Mutual Teaching), which effectively performs label confidence refinement by optimizing the couple prediction networks under the supervision of the iteratively refined label confidence in a mutual teaching manner. Specially, the proposed PML-MT model provides more reliable label confidence to simultaneously train two prediction networks in a mutual teaching manner. To avoid the training error amplification, a self-ensemble teacher network of each prediction network is introduced to refine the label confidence for supervising the other prediction network in a collaborative training manner. Furthermore, PML-MT utilizes the refined label confidences from the couple teacher networks to explore the label correlations. In addition, we further leverage a co-regularization term to reduce the diversity of the two prediction networks by maximizing the agreement on their predictions. Finally, we conduct extensive experiments on real-world and synthesized datasets under the PML setting. The empirical results show that the proposed PML-MT model yields the state-of-the-art PML performance.

The rest of this paper is organized as follows. Firstly, related works on partial multi-label learning are briefly reviewed in Section 2. Secondly, technical details of the proposed approach are introduced in Section 3. Thirdly, the results of the comparison experiments are reported in Section 4. Finally, we conclude this paper in Section 6.

Section snippets

Related work

Partial multi-label learning (PML) [8], [11], [13], [14], [15] is a weakly supervised framework to tackle the problem of multi-label learning with partial labels. It is the combination of two prevalent learning frameworks, i.e., multi-label learning [1], [16] and partial label learning [17], [18], [19], [20], [21], [22], [23].

The proposed approach

In this section, we introduce our proposed PML model called PML-MT (Partial Multi-label Learning with Mutual Teaching). We begin with introducing some basic notations. Given a PML training set D={(xi,yi)}i=1n, where xiRd denotes the d-dimensional input feature vector for the ith instance, yi{0,1}L is the corresponding assigned label indicator vector for the ith instance. The labels assigned with 1 in yi form the candidate label set Si, which is usually noisy and may contain the ground truth

Datasets

To thoroughly evaluate the performance of comparison methods, we perform experiments on totally ten datasets including synthetic as well as real-wold PML datasets. These datasets spanned a broad range of applications: music_emotion for music recognition; image, scene, corel5k and mirflickr for image annotation; enron, eurlex_dc, eurlex_sm, delicious and tmc2007 for text categorization. For these ten datasets, two of them are real-world PML datasets including music_emotion and mirflickr [44].

The PL-MT approach

As discussed in Section 2, PLL has some connections with PML, but addresses different problems. However, the proposed PML-MT model can be also applied on partial label learning problems easily by dropping the label correlations learning term. This extension leads to the following degraded version of PML-MT, which we denote as PL-MT. By dropping the label correlations learning term, we can obtain the overall training for the PL-MT model: L(Θ1,Θ2)=Lc1+Lc2+λLawhere λ is the trade-off

Conclusion

In this paper, we propose a novel mutual teaching model, PML-MT, for partial multi-label learning. Specially, the proposed PML-MT model provides a reliable label confidence matrix in an iterative learning manner by consulting a couple of teacher networks. With the refined label confidence matrix, two same prediction networks are induced simultaneously in a mutual teaching manner. Besides, we propose a novel regularization term to exploit the label correlations from the outputs of the couple

CRediT authorship contribution statement

Yan Yan: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft. Shining Li: Resources, Supervision, Project administration, Funding acquisition. Lei Feng: Conceptualization, Methodology, Formal analysis, Investigation, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61872434, the National Key Scientific Research Project of China under Grant No. MJ-2018-S-33, and National Key R&D Program of China under Grant No. 2018YFB1004803.

References (56)

  • BoutellM.R. et al.

    Learning multi-label scene classification

    Pattern Recognit.

    (2004)
  • ZhangM.-L. et al.

    Ml-knn: A lazy learning approach to multi-label learning

    Pattern Recognit.

    (2007)
  • ZhangM.-L. et al.

    A review on multi-label learning algorithms

    IEEE Trans. Knowl. Data Eng.

    (2013)
  • ZhangM.-L. et al.

    Multilabel neural networks with applications to functional genomics and text categorization

    IEEE Trans. Knowl. Data Eng.

    (2006)
  • ChenZ.-M. et al.

    Multi-label image recognition with graph convolutional networks

  • LinJ. et al.

    Semantic-unit-based dilated convolution for multi-label text classification

    (2018)
  • TsoumakasG. et al.

    Random k-labelsets for multilabel classification

    IEEE Trans. Knowl. Data Eng.

    (2010)
  • XieM.-K. et al.

    Partial multi-label learning

  • SunL. et al.

    Partial multi-label learning by low-rank and sparse decomposition

  • YuG. et al.

    Feature-induced partial multi-label learning

  • FangJ.-P. et al.

    Partial multi-label learning via credible label elicitation

  • ZhouD. et al.

    Learning with local and global consistency

  • YanY. et al.

    Adversarial partial multi-label learning

    (2019)
  • LyuG. et al.

    Partial multi-label learning via probabilistic graph matching mechanism

  • G. Lyu, S. Feng, Y. Li, Noisy label tolerance: A new perspective of partial multi-label learning, Inf. Sci. 543,...
  • FengL. et al.

    Collaboration based multi-label learning

  • CourT. et al.

    Learning from partial labels

    Mach. Learn. Res.

    (2011)
  • LyuG. et al.

    Gm-pll: Graph matching based partial label learning

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • LyuG. et al.

    Hera: Partial label learning by combining heterogeneous loss with sparse and low-rank regularization

    ACM Trans. Interact. Intell. Syst.

    (2020)
  • YanY. et al.

    Partial label learning with batch label correction

  • LvJ.-Q. et al.

    Progressive identification of true labels for partial-label learning

  • FengL. et al.

    Provably consistent partial-label learning

  • YanY. et al.

    Multi-level generative models for partial label learning with non-random label noise

    (2020)
  • YanY. et al.

    Multi-label learning with label-specific feature selection

  • FürnkranzJ. et al.

    Multilabel classification via calibrated label ranking

    Mach. Learn.

    (2008)
  • ZhuS. et al.

    Multi-labelled classification using maximum entropy method

  • HeS. et al.

    Estimating latent relative labeling importances for multi-label learning

  • BurkhardtS. et al.

    Online multi-label dependency topic models for text classification

    Mach. Learn.

    (2018)
  • Cited by (18)

    • DLSA: Semi-supervised partial label learning via dependence-maximized label set assignment

      2022, Information Sciences
      Citation Excerpt :

      The most common method for SPL problems is to fit off-the-shelf PL techniques to all partially labeled examples. Existing PL techniques focus on manipulating the label space through candidate label disambiguation [10,35,41]. For identification-based disambiguation techniques, the valid label is regarded as a latent variable and identified through an iterative refining procedure [4].

    • Partial multi-label learning based on sparse asymmetric label correlations

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Therefore, the PML is converted to a feature completion problem and the multi-label classification model with the completed features and the candidate labels is directly trained. PML-MT (Partial multi-label Learning with Mutual Teaching) [44] refines the label confidence matrix iteratively with a couple of self-ensemble teacher works and trains two prediction networks simultaneously. End-to-end learning-based PML methods fuse label disambiguation and model induction with iterative optimization, which is simple and direct.

    View all citing articles on Scopus
    View full text