Elsevier

Neurocomputing

Volume 119, 7 November 2013, Pages 273-280
Neurocomputing

Multi-class learning from class proportions

https://doi.org/10.1016/j.neucom.2013.03.031Get rights and content

Abstract

In this work, we aim to solve the following multi-class inference problem: for given groups of unlabeled samples, a reliable multi-class classifier is expected to deterministically predict the label of each sample under the condition that only the class proportion information of each group is provided. Actually many modern applications can be abstracted to such a problem, e.g., large-scale images annotation, spam filtering, and improper content detection, where the class proportions of samples can be cheaply obtained while sample-wise labeling is prohibitive or quite hard. However, this problem has not been thoroughly investigated in previous works yet though it is much important in practice. The main challenging essentially lies on the severely under-determining itself. In this paper, we propose to utilize the natural sparsity of labels to alleviate this issue, and then formulate the classifier learning as a sparsity pursuit problem over a standard simplex. Moreover, due to the inapplicability of the popular 1-relaxation method for this case, we propose an optimization method to directly tackle the hard sparsity constraint, i.e., 0-constraint, based on the Augmented Lagrangian Multiplier (ALM) which can nicely provide a global convergence guarantee. It is noteworthy that our overall solution can not only directly predict the labels of the training and new samples, but also gracefully utilize the test samples to further boost the classification performance in a manner of semi-supervised learning. The experimental results on two benchmark datasets well validate the effectiveness of the proposed method.

Introduction

Many modern intelligent applications generally involve a multi-class inference problem. For example, images in the digital image management are automatically classified into different categories according to the contained semantic objects for more conveniently browsing and retrieving them, and new emails in the spam filtering are classified into spam or non-spam ones based on certain features and rules. To solve such multi-class inference problems, traditional methods adopt supervised learning to learn a classifier from the training set of samples, e.g., multi-class SVM classifier [1], [2], [3], random forest [4], or discriminative subspace learning [5], [6]. In practice, these supervised learning methods require quite a large amount of well annotated training samples for achieving the satisfactory classification performance. However, specific sample-wise labeling is probably prohibitive or quite labor-intensive, especially as both the amount of samples and number of classes are being rapidly increased nowadays.

Compared with accurately labeling individual sample, the rough class proportion1 can usually be obtained much more cheaply and efficiently. Thus how to learn a multi-class classifier from only the class proportion information is very meaningful in practice besides the special research interest itself. Fig. 1 provides an illustrating example of this classification problem, where images need to be classified into different categories according their contained objects. Here given a large amount of images and many classes, the class label of each image cannot be obtained due to the heavy burden of manually labeling. However, such class proportions can be efficiently estimated through random sampling some images and counting their categories. More specifically, for the first image group in Fig. 1, we only know there are 50%, 17% and 33% of the images belonging to “butterfly”, “camera” and “cougar”, respectively. Then our target is to learn a multi-class classifier to predict the class label of each image, including the observed images and the new images, only using the knowledge of class proportions.

Besides the advantage of efficiency, such multi-class learning from class proportions is more robust to inaccurate annotations compared with the traditional supervised learning. In practice, manually labeling images is evidently labor-intensive, and consequently it is inevitable to generate inaccurate labels (e.g., a “butterfly” image is labeled as a “moth” image). These inaccurate labels would certainly incur severe negative effect in the supervised learning process. In contrast, learning from class proportions, as a weakly supervised learning, can naturally inject certain robustness to inaccurate supervision information. In addition, the class proportions can be obtained more accurately benefitted from that sampling-counting process can be repeated efficiently for validation.

In spite of valuable practicability, learning from class proportion is quite challenging due to its severe under-determination. Given only the class proportion information, there are almost infinitely many sample label configurations providing the same class proportion statistics. Here, we propose to alleviate this issue from the following two aspects. First, we divide the samples into a number of groups and obtain the class proportion of each individual group. If the number of groups is sufficient (but not too many to increase the labeling burden), such under-determination would be significantly relieved. Second, we impose the sparsity constraint on the predicted sample labels to regularize the problem. The sparsity prior is motivated by the observation that though the number of classes over the whole dataset may be large, each individual sample usually only contains one or few classes.

In this work, we present a multi-class learning method directly using the knowledge of class proportions. In particular, linear classifiers are employed here that have been proved to achieve excellent performance in various classification tasks. That is, for all samples, there is a linear relationship between the feature vector and the label confidence vector.2 Then the class proportion of a given group can be obtained through averaging the label confidence vectors of its samples, which is enforced to be consistent with the provided groundtruth in our model. On the other hand, a sparsity prior is imposed on the predicted label confidences of individual sample in the learning process. Such sparsity is popularly enforced by minimizing the 1-norm of a vector as in most recent works [7]. However, the class proportion vector in this work is naturally normalized by 1-norm, i.e., the 1-norm of vectors always equal to a constant. Hence such minimization of the 1-norm is not applicable any more. Alternatively, we directly adopt the explicit sparsity measure represented by the 0-norm. To tackle the non-convex and combinatorial optimization problem introduced by such 0-norm constraints, we propose an effective algorithm based on the combination of Augmented Lagragian Multiplier (ALM) [8] and non-negative matrix multiplicative update [9], which can provide nice global convergence guarantee. The experimental evaluation is performed on two benchmark image classification datasets, i.e., Caltech-101 [10] and more challenging Caltech-256 [11]. The results show that the proposed method can achieve 40% and 20% classification accuracy on the test samples, respectively, even though only the class proportions are given, which, compared with two baseline methods, improve the performance by 7% and 3%. Moreover, the performance can be further boosted by integrating with semi-supervised learning.

It is noteworthy that the proposed solution has two distinctive characteristics: (1) it can directly handle multi-class classification problems because the multi-class classifier is explicitly modeled as our targeted result, while other related works have to perform certain post-processing procedure for implementing the final multi-class classification; and (2) the regularization term in our proposed model can naturally utilize unlabeled data to further boost the classification performance, while previous methods cannot work seamlessly under such semi-supervised learning framework.

The remainder of this paper is organized as follows. First, we discuss some related works in Section 2, and then the proposed solution is elaborated on in Section 3. After this, Section 4 presents several comparison experiments to show the effectiveness of the proposed method. Finally, we conclude the paper in Section 5.

Section snippets

Related works

The related research of multi-class learning from class proportions is quite rare though this problem widely exists in many real-world applications. To our best knowledge, there exist only two related works to solve similar problems. Quadrianto et al. [12] proposed to estimate the joint distribution of the sample features and the corresponding labels under the family of exponential distributions. Specially, it is assumed that the distribution of labels conditioned on the features obeys an

Multi-class learning from class proportions

In this section, we introduce the proposed method in details. We first formalize the multi-class learning from class proportions into a certain optimization problem. Then we show how to handle the involved hard sparsity constraint and provide the update procedure of nonnegative matrix. Finally, we describe the overall algorithm of our proposed method that is mainly represented by an iterative optimization procedure.

Experiments

In this section, we experimentally evaluate the performance of the proposed method in the multi-class image classification tasks, and compare it with the existing two methods: the Mean Map (MM) [12] and the Inverse Calibration (INV.CAL) [13]. Here two widely used benchmark datasets, Caltech-101 [10] and Caltech-256 [11], are employed for the performance evaluation due to their representativeness in image classification. Furthermore, besides the performance comparison among different methods, we

Conclusions

In this work, we investigated the problem of learning multi-class classifier using the provided class proportion information, which is quite different from the traditional supervised learning settings. Specially, we proposed an optimization approach that enforces the predicted class proportions to approximate the groundtruths while preserving the sparsity of the predicted label vector of each individual sample. Then by introducing a set of auxiliary variables, constructing the classifier boils

Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 61203256 and no. 61233003) and the Fundamental Research Funds for the Central Universities WK2100100018 and WK2100100021.

Zilei Wang received the B.S. and Ph.D. degrees in control theory and control engineering from the University of Science and Technology of China (USTC), Hefei, China, in 2002 and 2007, respectively. He is currently an Associate Professor in the Department of Automation, USTC, and works at the Advanced Sensing and Control Laboratory of USTC. His research interests include computer vision, multimedia, and network management and control.

References (30)

  • Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices...
  • X. Liu, S. Yan, J. Yan, H. Jin, Unified solution to nonnegative data factorization problems, in: Proceedings of the...
  • G. Griffin, A. Holub, P. Perona, Caltech-256 Object Category Dataset, Technical Report 7694, California Institute of...
  • N. Quadrianto et al.

    Estimating labels from label proportions

    J. Mach. Learn. Res.

    (2009)
  • S. Rueping, SVM classifier estimation from group probabilities, in: Proceedings of the International Conference on...
  • Cited by (0)

    Zilei Wang received the B.S. and Ph.D. degrees in control theory and control engineering from the University of Science and Technology of China (USTC), Hefei, China, in 2002 and 2007, respectively. He is currently an Associate Professor in the Department of Automation, USTC, and works at the Advanced Sensing and Control Laboratory of USTC. His research interests include computer vision, multimedia, and network management and control.

    Jiashi Feng received the B.S. degree from the University of Science and Technology of China (USTC), Hefei, China, in 2007. He is currently pursuing the Ph.D. degree from Department of Electrical and Computer Engineering, National University of Singapore (NUS), Singapore. His research interests include computer vision and machine learning.

    View full text