Elsevier

Neurocomputing

Volume 74, Issue 17, October 2011, Pages 3114-3124
Neurocomputing

An extended one-versus-rest support vector machine for multi-label classification

https://doi.org/10.1016/j.neucom.2011.04.024Get rights and content

Abstract

Hybrid strategy, which generalizes a specific single-label algorithm while one or two data decomposition tricks are applied implicitly or explicitly, has become an effective and efficient tool to design and implement various multi-label classification algorithms. In this paper, we extend traditional binary support vector machine by introducing an approximate ranking loss as its empirical loss term to build a novel support vector machine for multi-label classification, resulting into a quadratic programming problem with different upper bounds of variables to characterize label correlation of individual instance. Further, our optimization problem can be solved via combining one-versus-rest data decomposition trick with modified binary support vector machine, which dramatically reduces computational cost. Experimental study on ten multi-label data sets illustrates that our method is a powerful candidate for multi-label classification, compared with four state-of-the-art multi-label classification approaches.

Research highlights

► We propose a novel one-versus-rest multi-label support vector machine. ► Label correlation is characterized explicitly via upper bounds of variables. ► Our method works well on ten data sets according to five indicative measures. ► Our method is a powerful candidate for multi-label classification.

Introduction

Multi-label classification is a particular learning task where a single instance can belong to several classes at the same time and thus the classes are not mutually exclusive. Recently, it has been paid more attention to than before because of many real-world applications, e.g., text categorization [16], [26], [27], [40], scene classification [2], [15], [41], bioinformatics [1], [21], and music and speech categorization [17], [29]. Nowadays, there are three strategies to design and implement various discriminative multi-label classification methods: data decomposition, algorithm extension, and hybrid strategies. Further, label correlation, i.e., label co-occurrence information, has been exploited in three levels: individual instance, partial instances, and different labels.

Data decomposition strategy divides a multi-label data set into either one or more single-label (single, binary, or multi-class) subsets, constructs a sub-classifier for each subset using an existing classification technique, and then assembles all sub-classifiers into an entire multi-label classifier. There are four widely used decomposition tricks: one-versus-rest (OVR), one-versus-one (OVO), one-by-one (OBO), and label powerset (LP) [3], [30], [32], [39]. It is convenient and fast to implement a data decomposition multi-label method since lots of existing classification techniques and their software can be utilized. This strategy reflects label correlation of individual instance through exploiting multi-label instances repeatedly in OVR, OVO, and OBO methods implicitly, and label correlations of partial instances via considering possible label combinations in LP methods directly.

Algorithm extension strategy generalizes a specific multi-class classification algorithm to consider all training instances and all classes of a multi-label training data set at once. This strategy could induce some complicated optimization problems, e.g., large-scale quadratic programming in multi-label support vector machine (Rank-SVM) [10] and unconstrained optimization in multi-label neural networks (BP-MLL) [40]. But such two methods explicitly characterize label correlation of individual instance using an approximate expression of ranking loss, and further reflect label correlation of different labels using a threshold function from linear regression.

Hybrid strategy aims to integrate the merits of the above two strategies. It needs to modify or extend an existing single-label method while a multi-label data set is divided into a series of subsets implicitly or explicitly. This strategy has been used to design and implement several efficient and effective multi-label classifiers, e.g., two kNN-based multi-label approaches (ML-kNN and IBLR-ML) [4], [41], which introduce posterior probability estimation for each label independently to extend kNN, after the OVR decomposition trick is applied implicitly. Furthermore, IBLR-ML captures label correlation of different labels via linking its posterior probability of each label with distance-weighted sums of k neighbor instance labels from all classes. But how to find out a proper way to characterize label correlation of individual instance, partial instances, and even different labels in extending a specific method is still a challenging issue for such a hybrid strategy.

Binary support vector machine [36] is one of the most powerful machine learning algorithms in the past 15 years. For multi-label classification, one-versus-rest support vector machine has been successfully used in many real-world applications [2], [16], which indirectly reflects label correlation of individual instance through reusing multi-label instances. In this paper, our focus is on incorporating label correlation of individual instance into one-versus-rest multi-label support vector machine explicitly. We define a new empirical loss term through approximating ranking loss from above and then generalize traditional binary support vector machine to design a novel support vector machine for multi-label classification. In our quadratic programming problem, the upper bounds of variables are associated with the number of relevant or irrelevant labels of training instances, which characterizes label correlation of individual instance directly. Particularly, our optimization problem can be solved via combining the OVR decomposition trick and modified binary support vector machine, which reduces computational complexity greatly. Experimental results demonstrate that our method is a competitive candidate for multi-label classification, compared with four existing techniques.

The rest of this paper is organized as follows. Multi-label classification setting is introduced in Section 2 and previous work is reviewed in Section 3. Then our novel method is described in Section 4. Section 5 is devoted to experiments with ten benchmark data sets. This paper ends with some conclusions in Section 6.

Section snippets

Multi-label classification setting

Let XRd be a d-dimensional input space and Q={1,2,…,q} a finite set of class labels, where q is the number of class labels. Further, assume that each instance xX can be associated with a subset of labels L∈2Q, which is referred to as the relevant set of labels for x. At the same time, the complement of L, i.e., L¯=Q\L, is called as the irrelevant set of labels. Given a training data set of size l drawn identically and independently from an unknown probability distribution on X×2Q, i.e.{(x1,L1)

Previous work

In the past several years, since multi-label classification has received a lot of attention in machine learning, pattern recognition and statistics, a variety of methods have been proposed. In this paper, according to three strategies mentioned in the introduction, we categorize existing discriminative multi-label methods into three groups: data decomposition, algorithm extension and hybrid methods. Note that in Refs. [3], [30], [32], [33], our first group is referred to as problem

Extended one-versus-rest multi-label support vector machine

In this section, we briefly review traditional one-versus-rest multi-label support vector machine (OVR-SVM) [2], [16], [30], and then propose its extended version (simply OVR-ESVM). For convenience, for a training instance xi, we define a binary label vector yi=[yi1,yi2,…,yiq]T, where yik=+1 if kLi, otherwise yik=−1.

Experiments

In this section, we compare our OVR-ESVM with four existing multi-label classification approaches experimentally. Before presenting our experimental results, we briefly describe four existing methods, four evaluation measures for multi-label classification, ten benchmark data sets and parameter settings for five multi-label methods.

Conclusions

For multi-label classification, almost all researchers aim at both low computational cost and good performance. But such two targets usually conflict with each other in fact, which are mainly paid attention to by data decomposition and algorithm extension strategies, respectively. Hybrid strategy considers the trade-off between two targets, resulting into some effective and efficient multi-label techniques. In this paper, we have applied hybrid strategy to design and implement a novel support

Acknowledgments

This work is supported by the Natural Science Foundation of China grant 60875001 and partially by the Jiangsu Province Scholarship for Overseas Studying (Sep. 2008–Sep. 2009).

Jianhua Xu received his Ph.D. in Pattern Recognition and Intelligent Systems in 2002 (Department of Automation, Tsinghua University, Beijing, China), M.S. in Geophysics in 1987 (Department of Earth and Space Sciences, University of Science and Technology of China, Hefei, China), and B.E. in Seismology in 1985 (Department of Applied Geophysics, Chengdu College of Geology, Chengdu, China). Since 2005, he is a professor in Computer Science, School of Computer Science and Technology, Nanjing Normal

References (44)

  • M.R. Boutell et al.

    Learning multi-label scene classification

    Pattern Recognition

    (2004)
  • E. Hullermeier et al.

    Label ranking by learning pairwise preferences

    Artificial Intelligence

    (2008)
  • M.L. Zhang et al.

    ML-kNN: a lazy learning approach to multi-label learning

    Pattern Recognition

    (2007)
  • M.L. Zhang et al.

    Feature selection for multi-label naïve Bayes classification

    Information Science

    (2009)
  • Z. Barutcuoglu et al.

    Hierarchical multi-label prediction of gene function

    Bioinformatics

    (2006)
  • A.C.P.L.F. de Carvalho et al.

    A tutorial on multi-label classification techniques

  • W. Cheng et al.

    Combining instance-based learning and logistic regression for multi-label classification

    Machine Learning

    (2009)
  • A. Clare, R.D. King, Knowledge discovery in multi-label phenotype data, in: Proceedings of the 5th European Conference...
  • F.D. Comite, R. Gilleron, M. Tommasi, Learning multi-label alternative decision tree from texts and data, in:...
  • K. Dembczynski, W. Cheng, E. Hullermeier, Bayes optimal multilabel classification via probabilistic classifier chains,...
  • J. Demsar

    Statistical comparison of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • A. Elisseeff, J. Weston, A kernel method for multi-labelled classification, in: Proceedings of the 14th Conference on...
  • R.E. Fan et al.

    Working set selection using second order information for training support vector machines

    Journal of Machine Learning Research

    (2005)
  • J. Furnkranz et al.

    Multi-label classification via calibrated label ranking

    Machine Learning

    (2008)
  • R. Grodzicki, J. Mandziuk, L. Wang, Improved multi-label classification with neural networks, in: Proceedings of the...
  • A. Jiang, C. Wang, Y. Zhu, Calibrated rank-svm for multi-label image categorization, in: Proceedings of 2008 IEEE...
  • T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: Proceedings of...
  • B. Lauser, A. Hotho, Automatic multi-label subject indexing in a multilingual environment, in: Proceeding of the 7th...
  • J.Y. Li, J.H. Xu, A fast multi-label classification algorithm based on double label support vector machine, in:...
  • C.J. Lin, LibSVM software and its implementation details, and multi-label data sets,...
  • E.L. Mencia, J. Furnkranz, Pairwise learning of multilabel classifications with perceptrons, in: Proceedings of 2008...
  • Cited by (0)

    Jianhua Xu received his Ph.D. in Pattern Recognition and Intelligent Systems in 2002 (Department of Automation, Tsinghua University, Beijing, China), M.S. in Geophysics in 1987 (Department of Earth and Space Sciences, University of Science and Technology of China, Hefei, China), and B.E. in Seismology in 1985 (Department of Applied Geophysics, Chengdu College of Geology, Chengdu, China). Since 2005, he is a professor in Computer Science, School of Computer Science and Technology, Nanjing Normal University, Nanjing, China. Between Sep. 2008 and Sep. 2009, he was a visiting scholar at Department of Statistics, Harvard University, Cambridge MA, USA. His research interests are focused on pattern recognition, machine learning, and their applications to bioinformatics.

    View full text