An extended one-versus-rest support vector machine for multi-label classification
Research highlights
► We propose a novel one-versus-rest multi-label support vector machine. ► Label correlation is characterized explicitly via upper bounds of variables. ► Our method works well on ten data sets according to five indicative measures. ► Our method is a powerful candidate for multi-label classification.
Introduction
Multi-label classification is a particular learning task where a single instance can belong to several classes at the same time and thus the classes are not mutually exclusive. Recently, it has been paid more attention to than before because of many real-world applications, e.g., text categorization [16], [26], [27], [40], scene classification [2], [15], [41], bioinformatics [1], [21], and music and speech categorization [17], [29]. Nowadays, there are three strategies to design and implement various discriminative multi-label classification methods: data decomposition, algorithm extension, and hybrid strategies. Further, label correlation, i.e., label co-occurrence information, has been exploited in three levels: individual instance, partial instances, and different labels.
Data decomposition strategy divides a multi-label data set into either one or more single-label (single, binary, or multi-class) subsets, constructs a sub-classifier for each subset using an existing classification technique, and then assembles all sub-classifiers into an entire multi-label classifier. There are four widely used decomposition tricks: one-versus-rest (OVR), one-versus-one (OVO), one-by-one (OBO), and label powerset (LP) [3], [30], [32], [39]. It is convenient and fast to implement a data decomposition multi-label method since lots of existing classification techniques and their software can be utilized. This strategy reflects label correlation of individual instance through exploiting multi-label instances repeatedly in OVR, OVO, and OBO methods implicitly, and label correlations of partial instances via considering possible label combinations in LP methods directly.
Algorithm extension strategy generalizes a specific multi-class classification algorithm to consider all training instances and all classes of a multi-label training data set at once. This strategy could induce some complicated optimization problems, e.g., large-scale quadratic programming in multi-label support vector machine (Rank-SVM) [10] and unconstrained optimization in multi-label neural networks (BP-MLL) [40]. But such two methods explicitly characterize label correlation of individual instance using an approximate expression of ranking loss, and further reflect label correlation of different labels using a threshold function from linear regression.
Hybrid strategy aims to integrate the merits of the above two strategies. It needs to modify or extend an existing single-label method while a multi-label data set is divided into a series of subsets implicitly or explicitly. This strategy has been used to design and implement several efficient and effective multi-label classifiers, e.g., two kNN-based multi-label approaches (ML-kNN and IBLR-ML) [4], [41], which introduce posterior probability estimation for each label independently to extend kNN, after the OVR decomposition trick is applied implicitly. Furthermore, IBLR-ML captures label correlation of different labels via linking its posterior probability of each label with distance-weighted sums of k neighbor instance labels from all classes. But how to find out a proper way to characterize label correlation of individual instance, partial instances, and even different labels in extending a specific method is still a challenging issue for such a hybrid strategy.
Binary support vector machine [36] is one of the most powerful machine learning algorithms in the past 15 years. For multi-label classification, one-versus-rest support vector machine has been successfully used in many real-world applications [2], [16], which indirectly reflects label correlation of individual instance through reusing multi-label instances. In this paper, our focus is on incorporating label correlation of individual instance into one-versus-rest multi-label support vector machine explicitly. We define a new empirical loss term through approximating ranking loss from above and then generalize traditional binary support vector machine to design a novel support vector machine for multi-label classification. In our quadratic programming problem, the upper bounds of variables are associated with the number of relevant or irrelevant labels of training instances, which characterizes label correlation of individual instance directly. Particularly, our optimization problem can be solved via combining the OVR decomposition trick and modified binary support vector machine, which reduces computational complexity greatly. Experimental results demonstrate that our method is a competitive candidate for multi-label classification, compared with four existing techniques.
The rest of this paper is organized as follows. Multi-label classification setting is introduced in Section 2 and previous work is reviewed in Section 3. Then our novel method is described in Section 4. Section 5 is devoted to experiments with ten benchmark data sets. This paper ends with some conclusions in Section 6.
Section snippets
Multi-label classification setting
Let be a d-dimensional input space and Q={1,2,…,q} a finite set of class labels, where q is the number of class labels. Further, assume that each instance x∈X can be associated with a subset of labels L∈2Q, which is referred to as the relevant set of labels for x. At the same time, the complement of L, i.e., , is called as the irrelevant set of labels. Given a training data set of size l drawn identically and independently from an unknown probability distribution on X×2Q, i.e.
Previous work
In the past several years, since multi-label classification has received a lot of attention in machine learning, pattern recognition and statistics, a variety of methods have been proposed. In this paper, according to three strategies mentioned in the introduction, we categorize existing discriminative multi-label methods into three groups: data decomposition, algorithm extension and hybrid methods. Note that in Refs. [3], [30], [32], [33], our first group is referred to as problem
Extended one-versus-rest multi-label support vector machine
In this section, we briefly review traditional one-versus-rest multi-label support vector machine (OVR-SVM) [2], [16], [30], and then propose its extended version (simply OVR-ESVM). For convenience, for a training instance xi, we define a binary label vector yi=[yi1,yi2,…,yiq]T, where yik=+1 if k∈Li, otherwise yik=−1.
Experiments
In this section, we compare our OVR-ESVM with four existing multi-label classification approaches experimentally. Before presenting our experimental results, we briefly describe four existing methods, four evaluation measures for multi-label classification, ten benchmark data sets and parameter settings for five multi-label methods.
Conclusions
For multi-label classification, almost all researchers aim at both low computational cost and good performance. But such two targets usually conflict with each other in fact, which are mainly paid attention to by data decomposition and algorithm extension strategies, respectively. Hybrid strategy considers the trade-off between two targets, resulting into some effective and efficient multi-label techniques. In this paper, we have applied hybrid strategy to design and implement a novel support
Acknowledgments
This work is supported by the Natural Science Foundation of China grant 60875001 and partially by the Jiangsu Province Scholarship for Overseas Studying (Sep. 2008–Sep. 2009).
Jianhua Xu received his Ph.D. in Pattern Recognition and Intelligent Systems in 2002 (Department of Automation, Tsinghua University, Beijing, China), M.S. in Geophysics in 1987 (Department of Earth and Space Sciences, University of Science and Technology of China, Hefei, China), and B.E. in Seismology in 1985 (Department of Applied Geophysics, Chengdu College of Geology, Chengdu, China). Since 2005, he is a professor in Computer Science, School of Computer Science and Technology, Nanjing Normal
References (44)
- et al.
Learning multi-label scene classification
Pattern Recognition
(2004) - et al.
Label ranking by learning pairwise preferences
Artificial Intelligence
(2008) - et al.
ML-kNN: a lazy learning approach to multi-label learning
Pattern Recognition
(2007) - et al.
Feature selection for multi-label naïve Bayes classification
Information Science
(2009) - et al.
Hierarchical multi-label prediction of gene function
Bioinformatics
(2006) - et al.
A tutorial on multi-label classification techniques
- et al.
Combining instance-based learning and logistic regression for multi-label classification
Machine Learning
(2009) - A. Clare, R.D. King, Knowledge discovery in multi-label phenotype data, in: Proceedings of the 5th European Conference...
- F.D. Comite, R. Gilleron, M. Tommasi, Learning multi-label alternative decision tree from texts and data, in:...
- K. Dembczynski, W. Cheng, E. Hullermeier, Bayes optimal multilabel classification via probabilistic classifier chains,...
Statistical comparison of classifiers over multiple data sets
Journal of Machine Learning Research
Pattern Classification
Working set selection using second order information for training support vector machines
Journal of Machine Learning Research
Multi-label classification via calibrated label ranking
Machine Learning
Cited by (0)
Jianhua Xu received his Ph.D. in Pattern Recognition and Intelligent Systems in 2002 (Department of Automation, Tsinghua University, Beijing, China), M.S. in Geophysics in 1987 (Department of Earth and Space Sciences, University of Science and Technology of China, Hefei, China), and B.E. in Seismology in 1985 (Department of Applied Geophysics, Chengdu College of Geology, Chengdu, China). Since 2005, he is a professor in Computer Science, School of Computer Science and Technology, Nanjing Normal University, Nanjing, China. Between Sep. 2008 and Sep. 2009, he was a visiting scholar at Department of Statistics, Harvard University, Cambridge MA, USA. His research interests are focused on pattern recognition, machine learning, and their applications to bioinformatics.