Elsevier

Pattern Recognition

Volume 48, Issue 10, October 2015, Pages 3249-3257
Pattern Recognition

Multi-task proximal support vector machine

https://doi.org/10.1016/j.patcog.2015.01.014Get rights and content

Highlights

  • Propose highly efficient multi-task proximal support vector machine (MTPSVM).

  • Develop a method to optimize the learning procedure of MTPSVM.

  • Unbalanced MTPSVM is proposed to deal with the unbalanced sample problem.

  • Propose proximal support vector regression (SVR) and multi-task proximal SVR.

  • Extensive experiments demonstrate the effectiveness and efficiency of our MTPSVM.

Abstract

With the explosive growth of the use of imagery, visual recognition plays an important role in many applications and attracts increasing research attention. Given several related tasks, single-task learning learns each task separately and ignores the relationships among these tasks. Different from single-task learning, multi-task learning can explore more information to learn all tasks jointly by using relationships among these tasks. In this paper, we propose a novel multi-task learning model based on the proximal support vector machine. The proximal support vector machine uses the large-margin idea as does the standard support vector machines but with looser constraints and much lower computational cost. Our multi-task proximal support vector machine inherits the merits of the proximal support vector machine and achieves better performance compared with other popular multi-task learning models. Experiments are conducted on several multi-task learning datasets, including two classification datasets and one regression dataset. All results demonstrate the effectiveness and efficiency of our proposed multi-task proximal support vector machine.

Introduction

Given the explosive growth the use of imagery in the era of big data, visual recognition has become an important problem. Various image classification and recognition methods have been proposed and have achieved much success [1], [2], [3], [4], [5], [6], [7], [8], [9]. Some feature learning methods are also proposed to improve the performance of image classification and recognition [10], [11], [12], [13]. When learning a visual recognition task, it can often be viewed as a combination of multiple correlated subtasks [14]. Considering multi-label image classification, for example, one particular image may contain multiple objects corresponding to different labels. Obviously, there are correlations among these labels. Traditional single-task learning methods, for example, SVMs and Bayesian models, learn to classify these labels separately and ignore correlations among them. It would be desirable to explore shared information across different subtasks and apply the information to learn all the subtasks jointly. Inspired by this idea, various methods are proposed to learn multiple tasks jointly rather than separately. This is often called the multi-task learning (MTL) [15], learning to learn [16] or inductive bias learning [17]. All these methods tend to learn multiple tasks together and improve the performance of single-task learning models.

The most important and difficult problem in multi-task learning is to discover the shared information among tasks and maintain the independence of each task. Considering the classification of vehicles (see Fig. 1), we have various types of vehicles, such as sports cars, family cars and buses corresponding to different classification tasks. These cars have shared features as well as unique characteristics. For example, all cars have four wheels and two headlights. However, sports cars usually have a lower and racing body, family cars often have medium size, and buses have a bigger body. Single-task learning only uses the information of the independent task, while multi-task learning will use all the information among the tasks. If a multi-task learning method can find the shared features of these vehicles and distinguish differences among the vehicles, each learning task will have much more additional information from other tasks. Conversely, noise will be added to the current learning task.

Existing multi-task learning methods mainly have two ways to discover relationships among different tasks. One way is to assume that different tasks share common parameters [18], [14], [19], [20], [21], [22], [23] such as a Bayesian model sharing a common prior [14] or a large-margin model sharing a mean hyperplane [19]. The other way to learn the relatedness is to find latent feature representation among these tasks [24], [25], [26], for example, learning a sparse representation shared across tasks [25]. Existing multi-task learning methods mainly have two defects. First, some multi-task learning models have a complicated theoretical foundation, which leads to implementation difficulties. For example, a nonparametric Bayesian model usually has many assumptions and many parameters to select. Second, the efficiency is low, especially when the dataset has a large number of data points and a high dimensional feature. Our goal is to find an easily implemented multi-task learning method with high efficiency and comparable performance. In this paper, we propose a multi-task learning method based on the proximal support vector machine (PSVM) [27] and apply it to two classification datasets and one regression dataset. PSVM was proposed by Fung and Mangasarian and is different from the standard SVM [28]. PSVM also utilizes the large margin idea by assigning the data points to the closest of two disjoint hyperplanes, which are separated as far as possible. However, PSVM has looser constraints than does standard SVM, with comparable performance and much lower computational cost. Inspired by the idea of PSVM and the advantages of multi-task learning, we derive a multi-task proximal support vector machine (MTPSVM). All data examples of all tasks are needed to learn MTPSVM simultaneously. It will absolutely slow the computing process if the dataset is a large-scale one. In this paper, we develop a method to optimize the procedure of learning MTPSVM that greatly improves efficiency. Based on the idea of PSVM for unbalanced data, we also apply this to MTPSVM. Finally, we propose proximal support vector regression for regression problems, which is not discussed in PSVM [27], and extend it to multi-task problems.

MTPSVM has two primary merits compared with other multi-task learning methods. First, MTPSVM is easily implemented by just solving a quadratic optimization problem with equality constraints. Second, MTPSVM has much lower computational cost and can be applied to a large-scale dataset. We will demonstrate that the computational time of MTPSVM relies primarily on the feature dimension of the data rather than on the number of data points.

We organize the remainder of this paper as follows. Section 2 reviews previous works in multi-task learning. In Section 3, we first briefly introduce the proximal support vector machine and then give a specific derivation of the proposed multi-task proximal support vector machine. The derivation of multi-task proximal support vector regression will be presented in Section 4. In Section 5, experiments on several datasets are presented. Section 6 presents our study׳s conclusions.

Section snippets

Related work

Multi-task learning has been proven more effective than single-task learning by many works via both theory analysis and extensive experiments. For example, Baxter proposed a novel model of inductive bias learning to learn multiple tasks together and derived explicit bounds which demonstrated that multi-task learning gave better generalization than single-task learning [17]. Another work conducted by Ben-David and Schuller developed a useful notion of task relatedness and better generalization

Multi-task proximal support vector machine

In this section, we first give an overview of the proximal support vector machines and then introduce the detailed theoretical derivation of our proposed MTPSVM. Additionally, computing optimization details will be given in section 3.4.

Multi-task proximal support vector regression

Having determined the derivation of the multi-task proximal support vector machine, it is easy to convert the multi-task proximal support vector machine to multi-task proximal support vector regression. The problem of proximal support vector regression is not discussed in [27]. Therefore, we first show the primal problem of proximal support vector regression (PSVR) and then extend it to multi-task proximal support vector regression (MTPSVR). Suppose Y¯ is an m×1 vector (y1,y2,,ym),yiR,i=1,2,

Experiments

We show empirical results of our proposed multi-task models on three real world datasets including two classification datasets and one regression dataset. The regression dataset is the school dataset, which is developed and used to evaluate the performance of multi-task learning in many works [31], [25], [19], [30]. We will test the performance of MTPSVR on this dataset. The two classification datasets are the landmine dataset [24], [14] and a multi-task image classification dataset using

Conclusion and future work

In this paper, we propose a novel multi-task learning method based on PSVM. We give a detailed derivation of our MTPSVM and extend it for unbalanced data (B_MTPSVM). Considering the efficiency problem, the calculating procedure of MTPSVM is optimized, which leads to high efficiency. Experiments are conducted on three datasets: the landmine dataset, the school dataset and one multi-task image classification dataset. We compare both the performance and the running time of MTPSVM, PSVM and three

Ya Li received his B.S. degree in 2013 from the Department of Electronic Engineering and Information Science in the University of Science and Technology of China (USTC). He is now pursuing his Ph.D. degree in USTC and his research interest is machine learning.

References (44)

  • Y.-H. Shao et al.

    An efficient weighted lagrangian twin support vector machine for imbalanced data classification

    Pattern Recognit.

    (2014)
  • A. Tayal et al.

    Primal explicit max margin feature selection for nonlinear support vector machines

    Pattern Recognit.

    (2014)
  • G.M. Allenby et al.

    Marketing models of consumer heterogeneity

    J. Economet.

    (1998)
  • W. Hu et al.

    Image classification using multiscale information fusion based on saliency driven nonlinear diffusion filtering

    IEEE Trans. Image Process.: Publ. IEEE Signal Process. Soc.

    (2014)
  • L. Zhang et al.

    Learning object-to-class kernels for scene classification

    IEEE Trans. Image Process.

    (2014)
  • F. Zhu, Z. Jiang, L. Shao, Submodular object recognition, in: IEEE Conference on Computer Vision and Pattern...
  • Q. Qiu, V.M. Patel, P. Turaga, R. Chellappa, Domain adaptive dictionary learning, in: Computer Vision—ECCV, 2012, pp....
  • P. Turaga et al.

    Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • X. Shen et al.

    Spatially-constrained similarity measure for large-scale object retrieval

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • A.J. Joshi, F. Porikli, N. Papanikolopoulos, Multi-class active learning for image classification, in: IEEE Conference...
  • Z. Jiang, G. Zhang, L.S. Davis, Submodular dictionary learning for sparse coding, in: IEEE Conference on Computer...
  • F. Zhu et al.

    Weakly-supervised cross-domain dictionary learning for visual recognition

    Int. J. Comput. Vis.

    (2014)
  • L. Shao et al.

    Feature learning for image classification via multiobjective genetic programming

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • L. Shao, D. Wu, X. Li, Learning deep and wide: a spectral method for learning deep networks, IEEE Trans. Neural Netw....
  • Y. Xue, X. Liao, L. Carlin, B. Krishnapuram, Multi-task learning for classification with Dirichlet process priors, J....
  • R. Caruanal

    Multitask learning

    Mach. Learn.

    (1997)
  • S. Thrun, Learning to learn: introduction, in: Learning To...
  • J. Baxter, A model of inductive bias learning, J. Artif. Intell. Res. 12(1-C12) (2000)...
  • P. Rai, H. Daume, Infinite predictor subspace models for multitask learning, in: International Conference on Artificial...
  • T. Evgeniou, M. Pontil, Regularized multi–task learning, in: Proceedings of the 10th ACM SIGKDD International...
  • Y. Zhang, D.-Y. Yeung, A Convex Formulation for Learning Task Relationships in Multi-task Learning,...
  • S. Parameswaran et al.

    Large margin multi-task metric learning

    Adv. Neural Inf. Process. Syst.

    (2010)
  • Cited by (40)

    • Multi-task learning for energy consumption forecasting of methyl chlorosilanes fractional distillation process

      2022, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      For example, select a subset of the original features as the new representation [22]. However, the above three methods are all data-driven methods, which lack the interpretability of the model [23]. In order to make better use of the mechanism of the distillation process and to increase the interpretability of the model, in this paper, a simplified white-box model via the heat and material balance equations is made as a priority.

    • A novel ramp loss-based multi-task twin support vector machine with multi-parameter safe acceleration

      2022, Neural Networks
      Citation Excerpt :

      They (Xie & Sun, 2015) further proposed an improved MCTSVM model by adding the average of samples into the dataset. Then, Xu, An, Qiao, and Zhu (2004), Mei and Xu (2019) and Li, Tian, Song, and Tao (2015) proposed the MTLS-SVM, MTLS-TWSVM and MTPSVM based on the least square formulation to achieve faster solving speed, respectively. Mei and Xu (2020) proposed two novel multi-task v-TWSVMs which inherit the merits of MTL and v-TWSVM.

    View all citing articles on Scopus

    Ya Li received his B.S. degree in 2013 from the Department of Electronic Engineering and Information Science in the University of Science and Technology of China (USTC). He is now pursuing his Ph.D. degree in USTC and his research interest is machine learning.

    Xinmei Tian is an associate professor in the Department of Electronic Engineering and Information Science, University of Science and Technology of China. She received the Ph.D. degree from the University of Science and Technology of China in 2010. Her current research interests include multimedia information retrieval and machine learning. She received the Excellent Doctoral Dissertation of Chinese Academy of Sciences award in 2012 and the Nomination of National Excellent Doctoral Dissertation award in 2013.

    Mingli Song a professor of Computer Science with the College of Computer Science and Technology, Zhejiang University. He received his Ph.D degree in Computer Science and Technology from College of Computer Science, Zhejiang University, and B. Eng. Degree from Northwestern Polytechnical University. He was awarded Microsoft Research Fellowship in 2004. His research interests include Pattern Classification, Weakly Superivsed Clustering, Color and Texture Analysis, Object Recognition, and Reconstruction. He has authored and co-authored more than 70 scientific articles at top venues including IEEE T-PAMI, IEEE T-IP, T-MM, T-SMCB, Information Sciences, Pattern Recognition, CVPR, ECCV and ACM MM. He has served with more than 10 major international conferences including ICDM, ACM Multimedia, ICIP, ICASSP, ICME, PCM, PSIVT and CAIP, and more than 10 prestigious international journals including T-IP, T-VCG, T-KDE, T-MM, T-CSVT, T-NNLS and TSMCB. He is a Senior Member of IEEE, and Professional Member of ACM.

    Dacheng Tao is a professor of Computer Science with the Centre for Quantum Computation & Intelligent Systems, and the Faculty of Engineering and Information Technology in the University of Technology, Sydney. He mainly applies statistics and mathematics to data analytics and his research interests spread across computer vision, data science, image processing, machine learning, neural networks and video surveillance. His research results have expounded in one monograph and 100+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, T-CYB, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM; and ACM SIGKDD, with several best paper awards, such as the Best Theory/Algorithm Paper Runner Up Award in IEEE ICDM’07, the Best Student Paper Award in IEEE ICDM׳13, and the 2014 ICDM 10 Year Highest-Impact Paper Award.

    This work is supported by the NSFC under the Contract nos. 61201413 and 61390514, the Fundamental Research Funds for the Central Universities Nos. WK2100060011 and WK2100100021, and the Specialized Research Fund for the Doctoral Program of Higher Education No. WJ2100060003. Australian Research Council Projects: DP-140102164, ARC FT-130101457, and ARC LP-140100569.

    1

    CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, China

    View full text