Multi-task proximal support vector machine

doi:10.1016/j.patcog.2015.01.014

Pattern Recognition

Volume 48, Issue 10, October 2015, Pages 3249-3257

https://doi.org/10.1016/j.patcog.2015.01.014 Get rights and content

Highlights

•
Propose highly efficient multi-task proximal support vector machine (MTPSVM).
•
Develop a method to optimize the learning procedure of MTPSVM.
•
Unbalanced MTPSVM is proposed to deal with the unbalanced sample problem.
•
Propose proximal support vector regression (SVR) and multi-task proximal SVR.
•
Extensive experiments demonstrate the effectiveness and efficiency of our MTPSVM.

Abstract

With the explosive growth of the use of imagery, visual recognition plays an important role in many applications and attracts increasing research attention. Given several related tasks, single-task learning learns each task separately and ignores the relationships among these tasks. Different from single-task learning, multi-task learning can explore more information to learn all tasks jointly by using relationships among these tasks. In this paper, we propose a novel multi-task learning model based on the proximal support vector machine. The proximal support vector machine uses the large-margin idea as does the standard support vector machines but with looser constraints and much lower computational cost. Our multi-task proximal support vector machine inherits the merits of the proximal support vector machine and achieves better performance compared with other popular multi-task learning models. Experiments are conducted on several multi-task learning datasets, including two classification datasets and one regression dataset. All results demonstrate the effectiveness and efficiency of our proposed multi-task proximal support vector machine.

Introduction

Given the explosive growth the use of imagery in the era of big data, visual recognition has become an important problem. Various image classification and recognition methods have been proposed and have achieved much success [1], [2], [3], [4], [5], [6], [7], [8], [9]. Some feature learning methods are also proposed to improve the performance of image classification and recognition [10], [11], [12], [13]. When learning a visual recognition task, it can often be viewed as a combination of multiple correlated subtasks [14]. Considering multi-label image classification, for example, one particular image may contain multiple objects corresponding to different labels. Obviously, there are correlations among these labels. Traditional single-task learning methods, for example, SVMs and Bayesian models, learn to classify these labels separately and ignore correlations among them. It would be desirable to explore shared information across different subtasks and apply the information to learn all the subtasks jointly. Inspired by this idea, various methods are proposed to learn multiple tasks jointly rather than separately. This is often called the multi-task learning (MTL) [15], learning to learn [16] or inductive bias learning [17]. All these methods tend to learn multiple tasks together and improve the performance of single-task learning models.

The most important and difficult problem in multi-task learning is to discover the shared information among tasks and maintain the independence of each task. Considering the classification of vehicles (see Fig. 1), we have various types of vehicles, such as sports cars, family cars and buses corresponding to different classification tasks. These cars have shared features as well as unique characteristics. For example, all cars have four wheels and two headlights. However, sports cars usually have a lower and racing body, family cars often have medium size, and buses have a bigger body. Single-task learning only uses the information of the independent task, while multi-task learning will use all the information among the tasks. If a multi-task learning method can find the shared features of these vehicles and distinguish differences among the vehicles, each learning task will have much more additional information from other tasks. Conversely, noise will be added to the current learning task.

Existing multi-task learning methods mainly have two ways to discover relationships among different tasks. One way is to assume that different tasks share common parameters [18], [14], [19], [20], [21], [22], [23] such as a Bayesian model sharing a common prior [14] or a large-margin model sharing a mean hyperplane [19]. The other way to learn the relatedness is to find latent feature representation among these tasks [24], [25], [26], for example, learning a sparse representation shared across tasks [25]. Existing multi-task learning methods mainly have two defects. First, some multi-task learning models have a complicated theoretical foundation, which leads to implementation difficulties. For example, a nonparametric Bayesian model usually has many assumptions and many parameters to select. Second, the efficiency is low, especially when the dataset has a large number of data points and a high dimensional feature. Our goal is to find an easily implemented multi-task learning method with high efficiency and comparable performance. In this paper, we propose a multi-task learning method based on the proximal support vector machine (PSVM) [27] and apply it to two classification datasets and one regression dataset. PSVM was proposed by Fung and Mangasarian and is different from the standard SVM [28]. PSVM also utilizes the large margin idea by assigning the data points to the closest of two disjoint hyperplanes, which are separated as far as possible. However, PSVM has looser constraints than does standard SVM, with comparable performance and much lower computational cost. Inspired by the idea of PSVM and the advantages of multi-task learning, we derive a multi-task proximal support vector machine (MTPSVM). All data examples of all tasks are needed to learn MTPSVM simultaneously. It will absolutely slow the computing process if the dataset is a large-scale one. In this paper, we develop a method to optimize the procedure of learning MTPSVM that greatly improves efficiency. Based on the idea of PSVM for unbalanced data, we also apply this to MTPSVM. Finally, we propose proximal support vector regression for regression problems, which is not discussed in PSVM [27], and extend it to multi-task problems.

MTPSVM has two primary merits compared with other multi-task learning methods. First, MTPSVM is easily implemented by just solving a quadratic optimization problem with equality constraints. Second, MTPSVM has much lower computational cost and can be applied to a large-scale dataset. We will demonstrate that the computational time of MTPSVM relies primarily on the feature dimension of the data rather than on the number of data points.

We organize the remainder of this paper as follows. Section 2 reviews previous works in multi-task learning. In Section 3, we first briefly introduce the proximal support vector machine and then give a specific derivation of the proposed multi-task proximal support vector machine. The derivation of multi-task proximal support vector regression will be presented in Section 4. In Section 5, experiments on several datasets are presented. Section 6 presents our study׳s conclusions.

Section snippets

Related work

Multi-task learning has been proven more effective than single-task learning by many works via both theory analysis and extensive experiments. For example, Baxter proposed a novel model of inductive bias learning to learn multiple tasks together and derived explicit bounds which demonstrated that multi-task learning gave better generalization than single-task learning [17]. Another work conducted by Ben-David and Schuller developed a useful notion of task relatedness and better generalization

Multi-task proximal support vector machine

In this section, we first give an overview of the proximal support vector machines and then introduce the detailed theoretical derivation of our proposed MTPSVM. Additionally, computing optimization details will be given in section 3.4.

Multi-task proximal support vector regression

Having determined the derivation of the multi-task proximal support vector machine, it is easy to convert the multi-task proximal support vector machine to multi-task proximal support vector regression. The problem of proximal support vector regression is not discussed in [27]. Therefore, we first show the primal problem of proximal support vector regression (PSVR) and then extend it to multi-task proximal support vector regression (MTPSVR). Suppose $\bar{Y}$ is an $m \times 1$ vector $(y_{1}, y_{2}, \dots, y_{m})', y_{i} \in R, i = 1, 2, \dots$

Experiments

We show empirical results of our proposed multi-task models on three real world datasets including two classification datasets and one regression dataset. The regression dataset is the school dataset, which is developed and used to evaluate the performance of multi-task learning in many works [31], [25], [19], [30]. We will test the performance of MTPSVR on this dataset. The two classification datasets are the landmine dataset [24], [14] and a multi-task image classification dataset using

Conclusion and future work

In this paper, we propose a novel multi-task learning method based on PSVM. We give a detailed derivation of our MTPSVM and extend it for unbalanced data (B_MTPSVM). Considering the efficiency problem, the calculating procedure of MTPSVM is optimized, which leads to high efficiency. Experiments are conducted on three datasets: the landmine dataset, the school dataset and one multi-task image classification dataset. We compare both the performance and the running time of MTPSVM, PSVM and three

Ya Li received his B.S. degree in 2013 from the Department of Electronic Engineering and Information Science in the University of Science and Technology of China (USTC). He is now pursuing his Ph.D. degree in USTC and his research interest is machine learning.

References (44)

Y.-H. Shao et al.
An efficient weighted lagrangian twin support vector machine for imbalanced data classification
Pattern Recognit.
(2014)
A. Tayal et al.
Primal explicit max margin feature selection for nonlinear support vector machines
Pattern Recognit.
(2014)
G.M. Allenby et al.
Marketing models of consumer heterogeneity
J. Economet.
(1998)
W. Hu et al.
Image classification using multiscale information fusion based on saliency driven nonlinear diffusion filtering
IEEE Trans. Image Process.: Publ. IEEE Signal Process. Soc.
(2014)
L. Zhang et al.
Learning object-to-class kernels for scene classification
IEEE Trans. Image Process.
(2014)
F. Zhu, Z. Jiang, L. Shao, Submodular object recognition, in: IEEE Conference on Computer Vision and Pattern...
Q. Qiu, V.M. Patel, P. Turaga, R. Chellappa, Domain adaptive dictionary learning, in: Computer Vision—ECCV, 2012, pp....
P. Turaga et al.
Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2011)
X. Shen et al.
Spatially-constrained similarity measure for large-scale object retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
(2014)
A.J. Joshi, F. Porikli, N. Papanikolopoulos, Multi-class active learning for image classification, in: IEEE Conference...

Z. Jiang, G. Zhang, L.S. Davis, Submodular dictionary learning for sparse coding, in: IEEE Conference on Computer...

F. Zhu et al.

Weakly-supervised cross-domain dictionary learning for visual recognition

Int. J. Comput. Vis.

(2014)

L. Shao et al.

Feature learning for image classification via multiobjective genetic programming

IEEE Trans. Neural Netw. Learn. Syst.

(2014)

L. Shao, D. Wu, X. Li, Learning deep and wide: a spectral method for learning deep networks, IEEE Trans. Neural Netw....

Y. Xue, X. Liao, L. Carlin, B. Krishnapuram, Multi-task learning for classification with Dirichlet process priors, J....

R. Caruanal

Multitask learning

Mach. Learn.

(1997)

S. Thrun, Learning to learn: introduction, in: Learning To...

J. Baxter, A model of inductive bias learning, J. Artif. Intell. Res. 12(1-C12) (2000)...

P. Rai, H. Daume, Infinite predictor subspace models for multitask learning, in: International Conference on Artificial...

T. Evgeniou, M. Pontil, Regularized multi–task learning, in: Proceedings of the 10th ACM SIGKDD International...

Y. Zhang, D.-Y. Yeung, A Convex Formulation for Learning Task Relationships in Multi-task Learning,...

S. Parameswaran et al.

Large margin multi-task metric learning

Adv. Neural Inf. Process. Syst.

(2010)

Cited by (40)

Application of asymmetric proximal support vector regression based on multitask learning in the stock market
2023, Expert Systems with Applications
Predicting the stock price is challenging because of its volatility, high dimensions, and complex non-linearity. The multitask learning methods can capture the internal relationship among sub-tasks and obtain better prediction effect than the traditional single-task learning methods. However, most multitask learning methods ignore the inherent distribution of the original samples, which fails at achieving good generalization performance. In this paper, we first present an asymmetric squared $ɛ$ -insensitive loss function, which can improve the generalization ability of the regressor by adjusting the asymmetric parameter. Then, an asymmetric proximal support vector regression (a-PSVR) model is proposed, which greatly improves the flexibility of proximal support vector regression (PSVR). Based on different multitask learning assumptions, two multitask learning asymmetric proximal support vector regression algorithms, i.e., MTL-a-PSVR and EMTL-a-PSVR, are advanced. Both multitask learning algorithms can obtain optimal solutions by solving quadratic programming problems. Additionally, a special case of multitask learning proximal support vector regression (MTL-PSVR) is introduced by analyzing the asymmetric squared $ɛ$ -insensitive loss function. To illustrate the merit of the methods, the proposed models are applied to predict the trends of stock market indices in China and the U.S. and stock prices of four Chinese securities companies. The experimental results demonstrate the significant advantages of the proposed algorithms in prediction effect and generalization performance.
Multi-task twin bounded support vector machine and its safe screening rule
2023, Applied Soft Computing
Direct multi-task twin support vector machine (DMTSVM) obtains great performance in dealing with correlated tasks. However, DMTSVM only considers the empirical risk minimization principle, so it readily causes over-fitting and lowers the prediction accuracy. To enhance the generalization ability of the classifier, we construct a multi-task twin bounded support vector machine (MT-TBSVM), in which a regularization term is introduced into the objective function, thus implementing the structural risk minimization principle. To improve the computational speed of MT-TBSVM, we additionally put forward a safe screening rule (SSR) for it. SSR could identify most inactive samples from the multiple tasks. Therefore, this will result in a significant reduction in the scale of dual problems, meanwhile the testing accuracy keeps unchanged since it does not sacrifice the optimal solution. Moreover, a fast DCDM algorithm is presented for further solving the reduced MT-TBSVM. On fifteen benchmark datasets, numerical experimental results clearly demonstrated the effectiveness of proposed algorithms. Finally, the proposed method is applied to a real Chinese wine dataset to further investigate its validity.
Multi-task learning for energy consumption forecasting of methyl chlorosilanes fractional distillation process
2022, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
For example, select a subset of the original features as the new representation [22]. However, the above three methods are all data-driven methods, which lack the interpretability of the model [23]. In order to make better use of the mechanism of the distillation process and to increase the interpretability of the model, in this paper, a simplified white-box model via the heat and material balance equations is made as a priority.
The fractional distillation process of methyl chlorosilanes composed of multiple distillation units plays a vital role in the silicones industry. It consumes energy intensively because of the high demand for separating capacity. Therefore, it is crucial to establish the energy consumption models for better forecasting. A multi-task learning approach is presented in this paper to improve the model accuracy for each unit while saving the modeling cost. Firstly, the simplified white-box model of each distillation unit is established according to the heat and material balance. Then, the multi-task least square support vector machine algorithm is proposed to identify the model parameters by employing the similarity between multiple distillation units. Finally, the actual industrial data is used in the simulation section to verify the validity, practicability, and advantages of the multi-task models over single-task ones. It shows that the proposed models can enhance the understanding of the distillation process significantly and forecast the energy consumption more accurately than the existing single-task models.
Multi-task nonparallel support vector machine for classification
2022, Applied Soft Computing
Direct multi-task twin support vector machine (DMTSVM) explores the shared information between multiple correlated tasks, then it produces better generalization performance. However, it contains matrix inversion operation when solving the dual problems, so it costs much running time. Moreover, kernel trick cannot be directly utilized in the nonlinear case. To effectively avoid above problems, a novel multi-task nonparallel support vector machine (MTNPSVM) including linear and nonlinear cases is proposed in this paper. By introducing $ε$ -insensitive loss instead of square loss in DMTSVM, MTNPSVM effectively avoids matrix inversion operation and takes full advantage of the kernel trick. Theoretical implication of the model is further discussed. To further improve the computational efficiency, the alternating direction method of multipliers (ADMM) is employed when solving the dual problem. The computational complexity and convergence of the algorithm are provided. In addition, the property and sensitivity of the parameter in model are further explored. The experimental results on fifteen benchmark datasets and twelve image datasets demonstrate the validity of MTNPSVM in comparison with the state-of-the-art algorithms. Finally, it is applied to real Chinese Wine dataset, and also verifies its effectiveness.
Multi-task manifold learning for partial label learning
2022, Information Sciences
In partial label learning (PLL), each instance is associated with a candidate label set, and only one label is ground-truth. PLL aims to identify the ground-truth label out of these candidate labels. Most of the existing PLL approaches focus on single-task PLL, and ignore the auxiliary information of the related tasks. This paper puts forward a novel multi-task manifold learning method for partial label learning (MT-PLL), which learns multiple PLL tasks jointly, and incorporates the auxiliary information of the related tasks to improve the performance of PLL classifiers. MT-PLL assumes that the graph manifold structure guides the generation of labeling confidence for instances in each task. In addition, the information of related tasks can be used to boost the performance of the overall classification model. Then, a heuristic framework is used to optimize the objective function. Numerical experiments have demonstrated that MT-PLL can deliver better performance than state-of-the-art single-task PLL methods.
A novel ramp loss-based multi-task twin support vector machine with multi-parameter safe acceleration
2022, Neural Networks
Citation Excerpt :
They (Xie & Sun, 2015) further proposed an improved MCTSVM model by adding the average of samples into the dataset. Then, Xu, An, Qiao, and Zhu (2004), Mei and Xu (2019) and Li, Tian, Song, and Tao (2015) proposed the MTLS-SVM, MTLS-TWSVM and MTPSVM based on the least square formulation to achieve faster solving speed, respectively. Mei and Xu (2020) proposed two novel multi-task v-TWSVMs which inherit the merits of MTL and v-TWSVM.
Direct multi-task twin support vector machine (DMTSVM) is an effective algorithm to deal with multi-task classification problems. However, the generated hyperplane may shift to outliers since the hinge loss is used in DMTSVM. Therefore, we propose an improved multi-task model RaMTTSVM based on ramp loss to handle noisy points more effectively. It could limit the maximal loss value distinctly and put definite restrictions on the influences of noises. But RaMTTSVM is non-convex which should be solved by CCCP, then a series of approximate convex problems need to be solved. So, it may be time-consuming. Motivated by the sparse solution of our RaMTTSVM, we further propose a safe acceleration rule MSA to accelerate the solving speed. Based on optimality conditions and convex optimization theory, MSA could delete a lot of inactive samples corresponding to 0 elements in dual solutions before solving the model. Then the computation speed can be accelerated by just solving reduced problems. The rule contains three different parts that correspond to different parameters and different iteration phases of CCCP. It can be used not only for the first approximate convex problem of CCCP but also for the successive problems during the iteration process. More importantly, our MSA is safe in the sense that the reduced problem can derive an identical optimal solution as the original problem, so the prediction accuracy will not be disturbed. Experimental results on one artificial dataset, ten Benchmark datasets, ten Image datasets and one real wine dataset confirm the generalization and acceleration ability of our proposed algorithm.

View all citing articles on Scopus

Xinmei Tian is an associate professor in the Department of Electronic Engineering and Information Science, University of Science and Technology of China. She received the Ph.D. degree from the University of Science and Technology of China in 2010. Her current research interests include multimedia information retrieval and machine learning. She received the Excellent Doctoral Dissertation of Chinese Academy of Sciences award in 2012 and the Nomination of National Excellent Doctoral Dissertation award in 2013.

Mingli Song a professor of Computer Science with the College of Computer Science and Technology, Zhejiang University. He received his Ph.D degree in Computer Science and Technology from College of Computer Science, Zhejiang University, and B. Eng. Degree from Northwestern Polytechnical University. He was awarded Microsoft Research Fellowship in 2004. His research interests include Pattern Classification, Weakly Superivsed Clustering, Color and Texture Analysis, Object Recognition, and Reconstruction. He has authored and co-authored more than 70 scientific articles at top venues including IEEE T-PAMI, IEEE T-IP, T-MM, T-SMCB, Information Sciences, Pattern Recognition, CVPR, ECCV and ACM MM. He has served with more than 10 major international conferences including ICDM, ACM Multimedia, ICIP, ICASSP, ICME, PCM, PSIVT and CAIP, and more than 10 prestigious international journals including T-IP, T-VCG, T-KDE, T-MM, T-CSVT, T-NNLS and TSMCB. He is a Senior Member of IEEE, and Professional Member of ACM.

Dacheng Tao is a professor of Computer Science with the Centre for Quantum Computation & Intelligent Systems, and the Faculty of Engineering and Information Technology in the University of Technology, Sydney. He mainly applies statistics and mathematics to data analytics and his research interests spread across computer vision, data science, image processing, machine learning, neural networks and video surveillance. His research results have expounded in one monograph and 100+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, T-CYB, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM; and ACM SIGKDD, with several best paper awards, such as the Best Theory/Algorithm Paper Runner Up Award in IEEE ICDM’07, the Best Student Paper Award in IEEE ICDM׳13, and the 2014 ICDM 10 Year Highest-Impact Paper Award.

^☆: This work is supported by the NSFC under the Contract nos. 61201413 and 61390514, the Fundamental Research Funds for the Central Universities Nos. WK2100060011 and WK2100100021, and the Specialized Research Fund for the Doctoral Program of Higher Education No. WJ2100060003. Australian Research Council Projects: DP-140102164, ARC FT-130101457, and ARC LP-140100569.

¹: CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, China

View full text

Multi-task proximal support vector machine☆

Highlights

Abstract

Introduction

Section snippets

Related work

Multi-task proximal support vector machine

Multi-task proximal support vector regression

Experiments

Conclusion and future work

Pattern Recognit.

Pattern Recognit.

J. Economet.

Image classification using multiscale information fusion based on saliency driven nonlinear diffusion filtering

IEEE Trans. Image Process.: Publ. IEEE Signal Process. Soc.

Learning object-to-class kernels for scene classification

IEEE Trans. Image Process.

Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Spatially-constrained similarity measure for large-scale object retrieval

IEEE Trans. Pattern Anal. Mach. Intell.

Weakly-supervised cross-domain dictionary learning for visual recognition

Int. J. Comput. Vis.

Feature learning for image classification via multiobjective genetic programming

IEEE Trans. Neural Netw. Learn. Syst.

Multitask learning

Mach. Learn.

Large margin multi-task metric learning

Adv. Neural Inf. Process. Syst.