Parallel multitask cross validation for Support Vector Machine using GPU

https://doi.org/10.1016/j.jpdc.2012.02.011Get rights and content

Abstract

The Support Vector Machine (SVM) is an efficient tool in machine learning with high accuracy performance. However, in order to achieve the highest accuracy performance, n-fold cross validation is commonly used to identify the best hyperparameters for SVM. This becomes a weak point of SVM due to the extremely long training time for various hyperparameters of different kernel functions. In this paper, a novel parallel SVM training implementation is proposed to accelerate the cross validation procedure by running multiple training tasks simultaneously on a Graphics Processing Unit (GPU). All of these tasks with different hyperparameters share the same cache memory which stores the kernel matrix of the support vectors. Therefore, this heavily reduces redundant computations of kernel values across different training tasks. Considering that the computations of kernel values are the most time consuming operations in SVM training, the total time cost of the cross validation procedure decreases significantly. The experimental tests indicate that the time cost for the multitask cross validation training is very close to the time cost of the slowest task trained alone. Comparison tests have shown that the proposed method is 10 to 100 times faster compared to the state of the art LIBSVM tool.

Highlights

► We build an SVM tool with efficient cross validation implementation using GPU. ► Cross validation shows 10 to 100 times speedup compared to LIBSVM. ► The accuracy performance is as good as LIBSVM. ► The speed performances of both training and predicting phases of SVM are improved.

Introduction

The recent developments of Graphics Processing Unit (GPU) have shown superb computational performance on floating point operations compared to the current multi-core CPU. Certain high performance GPUs are now designed to solve general purpose computing problems instead of graphics rendering, which was considered as the sole purpose of using GPUs previously. Many research work have shown promising performance results gained by using GPUs on a wide area of topics. The speed performance of Support Vector Machine (SVM) has been improved by using GPU through both the training and testing phases shown in [1]. Although not all algorithms can be parallelized, some applications can benefit from the massive parallel processing capability of GPU by doing some very simple tweaks such as using GPU Basic Linear Algebra Subroutines (CUBLAS) [18]. Most of these algorithms are often called data parallel algorithms [13]. Task level parallelism is not the primary advantage of GPU compared to multi-core CPU, however possibilities of using both task and data level parallelism should be explored and analyzed in order to achieve the maximum performance of GPU. This is our starting point of developing the parallel SVM algorithm and its cross validation procedure using GPU.

Support Vector Machine [23] is a learning algorithm which has become popular due to its high accuracy performance in solving both regression and classification tasks. Nevertheless, the training phase of an SVM is a computationally expensive task because the core part of the training stage is solving a QP problem [20]. There are countless efforts and studies which have been done on how to reduce the training time of SVM. After Vapnik invented SVM, he described a method known as “chunking” to break down the large QP problem into a series of smaller QP problems. This method significantly reduced the size of the matrix but it still could not solve large problem due to the computer memory limitations at that time. Osuna et al. presented a decomposition approach using iterative methods in [19]. Joachims introduced practical techniques such as shrinking and kernel caching in [16], which are common implementation in many modern SVM softwares today. He also published his own SVM software called SVMLight [16] that uses these techniques. Platt invented SMO [20] to solve the standard QP problem by iteratively solving a QP problem with only two unknowns using analytical methods. This method requires a small amount of computer memory. Therefore, it addresses the memory limitation issue brought by large training data sets. Later on, Keerthi et al. developed an improved SMO in [17], which resolved the slow convergence issue in Platt’s method. More recently Fan et al. introduced a series of working set selection [9], which further improved the speed of convergence. The method has been implemented in the state of the art LIBSVM software [5]. Huang et al. developed the ISDA algorithm [14] to solve the SVM QP problem without the bias. The above major contributions summarize the work of how to implement a fast classic SVM in sequential programming.

Some earlier works using parallel techniques in SVM can be found in [6], [8], [24], [15]. Cao et al. presented a very practical Parallel SMO [3] implemented with Message Passing Interface on a cluster system. The performance gain of training SVM using clusters shows the beauty of parallel processing. This method is also the foundation of the proposed GPUSVM. Graf et al. introduced the Cascade SVM [11] which decomposes the training data set to multiple chunks and trains them separately. Then the support vectors from different individual classifiers are combined and fed back to the system again. They proved that the global optimal solution can be achieved by using this method. This parallelism is in the task level compared to the data level parallelism in Cao et al.’s work [3]. Cascade SVM offers a way to handle ultra-large data set training. Catanzaro et al. proposed a method to train a binary SVM classifier using GPU in [4]. Significant speed improvement is reported compared to the LIBSVM software. The latest GPU version of SVM is from [12]. They enable the possibility of training a multi-class classification problem on a GPU.

Cross validation [22], [2] is a commonly used procedure which searches for the best hyperparameters. Particularly, n-fold cross validation is generally used for SVM training. Most current SVM tools offer cross validation function by sequentially training through a group of different hyperparameter combinations. These tasks usually run one by one and do not communicate with each other. People tend to use multi-threading to run multiple tasks at the same time but they are still independent from each other. And the performance gain of multi-threading highly depends upon the hardware specification. Our aim is letting the multiple training tasks be aware of each other and share the kernel matrix cached in memory. The novelty of this method let every task synchronize together at each iteration of the training phase and if some of these tasks share the same support vectors, there is no need to do duplicated kernel computations but simply fetch the kernel results from the memory. The most time consuming part, which is the nonlinear kernel computations, can be heavily reduced.

This paper is organized as follows. Section 2 briefly reviews the basic principles of SVM. Section 3 introduces the GPU hardware and its software platform CUDA for general purpose computing. Section 4 explains the parallel SMO algorithm implemented on GPU and the implementation for cross validation procedure. Performance results of the proposed algorithm are presented and analyzed in Section 5. Section 6 summarizes the conclusions and points out the future research direction.

Section snippets

Support Vector Machine

This section reviews the basic principles of designing an L1 soft margin SVM for solving binary classification problems.

General purpose computing using graphics processing unit

This section briefly introduces the GPU hardware and the CUDA development platform.

CUDA implementation of multitask cross validation for SVM

This section introduces how to integrate the multitask cross validation in the binary SVM algorithm using CUDA.

Experimental results

This section presents the performance results obtained by the proposed algorithm. The routines are developed using CUDA in C/C++. The following measurements are carried out by our latest workstation computer equipped with two Intel Xeon X5680 3.3 GHz six-core CPUs, 96 GB ECC DDR3 1333 MHz main memory, six Tesla C2050 with 3 GB GDDR5 memory each and two Tesla C2070 with 6 GB GDDR5 memory each. The storage device is a 128 GB SSD with Fedora Core Linux 14×64 installed. The CUDA driver and runtime

Conclusions

In summary, the proposed GPUSVM shows excellent speed improvement compared to the state of the art LIBSVM tool. It has as good performance as LIBSVM in terms of classification accuracy. The speed improvement is further increased by involving multitask training for different hyperparameters in the cross validation procedure. The standard cross validation procedure suffers from the slow training because of the redundant computations of kernel values across multiple tasks in the traditional

Qi Li received his B.S. Degree in Electronic Engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2007 and M.S. Degree in Computer Science from Virginia Commonwealth University, Richmond, United States, in 2008. He is now a Ph.D. candidate in Computer Science at Virginia Commonwealth University. His research interests include data mining and parallel computing using GPU.

References (24)

  • G. Zanghirati et al.

    A parallel solver for large quadratic programs in training support vector machines

    Parallel Computing

    (2003)
  • A. Athanasopoulos, A. Dimou, V. Mezaris, I. Kompatsiaris, Gpu acceleration for support vector machines, in: Image...
  • Y. Bengio et al.

    No unbiased estimator of the variance of k-fold cross-validation

    Journal of Machine Learning Research

    (2004)
  • L.J. Cao et al.

    Parallel sequential minimal optimization for the training of support vector machines

    Neural Networks, IEEE Transactions on

    (2006)
  • B. Catanzaro et al.

    Fast support vector machine training and classification on graphics processors

  • C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, 2001. Software available at...
  • R. Collobert et al.

    A parallel mixture of svms for very large scale problems

    Neural Computation

    (2002)
  • C. Cortes et al.

    Support-vector networks

    Machine Learning

    (1995)
  • J.-X. Dong et al.

    A fast parallel optimization for training support vector machine

  • R.-E. Fan et al.

    Working set selection using second order information for training support vector machines

    Journal of Machine Learning Research

    (2005)
  • A. Frank, A. Asuncion, UCI machine learning repository, 2010. URL...
  • H.P. Graf et al.

    Parallel support vector machines: the cascade svm

  • Cited by (0)

    Qi Li received his B.S. Degree in Electronic Engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2007 and M.S. Degree in Computer Science from Virginia Commonwealth University, Richmond, United States, in 2008. He is now a Ph.D. candidate in Computer Science at Virginia Commonwealth University. His research interests include data mining and parallel computing using GPU.

    Raied Salman received the B.S. Degree (with high distinction) in Electrical Engineering and M.S. Degree in Computer Control from the University of Technology, Baghdad, Iraq in 1976 and 1978, respectively. He also received the Ph.D. in Electrical Engineering from Brunel University, England, UK in 1989. He is currently a Ph.D. candidate in the Department of Computer Science at Virginia Commonwealth University, Richmond, VA. His research interests include machine learning and data mining for large data sets.

    Erik Test is a Ph.D. student at Virginia Commonwealth University (VCU), Richmond, United States studying Computer Science and will complete his Master’s Degree in May, 2011. He received his Bachelor’s Degree in Computer Science in 2007 from VCU. He also gained previous work experience at Acision, BAE Systems, and SENTEL. His research interests are high performance computing (HPC), GPU computing, and machine learning as well as machine learning in an HPC framework.

    Robert Strack received his M.S. Eng. Degree in Computer Science from AGH University of Science and Technology, Cracow, Poland in 2007. He is now working towards his Ph.D. Degree in Computer Science at Virginia Commonwealth University, Richmond, US. His research is oriented towards machine learning and data mining algorithms and his field of interest includes Support Vector Machine classification and parallel computing.

    Vojislav Kecman is with VCU, Dept. of CS, Richmond, VA, USA, working in the fields of machine learning by both Support Vector Machines (SVMs) and neural networks, as well as by local approaches such as Adaptive Local Hyperplane (ALH) and Local SVMs, in different regression (function approximation) and pattern recognition (classification, decision making) tasks. He was a Fulbright Professor at MIT, Cambridge, MA, a Konrad Zuse Professor at FH Heilbronn, DFG Scientist at TU Darmstadt, a Research Fellow at Drexel University, Philadelphia, PA, and at Stuttgart University. Dr. Kecman authored several books on ML (see www.supportvector.ws and www.learning-from-data.com).

    View full text