Elsevier

Neurocomputing

Volume 191, 26 May 2016, Pages 175-186
Neurocomputing

Efficient parallel implementation of kernel methods

https://doi.org/10.1016/j.neucom.2015.11.097Get rights and content

Abstract

The availability of multi-core processors has motivated an increasing interest in research lines about parallelization of machine learning algorithms. Kernel methods such as Support Vector Machines (SVMs) or Gaussian Processes (GPs), in spite of their efficacy solving problems of classification and regression, have a very high computational cost and usually produce very large models. In this paper we present parallel algorithmic implementations of Semiparametric SVM (Parallel Semiparametric SVM, PS-SVM) and Gaussian Processes (Parallel full GP, P-GP and Parallel Semiparametric GP, PS-GP). We have implemented the proposed methods using OpenMP and benchmarked them against other state of the art methods, showing their good performance and advantages in both computation time and final model size.

Introduction

Kernel methods are very popular in machine learning because they produce highly competitive results in many practical tasks. They transform the input space onto a high dimensional one where inner products are computed using a kernel function. The most relevant techniques are Support Vector Machines (SVMs) for classification problems and Gaussian Processes (GPs) for regression.

Support Vector Machines [1] are one of the most successful kernel techniques, which aim to obtain a maximum margin separating hyper-plane. They are very popular because they automatically adjust the machine size and also produce highly competitive results in many real world problems. The resulting size of the classifier is often very large, that represents a high computational cost. Many research lines have emerged to solve this problem of complexity and scalability. Some works [2], [3], [4], [5] calculate a full SVM and reduce afterwards the machine size by solving a preimage problem [6]. In [7], [8], to avoid the calculation of a full SVM, they propose an iterative growing architecture. In [8], Sparse Greedy Matrix Approximation (SGMA) is proposed to iteratively select candidates to grow a semiparametric model. Peng et al. [9] introduce a criterion for identification of support vectors leading to a reduced support vector set. Other works [10] are focused on improving the classification complexity using decision trees.

Gaussian processes [11] are also non-parametric methods considered the “state-of-art” solving regression problems and relying on probabilistic Bayesian models. Unfortunately, their direct application is limited due to the high training time and computational cost O(n3) in non-sparse solutions, where n is the size of the training set. There are also some iterative greedy schemes that obtain a reduced GP. Among those, [12], [13] are based on minimizing Kullback–Leibler divergences, [14] uses a MAP criteria to select in every iteration the candidate to grow the model and [15] selects in every iteration the element that maximizes the evidence to avoid overfitting problems.

Since run time is the main problem of kernel methods, parallelization is one of the most important techniques to accelerate them. Currently, the semiconductor industry is creating new designs of processors that increase their performance with the inclusion of more cores in a single chip. With the emergence of multi-core processors and new programming interfaces such as OpenMP [16] to develop parallel software, many research lines about parallelization in kernel methods have been opened.

Early works on parallelization in SVMs propose to split the training set, train different SVMs on every data chunk and combine the results using a neural network [17] or to train a new SVM using the obtained Support Vectors [18]. In [19], a parallel version using a cascade of SVMs is used. Recently new methods have appeared, such as PSVM [20], Parallel SMO [21], [22] or Graphics Processing Unit (GPU) Tailored Approach SVM [23]. After the apparition of the Big Data technologies a MapReduced based SVM is used in [24], [25] to solve problems in a distributed environment.

PSVM solves the Quadratic Programming problem using a parallel implementation of the Interior Point Method (IPM) [26] and Incomplete Cholesky Factorization. Parallel SMO uses a parallel version of SMO [27] that divides the quadratic problem into a series of smaller subproblems, which can be solved analytically. An implementation for GPUs that uses clustering techniques to handle sparse data sets is presented in [23]. In GPs [28] uses domain decomposition to solve 2-dimensional problems in parallel.

Due to the fact that the run time of the training procedure and the complexity of the model are the main weaknesses of Kernel methods, our proposal here consists in the development of new schemas that can address these issues. To that end we are benefiting from two different techniques:

Semiparametric models: That can solve the issue of the model complexity, as presented in previous works [29], because the final machines are written as a function of a set of representatives, instead of support Vectors (as in SVMs) or all data (as in GPs). These models have been shown to achieve similar performance as the full machines but with a lower computational cost and complexity.

Parallel computing: That can solve the issue of the scalability and the excessive run time of the training procedure by simultaneously using multiple computer resources to solve the problem.

By using these techniques we have developed three different models.

  • PS-SVM : A parallel and semiparametric version of the SVM.

  • P-GP: A parallel version of the GPs.

  • PS-GP: A parallel and semiparametric version of the GPs.

This paper is organized as follows. In Section 2 we describe our algorithms. Experimental results are provided in Section 3. Finally we describe the conclusions in Section 4.

Section snippets

Algorithms

When developing parallel code, the two most important issues to avoid if possible are:

  • Non-parallelizable sections of code: Because they put the upper bound of speedup in our model according to Amdahl׳s law [30]. The run time of our non-parallel code is absolutely despicable comparing to the whole run time.

  • Communication between threads: To avoid possible bottlenecks we have selected OpenMP as the parallel framework because when a subtask finishes its job another subtask can access the results

Experiments

All the algorithms have been implemented in C using OpenMP [16]. We conducted experiments to evaluate their efficiency and acceleration performance. The experiments were executed on a HP DL160 G6 server with 48 GBytes and 2 Intel Xeon X5675 processors (each one has 6 cores with hyperthreading technology).

To evaluate the parallelization quality we took the speedup parameter:Speedup=MeanserialruntimeMeanparallelruntime

Conclusions

We have proposed several parallel algorithms for Kernel Methods: one method aimed at solving classification problems called Parallel Semiparametric SVMs (PS-SVM) and two methods intended for solving regression problems, a parallel version of a full GP (P-GP) and the parallel implementation of the SGEV algorithm for sparse GP training (PS-GP). The technique underlying these parallel implementations is based on the division of matrices in quadtrees for the parallelization of matrix inversion in

Roberto Díaz Morales received his Telecommunications Engineering degree from the University Carlos III of Madrid (Spain) in 2006. Until 2008 he worked in Sun Microsystems in the web services area. He received the M.Sc. (Hons.) degree in multimedia and communications from the University Carlos III de Madrid in 2011 and finished his PhD. in 2016. His research interests are focused on machine learning.

References (42)

  • B. Schölkopf, P. Knirsch, A. Smola, C. Burges, Fast approximation of support vector kernel expansions, and an...
  • A. Smola, B. Schölkopf, Sparse greedy matrix approximation for machine learning, In: Proceedings of the 17th...
  • J.-Y. Kwok et al.

    The pre-image problem in kernel methods

    IEEE Trans. Neural Netw.

    (2004)
  • C. Rasmussen, Gaussian processes in machine learning, Advanced Lectures on Machine Learning, 2004, pp....
  • L. Csató et al.

    Sparse on-line Gaussian processes

    Neural Comput.

    (2002)
  • N. Lawrence, M. Seeger, R. Herbrich, et al., Fast sparse gaussian process methods: the informative vector machine, In:...
  • A. Smola, P. Bartlett, Sparse greedy gaussian process regression, Advances in Neural Information Processing Systems,...
  • J. Quinonero-Candela et al.

    Analysis of some methods for reduced rank Gaussian process regression

    Switch. Learn. Feedback Syst.

    (2005)
  • L. Dagum et al.

    Openmpan industry standard api for shared-memory programming

    Comput. Sci. Eng. IEEE

    (1998)
  • R. Collobert et al.

    A parallel mixture of svms for very large scale problems

    Neural Comput.

    (2002)
  • J. Dong, A. Krzyżak, C. Suen, A fast parallel optimization for training support vector machine, In: Proceedings of the...
  • Cited by (0)

    Roberto Díaz Morales received his Telecommunications Engineering degree from the University Carlos III of Madrid (Spain) in 2006. Until 2008 he worked in Sun Microsystems in the web services area. He received the M.Sc. (Hons.) degree in multimedia and communications from the University Carlos III de Madrid in 2011 and finished his PhD. in 2016. His research interests are focused on machine learning.

    Ángel Navia-Vázquez received his degree in Telecommunications Engineering in 1992 (Universidad de Vigo, Spain), and finished his PhD, also in Telecommunications Engineering, in 1997 (Universidad Politécnica de Madrid, Spain). He is now an Associate Professor at the Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Spain. His research interests are focused on new architectures and algorithms for nonlinear processing, as well as their application to multimedia processing, communications, data mining and content management. He has (co)authored 26 international refereed journal papers in these areas, several book chapters, more than 40 conference communications, and participated in more than 20 research projects. He is IEEE (Senior) Member since 1999.

    This work has been partly supported by Spanish MEC project TIN2011-24533.

    View full text