Parallel pattern classification utilizing GPU-based kernelized Slackmin algorithm
Introduction
Nowadays, there is an increased interest in developing smart devices and advanced services making use of some kind of intelligence. The embodiment of intelligent capabilities in any human–machine interaction is achieved by incorporating machine learning techniques and algorithms that enable the interaction of a device with its surrounding environment.
A crucial part of a modern intelligent system is the classification module (classifier), which is responsible to categorize (classify) unknown data to specific predefined categories (classes). A supervised classifier is initially trained on a set of data with known class labels (training data) and then it is used to classify data with unknown class information (testing data). Although an increase to the size of the training data does not always lead to an increase of the classification performance, it is commonly accepted that the more data used for training data, the more possibilities a classifier has for learning.
With the rapid growth of the data interchanged and transmitted through the internet, in conjunction with the wide usage of digital devices in the everyday life there is a need to process a large amount of data. In the fields of machine learning and computational intelligence the classification of massive data has led to novel classification schemes as well as new high-performance computation architectures. Towards this direction, Support Vector Machines (SVMs) [6], Deep Belief Networks (DBNs) [13] and Convolutional Neural Networks (CNNs) [12], among others, have been proposed as classification methods capable of handling large data although their convergence rate can be quite low. For this reason, many parallel approaches [29], [32] e.g. GPU [1], [19], [25], MPI/OpenMP [31], MapReduce [35] to accelerate the training stage of the abovementioned methods, have been proposed in the last years.
Recently, the Slackmin algorithm and some variants were proposed [11], [15], [33], [34], in an attempt to develop a simple yet efficient classification model for medium scale data. The algorithm has shown a competitive classification performance with less complexity and fast convergence rate. In order to improve the medium scale data processing of the Slackmin algorithm, the authors proposed [24] its parallel implementation for the linear case, called cuLSlackmin, using the CUDA (Compute Unified Device Architecture) architecture of a GPU (Graphics Processing Unit).
Although the linear Slackmin algorithm and its parallel version cuLSlackmin have shown satisfactory performance, its kernelized form reveals useful properties and enhanced classification capabilities. This work proposes an efficient parallel implementation of the kernelized Slackmin algorithm, as a way to overcome the classification deficiency of the cuLSlackmin algorithm and the low convergence rates of the serial kernelized Slackmin algorithm. The proposed algorithm, called cuKSlackmin inherits the data representation capabilities owing to the used kernel functions and utilizes the high performance computing of a GPU. The resulting parallel algorithm takes the advantages of the GPU resources in order to be trained with many data and thus the only applied limitation is coming from the computing capabilities of the used GPU card.
It is worth noting that the proposed algorithm shows some limitations concerning the size of shared memory being available in the used GPU hardware. More precisely, the number of the support vectors used in the algorithm controls the amount of shared memory needed by the algorithm and thus it affects not only its accuracy but also its training speed.
The rest of the paper is organized as follows: Section 2 describes in detail the linear as well as the kernelized versions of the Slackmin algorithm. Section 3 highlights the main principles of GPU programming with emphasis to the GPU implementation of machine learning algorithms. Section 4 presents the proposed GPU-based kernelized Slackmin algorithm, while its performance is thoroughly studied experimentally, in Section 5. Finally, Section 6 summarizes the main conclusions derived from the conducted experiments and defines future work towards the improvement of the proposed algorithm to more demanding classification problems.
Section snippets
Slackmin Algorithm
A short description of Slackmin algorithm along with some basic definitions is provided in this section, whereas more theoretical details can be found in [11], [15].
A binary classification problem is defined as the task of data categorization into two predefined classes. The data is represented by a set of feature vectors and the corresponding class labels as follows:
In Eq. (1) and is the th feature vector of dimension and the corresponding
GPU programming
The main contribution of this manuscript is the computational enhancement of the kernelized Slackmin algorithm by utilizing the parallel programming framework provided by the CUDA architecture of a GPU. Therefore, it would be constructive to make a brief discussion of some fundamental concepts related to GPU programming and its impact in machine learning as well.
GPU-based kernelized Slackmin Algorithm
The promising results of the cuLSlackmin [24] have motivated the authors to proceed with the parallelization of the kernelized Slackmin algorithm (Section 2.2) in this section, in order to deal with highly nonlinear classification problems as well as large amount of data.
Experimental study
In order to study the convergence rate of the proposed cuKSlackmin algorithm as well as its classification performance, a set of experiments has been conducted. In these experiments the low cost mid-range Nvidia GT440 GPU card having the specifications of Table 1, is used. The card is mounted in a desktop computer equipped with Intel i5-750 2.67 GHz 64-bit CPU (4 cores, 4 threads, L2-Cache: 4×256 kB, L3-Cache: 8 MB shared) serving as the host, 4 GB RAM and Windows 8.1 OS, whereas the C/C++
Conclusion and future work
A parallel implementation of the kernelized Slackmin algorithm was presented in the previous sections. The proposed cuKSlackmin algorithm utilizes the parallel computing capabilities of a GPU card following the CUDA programming framework. The efficiency of the introduced cuKSlackmin algorithm was examined using four benchmark pattern classification datasets and its performance was compared with that of some established serial and parallel SVM-like algorithms. The experimental results
Acknowledgments
This research has been co-financed by the European Union (European Social Fund ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: THALES (MIS 380292). Investing in knowledge society through the European Social Fund.
G.A. Papakostas received the diploma in Electrical and Computer Engineering in 1999 and the M.Sc. and Ph.D. degrees in Electrical and Computer Engineering in 2002 and 2007, respectively, from the Democritus University of Thrace (DUTH), Greece. From 2007 to 2010 he served as an Adjunct Lecturer at the Department of Production Engineering and Management of DUTH. Dr. Papakostas currently serves as a full Professor at the Department of Computer and Informatics Engineering. He has (co)authored more
References (37)
- et al.
Efficient binary classification through energy minimisation of slack variables
Neurocomputing
(2015) - et al.
Self adaptable multithreaded object detection on embedded multicore systems
J. Parallel Distrib. Comput.
(2015) - et al.
Parallel multitask cross validation for support vector machine using GPU
J. Parallel Distrib. Comput.
(2013) Parallel approaches to machine learning–a comprehensive survey
J. Parallel Distrib. Comput.
(2013)- A. Athanasopoulos, A. Dimou, V. Mezaris, I. Kompatsiaris, GPU acceleration for support vector machines, in: 12th...
- et al.
Fast kernel classifiers with online and active learning
J. Mach. Learn. Res.
(2005) - B. Catanzaro, N. Sundaram, K. Keutzer, Fast support vector machine training and classification on graphics processors,...
- et al.
LIBSVM: a library for support vector machines
ACM Trans. Intell. Syst. Technol.
(2011) Training a support vector machine in the primal
Neural Comput.
(2007)- et al.
Support-vector networks
Mach. Learn.
(1995)
OpenMP: an industry standard API for shared-memory programming
IEEE Comput. Sci. Eng.
MapReduce: simplified data processing on large clusters
Commun. ACM
Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
Biol. Cybernet.
A fast learning algorithm for deep belief nets
Neural Comput.
A comparison of methods for multiclass support vector machines
IEEE Trans. Neural Netw.
Cited by (6)
A Parallel Multilevel Feature Selection algorithm for improved cancer classification
2020, Journal of Parallel and Distributed ComputingCitation Excerpt :As Spark could handle only continuous values, the nominal class labels such as Active (non-cancerous) and Inactive (cancerous) were converted to 1 and 0 respectively for further processing. In order to study the transcriptional activity of P53 mutants, multiple site dataset was separated into site-specific data from 1-site to 6-site with the help of P53 mutants’ instance tags [36]. Finally, the file header was incorporated in each site-wise data which provided attribute nomenclature such as V1 to V5409.
Skew Class-Balanced Re-Weighting for Unbiased Scene Graph Generation
2023, Machine Learning and Knowledge ExtractionMachine Learning-Based Approach for Airfare Forecasting
2023, Lecture Notes in Networks and SystemsComprehensive Analysis of the Uses of GPU and CUDA in Soft-Computing Techniques
2019, 2019 6th International Conference on Signal Processing and Integrated Networks, SPIN 2019Airfare prices prediction using machine learning techniques
2017, 25th European Signal Processing Conference, EUSIPCO 2017
G.A. Papakostas received the diploma in Electrical and Computer Engineering in 1999 and the M.Sc. and Ph.D. degrees in Electrical and Computer Engineering in 2002 and 2007, respectively, from the Democritus University of Thrace (DUTH), Greece. From 2007 to 2010 he served as an Adjunct Lecturer at the Department of Production Engineering and Management of DUTH. Dr. Papakostas currently serves as a full Professor at the Department of Computer and Informatics Engineering. He has (co)authored more than 80 publications in indexed journals, international conferences and book chapters. His research interests include pattern recognition, computer/machine vision, computational intelligence, machine learning, feature extraction, evolutionary optimization, parallel and distributed computing, signal and image processing. Dr. Papakostas served as a reviewer in numerous journals and conferences and he is a member of the IAENG, MIR Labs, EUCogIII and the Technical Chamber of Greece (TEE).
K.I. Diamantaras received the Diploma from the National Technical University of Athens, Greece and the Ph.D. degree in Electrical Engineering from Princeton University in 1992. Subsequently, he joined Siemens Corporate Research, Princeton, NJ, as a Post-Doctoral Researcher and then the Department of Electrical and Computer Engineering, Aristotelian University of Thessaloniki, Greece. Since 1998, he is with the Department of Information Technology, Technological Education Institute (T.E.I.) of Thessaloniki, where he currently holds the position of Professor. His research interests include machine learning, signal processing, and image processing. He is a Senior Member IEEE. He is the co-author of the book “Principal Component Neural Networks: Theory and Applications”, Wiley, New York, 1996. He currently serves as an editor for the IEEE Transactions on Signal Processing, the IEEE Signal Processing Letters and the Journal of Signal Processing Systems (Springer). In the past, he was also an Associate Editor for the IEEE Transactions on Neural Networks. In 1997, he received the IEEE Best Paper Award in the area of Neural Networks for Signal Processing for the paper “Adaptive Principal Component Extraction (APEX) and Applications”. He has been a member of the technical committee for various machine learning, signal processing and neural networks conferences.
T. Papadimitriou was born in Thessaloniki, Greece, in 1972. He received the Diploma degree in mathematics from the Aristotle University of Thessaloniki, Greece, and the D.E.A. A.R.A.V.I.S (Automatique, Robotique, Algorithmique, Vision, Image, Signale) degree from the University of Nice-Sophia Antipolis, France, both in 1996 and the Ph.D. degree from the Aristotle University of Thessaloniki in 2000. In 2001, he joined the Department of Economics, Democritus University of Thrace, Komotini, Greece, where, he served as a lecturer (2002–2008), assistant professor (2008–2013). Currently he holds the position of Associate Professor in the same department. Dr. Papadimitriou co-authored more than 80 journal papers, conference papers and book chapters combined. He served as a reviewer for various publications and as a member to scientific committees for Conferences and Workshops. Theophilos Papadimitriou current research interests include complex network, machine learning, and data analysis.