Abstract
Data management becomes a complex task when hundreds of petabytes of data are being gathered, stored and processed on a day to day basis. Efficient processing of the exponentially growing data is inevitable in this context. This paper discusses about the processing of a huge amount of data through Support Vector machine (SVM) algorithm using different techniques ranging from single node Linier implementation to parallel processing using the distributed processing frameworks like Hadoop. Map-Reduce component of Hadoop performs the parallelization process which is used to feed information to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression analysis. Paper also does a detailed anatomy of SVM algorithm and sets a roadmap for implementing the same in both linear and Map-Reduce fashion. The main objective is explain in detail the steps involved in developing an SVM algorithm from scratch using standard linear and Map-Reduce techniques and also conduct a performance analysis across linear implementation of SVM, SVM implementation in single node Hadoop, SVM implementation in Hadoop cluster and also against a proven tool like R, gauging them with respect to the accuracy achieved, their processing pace against varying data sizes, capability to handle huge data volume without breaking etc.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Freenor, M.: An implementation of SVM for botnet detection in Support Vector Machines and Hadoop: Theory vs. Practice
Sun, Z.: Geoffrey Fox Study on Parallel SVM Based on MapReduce Key Laboratory for Computer Network of Shandong Province, Shandong Computer Science Center, Jinan, Shandong, 250014, China 2School of Informatics and Computing, Pervasive Technology Institute, Indiana University Bloomington, Bloomington, Indiana, 47408, USA
Srinivas, R.: Managing Large Sets Using Support Vector Machines. University of Nebraska at Lincoln
Yu, H., Yang, J., Han, J.: Classifying Large Data Sets Using SVMs with Hierarchical Clusters (Department of Computer Science University of Illinois Urbana-Champaign, IL 61801 USA)
Pontil, M., Verri, A.: Properties of Support Vector Machines Massachusetts institute of technology artificial intelligence laboratory and center for biological and computational learning department of brain and cognitive sciences
Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H.: PSVM: Parallelizing Support Vector Machines on Distributed Computers Google Research, Beijing, China
Support Vector Machine Tutorial, ung (Ph.D) Dept. of CSIE, CYUT
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters Google, Inc.
HDFS Under the Hood, Sanjay Radia Sradia Grid Computing, Hadoop
Soman, K.P., Loganathan, R., Ajay, V.: Support Vector Machines and Other Kernel Methods by Centre for Excellence in Computational Engineering and Networking. Amrita Vishwa Vidyapeetham, Coimbatore
Kiran, M., Kumar, A., Mukherjee, S., Prakash, R.: G Verification and Validation of MapReduce Program Model for Parallel Support Vector Machine Algorithm on Hadoop Cluster
Pechyony, D., Shen, L., Jones, R.: Solving Large Scale Linear SVM with DistributedBlock Minimization
Bhonde, M., Patil, P.: Efficient Text Classification Model Based on Improved Hyper- sphere Support Vector Machine with Map Reduce and Hadoop
Pechyony, D., Shen, L., Jones, R.: Solving Large Scale Linear SVM with Distributed Block Minimization
Yang, H.-C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool
Chu, C.-T., Kim, S.K., Lin, Y.A., Yu, Y.Y., Bradsky, G., Ng, A.Y., Olukotun, K.: Map-Reduce For Machine Learning on Multicore
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sukanya, M.V., Sathyadevan, S., Sreeveni, U.B.U. (2015). Benchmarking Support Vector Machines Implementation Using Multiple Techniques. In: El-Alfy, ES., Thampi, S., Takagi, H., Piramuthu, S., Hanne, T. (eds) Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-319-11218-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-11218-3_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11217-6
Online ISBN: 978-3-319-11218-3
eBook Packages: EngineeringEngineering (R0)