Abstract
Machine Learning (ML) on massive scale datasets, called Big Data, has become a challenge for traditional computing and storage technologies. Henceforth, massive scale ML is an emerging domain of research. Least Square Twin Support Vector Machine (LSTSVM) is a faster variant of Support Vector Machine (SVM). However, it suffers from scalability issues and shows computational and/or storage bottlenecks on massive datasets. Proposed work designs a scalable solution to LSTSVM called Distributed LSTSVM (DLSTSVM). DLSTSVM is designed using distributed parallel computing on top of cluster of multiple machines. After applying horizontal partitioning on massive datasets, DLSTSVM trains it in distributed parallel fashion and finds two non-parallel hyper-planes as decision boundaries for two different classes. MapReduce paradigm is utilized to execute parallel computation on partitioned data in a way that averts memory constraints. Proposed technique achieves computational and storage scalability without losing prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Jayadeva, Khemchandani, R., Chandra, S.: Twin support vector machine for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)
Kumar, M.A., Gopal, M.: Least squares twin support vector machines for pattern classification. Expert Syst. Appl. 36(4), 7535–7543 (2009)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys 2007, pp. 59–72 (2007)
Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: SIGMOD 2007, pp. 1029–1040. ACM (2007)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot Topics in Cloud Computing (HotCloud 2010), p. 10. USENIX Association, Berkeley (2010)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI 2012), p. 2. USENIX Association (2012)
Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in Action. Manning Publications, Greenwich (2011)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)
Budiu, M., Fetterly, D., Isard, M., McSherry, F., Yu, Y.: Large-scale machine learning using DryadLINQ. In: Bekkerman, R., Bilenko, M., Langford, J. (eds.) Scaling Up Machine Learning. Cambridge University Press, Cambridge (2012)
Pednault, E., Tov, Y.E., Ghoting, E.: IBM parallel machine learning toolbox. In: Bekkerman, R., et al. (eds.) Scaling Up Machine Learning. Cambridge University Press, Cambridge (2012)
Apache Spark: Apache Spark: lightning-fast cluster computing (2016)
Upadhyaya, S.R.: Parallel approaches to machine learning—a comprehensive survey. J Parallel Distrib. Comput. 73(3), 284–292 (2013)
Peteiro-Barral, D., Guijarro-Berdiñas, B.: A survey of methods for distributed machine learning. Prog. Artif. Intell. 2(1), 1–11 (2012)
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Tomar, D., Agarwal, S.: A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl. Based Syst. 81, 131–147 (2015)
Collobert, R., et al.: A parallel mixture of SVMs for very large scale problems (2002)
Zanghirati, G., Zanni, L.: A parallel solver for large quadratic programs in training support vector machines. Parallel Comput. 29, 535–551 (2003)
Hazan, T., Man, A., Shashua, A.: A parallel decomposition solver for SVM: distributed dual ascend using Fenchel duality. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)
Graf, H., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Neural Information Processing Systems (2004)
Chu, C., et al.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288. MIT Press (2006)
Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp. 104–111. ACM (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Prasad, B.R., Agarwal, S. (2019). Scalable Least Square Twin Support Vector Machine Learning. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-27520-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27519-8
Online ISBN: 978-3-030-27520-4
eBook Packages: Computer ScienceComputer Science (R0)