Abstract
The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments of scalability and quality of the implementation. The results show that the proposed solution allows to scale up SVM that gives reasonable quality results. The proposed one-vs-near method significantly improves effectiveness of the classifier construction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
de Kunder, M.: The size of the world wide web (2014), http://www.worldwidewebsize.com/ (Online; accessed May 22, 2014)
Wikipedia: Size of wikipedia (2014), http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia (Online; accessed January 25, 2014)
Gantner, Z., Schmidt-Thieme, L.: Automatic content-based categorization of wikipedia articles. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, People’s Web 2009, pp. 32–37. Association for Computational Linguistics, Stroudsburg (2009)
Han, E.-H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Miao, Y., Qiu, X.: Hierarchical centroid-based classifier for large scale text classification. In: Large Scale Hierarchical Text Classification (2009)
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS, pp. 281–288. MIT Press (2006)
Balicki, J., Korłub, W., Szymanski, J., Zakidalski, M.: Big data paradigm developed in volunteer grid system with genetic programming scheduler. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS (LNAI), vol. 8467, pp. 771–782. Springer, Heidelberg (2014)
Szymański, J.: Wikipedia Articles Representation with Matrix’u. In: Hota, C., Srimani, P.K. (eds.) ICDCIT 2013. LNCS, vol. 7753, pp. 500–510. Springer, Heidelberg (2013)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Springer (1998)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2, 265–292 (2002)
Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99, 67–81 (2004)
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclass classification. In: Advances in Neural Information Processing Systems, vol. 12, pp. 547–553. MIT Press (2000)
Duan, K.-B., Keerthi, S.S.: Which is the best multiclass SVM method? An empirical study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
Szymański, J.: Comparative analysis of text representation methods using classification. Cybernetics and Systems 45, 180–199 (2014)
Szymański, J.: Words context analysis for improvement of information retrieval. In: Nguyen, N.-T., Hoang, K., Jędrzejowicz, P. (eds.) ICCCI 2012, Part I. LNCS, vol. 7653, pp. 318–325. Springer, Heidelberg (2012)
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval, vol. 463. ACM Press, New York (1999)
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings of the 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (2004)
Wikipedia: Wikipedia database dump (2014), http://dumps.wikimedia.org/enwiki/20140102/ (Online; accessed January 25, 2014)
Kryszkiewicz, M., Skonieczny, L.: Faster clustering with DBSCAN. In: Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2005 Conference held in Gdansk, Poland, June 13-16, pp. 605–614 (2005)
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)
Holloway, T., Bozicevic, M., Börner, K.: Analyzing and visualizing the semantic coverage of wikipedia and its authors: Research articles. Complex 12, 30–40 (2007)
Szymański, J.: Mining relations between wikipedia categories. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 248–255. Springer, Heidelberg (2010)
Cudré-Mauroux, P., Kimura, H., Lim, K.T., Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Wang, D.L., Balazinska, M., Becla, J., et al.: A demonstration of scidb: a science-oriented dbms. Proceedings of the VLDB Endowment 2, 1534–1537 (2009)
Balicki, J.: An adaptive quantum-based multiobjective evolutionary algorithm for efficient task assignment in distributed systems. In: Proceedings of the WSEAES 13th International Conference on Computers, ICCOMP 2009, Stevens Point, Wisconsin, USA, pp. 417–422. World Scientific and Engineering Academy and Society (WSEAS) (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Balicki, J., Szymański, J., Kępa, M., Draszawka, K., Korłub, W. (2015). Improving Effectiveness of SVM Classifier for Large Scale Data. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2015. Lecture Notes in Computer Science(), vol 9119. Springer, Cham. https://doi.org/10.1007/978-3-319-19324-3_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-19324-3_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19323-6
Online ISBN: 978-3-319-19324-3
eBook Packages: Computer ScienceComputer Science (R0)