Improving Effectiveness of SVM Classifier for Large Scale Data

Balicki, Jerzy; Szymański, Julian; Kępa, Marcin; Draszawka, Karol; Korłub, Waldemar

doi:10.1007/978-3-319-19324-3_60

Jerzy Balicki¹⁰,
Julian Szymański¹⁰,
Marcin Kępa¹⁰,
Karol Draszawka¹⁰ &
…
Waldemar Korłub¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9119))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2049 Accesses

Abstract

The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments of scalability and quality of the implementation. The results show that the proposed solution allows to scale up SVM that gives reasonable quality results. The proposed one-vs-near method significantly improves effectiveness of the classifier construction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Parallel Learning of Local SVM Algorithms for Classifying Large Datasets

Incremental Parallel Support Vector Machines for Classifying Large-Scale Multi-class Image Datasets

Parallel Algorithm of Local Support Vector Regression for Large Datasets

References

de Kunder, M.: The size of the world wide web (2014), http://www.worldwidewebsize.com/ (Online; accessed May 22, 2014)
Wikipedia: Size of wikipedia (2014), http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia (Online; accessed January 25, 2014)
Gantner, Z., Schmidt-Thieme, L.: Automatic content-based categorization of wikipedia articles. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, People’s Web 2009, pp. 32–37. Association for Computational Linguistics, Stroudsburg (2009)
Chapter Google Scholar
Han, E.-H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Chapter Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
MATH Google Scholar
Miao, Y., Qiu, X.: Hierarchical centroid-based classifier for large scale text classification. In: Large Scale Hierarchical Text Classification (2009)
Google Scholar
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS, pp. 281–288. MIT Press (2006)
Google Scholar
Balicki, J., Korłub, W., Szymanski, J., Zakidalski, M.: Big data paradigm developed in volunteer grid system with genetic programming scheduler. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS (LNAI), vol. 8467, pp. 771–782. Springer, Heidelberg (2014)
Chapter Google Scholar
Szymański, J.: Wikipedia Articles Representation with Matrix’u. In: Hota, C., Srimani, P.K. (eds.) ICDCIT 2013. LNCS, vol. 7753, pp. 500–510. Springer, Heidelberg (2013)
Chapter Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Springer (1998)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2, 265–292 (2002)
MATH Google Scholar
Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99, 67–81 (2004)
Article MathSciNet MATH Google Scholar
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclass classification. In: Advances in Neural Information Processing Systems, vol. 12, pp. 547–553. MIT Press (2000)
Google Scholar
Duan, K.-B., Keerthi, S.S.: Which is the best multiclass SVM method? An empirical study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
Chapter Google Scholar
Szymański, J.: Comparative analysis of text representation methods using classification. Cybernetics and Systems 45, 180–199 (2014)
Article Google Scholar
Szymański, J.: Words context analysis for improvement of information retrieval. In: Nguyen, N.-T., Hoang, K., Jędrzejowicz, P. (eds.) ICCCI 2012, Part I. LNCS, vol. 7653, pp. 318–325. Springer, Heidelberg (2012)
Chapter Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval, vol. 463. ACM Press, New York (1999)
Google Scholar
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings of the 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (2004)
Google Scholar
Wikipedia: Wikipedia database dump (2014), http://dumps.wikimedia.org/enwiki/20140102/ (Online; accessed January 25, 2014)
Kryszkiewicz, M., Skonieczny, L.: Faster clustering with DBSCAN. In: Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2005 Conference held in Gdansk, Poland, June 13-16, pp. 605–614 (2005)
Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)
Google Scholar
Holloway, T., Bozicevic, M., Börner, K.: Analyzing and visualizing the semantic coverage of wikipedia and its authors: Research articles. Complex 12, 30–40 (2007)
Article Google Scholar
Szymański, J.: Mining relations between wikipedia categories. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 248–255. Springer, Heidelberg (2010)
Chapter Google Scholar
Cudré-Mauroux, P., Kimura, H., Lim, K.T., Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Wang, D.L., Balazinska, M., Becla, J., et al.: A demonstration of scidb: a science-oriented dbms. Proceedings of the VLDB Endowment 2, 1534–1537 (2009)
Article Google Scholar
Balicki, J.: An adaptive quantum-based multiobjective evolutionary algorithm for efficient task assignment in distributed systems. In: Proceedings of the WSEAES 13th International Conference on Computers, ICCOMP 2009, Stevens Point, Wisconsin, USA, pp. 417–422. World Scientific and Engineering Academy and Society (WSEAS) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
Jerzy Balicki, Julian Szymański, Marcin Kępa, Karol Draszawka & Waldemar Korłub

Authors

Jerzy Balicki
View author publications
You can also search for this author in PubMed Google Scholar
Julian Szymański
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Kępa
View author publications
You can also search for this author in PubMed Google Scholar
Karol Draszawka
View author publications
You can also search for this author in PubMed Google Scholar
Waldemar Korłub
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerzy Balicki .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafal Scherer
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balicki, J., Szymański, J., Kępa, M., Draszawka, K., Korłub, W. (2015). Improving Effectiveness of SVM Classifier for Large Scale Data. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2015. Lecture Notes in Computer Science(), vol 9119. Springer, Cham. https://doi.org/10.1007/978-3-319-19324-3_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-19324-3_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19323-6
Online ISBN: 978-3-319-19324-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics