Skip to main content

Improving Effectiveness of SVM Classifier for Large Scale Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9119))

Abstract

The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments of scalability and quality of the implementation. The results show that the proposed solution allows to scale up SVM that gives reasonable quality results. The proposed one-vs-near method significantly improves effectiveness of the classifier construction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. de Kunder, M.: The size of the world wide web (2014), http://www.worldwidewebsize.com/ (Online; accessed May 22, 2014)

  2. Wikipedia: Size of wikipedia (2014), http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia (Online; accessed January 25, 2014)

  3. Gantner, Z., Schmidt-Thieme, L.: Automatic content-based categorization of wikipedia articles. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, People’s Web 2009, pp. 32–37. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  4. Han, E.-H., Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)

    MATH  Google Scholar 

  6. Miao, Y., Qiu, X.: Hierarchical centroid-based classifier for large scale text classification. In: Large Scale Hierarchical Text Classification (2009)

    Google Scholar 

  7. Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS, pp. 281–288. MIT Press (2006)

    Google Scholar 

  8. Balicki, J., Korłub, W., Szymanski, J., Zakidalski, M.: Big data paradigm developed in volunteer grid system with genetic programming scheduler. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS (LNAI), vol. 8467, pp. 771–782. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  9. Szymański, J.: Wikipedia Articles Representation with Matrix’u. In: Hota, C., Srimani, P.K. (eds.) ICDCIT 2013. LNCS, vol. 7753, pp. 500–510. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  10. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Springer (1998)

    Google Scholar 

  11. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2, 265–292 (2002)

    MATH  Google Scholar 

  12. Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99, 67–81 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclass classification. In: Advances in Neural Information Processing Systems, vol. 12, pp. 547–553. MIT Press (2000)

    Google Scholar 

  14. Duan, K.-B., Keerthi, S.S.: Which is the best multiclass SVM method? An empirical study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 278–285. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Szymański, J.: Comparative analysis of text representation methods using classification. Cybernetics and Systems 45, 180–199 (2014)

    Article  Google Scholar 

  16. Szymański, J.: Words context analysis for improvement of information retrieval. In: Nguyen, N.-T., Hoang, K., Jędrzejowicz, P. (eds.) ICCCI 2012, Part I. LNCS, vol. 7653, pp. 318–325. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  17. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval, vol. 463. ACM Press, New York (1999)

    Google Scholar 

  18. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings of the 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (2004)

    Google Scholar 

  19. Wikipedia: Wikipedia database dump (2014), http://dumps.wikimedia.org/enwiki/20140102/ (Online; accessed January 25, 2014)

  20. Kryszkiewicz, M., Skonieczny, L.: Faster clustering with DBSCAN. In: Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2005 Conference held in Gdansk, Poland, June 13-16, pp. 605–614 (2005)

    Google Scholar 

  21. Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)

    Google Scholar 

  22. Holloway, T., Bozicevic, M., Börner, K.: Analyzing and visualizing the semantic coverage of wikipedia and its authors: Research articles. Complex 12, 30–40 (2007)

    Article  Google Scholar 

  23. Szymański, J.: Mining relations between wikipedia categories. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 248–255. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Cudré-Mauroux, P., Kimura, H., Lim, K.T., Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Wang, D.L., Balazinska, M., Becla, J., et al.: A demonstration of scidb: a science-oriented dbms. Proceedings of the VLDB Endowment 2, 1534–1537 (2009)

    Article  Google Scholar 

  25. Balicki, J.: An adaptive quantum-based multiobjective evolutionary algorithm for efficient task assignment in distributed systems. In: Proceedings of the WSEAES 13th International Conference on Computers, ICCOMP 2009, Stevens Point, Wisconsin, USA, pp. 417–422. World Scientific and Engineering Academy and Society (WSEAS) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerzy Balicki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Balicki, J., Szymański, J., Kępa, M., Draszawka, K., Korłub, W. (2015). Improving Effectiveness of SVM Classifier for Large Scale Data. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2015. Lecture Notes in Computer Science(), vol 9119. Springer, Cham. https://doi.org/10.1007/978-3-319-19324-3_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19324-3_60

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19323-6

  • Online ISBN: 978-3-319-19324-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics