Skip to main content

Reduced Support Vector Machine Based on k-Mode Clustering for Classification Large Categorical Dataset

  • Conference paper
Software Engineering and Computer Systems (ICSECS 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 180))

Included in the following conference series:

  • 1722 Accesses

Abstract

The smooth support vector machine (SSVM) is one of the promising algorithms for classification problems. However, it is restricted to work well on a small to moderate dataset. There exist computational difficulties when we use SSVM with non linear kernel to deal with large dataset. Based on SSVM, the reduced support vector machine (RSVM) was proposed to solve these difficulties using a randomly selected subset of data to obtain a nonlinear separating surface. In this paper, we propose an alternative algorithm, k-mode RSVM (KMO-RSVM) that combines RSVM with k-mode clustering technique to handle classification problems on categorical large dataset. In our experiments, we tested the effectiveness of KMO-RSVM on four public available dataset. It turns out that KMO-RSVM can improve speed of running time significantly than SSVM and still obtained a high accuracy. Comparison with RSVM indicates that KMO-RSVM is faster, gets smaller reduced set and comparable testing accuracy than RSVM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chaturvedi, A., Green, P., Carrol, J.: K-Modes Clustering. Journal of Classification 18, 35–55 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  2. Chien, L.J., Chang, C.C., Lee, Y.J.: Variant Methods of Reduced Set Selection for Reduced Support Vector achines. Journal of Information Science and Engineering 26(1) (2010)

    Google Scholar 

  3. Chang, C.-C., Lee, Y.-J.: Generating the reduced set by systematic sampling. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 720–725. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition Letters 24(6), 567–578 (1991)

    Article  Google Scholar 

  5. He, Z., Xu, X., Deng, S.: A cluster ensemble for clustering categorical data. Information Fusion 6, 143–151 (2005)

    Article  Google Scholar 

  6. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of The First Pacific sia Knowledge Discovery and Data Mining Conference, Singapore (1997)

    Google Scholar 

  7. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining nd Knowledge Discovery 2, 283–304 (1998)

    Article  Google Scholar 

  8. Huang, Z.: A Note on K-modes Clustering. Journal of Classification 20, 257–261 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Huang, C.M., Lee, Y.J., Lin, D.K.J., Huang, S.Y.: Model Selection for Support Vector Machines via Uniform esign. A Special issue on Machine Learning and Robust Data Mining of Computational Statistics and Data Analysis 52, 335–346 (2007)

    MathSciNet  MATH  Google Scholar 

  10. Hsu, C.W., Chang, C.C., Lin, C.J.: Practical Guide To Support Vector Classification. Department of Computer cience and Information Engineering National Taiwan University (2003), http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  11. Mardia, K., et al.: Multivariate Analysis. Academic Press, London (1979)

    MATH  Google Scholar 

  12. Lee, Y.J.: Support vector machines in data mining. PhD thesis. University of Wisconsin-Madison, USA (2001)

    Google Scholar 

  13. Lee, Y.J., Mangasarian, O.L.: A Smooth Support Vector Machine. J. Comput. Optimiz. Appli. 20, 5–22 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lee, Y.J., Mangasarian, O.L.: RSVM: Reduced Support Vector Machines. In: Proceedings of the First SIAM International Conference on Data Mining. SIAM, Philadelphia (2001)

    Google Scholar 

  15. Lee, Y.J., Huang, S.Y.: Reduced Support Vector Machines: A Statistical Theory. IEEE Trans.Neural Network. 18(1) (2007)

    Google Scholar 

  16. Jen, L.-R., Lee, Y.-J.: Clustering model selection for reduced support vector machines. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 714–719. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Lin, K.M., Lin, C.J.: A Study on Reduced Support Vector Machines. IEEE Trans.Neural Network. 14(6), 1449–1459 (2003)

    Article  Google Scholar 

  18. Mangasarian, O.L.: Generalized Support Vector Machines. In: Smola, A., Bartlett, P., Scholkopf, B., Schurrmans, D. (eds.) Advances in large Margin Classifiers, pp. 35–146. MIT Press, Cambridge (2000); ISBN: 0-262-19448-1

    Google Scholar 

  19. Newman, D.J., Hettich, S., Blake, C.L.S., Merz, C.J.: UCI repository of machine learning database. Dept. of Information and Computer Science. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html

    Google Scholar 

  20. Ralambondrainy, H.: A conceptual version of the K-Means algorithm. Pattern Recognition Letters 16, 1147–1157 (1995)

    Article  Google Scholar 

  21. San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. International Journal Applied Mathematic Computing Science 14(2), 241–247 (2004)

    MathSciNet  MATH  Google Scholar 

  22. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (1995)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Purnami, S.W., Zain, J.M., Embong, A. (2011). Reduced Support Vector Machine Based on k-Mode Clustering for Classification Large Categorical Dataset. In: Zain, J.M., Wan Mohd, W.M.b., El-Qawasmeh, E. (eds) Software Engineering and Computer Systems. ICSECS 2011. Communications in Computer and Information Science, vol 180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22191-0_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22191-0_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22190-3

  • Online ISBN: 978-3-642-22191-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics