Skip to main content

Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

  • 3137 Accesses

Abstract

Text Categorization (TC) remains as a potential application area for linear support vector machines (SVMs). Among the numerous linear SVM formulations, we bring forward linear PSVM together with recently proposed distributional clustering (DC) of words to realize its potential in TC realm. DC has been presented as an efficient alternative to conventionally used feature selection in TC. It has been shown that, DC together with linear SVM drastically brings down the dimensionality of text documents without any compromise in classification performance. In this paper we use linear PSVM and its extension Fuzzy PSVM (FPSVM) together with DC for TC. We present experimental results comparing PSVM/FPSVM with linear SVMlight and SVMlin on popular WebKB text corpus. Through numerous experiments on subsets of WebKB, we reveal the merits of PSVM and FPSVM over other linear SVMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, Y.: An evaluation of statistical approaches to text categorization. Technical Report CMU-CS-97-127, Carnegie Mellon University (1997)

    Google Scholar 

  2. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proc. of Seventh International Conf. on Information and Knowledge Management, pp. 148–155 (1998)

    Google Scholar 

  4. Fung, G., Mangasarian, O.L.: Proximal support vector machine classifiers. In: Proc. Of Seventh International Conf. on Knowledge and Data Discovery, pp. 77–86 (2001)

    Google Scholar 

  5. Joachims, T.: Training linear SVMs in linear time. In: Proc. of the ACM conf. on Knowledge Discovery and Data Mining, pp. 217–226 (2006)

    Google Scholar 

  6. Baker, L.D., McCallum, A.K.: Distributional Clustering of words for Text Classification. In: Proc. of (SIGIR) 1998, 21st International Conf. on Research and Development in Information Retrieval, pp. 96–103 (1998)

    Google Scholar 

  7. Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs words for text categorization. Journal of Machine Learning Research 3, 1183–1208 (2003)

    MATH  Google Scholar 

  8. Jayadeva, Khemchandani, R., Chandra, S.: Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomputing 61, 401–411 (2004)

    Article  Google Scholar 

  9. Joachims, T.: Making large-scale SVM learning practical. In: Advances in kernel methods – support vector learning. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Sindhwani, V., Keerthi, S.S.: Newton methods for fast solution of semi-supervised linear SVMs. In: Large Scale Kernel Machines, pp. 155–174. MIT Press, Cambridge (2005)

    Google Scholar 

  11. Craven, M., DiPasquo, D., Freitag, D., McCallum, A.K., Mitchell, T.M., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: Proc. of National Conf. Artificial Intelligence (AAAI 1998) (1998)

    Google Scholar 

  12. Stop words list, http://www.dcs.gla.ac.uk/idom/ir_resources/linguisticutils/

  13. Porter, M.: An algorithm for suffix stripping. Program (Automated Library and Information Systems) 14(3), 130–137 (1980)

    Article  Google Scholar 

  14. DC of words software, http://www.cs.technion.ac.il/~ronb/thesis.html

  15. MATLAB (2008), http://www.mathworks.com

  16. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  17. PSVM codes, http://www.cs.wisc.edu/dmi/svm/psvm/

  18. Al-Mubaid, H., Umair, S.A.: A new text categorization technique using distributional clustering and learning logic. IEEE Transactions on Knowledge and Data Engineering 18(9), 1156–1165 (2006)

    Article  Google Scholar 

  19. Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, M.A., Gopal, M. (2009). Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics