Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words

Kumar, Mani Arun; Gopal, Madan

doi:10.1007/978-3-642-01307-2_8

Mani Arun Kumar²³ &
Madan Gopal²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3137 Accesses

Abstract

Text Categorization (TC) remains as a potential application area for linear support vector machines (SVMs). Among the numerous linear SVM formulations, we bring forward linear PSVM together with recently proposed distributional clustering (DC) of words to realize its potential in TC realm. DC has been presented as an efficient alternative to conventionally used feature selection in TC. It has been shown that, DC together with linear SVM drastically brings down the dimensionality of text documents without any compromise in classification performance. In this paper we use linear PSVM and its extension Fuzzy PSVM (FPSVM) together with DC for TC. We present experimental results comparing PSVM/FPSVM with linear SVM^light and SVM_lin on popular WebKB text corpus. Through numerous experiments on subsets of WebKB, we reveal the merits of PSVM and FPSVM over other linear SVMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, Y.: An evaluation of statistical approaches to text categorization. Technical Report CMU-CS-97-127, Carnegie Mellon University (1997)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proc. of Seventh International Conf. on Information and Knowledge Management, pp. 148–155 (1998)
Google Scholar
Fung, G., Mangasarian, O.L.: Proximal support vector machine classifiers. In: Proc. Of Seventh International Conf. on Knowledge and Data Discovery, pp. 77–86 (2001)
Google Scholar
Joachims, T.: Training linear SVMs in linear time. In: Proc. of the ACM conf. on Knowledge Discovery and Data Mining, pp. 217–226 (2006)
Google Scholar
Baker, L.D., McCallum, A.K.: Distributional Clustering of words for Text Classification. In: Proc. of (SIGIR) 1998, 21st International Conf. on Research and Development in Information Retrieval, pp. 96–103 (1998)
Google Scholar
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs words for text categorization. Journal of Machine Learning Research 3, 1183–1208 (2003)
MATH Google Scholar
Jayadeva, Khemchandani, R., Chandra, S.: Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomputing 61, 401–411 (2004)
Article Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Advances in kernel methods – support vector learning. MIT Press, Cambridge (1998)
Google Scholar
Sindhwani, V., Keerthi, S.S.: Newton methods for fast solution of semi-supervised linear SVMs. In: Large Scale Kernel Machines, pp. 155–174. MIT Press, Cambridge (2005)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A.K., Mitchell, T.M., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: Proc. of National Conf. Artificial Intelligence (AAAI 1998) (1998)
Google Scholar
Stop words list, http://www.dcs.gla.ac.uk/idom/ir_resources/linguisticutils/
Porter, M.: An algorithm for suffix stripping. Program (Automated Library and Information Systems) 14(3), 130–137 (1980)
Article Google Scholar
DC of words software, http://www.cs.technion.ac.il/~ronb/thesis.html
MATLAB (2008), http://www.mathworks.com
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
PSVM codes, http://www.cs.wisc.edu/dmi/svm/psvm/
Al-Mubaid, H., Umair, S.A.: A new text categorization technique using distributional clustering and learning logic. IEEE Transactions on Knowledge and Data Engineering 18(9), 1156–1165 (2006)
Article Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Control Group, Department of Electrical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India
Mani Arun Kumar & Madan Gopal

Authors

Mani Arun Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Madan Gopal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5 Tiwanont Road, 12000, Bangkadi, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 10330, Bangkok, Thailand
Boonserm Kijsirikul
Faculty of Science & Engineering, York University, 355 Lumbers Building, 4700 Keele Street, M3J 1P3, Toronto, Ontario, Canada
Nick Cercone
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Tu-Bao Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, M.A., Gopal, M. (2009). Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-01307-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics