Abstract
Text Categorization (TC) remains as a potential application area for linear support vector machines (SVMs). Among the numerous linear SVM formulations, we bring forward linear PSVM together with recently proposed distributional clustering (DC) of words to realize its potential in TC realm. DC has been presented as an efficient alternative to conventionally used feature selection in TC. It has been shown that, DC together with linear SVM drastically brings down the dimensionality of text documents without any compromise in classification performance. In this paper we use linear PSVM and its extension Fuzzy PSVM (FPSVM) together with DC for TC. We present experimental results comparing PSVM/FPSVM with linear SVMlight and SVMlin on popular WebKB text corpus. Through numerous experiments on subsets of WebKB, we reveal the merits of PSVM and FPSVM over other linear SVMs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, Y.: An evaluation of statistical approaches to text categorization. Technical Report CMU-CS-97-127, Carnegie Mellon University (1997)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proc. of Seventh International Conf. on Information and Knowledge Management, pp. 148–155 (1998)
Fung, G., Mangasarian, O.L.: Proximal support vector machine classifiers. In: Proc. Of Seventh International Conf. on Knowledge and Data Discovery, pp. 77–86 (2001)
Joachims, T.: Training linear SVMs in linear time. In: Proc. of the ACM conf. on Knowledge Discovery and Data Mining, pp. 217–226 (2006)
Baker, L.D., McCallum, A.K.: Distributional Clustering of words for Text Classification. In: Proc. of (SIGIR) 1998, 21st International Conf. on Research and Development in Information Retrieval, pp. 96–103 (1998)
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs words for text categorization. Journal of Machine Learning Research 3, 1183–1208 (2003)
Jayadeva, Khemchandani, R., Chandra, S.: Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomputing 61, 401–411 (2004)
Joachims, T.: Making large-scale SVM learning practical. In: Advances in kernel methods – support vector learning. MIT Press, Cambridge (1998)
Sindhwani, V., Keerthi, S.S.: Newton methods for fast solution of semi-supervised linear SVMs. In: Large Scale Kernel Machines, pp. 155–174. MIT Press, Cambridge (2005)
Craven, M., DiPasquo, D., Freitag, D., McCallum, A.K., Mitchell, T.M., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: Proc. of National Conf. Artificial Intelligence (AAAI 1998) (1998)
Stop words list, http://www.dcs.gla.ac.uk/idom/ir_resources/linguisticutils/
Porter, M.: An algorithm for suffix stripping. Program (Automated Library and Information Systems) 14(3), 130–137 (1980)
DC of words software, http://www.cs.technion.ac.il/~ronb/thesis.html
MATLAB (2008), http://www.mathworks.com
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
PSVM codes, http://www.cs.wisc.edu/dmi/svm/psvm/
Al-Mubaid, H., Umair, S.A.: A new text categorization technique using distributional clustering and learning logic. IEEE Transactions on Knowledge and Data Engineering 18(9), 1156–1165 (2006)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, M.A., Gopal, M. (2009). Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)