Abstract
In this paper we study the interest of integration of an overlapping clustering approach rather than traditional hard-clustering ones, in the context of dimensionality reduction of the description space for document classification.
The Distributional Divisive Overlapping Clustering (DDOC) method is briefly presented and compared to Agglomerative Distributional Clustering (ADC) [2] and Information-Theoretical Divisive Clustering (ITDC) [3] on the two corpus Reuters-21578 and 20Newsgroup.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cleuziou, G., Martin, L., Clavier, L., Vrain, C.: PoBOC: an Overlapping Clustering Algorithm, Application to Rule-Based Classification and Textual Data. In: Proceedings of the 16th European Conference on Artificial Intelligence ECAI, Valencia, Spain, August 22-27 (2004) (to appear)
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 96–103 (1998)
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. Journal of Machine Learning Ressources 3, 1265–1287 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cleuziou, G., Martin, L., Clavier, V., Vrain, C. (2004). DDOC: Overlapping Clustering of Words for Document Classification. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive