Abstract
The paper presents a comparative analysis of K-means and PSO based clustering performances for text datasets. The dimensionality reduction techniques like Stop word removal, Brill’s tagger algorithm and mean Tf-Idf are used while reducing the size of dimension for clustering. The results reveal that PSO based approaches find better solution compared to K-means due to its ability to evaluate many cluster centroids simultaneously in any given time unlike K-means.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality Reduction a Comparative Review, Citeseer (2007)
Cui, X., Potok, T.E., Palathingal, P.: Document Clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium (2005)
Wai-chiu, W., Ada Wai-chee, F.: Incremental Document Clustering for Web Page. In: IEEE International Conference on Information Society (2000)
Tan, A.-H.: Text Mining state of art and challenges. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574. Springer, Heidelberg (1999)
Weixin, T., Fuxi, Z.: Text document clustering based on the modifying relations. In: 2008 International Conference on Computer Science (2008)
Balasubramanian, M., Schwartz, E.L.: Introduction to Statistical Pattern Recognition. In: Fukanaga, K. (ed.), 2nd edn., Academic Press, San Diego (1990)
Parsons, L., Hague, E., Liu, H.: Subspace clustering for high dimensional data. In: Parsons, L., Haque, E., Liu, H. (eds.) ACM SIGKDD Explorations Newsletter (2004)
R uger, S.M., Gauch, S.E.: Feature reduction for document clustering and classification, Citeseer (2000)
The TechTC-300 Test Collection for Text Categorization Version: 1.0 TechTC - Technion Repository of Text Categorization Datasets Maintained by: E. Gabrilovich, gabr@cs.technion.ac.il
Han, J., Kamber, M.: DataMining concepts and Techniques, 2nd edn. Morgan Kaufmann publishers, San Francisco (2006)
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Kennedy, J.F., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International conference on neural networks, Perth, Australia, vol. 4, pp. 1942–1948 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Killani, R., Rao, K.S., Satapathy, S.C., Pradhan, G., Chandran, K.R. (2010). Effective Document Clustering with Particle Swarm Optimization. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Dash, S.S. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2010. Lecture Notes in Computer Science, vol 6466. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17563-3_73
Download citation
DOI: https://doi.org/10.1007/978-3-642-17563-3_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17562-6
Online ISBN: 978-3-642-17563-3
eBook Packages: Computer ScienceComputer Science (R0)