Abstract
Document clustering plays an important role in several applications. K-Medoids and CLARA are among the most notable algorithms for clustering. These algorithms together with their relatives have been employed widely in clustering problems. In this paper we present a solution to improve the original K-Medoids and CLARA by making change in the way they assign objects to clusters. Experimental results on various document datasets using three distance measures have shown that the approach helps enhance the clustering outcomes substantially as demonstrated by three quality metrics, i.e. Entropy, Purity and F-Measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 77–128. Springer, Heidelberg (2012). doi:10.1007/978-1-4614-3223-4_4
Basu, T., Murthy, C.: A similarity assessment technique for effective grouping of documents. Inf. Sci. 311(C), 149–162 (2015)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
DuBois, T., Golbeck, J., Kleint, J., Srinivasan, A.: Improving recommendation accuracy by clustering social networks with trust, New York, NY, USA (2009)
Huang, A.: Similarity measures for text document clustering, pp. 49–56 (2008)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Ng, R.T., Han, J.: Clarans: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14(5), 1003–1016 (2002)
Reed, J.W., Jiao, Y., Potok, T.E., Klump, B.A., Elmore, M.T., Hurson, A.R.: TF-ICF: a new term weighting scheme for clustering dynamic data streams. In Proceedings of the 5th International Conference on Machine Learning and Applications, ICMLA 2006, Washington, DC, USA, pp. 258–263. IEEE Computer Society (2006)
Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5, 27–34 (2011)
Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: 6th ACM SIGKDD, World Text Mining Conference (2000)
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10, 141–168 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nguyen, P.T., Eckert, K., Ragone, A., Di Noia, T. (2017). Modification to K-Medoids and CLARA for Effective Document Clustering. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-60438-1_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60437-4
Online ISBN: 978-3-319-60438-1
eBook Packages: Computer ScienceComputer Science (R0)