Abstract
Rough set theory based attribute selection clustering approaches for categorical data have attracted much attention in recent years. However, they have some limitations in the process of selecting clustering attribute. In this paper, we analyze the limitations of three rough set based approaches: total roughness (TR), min-min roughness (MMR) and maximum dependency attribute (MDA), and propose a mean mutual information (MMI) based approach for selecting clustering attribute. It is proved that the proposed approach is able to overcome the limitations of rough set based approaches. In addition, we define the concept of mean inter-class similarity to measure the accuracy of selecting clustering attribute. The experiment results show that the accuracy of selecting clustering attribute using our method is higher than that using TR, MMR and MDA methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Parmar, D., Wu, T., Blackhurst, J.: MMR: an algorithm for clustering categorical data using rough set theory. Data and Knowledge Engineering 63, 879–893 (2007)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categori-cal values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: an approach based on dynamical systems. The Very Large Data Bases Journal 8(3-4), 222–236 (2000)
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS –clustering categorical data using summaries. In: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 73–83 (1999)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11, 341–356 (1982)
Mazlack, L.J., He, A., Zhu, Y., Coppock, S.: A rough set approach in choosing clustering attributes. In: Proceedings of the ISCA 13th International Conference (CAINE 2000), pp. 1–6 (2000)
Yao, Y.Y.: Information granulation and rough set approximation. International Journal of Intelligent Systems 16(1), 87–104 (2001)
Herawan, T., Deris, M.M., Abawajy, J.H.: A rough set approach for selecting clustering attribute. Knowledge-Based Systems 23, 220–231 (2010)
Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proc. of CIKM 2002, pp. 582–589 (2002)
Pawlak, Z., Skowron, A.: Rudiments of rough sets. Information Sciences: An International Journal 177(1), 3–27 (2007)
Hu, X.: Knowledge discovery in databases: an attribute oriented rough set approach. Ph.D. Thesis, University of Regina (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qin, H., Ma, X., Mohamad Zain, J., Sulaiman, N., Herawan, T. (2011). A Mean Mutual Information Based Approach for Selecting Clustering Attribute. In: Zain, J.M., Wan Mohd, W.M.b., El-Qawasmeh, E. (eds) Software Engineering and Computer Systems. ICSECS 2011. Communications in Computer and Information Science, vol 180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22191-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-22191-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22190-3
Online ISBN: 978-3-642-22191-0
eBook Packages: Computer ScienceComputer Science (R0)