Abstract
Data clustering on categorical data pose a difficult challenge since there are no-inherent distance measures between data values. One of the approaches that can be used is by introducing a series of clustering attributes in the categorical data. By this approach, Maximum Total Attribute Relative (MTAR) technique that is based on the attribute relative of soft-set theory has been proposed and proved has better execution time as compared to other equivalent techniques that used the same approach. In this paper, the cluster validity analysis on the technique is explained and discussed. In this analysis, the validity of the clusters produced by MTAR technique is evaluated by the entropy measure using two standards dataset: Soybean (Small) and Zoo from University California at Irvine (UCI) repository. Results show that the clusters produce by MTAR technique have better entropy and improved the clusters validity up to 33%.
References
Xui, R., Wunsch II, D.: Survey of Clustering Algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Mamat, R., Deris, M.M., Herawan, T.: MAR maximum attribute relative of soft-set for partition attribute selection. Knowl. Based Syst. 52, 11–20 (2013)
Molodtsov, D.: Soft set theory - first results. Comput. Math. Appl. 37(4/5), 19–31 (1999)
Herawan, T., Deris, M.M., Abawajy, J.H.: A rough set approach for selecting clustering attribute. Knowl. Based Syst. 23(3), 220–231 (2010)
Herawan, T., Deris, M.M.: On multi-soft sets construction in information systems. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS (LNAI), vol. 5755, pp. 101–110. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-642-04020-7_12
Mazlack, L.J., He, A., Zhu, Y., Coppock, A.S.: Rough sets approach in choosing partitioning attributes. In: Proceeding of ICSA 13th International Conference, pp. 1–6 (2000)
Parmar, D., Wu, T., Blackhurst, J.: MMR: an algorithm for clustering categorical data using rough set theory. Data Knowl. Discov. 63(3), 879–893 (2007)
Herawan, T., Ghazali, R., Yanto, I.T.R., Deris, M.M.: Rough set approach for categorical data clustering. Int. J. Database Theor. Appl. 3(1), 33–52 (2010)
Qin, H., Ma, X., Zain, J.M., Herawan, T.: A novel soft set approach in selecting clustering attribute. Knowl. Based Syst. 36, 139–145 (2012)
Sripada, S.C., Rao, S.M.: Comparison of purity and entropy of KMeans clustering and fuzzy C-Means clustering. Indian J. Comput. Sci. Eng. 2(3), 343–346 (2011)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Workshop on Text Mining, the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000)
Zhao, Y., Karypis, G.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10(2), 141168 (2005)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Steinbach, M., Tan, P.-N., Kumpar, V., Xiong, H.: Hicap : hierarchial clustering with pattern preservation. In: Proceeding of 2004 SIAM International Conference on Data Mining (SDM), pp. 279–290 (2004)
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10(2), 141–168 (2005)
Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mamat, R., Noor, A.S.M., Herawan, T., Deris, M.M. (2017). Cluster Validation Analysis on Attribute Relative of Soft-Set Theory. In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol 549. Springer, Cham. https://doi.org/10.1007/978-3-319-51281-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-51281-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51279-2
Online ISBN: 978-3-319-51281-5
eBook Packages: EngineeringEngineering (R0)