Abstract
Clustering analysis has become an indispensable tool for obtaining and analyzing meaningful groups, irrespective of any numerical or categorical clustering problems. Algorithms such as fuzzy k-Modes, New fuzzy k-Modes, k-AMH, and the extended k-AMH algorithms such as Nk-AMH I, II, and III are usually employed to improve clustering of categorical problems. However, the performance of these algorithms is measured and evaluated according to the average accuracy scores taken from 100-run experiments, which require labeled data. Thus, the performance of the algorithms on unlabeled data cannot be measured explicitly. This paper extends complementary optimization procedures on the k-AMH model, known as Ck-AMH I, II, III, and IV, to obtain final and optimal clustering results. In experiments conducted, the complementary procedures produced optimal clustering results when tested on five categorical datasets: Soybean, Zoo, Hepatitis, Voting, and Breast. The optimal accuracy scores obtained were marginally lower than the maximum accuracy scores and, in some cases, were identical to the maximum accuracy scores obtained from the 100-run experiments. Consequently, using the complementary procedures, these clustering algorithms can be further developed as workbench clustering tools to cluster both unlabeled categorical and unlabeled numerical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2007)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., New Jersey (1998)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, a Member of the Hodder Headline Group, London (2001)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education Inc., New Jersey (2006)
Xu, R., Wunsch, D.: Clustering. Wiley, New Jersey (2009)
Huang, Z.: Extensions to the k-means algorithm for clustering large datasets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
He, Z., Xu, X., Deng, S.: Attribute value weighting in k-Modes clustering. Computer Science e-Prints: arXiv:cs/0701013v1 [cs.AI]. [Online] 1, pp. 1–15, January 2007. http://arxiv.org/abs/cs/0701013
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29, 503–507 (2007)
Ng, M.K., Jing, L.: A new fuzzy k-modes clustering algorithm for categorical data. Int. J. Granular Comput. Rough Sets Intell. Syst. (IJGCRSIS) 1(1), 105–118 (2009)
Seman, A., Bakar, Z.A., Isa, M.N.: An efficient clustering algorithm for partitioning y-short tandem repeats data. BMC Res. Notes. 5, 557 (2012)
Seman, A., Bakar, Z.A., Isa, M.N.: Evaluation of k-modes-type algorithms for clustering y-short tandem repeats data. Trends Bioinf. 5, 47–52 (2012)
Seman, A., Bakar, Z.A., Sapawi, A.M., Othman, I.R.: A medoid-based method for clustering categorical data. J. Artif. Intell. 6, 257–265 (2013)
Seman, A., Sapawi, A.M., Salleh, M.Z.: Performance evaluations of k-approximate modal haplotype type algorithms for clustering categorical data. Res. J. Inform. Technol. 7(2), 112–120 (2015)
Seman, A., Sapawi, A.M., Salleh, M.Z.: Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using y-short tandem repeats. J. Integr. Biol. 19, 361–367 (2015)
Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of the 5th Fuzzy System Symposium, pp. 247–250 (1989)
Gindy, N.N.Z., Ratchey, T.M., Case, K.: Component grouping for GT applications: a fuzzy clustering approach with validity measure. Int. J. Prod. Res. 4(9), 2493–2509 (1995)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI-13) 13(8), 841–847 (1991)
Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007 (2007)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA, April 2013 http://archive.ics.uci.edu/ml
Acknowledgements
This research was supported by the Fundamental Research Grant Scheme, Ministry of Higher Education, Malaysia (Reference No.: 600-RMI/FRGS 5/3 (37/2104)). We would like to thank IRMI and UiTM for their support for this research. We also extend our gratitude to those who have contributed toward the completion of this paper, including our RA Nur Amiera Abdul Rahman.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Seman, A., Sapawi, A.M. (2020). A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-33585-4_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33584-7
Online ISBN: 978-3-030-33585-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)