A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data

Seman, Ali; Sapawi, Azizian Mohd

doi:10.1007/978-3-030-33585-4_30

Ali Seman¹⁷ &
Azizian Mohd Sapawi¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1072))

Included in the following conference series:

International Conference on Intelligent Computing & Optimization

1114 Accesses
1 Citations

Abstract

Clustering analysis has become an indispensable tool for obtaining and analyzing meaningful groups, irrespective of any numerical or categorical clustering problems. Algorithms such as fuzzy k-Modes, New fuzzy k-Modes, k-AMH, and the extended k-AMH algorithms such as Nk-AMH I, II, and III are usually employed to improve clustering of categorical problems. However, the performance of these algorithms is measured and evaluated according to the average accuracy scores taken from 100-run experiments, which require labeled data. Thus, the performance of the algorithms on unlabeled data cannot be measured explicitly. This paper extends complementary optimization procedures on the k-AMH model, known as Ck-AMH I, II, III, and IV, to obtain final and optimal clustering results. In experiments conducted, the complementary procedures produced optimal clustering results when tested on five categorical datasets: Soybean, Zoo, Hepatitis, Voting, and Breast. The optimal accuracy scores obtained were marginally lower than the maximum accuracy scores and, in some cases, were identical to the maximum accuracy scores obtained from the 100-run experiments. Consequently, using the complementary procedures, these clustering algorithms can be further developed as workbench clustering tools to cluster both unlabeled categorical and unlabeled numerical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2007)
Book Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., New Jersey (1998)
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Book Google Scholar
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, a Member of the Hodder Headline Group, London (2001)
Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education Inc., New Jersey (2006)
Google Scholar
Xu, R., Wunsch, D.: Clustering. Wiley, New Jersey (2009)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large datasets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)
Article Google Scholar
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Article Google Scholar
He, Z., Xu, X., Deng, S.: Attribute value weighting in k-Modes clustering. Computer Science e-Prints: arXiv:cs/0701013v1 [cs.AI]. [Online] 1, pp. 1–15, January 2007. http://arxiv.org/abs/cs/0701013
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29, 503–507 (2007)
Article Google Scholar
Ng, M.K., Jing, L.: A new fuzzy k-modes clustering algorithm for categorical data. Int. J. Granular Comput. Rough Sets Intell. Syst. (IJGCRSIS) 1(1), 105–118 (2009)
Article Google Scholar
Seman, A., Bakar, Z.A., Isa, M.N.: An efficient clustering algorithm for partitioning y-short tandem repeats data. BMC Res. Notes. 5, 557 (2012)
Article Google Scholar
Seman, A., Bakar, Z.A., Isa, M.N.: Evaluation of k-modes-type algorithms for clustering y-short tandem repeats data. Trends Bioinf. 5, 47–52 (2012)
Article Google Scholar
Seman, A., Bakar, Z.A., Sapawi, A.M., Othman, I.R.: A medoid-based method for clustering categorical data. J. Artif. Intell. 6, 257–265 (2013)
Article Google Scholar
Seman, A., Sapawi, A.M., Salleh, M.Z.: Performance evaluations of k-approximate modal haplotype type algorithms for clustering categorical data. Res. J. Inform. Technol. 7(2), 112–120 (2015)
Article Google Scholar
Seman, A., Sapawi, A.M., Salleh, M.Z.: Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using y-short tandem repeats. J. Integr. Biol. 19, 361–367 (2015)
Google Scholar
Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of the 5th Fuzzy System Symposium, pp. 247–250 (1989)
Google Scholar
Gindy, N.N.Z., Ratchey, T.M., Case, K.: Component grouping for GT applications: a fuzzy clustering approach with validity measure. Int. J. Prod. Res. 4(9), 2493–2509 (1995)
Article Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI-13) 13(8), 841–847 (1991)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007 (2007)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA, April 2013 http://archive.ics.uci.edu/ml

Download references

Acknowledgements

This research was supported by the Fundamental Research Grant Scheme, Ministry of Higher Education, Malaysia (Reference No.: 600-RMI/FRGS 5/3 (37/2104)). We would like to thank IRMI and UiTM for their support for this research. We also extend our gratitude to those who have contributed toward the completion of this paper, including our RA Nur Amiera Abdul Rahman.

Author information

Authors and Affiliations

Center for Computer Science Studies, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), 40450, Shah Alam, Selangor, Malaysia
Ali Seman & Azizian Mohd Sapawi

Authors

Ali Seman
View author publications
You can also search for this author in PubMed Google Scholar
Azizian Mohd Sapawi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Seman .

Editor information

Editors and Affiliations

Department of Fundamental and Applied Sciences, Universiti Teknologi Petronas, Tronoh, Perak, Malaysia
Pandian Vasant
Computer Science, FEI, VSB-TU Ostrava, Ostrava, Czech Republic
Ivan Zelinka
Faculty of Engineering Management, Poznan University of Technology, Poznan, Poland
Gerhard-Wilhelm Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seman, A., Sapawi, A.M. (2020). A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-33585-4_30
Published: 27 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33584-7
Online ISBN: 978-3-030-33585-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics