Skip to main content

A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data

  • Conference paper
  • First Online:
Intelligent Computing and Optimization (ICO 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1072))

Included in the following conference series:

Abstract

Clustering analysis has become an indispensable tool for obtaining and analyzing meaningful groups, irrespective of any numerical or categorical clustering problems. Algorithms such as fuzzy k-Modes, New fuzzy k-Modes, k-AMH, and the extended k-AMH algorithms such as Nk-AMH I, II, and III are usually employed to improve clustering of categorical problems. However, the performance of these algorithms is measured and evaluated according to the average accuracy scores taken from 100-run experiments, which require labeled data. Thus, the performance of the algorithms on unlabeled data cannot be measured explicitly. This paper extends complementary optimization procedures on the k-AMH model, known as Ck-AMH I, II, III, and IV, to obtain final and optimal clustering results. In experiments conducted, the complementary procedures produced optimal clustering results when tested on five categorical datasets: Soybean, Zoo, Hepatitis, Voting, and Breast. The optimal accuracy scores obtained were marginally lower than the maximum accuracy scores and, in some cases, were identical to the maximum accuracy scores obtained from the 100-run experiments. Consequently, using the complementary procedures, these clustering algorithms can be further developed as workbench clustering tools to cluster both unlabeled categorical and unlabeled numerical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2007)

    Book  Google Scholar 

  2. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., New Jersey (1998)

    MATH  Google Scholar 

  3. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  4. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  Google Scholar 

  5. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, a Member of the Hodder Headline Group, London (2001)

    Google Scholar 

  6. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education Inc., New Jersey (2006)

    Google Scholar 

  7. Xu, R., Wunsch, D.: Clustering. Wiley, New Jersey (2009)

    Google Scholar 

  8. Huang, Z.: Extensions to the k-means algorithm for clustering large datasets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)

    Article  Google Scholar 

  9. Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)

    Article  Google Scholar 

  10. He, Z., Xu, X., Deng, S.: Attribute value weighting in k-Modes clustering. Computer Science e-Prints: arXiv:cs/0701013v1 [cs.AI]. [Online] 1, pp. 1–15, January 2007. http://arxiv.org/abs/cs/0701013

  11. Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29, 503–507 (2007)

    Article  Google Scholar 

  12. Ng, M.K., Jing, L.: A new fuzzy k-modes clustering algorithm for categorical data. Int. J. Granular Comput. Rough Sets Intell. Syst. (IJGCRSIS) 1(1), 105–118 (2009)

    Article  Google Scholar 

  13. Seman, A., Bakar, Z.A., Isa, M.N.: An efficient clustering algorithm for partitioning y-short tandem repeats data. BMC Res. Notes. 5, 557 (2012)

    Article  Google Scholar 

  14. Seman, A., Bakar, Z.A., Isa, M.N.: Evaluation of k-modes-type algorithms for clustering y-short tandem repeats data. Trends Bioinf. 5, 47–52 (2012)

    Article  Google Scholar 

  15. Seman, A., Bakar, Z.A., Sapawi, A.M., Othman, I.R.: A medoid-based method for clustering categorical data. J. Artif. Intell. 6, 257–265 (2013)

    Article  Google Scholar 

  16. Seman, A., Sapawi, A.M., Salleh, M.Z.: Performance evaluations of k-approximate modal haplotype type algorithms for clustering categorical data. Res. J. Inform. Technol. 7(2), 112–120 (2015)

    Article  Google Scholar 

  17. Seman, A., Sapawi, A.M., Salleh, M.Z.: Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using y-short tandem repeats. J. Integr. Biol. 19, 361–367 (2015)

    Google Scholar 

  18. Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of the 5th Fuzzy System Symposium, pp. 247–250 (1989)

    Google Scholar 

  19. Gindy, N.N.Z., Ratchey, T.M., Case, K.: Component grouping for GT applications: a fuzzy clustering approach with validity measure. Int. J. Prod. Res. 4(9), 2493–2509 (1995)

    Article  Google Scholar 

  20. Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI-13) 13(8), 841–847 (1991)

    Article  Google Scholar 

  21. Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007 (2007)

    Google Scholar 

  22. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  23. Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA, April 2013 http://archive.ics.uci.edu/ml

Download references

Acknowledgements

This research was supported by the Fundamental Research Grant Scheme, Ministry of Higher Education, Malaysia (Reference No.: 600-RMI/FRGS 5/3 (37/2104)). We would like to thank IRMI and UiTM for their support for this research. We also extend our gratitude to those who have contributed toward the completion of this paper, including our RA Nur Amiera Abdul Rahman.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Seman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Seman, A., Sapawi, A.M. (2020). A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_30

Download citation

Publish with us

Policies and ethics