research-article

Two Step Clustering Model for K-Means Algorithm

Authors:

Narongsak Chayangkoon,

Anongnart SrivihokAuthors Info & Claims

ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and Computing

Pages 213 - 217

https://doi.org/10.1145/3033288.3033347

Published: 17 December 2016 Publication History

Abstract

In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.

References

[1]

Jain, A. K., Murty, M. N. and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264--323.

Digital Library

[2]

MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Oakland, CA, USA., 1967). 281--297.

[3]

Zhang, C. and Xia, S. 2009. K-Means Algorithm Algorithm with Improved Initial Center. In Proceedings of the Knowledge Discovery and Data Mining, 2009. WKDD 2009. Second International Workshop (January 23-25, 2009). 790--792.

Digital Library

[4]

Krishna, K. and Murty, M. N. 1999. Genetic K-Means Algorithm. IEEE Transactions on Systems, Man, and Cybernetics,Part B (Cybernetics). 29, 3 (Jun. 1999), 433--439.

Digital Library

[5]

Redmond, S. J. and Heneghan, C. 2007. A method for initialising the K-Means Algorithm algorithm using kd-trees. Pattern Recognition Letters. 28, 8 (Jun. 2007), 965--973.

Digital Library

[6]

Ristic, D. M., Pavlovic, M. and Reljin, I. 2008. Image Segmentation Method Based on Self Organizing Maps and K-Means Algorithm. In Proceedings of the 9th Symposium on Neural Network Applications in Electrical Engineering (September 25-27, 2008). 27--30.

[7]

Gourgaris, P. and Makris, C. 2015. A Density Based k-Means Initialization Scheme. In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS) (Rhodes Island, Greece, 2015). ACM, 1--9.

Digital Library

[8]

Bala, C., Basu, T. and Dasgupta, A. 2015. Automatic detection of k with suitable seed values for classic k-means algorithm using DE. In Proceedings of the Advances in Computing, Communications and Informatics (ICACCI), International Conference (August 10-13, 2015). 759--765.

[9]

Laerhoven, K. 2001. Combining the Self-Organizing Map and K-Means Algorithm for On-Line Classification of Sensor Data. In Artificial Neural Networks --- ICANN 2001: International Conference (Vienna, Austria, August 21-25, 2001). Springer Berlin Heidelberg, 464--469.

Digital Library

[10]

Nasser, S., Alkhaldi, R. and Vert, G. 2006. A Modified Fuzzy K-Means Algorithm using Expectation Maximization. In Proceedings of the 2006 IEEE International Conference on Fuzzy Systems (2006), 231--235.

[11]

Calinski, T. and Harabasz, J. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods. 3, 1 (Jan. 1974), 1--27.

[12]

Kohavi, R. and John, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence. 97, 1-2 (Dec. 1997), 273--324.

Digital Library

[13]

Ding, C. and He, X. 2004. K-Means Algorithm via principal component analysis. In Proceedings of the Proceedings of the twenty-first international conference on Machine learning (Banff, Alberta, Canada, 2004). ACM, 29.

Digital Library

[14]

Ganda, R. and Chahar, V. 2013. A Comparative Study on Feature Selection Using Data Mining Tools. International Journal of Advanced Research in Computer Science and Software Engineering. 3, 9 (Sep. 2013), 26--32.

[15]

Lichman, M. 2013. UCI Machine Learning Repository {http://archive.ics.uci.edu/ml}. Irvine, CA: University of California, School of Information and Computer Science.

[16]

Sree, S. V., Ng, E. and Acharya, U. R. 2010. Data mining approach to evaluating the use of skin surface electro potentials for breast cancer detection. Technology in cancer research & treatment. 9, 1 (Feb. 2010), 95--105.

[17]

Hall, M. A. and Smith, L. A. 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. In Proceedings of the Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (1999). AAAI Press, 235--239.

Digital Library

[18]

Ilin, A. and Raiko, T. 2010. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. J. Mach. Learn. Res. 11 (Jul. 2010), 1957--2000.

Digital Library

[19]

Huan, L. and Lei, Y. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 17, 4 (Apr. 2005), 491--502.

Digital Library

[20]

Dempster, A. P., Laird, N.M. and Rudin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 39, (Jan. 1977), 1--38.

[21]

McCallum, A., Nigam, K. and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (Boston, Massachusetts, USA., 2000). ACM, 169--178.

Digital Library

[22]

Zhang, J., Dong, J. and Xiao, Y. 2015. A new method on finding optimal centers for improving K-means algorithm. In Proceedings of the Control and Decision Conference (CCDC), 27th (Chinese, May 23-25, 2015). 1827--1832.

[23]

Na, S., Xumin, L. and Yong, G. 2010. Research on K-Means Algorithm Algorithm: An Improved K-Means Algorithm Algorithm. In Proceedings of the Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium (April 2-4, 2010). 63--67.

Digital Library

Cited By

Derscanu SBresfelean VStanca LCiaca MVancea A(2022)Visual Tool for Stimulating Employee Intelligent AttitudeEducation, Research and Business Technologies10.1007/978-981-16-8866-9_32(383-395)Online publication date: 16-Apr-2022
https://doi.org/10.1007/978-981-16-8866-9_32
Kim J(2021)US House Price Prediction Using Two-Stage k-Means ClusteringThe Journal of Korean Institute of Information Technology10.14801/jkiit.2021.19.5.719:5(7-17)Online publication date: 31-May-2021
https://doi.org/10.14801/jkiit.2021.19.5.7
Sithambranathan MKasim SHassan MSyafiq Rodzuan N(2020)Identification of Gene of Melanoma Skin Cancer Using Clustering AlgorithmsInternational Journal of Data Science10.18517/ijods.1.1.51-56.20201:1(51-56)Online publication date: 11-May-2020
https://doi.org/10.18517/ijods.1.1.51-56.2020
Show More Cited By

Recommendations

An Improved K-means Algorithm Based on Multiple Clustering and Density
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

The initial clustering center set of the k-means algorithm is randomly selected, which leads to unstable clustering results. To address this shortcoming, many improved k-means algorithms based on density have propersed, but the time complexity of these ...
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Clustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Clustering stability-based Evolutionary K-Means

Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and Computing

December 2016

343 pages

ISBN:9781450347938

DOI:10.1145/3033288

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICNCC '16

ICNCC '16: Fifth International Conference on Network, Communication and Computing

December 17 - 21, 2016

Kyoto, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
387
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)4

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Derscanu SBresfelean VStanca LCiaca MVancea A(2022)Visual Tool for Stimulating Employee Intelligent AttitudeEducation, Research and Business Technologies10.1007/978-981-16-8866-9_32(383-395)Online publication date: 16-Apr-2022
https://doi.org/10.1007/978-981-16-8866-9_32
Kim J(2021)US House Price Prediction Using Two-Stage k-Means ClusteringThe Journal of Korean Institute of Information Technology10.14801/jkiit.2021.19.5.719:5(7-17)Online publication date: 31-May-2021
https://doi.org/10.14801/jkiit.2021.19.5.7
Sithambranathan MKasim SHassan MSyafiq Rodzuan N(2020)Identification of Gene of Melanoma Skin Cancer Using Clustering AlgorithmsInternational Journal of Data Science10.18517/ijods.1.1.51-56.20201:1(51-56)Online publication date: 11-May-2020
https://doi.org/10.18517/ijods.1.1.51-56.2020
He MZhang JZhang S(2019)ACTL: Adaptive Codebook Transfer Learning for Cross-Domain RecommendationIEEE Access10.1109/ACCESS.2019.28968817(19539-19549)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2896881

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten