skip to main content
10.1145/3033288.3033347acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicnccConference Proceedingsconference-collections
research-article

Two Step Clustering Model for K-Means Algorithm

Published: 17 December 2016 Publication History

Abstract

In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.

References

[1]
Jain, A. K., Murty, M. N. and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264--323.
[2]
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Oakland, CA, USA., 1967). 281--297.
[3]
Zhang, C. and Xia, S. 2009. K-Means Algorithm Algorithm with Improved Initial Center. In Proceedings of the Knowledge Discovery and Data Mining, 2009. WKDD 2009. Second International Workshop (January 23-25, 2009). 790--792.
[4]
Krishna, K. and Murty, M. N. 1999. Genetic K-Means Algorithm. IEEE Transactions on Systems, Man, and Cybernetics,Part B (Cybernetics). 29, 3 (Jun. 1999), 433--439.
[5]
Redmond, S. J. and Heneghan, C. 2007. A method for initialising the K-Means Algorithm algorithm using kd-trees. Pattern Recognition Letters. 28, 8 (Jun. 2007), 965--973.
[6]
Ristic, D. M., Pavlovic, M. and Reljin, I. 2008. Image Segmentation Method Based on Self Organizing Maps and K-Means Algorithm. In Proceedings of the 9th Symposium on Neural Network Applications in Electrical Engineering (September 25-27, 2008). 27--30.
[7]
Gourgaris, P. and Makris, C. 2015. A Density Based k-Means Initialization Scheme. In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS) (Rhodes Island, Greece, 2015). ACM, 1--9.
[8]
Bala, C., Basu, T. and Dasgupta, A. 2015. Automatic detection of k with suitable seed values for classic k-means algorithm using DE. In Proceedings of the Advances in Computing, Communications and Informatics (ICACCI), International Conference (August 10-13, 2015). 759--765.
[9]
Laerhoven, K. 2001. Combining the Self-Organizing Map and K-Means Algorithm for On-Line Classification of Sensor Data. In Artificial Neural Networks --- ICANN 2001: International Conference (Vienna, Austria, August 21-25, 2001). Springer Berlin Heidelberg, 464--469.
[10]
Nasser, S., Alkhaldi, R. and Vert, G. 2006. A Modified Fuzzy K-Means Algorithm using Expectation Maximization. In Proceedings of the 2006 IEEE International Conference on Fuzzy Systems (2006), 231--235.
[11]
Calinski, T. and Harabasz, J. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods. 3, 1 (Jan. 1974), 1--27.
[12]
Kohavi, R. and John, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence. 97, 1-2 (Dec. 1997), 273--324.
[13]
Ding, C. and He, X. 2004. K-Means Algorithm via principal component analysis. In Proceedings of the Proceedings of the twenty-first international conference on Machine learning (Banff, Alberta, Canada, 2004). ACM, 29.
[14]
Ganda, R. and Chahar, V. 2013. A Comparative Study on Feature Selection Using Data Mining Tools. International Journal of Advanced Research in Computer Science and Software Engineering. 3, 9 (Sep. 2013), 26--32.
[15]
Lichman, M. 2013. UCI Machine Learning Repository {http://archive.ics.uci.edu/ml}. Irvine, CA: University of California, School of Information and Computer Science.
[16]
Sree, S. V., Ng, E. and Acharya, U. R. 2010. Data mining approach to evaluating the use of skin surface electro potentials for breast cancer detection. Technology in cancer research & treatment. 9, 1 (Feb. 2010), 95--105.
[17]
Hall, M. A. and Smith, L. A. 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. In Proceedings of the Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (1999). AAAI Press, 235--239.
[18]
Ilin, A. and Raiko, T. 2010. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. J. Mach. Learn. Res. 11 (Jul. 2010), 1957--2000.
[19]
Huan, L. and Lei, Y. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 17, 4 (Apr. 2005), 491--502.
[20]
Dempster, A. P., Laird, N.M. and Rudin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 39, (Jan. 1977), 1--38.
[21]
McCallum, A., Nigam, K. and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (Boston, Massachusetts, USA., 2000). ACM, 169--178.
[22]
Zhang, J., Dong, J. and Xiao, Y. 2015. A new method on finding optimal centers for improving K-means algorithm. In Proceedings of the Control and Decision Conference (CCDC), 27th (Chinese, May 23-25, 2015). 1827--1832.
[23]
Na, S., Xumin, L. and Yong, G. 2010. Research on K-Means Algorithm Algorithm: An Improved K-Means Algorithm Algorithm. In Proceedings of the Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium (April 2-4, 2010). 63--67.

Cited By

View all
  • (2022)Visual Tool for Stimulating Employee Intelligent AttitudeEducation, Research and Business Technologies10.1007/978-981-16-8866-9_32(383-395)Online publication date: 16-Apr-2022
  • (2021)US House Price Prediction Using Two-Stage k-Means ClusteringThe Journal of Korean Institute of Information Technology10.14801/jkiit.2021.19.5.719:5(7-17)Online publication date: 31-May-2021
  • (2020)Identification of Gene of Melanoma Skin Cancer Using Clustering AlgorithmsInternational Journal of Data Science10.18517/ijods.1.1.51-56.20201:1(51-56)Online publication date: 11-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and Computing
December 2016
343 pages
ISBN:9781450347938
DOI:10.1145/3033288
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Feature Selection
  2. Hybrid Model
  3. K-Means Algorithm
  4. Unsupervised learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICNCC '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Visual Tool for Stimulating Employee Intelligent AttitudeEducation, Research and Business Technologies10.1007/978-981-16-8866-9_32(383-395)Online publication date: 16-Apr-2022
  • (2021)US House Price Prediction Using Two-Stage k-Means ClusteringThe Journal of Korean Institute of Information Technology10.14801/jkiit.2021.19.5.719:5(7-17)Online publication date: 31-May-2021
  • (2020)Identification of Gene of Melanoma Skin Cancer Using Clustering AlgorithmsInternational Journal of Data Science10.18517/ijods.1.1.51-56.20201:1(51-56)Online publication date: 11-May-2020
  • (2019)ACTL: Adaptive Codebook Transfer Learning for Cross-Domain RecommendationIEEE Access10.1109/ACCESS.2019.28968817(19539-19549)Online publication date: 2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media