skip to main content
10.1145/3033288.3033347acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicnccConference Proceedingsconference-collections
research-article

Two Step Clustering Model for K-Means Algorithm

Authors Info & Claims
Published:17 December 2016Publication History

ABSTRACT

In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.

References

  1. Jain, A. K., Murty, M. N. and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Oakland, CA, USA., 1967). 281--297.Google ScholarGoogle Scholar
  3. Zhang, C. and Xia, S. 2009. K-Means Algorithm Algorithm with Improved Initial Center. In Proceedings of the Knowledge Discovery and Data Mining, 2009. WKDD 2009. Second International Workshop (January 23-25, 2009). 790--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Krishna, K. and Murty, M. N. 1999. Genetic K-Means Algorithm. IEEE Transactions on Systems, Man, and Cybernetics,Part B (Cybernetics). 29, 3 (Jun. 1999), 433--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Redmond, S. J. and Heneghan, C. 2007. A method for initialising the K-Means Algorithm algorithm using kd-trees. Pattern Recognition Letters. 28, 8 (Jun. 2007), 965--973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ristic, D. M., Pavlovic, M. and Reljin, I. 2008. Image Segmentation Method Based on Self Organizing Maps and K-Means Algorithm. In Proceedings of the 9th Symposium on Neural Network Applications in Electrical Engineering (September 25-27, 2008). 27--30.Google ScholarGoogle Scholar
  7. Gourgaris, P. and Makris, C. 2015. A Density Based k-Means Initialization Scheme. In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS) (Rhodes Island, Greece, 2015). ACM, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bala, C., Basu, T. and Dasgupta, A. 2015. Automatic detection of k with suitable seed values for classic k-means algorithm using DE. In Proceedings of the Advances in Computing, Communications and Informatics (ICACCI), International Conference (August 10-13, 2015). 759--765.Google ScholarGoogle Scholar
  9. Laerhoven, K. 2001. Combining the Self-Organizing Map and K-Means Algorithm for On-Line Classification of Sensor Data. In Artificial Neural Networks --- ICANN 2001: International Conference (Vienna, Austria, August 21-25, 2001). Springer Berlin Heidelberg, 464--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nasser, S., Alkhaldi, R. and Vert, G. 2006. A Modified Fuzzy K-Means Algorithm using Expectation Maximization. In Proceedings of the 2006 IEEE International Conference on Fuzzy Systems (2006), 231--235.Google ScholarGoogle Scholar
  11. Calinski, T. and Harabasz, J. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods. 3, 1 (Jan. 1974), 1--27.Google ScholarGoogle ScholarCross RefCross Ref
  12. Kohavi, R. and John, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence. 97, 1-2 (Dec. 1997), 273--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ding, C. and He, X. 2004. K-Means Algorithm via principal component analysis. In Proceedings of the Proceedings of the twenty-first international conference on Machine learning (Banff, Alberta, Canada, 2004). ACM, 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ganda, R. and Chahar, V. 2013. A Comparative Study on Feature Selection Using Data Mining Tools. International Journal of Advanced Research in Computer Science and Software Engineering. 3, 9 (Sep. 2013), 26--32.Google ScholarGoogle Scholar
  15. Lichman, M. 2013. UCI Machine Learning Repository {http://archive.ics.uci.edu/ml}. Irvine, CA: University of California, School of Information and Computer Science.Google ScholarGoogle Scholar
  16. Sree, S. V., Ng, E. and Acharya, U. R. 2010. Data mining approach to evaluating the use of skin surface electro potentials for breast cancer detection. Technology in cancer research & treatment. 9, 1 (Feb. 2010), 95--105.Google ScholarGoogle Scholar
  17. Hall, M. A. and Smith, L. A. 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. In Proceedings of the Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (1999). AAAI Press, 235--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ilin, A. and Raiko, T. 2010. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. J. Mach. Learn. Res. 11 (Jul. 2010), 1957--2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Huan, L. and Lei, Y. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 17, 4 (Apr. 2005), 491--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dempster, A. P., Laird, N.M. and Rudin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 39, (Jan. 1977), 1--38.Google ScholarGoogle Scholar
  21. McCallum, A., Nigam, K. and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (Boston, Massachusetts, USA., 2000). ACM, 169--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhang, J., Dong, J. and Xiao, Y. 2015. A new method on finding optimal centers for improving K-means algorithm. In Proceedings of the Control and Decision Conference (CCDC), 27th (Chinese, May 23-25, 2015). 1827--1832.Google ScholarGoogle Scholar
  23. Na, S., Xumin, L. and Yong, G. 2010. Research on K-Means Algorithm Algorithm: An Improved K-Means Algorithm Algorithm. In Proceedings of the Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium (April 2-4, 2010). 63--67. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and Computing
    December 2016
    343 pages
    ISBN:9781450347938
    DOI:10.1145/3033288

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 December 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader