ABSTRACT
In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.
- Jain, A. K., Murty, M. N. and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264--323. Google ScholarDigital Library
- MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Oakland, CA, USA., 1967). 281--297.Google Scholar
- Zhang, C. and Xia, S. 2009. K-Means Algorithm Algorithm with Improved Initial Center. In Proceedings of the Knowledge Discovery and Data Mining, 2009. WKDD 2009. Second International Workshop (January 23-25, 2009). 790--792. Google ScholarDigital Library
- Krishna, K. and Murty, M. N. 1999. Genetic K-Means Algorithm. IEEE Transactions on Systems, Man, and Cybernetics,Part B (Cybernetics). 29, 3 (Jun. 1999), 433--439. Google ScholarDigital Library
- Redmond, S. J. and Heneghan, C. 2007. A method for initialising the K-Means Algorithm algorithm using kd-trees. Pattern Recognition Letters. 28, 8 (Jun. 2007), 965--973. Google ScholarDigital Library
- Ristic, D. M., Pavlovic, M. and Reljin, I. 2008. Image Segmentation Method Based on Self Organizing Maps and K-Means Algorithm. In Proceedings of the 9th Symposium on Neural Network Applications in Electrical Engineering (September 25-27, 2008). 27--30.Google Scholar
- Gourgaris, P. and Makris, C. 2015. A Density Based k-Means Initialization Scheme. In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS) (Rhodes Island, Greece, 2015). ACM, 1--9. Google ScholarDigital Library
- Bala, C., Basu, T. and Dasgupta, A. 2015. Automatic detection of k with suitable seed values for classic k-means algorithm using DE. In Proceedings of the Advances in Computing, Communications and Informatics (ICACCI), International Conference (August 10-13, 2015). 759--765.Google Scholar
- Laerhoven, K. 2001. Combining the Self-Organizing Map and K-Means Algorithm for On-Line Classification of Sensor Data. In Artificial Neural Networks --- ICANN 2001: International Conference (Vienna, Austria, August 21-25, 2001). Springer Berlin Heidelberg, 464--469. Google ScholarDigital Library
- Nasser, S., Alkhaldi, R. and Vert, G. 2006. A Modified Fuzzy K-Means Algorithm using Expectation Maximization. In Proceedings of the 2006 IEEE International Conference on Fuzzy Systems (2006), 231--235.Google Scholar
- Calinski, T. and Harabasz, J. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods. 3, 1 (Jan. 1974), 1--27.Google ScholarCross Ref
- Kohavi, R. and John, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence. 97, 1-2 (Dec. 1997), 273--324. Google ScholarDigital Library
- Ding, C. and He, X. 2004. K-Means Algorithm via principal component analysis. In Proceedings of the Proceedings of the twenty-first international conference on Machine learning (Banff, Alberta, Canada, 2004). ACM, 29. Google ScholarDigital Library
- Ganda, R. and Chahar, V. 2013. A Comparative Study on Feature Selection Using Data Mining Tools. International Journal of Advanced Research in Computer Science and Software Engineering. 3, 9 (Sep. 2013), 26--32.Google Scholar
- Lichman, M. 2013. UCI Machine Learning Repository {http://archive.ics.uci.edu/ml}. Irvine, CA: University of California, School of Information and Computer Science.Google Scholar
- Sree, S. V., Ng, E. and Acharya, U. R. 2010. Data mining approach to evaluating the use of skin surface electro potentials for breast cancer detection. Technology in cancer research & treatment. 9, 1 (Feb. 2010), 95--105.Google Scholar
- Hall, M. A. and Smith, L. A. 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. In Proceedings of the Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (1999). AAAI Press, 235--239. Google ScholarDigital Library
- Ilin, A. and Raiko, T. 2010. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. J. Mach. Learn. Res. 11 (Jul. 2010), 1957--2000. Google ScholarDigital Library
- Huan, L. and Lei, Y. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 17, 4 (Apr. 2005), 491--502. Google ScholarDigital Library
- Dempster, A. P., Laird, N.M. and Rudin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 39, (Jan. 1977), 1--38.Google Scholar
- McCallum, A., Nigam, K. and Ungar, L. H. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (Boston, Massachusetts, USA., 2000). ACM, 169--178. Google ScholarDigital Library
- Zhang, J., Dong, J. and Xiao, Y. 2015. A new method on finding optimal centers for improving K-means algorithm. In Proceedings of the Control and Decision Conference (CCDC), 27th (Chinese, May 23-25, 2015). 1827--1832.Google Scholar
- Na, S., Xumin, L. and Yong, G. 2010. Research on K-Means Algorithm Algorithm: An Improved K-Means Algorithm Algorithm. In Proceedings of the Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium (April 2-4, 2010). 63--67. Google ScholarDigital Library
Recommendations
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information TechnologyClustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Clustering stability-based Evolutionary K-Means
Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...
Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global InformatizationIn this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Comments