ABSTRACT
The two key steps of the K-means algorithm are the selection of the clustering number and the selection of the initial clustering center, which will seriously affect the classification accuracy and efficiency of K-means, and need further optimization. Aiming at the selection of the number of clusters, a K-means optimization method based on adaptive parallel hierarchical clustering is proposed. In the merging process of hierarchical clustering, the optimal number of clusters is selected adaptively by improving the clustering effect evaluation function, and the Parallel computing method is used instead of the serial computing method to improve the computing speed. Aiming at selecting cluster centers more accurately, an optimized data density model is proposed to make full use of potentially related information between samples, which improves the classification accuracy of the algorithm. More importantly, it overcomes the problem of the strong subjectivity of super-parameter selection. The improved algorithm was tested with the ablation experiment method and compared to other traditional algorithms on iris and seed data sets. The results showed that the optimization algorithm could accelerate the calculation speed and improve the classification accuracy.
- Li Peng. Research on Hierarchical K-means based clustering algorithm [D]. Harbin: Harbin Engineering University, 2015.Google Scholar
- SHI Xiaoyu, TANG Xiaoyu, WANG Xiaoli, SUN Yaming, QI Zixuan, ZHANG Yanxin. Cluster Analysis of Dairy Consumption Preference in Hebei Province Based on K-means Clustering [J]. Journal of Hebei Agricultural Sciences, 2021, 25 (2): 29-33.Google Scholar
- Pelleg, Dan & Moore, Andrew. (2002). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. Machine Learning, p.Google Scholar
- Redmond S J, Heneghan C.A Method for Initialising the K − means Clustering Algorithm Using Kd − trees[J]. Pattern Recognition Letters, 2007, 28( 8) : 965 − 973.Google ScholarDigital Library
- Jia Ruiyu, Song Jianlin. K-means Optimal Clustering Number Determination MethodBased on Clustering Center Optimization. MICROELECTRONICS &COMPUTER, 2016, 33(5): 62-66.Google Scholar
- WANG S,LIU C,XING S J. Review on K-means clustering algorithm[J]. Journal of East China Jiaotong University, 2022, 39(5): 119-126.Google Scholar
- LI Y S, YANG S L, MA X J, HU X X, CHEN Z M. Optimization Study on K Value of Spatial Clustering[J]. Journal of System Simulation, 2006, 18(3): 573-576.Google Scholar
- He Xuansen, He Fan, Xu Li, Fan Yueping. Determination of the Optimal Number of Clusters in K-Means Algorithm[J]. Journal of University of Electronic Science and Technology of China, 2022,51(6): 904 – 912.Google Scholar
- Li Chunfang, Zhang Ruifeng, Jia Lu, Wang Fang, Guo Fei. A new electricity stealing identification model and simulation based on improved k-means algorithm and big data analysis [J].Electronic Design Engineering, 2022, 30(22) : 84-88.Google Scholar
- WANG Zhong, LIU Gui-Quan, CHEN En-Hong. A K-means Algorithm Based on Optimized Initial Center Points. PR&AI, 2009, 22(2): 299−303.Google Scholar
- Jones D R,Beltramo M A.Solving Partitioning Problems with Genetic Algorithms[C]. In: Proceedings of the 4th International Conference Genetic Algorithms, San Diego,CA,USA. 1991: 442 − 494.Google Scholar
- Lai Yuxia, Liu Jianping, Yang Guoxing. K-Means Clustering Analysis Based on Genetic Algorithm[J]. Computer Engineering, 2008, 34(20):200-202.Google Scholar
- Zhang Chao. K-means Clustering Center Selection [J]. Journal of Jilin University, 2019, 37(4):437-441Google Scholar
- Tao Yonghui, Wang Yong. Improved K-means algorithm based on the selection of initial clustering center [J].theories and methods, 2022,41(9):54 – 59.Google Scholar
- Sun Lin, Liu Menghan, Xu Jiucheng .K-means Clustering Algorithm Using Optimal Initial Clustering Center and Contour Coefficie. Fuzzy Systems and Mathematics, 2022, 36(1):47-64.Google Scholar
- Novoselsky, Alexander & Kagan, Eugene. (2021). An introduction to cluster analysis. 10.13140/RG.2.2.25993.57448/1.Google Scholar
- HAN Ling-bo,WANG Qiang,JIANG Zheng-feng,et al.Improved k-means initial clustering center selection algorithm. Computer Engineering and Applications,2010,46(17):150-152.Google Scholar
- X. Wu, Z. Chen, S. Yuan, J. Wei and X. Wang, "An improved k-means algorithm based on density normalization," 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 2021, pp. 1141-1146, doi: 10.1109/ICIBA52610.2021.9687899.Google ScholarCross Ref
- Mitra P, Murthy C A, Pal S K. Density-based multiscale data condensation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(6): 734-747.Google ScholarDigital Library
- CAI Yuhao, LIANG Yongquan, FAN Jiancong, LI Xuan, LIU Wenhua . Optimizing Initial Cluster Centroids by Weighted Local Variance in K-means Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(5): 732-741.Google Scholar
- Rezaee M R, Lelieveldt B P, Reiber J H. A New Cluster Validity Index for the Fuzzy C-Means[J].Pattern Recognition Letters,1998, 19( 3 − 4) : 237 − 246.Google Scholar
- Chen Yin, He Zhongshi . The study on improved K-means algorithm[J]. Manufacturing automation, 2012, 34(4):19-22.Google Scholar
- Huang He, Xiong Wu, Wu Kun, Wang Huifeng. K-means Hybrid Iterative Clustering Basedon Memory Transfer Sail fish Optimization.JOURNAL OF SHANGHAIJIAOTONG UNIVERSITY, 2022, 56(12) :1638-1648.Google Scholar
Index Terms
- K-means Optimization Method Based On Adaptive Parallel Hierarchical Clustering
Recommendations
Hierarchical Means Clustering
AbstractIn the cluster analysis literature, there are several partitioning (non-hierarchical) methods for clustering multivariate objects based on model estimation. Distinct to these methods is the use of a system of n nested statistical models and the ...
Survey of Clustering: Algorithms and Applications
This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of ...
An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization
Clustering is the process of grouping data objects into set of disjoint classes called clusters so that objects within a class are highly similar with one another and dissimilar with the objects in other classes. K-means (KM) algorithm is one of the ...
Comments