Abstract
Sequence clustering has become an important topic that experts in data mining are currently investigating. However, clustering quality is typically significantly affected by both the selection of initial centers and the mean sequences. In this study, the sequence clustering algorithm based on weighted vector identification (SCAWVI) algorithm is developed based on sequence element composite similarity and the weight of a sequence in its corresponding cluster. Based on the weighted sequence element, all sequences in the sequence database are preprocessed into M-dimensional weighted vector identifications. Then, using Huffman-based initial clustering centers optimization algorithm, the initial clustering centers are optimized. In addition, the weighted vector identification and the weight of a sequence in its corresponding cluster are used to update the clustering centers. The theoretical experimental results and the analysis results in this study show that the SCAWVI algorithm has a higher rate of accurate results in its clustering results and higher execution efficiency.
Similar content being viewed by others
References
Vincent M, Simon P, Vincent D (2012) High-quality sequence clustering guided by network topology and multiple alignment likelihood. Int J Bioinform 28(8):1078–1085
Ma JJ, Tian DY, Gong MG (2014) Fuzzy clustering with non-local information for image segmentation. Int J Mach Learn Cybern 5(6):109–118
Li HL (2015) On-line and dynamic time warping for time series data mining. Int J Mach Learn Cybern 6(1):145–153
Yang J, Hong P, Huang XL, Zhang JR, Shi P (2014) A novel clustering algorithm based on P systems. Int J Innov Comput Inf Control 10(2):753–765
Lipovetsky S (2013) Additive and multiplicative mixed normal distributions and finding cluster centers. Int J Mach Learn Cybern 4:1–11
Arai K, Barakbah AR (2007) Hierarchical K-means: an algorithm for centroids initialization for K-means. J Rep Fac Sci Eng 36(1):25–31
Li YX, Shi YM, Li GY (2011) Research on K-means algorithm based on concept lattice. Comput Eng Des 32(2):913–916
Xie JY, Guo WJ, Xie WX, Gao XB (2012) K-means clustering algorithm based on optimal initial centers related to pattern distribution of samples in space. J Appl Res Comput 29(3):888–892
Liu N, Chen F, Lu MY (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybern 4:75–83
Xie JY, Guo WJ, Xie WX (2012) A neighborhood-based K-medoids clustering algorithm. J Shaanxi Norm Univ (Nat Sci Ed) 40(4):16–22
Wan XJ, Yang JW, Chen XO (2003) An improved K-means algorithm for documents clustering. Comput Eng 29(2):102–104
Zheng JS, Zhang JH, Bai FL, Ma LX (2012) Similarity analysis of DNA sequences based on the MQ-EMD method. Int J Comput Inf Syst 8(23):9823–9830
Li S, Mu WS, Qi B, Zhou ZJ (2015) A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int J Mach Learn Cybern 6(1):109–118
Morzy T, Wojciechowski M, Zakrzewicz M (2001) Scalable hierarchical clustering method for sequences of categorical values. In: Proceedings of the 5th Pacific-Asia conference on knowledge discovery and data mining ( PAKDD). Lecture notes in computer science, vol 2035. Springer, pp 282–293
Hu XG, Zhang YY (2008) Clustering sequences using sequential patterns. J Hefei Univ Technol 31(14):9–12
Pham TT, Luo JW, Hong TP, Vo B (2013) An efficient algorithm for mining sequential rules with interestingness measures. Intl J Innov Comput Inf Control 9(12):4811–4824
Yang TX, Wang ZH, Wang H, Wang LY (2010) Research of clustering initial center selection. J Nanjing Norm Univ (Nat Sci Ed) 33(4):161–165
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61170190), the Nature Science Foundation of Hebei Province (No. F2015402114) and the Science and Technology Research and Development Program of Handan (No. 1321103077-3). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation of this study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, D., Ren, J. Sequence clustering algorithm based on weighted vector identification. Int. J. Mach. Learn. & Cyber. 8, 731–738 (2017). https://doi.org/10.1007/s13042-015-0381-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0381-2