Skip to main content
Log in

Sequence clustering algorithm based on weighted vector identification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Sequence clustering has become an important topic that experts in data mining are currently investigating. However, clustering quality is typically significantly affected by both the selection of initial centers and the mean sequences. In this study, the sequence clustering algorithm based on weighted vector identification (SCAWVI) algorithm is developed based on sequence element composite similarity and the weight of a sequence in its corresponding cluster. Based on the weighted sequence element, all sequences in the sequence database are preprocessed into M-dimensional weighted vector identifications. Then, using Huffman-based initial clustering centers optimization algorithm, the initial clustering centers are optimized. In addition, the weighted vector identification and the weight of a sequence in its corresponding cluster are used to update the clustering centers. The theoretical experimental results and the analysis results in this study show that the SCAWVI algorithm has a higher rate of accurate results in its clustering results and higher execution efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Vincent M, Simon P, Vincent D (2012) High-quality sequence clustering guided by network topology and multiple alignment likelihood. Int J Bioinform 28(8):1078–1085

    Article  Google Scholar 

  2. Ma JJ, Tian DY, Gong MG (2014) Fuzzy clustering with non-local information for image segmentation. Int J Mach Learn Cybern 5(6):109–118

    Article  Google Scholar 

  3. Li HL (2015) On-line and dynamic time warping for time series data mining. Int J Mach Learn Cybern 6(1):145–153

    Article  Google Scholar 

  4. Yang J, Hong P, Huang XL, Zhang JR, Shi P (2014) A novel clustering algorithm based on P systems. Int J Innov Comput Inf Control 10(2):753–765

    Google Scholar 

  5. Lipovetsky S (2013) Additive and multiplicative mixed normal distributions and finding cluster centers. Int J Mach Learn Cybern 4:1–11

    Article  Google Scholar 

  6. Arai K, Barakbah AR (2007) Hierarchical K-means: an algorithm for centroids initialization for K-means. J Rep Fac Sci Eng 36(1):25–31

    Google Scholar 

  7. Li YX, Shi YM, Li GY (2011) Research on K-means algorithm based on concept lattice. Comput Eng Des 32(2):913–916

    Google Scholar 

  8. Xie JY, Guo WJ, Xie WX, Gao XB (2012) K-means clustering algorithm based on optimal initial centers related to pattern distribution of samples in space. J Appl Res Comput 29(3):888–892

    Google Scholar 

  9. Liu N, Chen F, Lu MY (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybern 4:75–83

    Article  Google Scholar 

  10. Xie JY, Guo WJ, Xie WX (2012) A neighborhood-based K-medoids clustering algorithm. J Shaanxi Norm Univ (Nat Sci Ed) 40(4):16–22

    Google Scholar 

  11. Wan XJ, Yang JW, Chen XO (2003) An improved K-means algorithm for documents clustering. Comput Eng 29(2):102–104

    Google Scholar 

  12. Zheng JS, Zhang JH, Bai FL, Ma LX (2012) Similarity analysis of DNA sequences based on the MQ-EMD method. Int J Comput Inf Syst 8(23):9823–9830

    Google Scholar 

  13. Li S, Mu WS, Qi B, Zhou ZJ (2015) A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int J Mach Learn Cybern 6(1):109–118

    Article  Google Scholar 

  14. Morzy T, Wojciechowski M, Zakrzewicz M (2001) Scalable hierarchical clustering method for sequences of categorical values. In: Proceedings of the 5th Pacific-Asia conference on knowledge discovery and data mining ( PAKDD). Lecture notes in computer science, vol 2035. Springer, pp 282–293

  15. Hu XG, Zhang YY (2008) Clustering sequences using sequential patterns. J Hefei Univ Technol 31(14):9–12

    Google Scholar 

  16. Pham TT, Luo JW, Hong TP, Vo B (2013) An efficient algorithm for mining sequential rules with interestingness measures. Intl J Innov Comput Inf Control 9(12):4811–4824

    Google Scholar 

  17. Yang TX, Wang ZH, Wang H, Wang LY (2010) Research of clustering initial center selection. J Nanjing Norm Univ (Nat Sci Ed) 33(4):161–165

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61170190), the Nature Science Foundation of Hebei Province (No. F2015402114) and the Science and Technology Research and Development Program of Handan (No. 1321103077-3). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Di Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, D., Ren, J. Sequence clustering algorithm based on weighted vector identification. Int. J. Mach. Learn. & Cyber. 8, 731–738 (2017). https://doi.org/10.1007/s13042-015-0381-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-015-0381-2

Keywords

Navigation