Sequence clustering algorithm based on weighted vector identification

Wu, Di; Ren, Jiadong

doi:10.1007/s13042-015-0381-2

Sequence clustering algorithm based on weighted vector identification

Original Article
Published: 03 June 2015

Volume 8, pages 731–738, (2017)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Di Wu¹ &
Jiadong Ren²

238 Accesses
2 Citations
Explore all metrics

Abstract

Sequence clustering has become an important topic that experts in data mining are currently investigating. However, clustering quality is typically significantly affected by both the selection of initial centers and the mean sequences. In this study, the sequence clustering algorithm based on weighted vector identification (SCAWVI) algorithm is developed based on sequence element composite similarity and the weight of a sequence in its corresponding cluster. Based on the weighted sequence element, all sequences in the sequence database are preprocessed into M-dimensional weighted vector identifications. Then, using Huffman-based initial clustering centers optimization algorithm, the initial clustering centers are optimized. In addition, the weighted vector identification and the weight of a sequence in its corresponding cluster are used to update the clustering centers. The theoretical experimental results and the analysis results in this study show that the SCAWVI algorithm has a higher rate of accurate results in its clustering results and higher execution efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Clustering Identification Method Based on Path Sampling in Support Vector Clustering

A novel sequence space related to $\mathcal{L}_{p}$ defined by Orlicz function with application in pattern recognition

Article Open access 06 December 2017

An Inflection Point Based Clustering Method for Sequence Data

References

Vincent M, Simon P, Vincent D (2012) High-quality sequence clustering guided by network topology and multiple alignment likelihood. Int J Bioinform 28(8):1078–1085
Article Google Scholar
Ma JJ, Tian DY, Gong MG (2014) Fuzzy clustering with non-local information for image segmentation. Int J Mach Learn Cybern 5(6):109–118
Article Google Scholar
Li HL (2015) On-line and dynamic time warping for time series data mining. Int J Mach Learn Cybern 6(1):145–153
Article Google Scholar
Yang J, Hong P, Huang XL, Zhang JR, Shi P (2014) A novel clustering algorithm based on P systems. Int J Innov Comput Inf Control 10(2):753–765
Google Scholar
Lipovetsky S (2013) Additive and multiplicative mixed normal distributions and finding cluster centers. Int J Mach Learn Cybern 4:1–11
Article Google Scholar
Arai K, Barakbah AR (2007) Hierarchical K-means: an algorithm for centroids initialization for K-means. J Rep Fac Sci Eng 36(1):25–31
Google Scholar
Li YX, Shi YM, Li GY (2011) Research on K-means algorithm based on concept lattice. Comput Eng Des 32(2):913–916
Google Scholar
Xie JY, Guo WJ, Xie WX, Gao XB (2012) K-means clustering algorithm based on optimal initial centers related to pattern distribution of samples in space. J Appl Res Comput 29(3):888–892
Google Scholar
Liu N, Chen F, Lu MY (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybern 4:75–83
Article Google Scholar
Xie JY, Guo WJ, Xie WX (2012) A neighborhood-based K-medoids clustering algorithm. J Shaanxi Norm Univ (Nat Sci Ed) 40(4):16–22
Google Scholar
Wan XJ, Yang JW, Chen XO (2003) An improved K-means algorithm for documents clustering. Comput Eng 29(2):102–104
Google Scholar
Zheng JS, Zhang JH, Bai FL, Ma LX (2012) Similarity analysis of DNA sequences based on the MQ-EMD method. Int J Comput Inf Syst 8(23):9823–9830
Google Scholar
Li S, Mu WS, Qi B, Zhou ZJ (2015) A new privacy-preserving proximal support vector machine for classification of vertically partitioned data. Int J Mach Learn Cybern 6(1):109–118
Article Google Scholar
Morzy T, Wojciechowski M, Zakrzewicz M (2001) Scalable hierarchical clustering method for sequences of categorical values. In: Proceedings of the 5th Pacific-Asia conference on knowledge discovery and data mining ( PAKDD). Lecture notes in computer science, vol 2035. Springer, pp 282–293
Hu XG, Zhang YY (2008) Clustering sequences using sequential patterns. J Hefei Univ Technol 31(14):9–12
Google Scholar
Pham TT, Luo JW, Hong TP, Vo B (2013) An efficient algorithm for mining sequential rules with interestingness measures. Intl J Innov Comput Inf Control 9(12):4811–4824
Google Scholar
Yang TX, Wang ZH, Wang H, Wang LY (2010) Research of clustering initial center selection. J Nanjing Norm Univ (Nat Sci Ed) 33(4):161–165

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61170190), the Nature Science Foundation of Hebei Province (No. F2015402114) and the Science and Technology Research and Development Program of Handan (No. 1321103077-3). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation of this study.

Author information

Authors and Affiliations

Hebei University of Engineering, Handan, Hebei, China
Di Wu
Yanshan University, Qinghuangdao, Hebei, China
Jiadong Ren

Authors

Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiadong Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, D., Ren, J. Sequence clustering algorithm based on weighted vector identification. Int. J. Mach. Learn. & Cyber. 8, 731–738 (2017). https://doi.org/10.1007/s13042-015-0381-2

Download citation

Received: 25 May 2014
Accepted: 23 May 2015
Published: 03 June 2015
Issue Date: June 2017
DOI: https://doi.org/10.1007/s13042-015-0381-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequence clustering algorithm based on weighted vector identification

Abstract

Access this article

Similar content being viewed by others

Research on Clustering Identification Method Based on Path Sampling in Support Vector Clustering

A novel sequence space related to $\mathcal{L}_{p}$ defined by Orlicz function with application in pattern recognition

An Inflection Point Based Clustering Method for Sequence Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequence clustering algorithm based on weighted vector identification

Abstract

Access this article

Similar content being viewed by others

Research on Clustering Identification Method Based on Path Sampling in Support Vector Clustering

A novel sequence space related to $\mathcal{L}_{p}$ defined by Orlicz function with application in pattern recognition

An Inflection Point Based Clustering Method for Sequence Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation