Abstract
As one of the important techniques of data analysis, a clustering algorithm is widely used in data mining, image recognition, information extraction, pattern recognition and other fields. In the era of big data, with the rapid development of web applications, much of the data to be processed is characterized by massive and dynamic growth. Under this background, how to cluster the incremental data becomes a challenging problem that clustering algorithms face. In this paper, we proposed a limited incremental clustering algorithm with respect to cluster stability. Based on an assumption that the categories of data are limited, we take advantage of an existing clustering structure and deal with incremental data steadily with respect to cluster stability. Cluster reconstruction will be triggered when the stability does no longer hold or a buffer pool for undetermined data is full. At the end of the paper, we implement the limited incremental clustering algorithm with K-means. Meanwhile, we use an average density of clusters and the global stability to choose a proper value of K.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining, Boston, MA, USA, 20–23 August 2000
Sun, J., Liu, J., Zhao, L.: Clustering algorithms research. J. Softw. 19(1), 48–61 (2008)
Li, T., Anand, S.S.: Hirel: an incremental clustering algorithm for relational datasets. In: Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 887–892. IEEE (2008)
Gupta, C., Grossman, R.L.: GenIc: a single-pass generalized incremental algorithm for clustering. In: SDM 2004, pp. 147–153 (2004)
Charikar, M., Chekuri, C., Feder, T., et al.: Incremental clustering and dynamic information retrieval. In: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp. 626–635. ACM (1997)
Azzopardi, J., Staff, C.: Incremental clustering of news reports. Algorithms 5(3), 364–378 (2012)
Berkhin, P.: A Survey of Clustering Data Mining Techniques. Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)
Jing, L., Ng, M.K., Huang, J.Z.: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19(8), 1026–1041 (2007)
Rezaee, M.R., Lelieveldt, B.P., Reiber, J.H.: A new cluster validity index for the fuzzy C-means. Pattern Recogn. Lett. 19(3–4), 237–246 (1998)
Acknowledgements
The work of the paper is partially supported by National Natural Science Foundation of China (No. 61303097) and Ph.D. Programs Foundation of Ministry of Education of China (No. 20123108120026).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhu, W., Yao, W., Dai, S., Lu, Z. (2016). A Limited Incremental Clustering Algorithm with Respect to Cluster Stability. In: Xie, J., Chen, Z., Douglas, C., Zhang, W., Chen, Y. (eds) High Performance Computing and Applications. HPCA 2015. Lecture Notes in Computer Science(), vol 9576. Springer, Cham. https://doi.org/10.1007/978-3-319-32557-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-32557-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32556-9
Online ISBN: 978-3-319-32557-6
eBook Packages: Computer ScienceComputer Science (R0)