FGCH: a fast and grid based clustering algorithm for hybrid data stream

Chen, Jinyin; Lin, Xiang; Xuan, Qi; Xiang, Yun

doi:10.1007/s10489-018-1324-x

FGCH: a fast and grid based clustering algorithm for hybrid data stream

Published: 30 October 2018

Volume 49, pages 1228–1244, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jinyin Chen¹,
Xiang Lin¹,
Qi Xuan¹ &
…
Yun Xiang ORCID: orcid.org/0000-0003-1163-698X¹

753 Accesses
18 Citations
Explore all metrics

Abstract

Streaming large volumes of data has a wide range of real-world applications, e.g., video flows, internet calls, and online games etc. Thus, fast and real-time data stream processing is important. Traditionally, data clustering algorithms are efficient and effective to mine information from large data. However, they are mostly not suitable for online data stream clustering. Therefore, in this work, we propose a novel fast and grid based clustering algorithm for hybrid data stream (FGCH). Specifically, we have made the following main contributions: 1), we develop a non-uniform attenuation model to enhance the resistance to noise; 2), we propose a similarity calculation method for hybrid data, which can calculate the similarity more efficiently and accurately; and 3), we present a novel clustering center fast determination algorithm (CCFD), which can automatically determine the number, center, and radius of clusters. Our technique is compared with several state-of-art clustering algorithms. The experimental results show that our technique can achieve more than better clustering accuracy on average. Meanwhile, the running time is shorter compared with the closest algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DWDP-Stream: A Dynamic Weight and Density Peaks Clustering Algorithm for Data Stream

Article Open access 10 November 2022

Adaptive Multiple-Resolution Stream Clustering

MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data

Article Open access 27 February 2024

References

Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer Society Press
Wu Z, Xu Q, Li J, Fu C, Qi X, Xiang Y (2018) Passive indoor localization based on CSI and naive Bayes classification. IEEE Trans Syst Man Cybern Syst 48(9):1566–1577
Article Google Scholar
Silva JA, Faria ER, Barros RC, Hruschka ER (2013) Data stream clustering: a survey. Acm Comput Surv 46(1):13
Article MATH Google Scholar
Chen JY, He HH (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Elsevier Science Inc.
Goodall DW (1966) A new similarity index based on probability. Biometrics 22(4):882–907
Article Google Scholar
Fu C, Zhao M, Lu F, Chen X, Chen J, Wu Z, Xia Y, Xuan Q (2018) Link weight prediction using supervised learning methods and its application to yelp layered network. IEEE Trans Knowl Data Eng 30 (8):1507–1518
Article Google Scholar
Qi X, Fang B, Yi L, Wang J, Zhang J, Zheng Y, Bao G (2018) Automatic pearl classification machine based on a multistream convolutional neural network. IEEE Trans Ind Electron 65(8):6538–6547
Article Google Scholar
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255
Article Google Scholar
Xuan Q, Zhang ZY, Fu C, Hu HX, Filkov V (2018) Social synchrony on complex networks. IEEE Trans Cybern 48(5):1420–1431
Article Google Scholar
Hassanien AE, Azar AT, Snasael V, Kacprzyk J, Abawajy JH (2015) Big data in complex systems. Springer International Publishing
Xiang Y, Tang Y, Zhu W (2016) Mobile sensor network noise reduction and re-calibration using Bayesian network. Atmosp. Measur. Techn. 9(9):347–357
Article Google Scholar
Chen JY, He HH (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Elsevier Science Inc.
Wang S, Fan Y, Zhang C, Xu H, Hao X, Hu Y (2008) Entropy based clustering of data streams with mixed numeric and categorical values
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values, pp 21–34
Ji J, Pang W, Zhou C, Han X, Wang Z (2013) Corrigendum: corrigendum to ’a fuzzy k-prototype clustering algorithm for mixed numeric and categorical data’ [knowledge-based systems, 30 (2012) 129-135]. Neurocomputing 120(10):590–596
Article Google Scholar
Gath I, Geva A (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Machine Intell 11(7):773–780
Article MATH Google Scholar
Chatzis SP (2011) A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Pergamon Press, Inc.
Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data. IEEE Trans Knowl Data Eng 14(4):673–690
Article Google Scholar
Hsu CC, Yu CC (2007) Mining of mixed data with application to catalog marketing. Expert Syst Appl 32 (1):12–23
Article MathSciNet Google Scholar
Ryu TW, Eick F (1998) Similarity measures for multi-valued attributes for database clustering. In: Conf on smart engineering system design: neural networks, fuzzy logic, evolutionary programming, data mining & rough sets, pp 1–4
Chavent M, De Carvalho F, Lechevallier Y, Verde R (2006) New clustering methods for interval data. Comput Stat 21(2):211–229
Article MathSciNet MATH Google Scholar
Chen M, Li L, Bo W, Cheng J, Pan L, Chen X (2016) Effectively clustering by finding density backbone based-on k nn. Pattern Recogn 60:486–498
Article Google Scholar
De Andrade Silva J, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238
Article Google Scholar
Rodriguez A, Laio A (2014) Machine learning. Clustering by fast search and find of density peaks. Science 344(6191):1492
Article Google Scholar
Liadan O, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for high-quality clustering. In: International conference on data engineering, 2002. Proceedings, pp 685–694
Aggarwal C C, Yu PS, Han J, Wang J (2003) A framework for clustering evolving data streams. In: Proceedings 2003 VLDB conference, pp 81–92. Elsevier
Aggarwal CC, Han J, Wang J, Philip S (2004) A framework for projected clustering of high dimensional data streams. In: Thirtieth International conference on very large data bases, pp 852–863
Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Siam International conference on data mining, April 20-22, 2006, Bethesda, MD, USA, pp 328–339
Kremer H, Kranen P, Jansen T, Seidl T, Bifet A, Holmes G, Pfahringer B (2011) An effective evaluation measure for clustering on evolving data streams. In: ACM SIGKDD International conference on knowledge discovery and data mining, pp 868–876
Hyde R, Angelov P, Mackenzie AR (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382C383:96–114
Article Google Scholar
Bodyanskiy YV, Tyshchenko OK, Kopaliani DS (2017) An evolving connectionist system for data stream fuzzy clustering and its online learning. Neurocomputing
Blei D (2006) Variational inference for Dirichlet process mixtures. J Bayesian Anal 1(1):121–143
Article MathSciNet MATH Google Scholar
Huynh V, Phung D (2017) Streaming clustering with Bayesian nonparametric models. Neurocomputing, 258
Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41(1):127–152
Article Google Scholar
Gomes HM, Gomes HM (2015) Sncstream: a social network-based data stream clustering algorithm. In: ACM Symposium on applied computing, pp 935–940
Barddal JP, Gomes HM, Enembreck F, Barths̈ JP (2016) Sncstream +: extending a high quality true anytime data stream clustering algorithm. Inf Syst 62:60–73
Article Google Scholar
Xu J, Wang G, Li T, Deng W, Gou G (2016) Fat node leading tree for data stream clustering with density peaks. Knowl-Based Syst 120:99–117
Article Google Scholar
Han D, Giraud-Carrier C, Li S (2015) Efficient mining of high-speed uncertain data streams. Kluwer Academic Publishers
Sang CY, Di HS (2014) Co-clustering over multiple dynamic data streams based on non-negative matrix factorization. Appl Intell 41(2):487–502
Article Google Scholar
Yi W, Li T (2018) Improving semi-supervised co-forest algorithm in evolving data streams. Appl Intell 4:1–15
Google Scholar
Zheng Z, Gong M, Ma J, Jiao L (2010) Unsupervised evolutionary clustering algorithm for mixed type data. In: Evolutionary computation, pp 1–8
Ji J, Bai T, Zhou C, Ma C, Wang Z (2013) An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120:590–596
Article Google Scholar
David G, Averbuch A (2012) Spectralcat: categorical spectral clustering of numerical and nominal data. Pattern Recogn 45(1):416–433
Article MATH Google Scholar
Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. Research Issues Data Mining Knowl Discov, 1–8
Chen JY, He HH (2015) Research on density-based clustering algorithm for mixed data with determine cluster centers automatically. Acta Automatica Sinica
Chen JY, He HH (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inform Sci 345(C):271–293
Article Google Scholar
Zhang X, Furtlehner C, Sebag M (2008) Data streaming with affinity propagation. Lect Notes Comput Sci 5212:628–643
Article Google Scholar
Zhang JP, Chen FC, Li SM, Liu LX (2011) Data stream clustering algorithm based on density and affinity propagation techniques. Zidonghua Xuebao/acta Automatica Sinica 40(2):277–288
MATH Google Scholar

Download references

Author information

Authors and Affiliations

The College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
Jinyin Chen, Xiang Lin, Qi Xuan & Yun Xiang

Authors

Jinyin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Qi Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Yun Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Xiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Lin, X., Xuan, Q. et al. FGCH: a fast and grid based clustering algorithm for hybrid data stream. Appl Intell 49, 1228–1244 (2019). https://doi.org/10.1007/s10489-018-1324-x

Download citation

Published: 30 October 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10489-018-1324-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FGCH: a fast and grid based clustering algorithm for hybrid data stream

Abstract

Access this article

Similar content being viewed by others

DWDP-Stream: A Dynamic Weight and Density Peaks Clustering Algorithm for Data Stream

Adaptive Multiple-Resolution Stream Clustering

MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FGCH: a fast and grid based clustering algorithm for hybrid data stream

Abstract

Access this article

Similar content being viewed by others

DWDP-Stream: A Dynamic Weight and Density Peaks Clustering Algorithm for Data Stream

Adaptive Multiple-Resolution Stream Clustering

MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation