Abstract
Data stream clustering aims to produce clusters from a data-stream in a real-time. Many of existing algorithms focus however on solving a single problem, leaving anomalous noise in data streams at the wayside. This paper describes the MicroGRID approach to cluster data from single data-streams to handle noisy data streams, accurately identifying and separating noise-affected data points from outlier points. In particular, MicroGRID utilises a combination of micro-cluster and grid-based prospectives, an approach that has not been attempted when clustering data-streams. The experimental results clearly show that MicroGRID significantly outperforms the baseline methods: MicroGRID is up 87% faster and up to 80% more accurate clustering outputs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 852–863 (2004)
Aggarwal, C.C., Yu, P.S.: A framework for clustering uncertain data streams. In: 24th Proceedings of the IEEE International Conference on Data Engineering, pp. 150–159 (2008)
Al Aghbari, Z., Kamel, I., Awad, T.: On clustering large number of data streams. Intell. Data Anal. 16(1), 69–91 (2012)
Amini, A., Wah, T.Y., Saybani, M.R., Yazdi, S.R.: A study of density-grid based clustering algorithms on data streams. In: Proceedings of the 8th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1652–1656 (2011)
Amini, A., Saboohi, H., Herawan, T., Wah, T.Y.: Mudi-stream: s multi density clustering algorithm for evolving data stream. J. Netw. Comput. Appl. 59, 370–385 (2016)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM International Conference on Data Mining, vol. 6, pp. 328–339 (2006)
Chen, L., Zou, L.J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inf. Sci. 183(1), 35–47 (2012)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2007)
Ciampi, A., Appice, A., Malerba, D.: Summarization for geographically distributed data streams. In: Proceedings of Knowledge-Based and Intelligent Information and Engineering Systems, pp. 339–348 (2010)
de Andrade Silva, J., Hruschka, E.R.: Extending k-means-based algorithms for evolving data streams with variable number of clusters. In: Proceedings of the 10th International Conference on Machine Learning and Applications, pp. 14–19 (2011)
Hahsler, M., Bolaos, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28, 1449–1461 (2016)
Huang, G., Zhang, Y., Cao, J., Steyn, M., Taraporewalla, K.: Online mining abnormal period patterns from multiple medical sensor data streams. World Wide Web 17(4), 569–587 (2014)
Liu, W., and J. OuYang. Clustering algorithm for high dimensional data stream over sliding windows. In: Proceedings of 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 1537–1542 (2011)
Qi, Z., Jinze, L., Wei, W.: Approximate clustering on distributed data streams, pp. 1131–1139 (2008)
Sabit, H., Al-Anbuky, A., Gholam-Hosseini, H.: Distributed WSN data stream mining based on fuzzy clustering. In: Proceedings of Symposia on Ubiquitous, Autonomic and Trusted Computing, pp. 395–400 (2009)
Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25(6), 1410–1424 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Tari, Z., Thompson, A., Almusalam, N., Bertok, P., Mahmood, A. (2018). MicroGRID: An Accurate and Efficient Real-Time Stream Data Clustering with Noise. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-93037-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)