ABSTRACT
Number of connected devices is steadily increasing and these devices continuously generate data streams. These data streams are often high dimensional and contain concept drift. Real-time processing of data streams is arousing interest despite many challenges. Clustering is a method that does not need labeled instances (it is unsupervised) and it can be applied with less prior information about the data. These properties make clustering one of the most suitable methods for real-time data stream processing. Moreover, data embedding is a process that may simplify clustering and makes visualization of high dimensional data possible. There exist several data stream clustering algorithms in the literature, however no data stream embedding method exists. UMAP is a data embedding algorithm that is suitable to be applied on data streams, but it cannot adopt concept drift. In this study, we have developed a new method to apply UMAP on data streams, adopt concept drift and cluster embedded data instances using any distance based clustering algorithms.
- Furqan Alam, Rashid Mehmood, Iyad Katib, and Aiiad Albeshri. 2016. Analysis of Eight Data Mining Algorithms for Smarter Internet of Things (IoT). Procedia Computer Science 98 (2016), 437--442.Google ScholarDigital Library
- Amineh Amini, Hadi Saboohi, Tutut Herawan, and Teh Ying Wah. 2016. MuDi-Stream: A multi density clustering algorithm for evolving data stream. J. Netw. Comput. Appl. 59, C (Jan. 2016), 370--385.Google ScholarDigital Library
- Jonathan de Andrade Silva, Eduardo Raul Hruschka, and João Gama. 2017. An Evolutionary Algorithm for Clustering Data Streams with a Variable Number of Clusters. Expert Syst. Appl. 67, C (Jan. 2017), 228--238.Google Scholar
- Matthias Carnein, Dennis Assenmacher, and Heike Trautmann. 2017. An Empirical Comparison of Stream Clustering Algorithms. In Proceedings of the Computing Frontiers Conference (CF'17). 361--366.Google ScholarDigital Library
- Shifei Ding, Fulin Wu, Jun Qian, Hongjie Jia, and Fengxiang Jin. 2015. Research on Data Stream Clustering Algorithms. Artif. Intell. Rev. 43, 4 (April 2015), 593--600.Google ScholarDigital Library
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- C. Fahy, S. Yang, and M. Gongora. 2018. Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams. IEEE Transactions on Cybernetics (2018), 1--14.Google Scholar
- Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. 2016. State-of-the-art on clustering data streams. Big Data Analytics 1, 1 (01 Dec 2016), 13.Google Scholar
- Michael Hahsler, Matthew Bolanos, and John Forrest. 2015. stream-MOA: Interface for MOA Stream Clustering Algorithms. https://CRAN. R-project.org/package=streamMOA R package version 1. 1--2.Google Scholar
- Marwan Hassani, Pascal Spaus, Alfredo Cuzzocrea, and Thomas Seidl. 2015. Adaptive Stream Clustering Using Incremental Graph Maintenance. In Proceedings of the 4th International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications - Volume 41 (BIGMINE'15). 49--64.Google Scholar
- M. Hassani, P. Spaus, A. Cuzzocrea, and T. Seidl. 2016. I-HASTREAM: Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 656--665.Google Scholar
- Richard Hyde, Plamen Angelov, and A.R. MacKenzie. 2017. Fully online clustering of evolving data streams into arbitrarily shaped clusters. Information Sciences 382-383 (2017), 96--114.Google Scholar
- Praveen Kumar. 2016. Data Stream Clustering in Internet of Things. SSRG International Journal of Computer Science and Engineering 3, 8 (2016).Google Scholar
- Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. (2010). http://yann.lecun.com/exdb/mnist/Google Scholar
- L. McInnes, J. Healy, and J. Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (Feb. 2018). arXiv:stat.ML/1802.03426Google Scholar
- Maryam Mousavi, Azuraliza Abu Bakar, and Mohammadmahdi Vakilian. 2015. Data stream clustering algorithms: A review. International Journal of Advances in Soft Computing and its Applications 7 (2015), 1--15.Google Scholar
- Hai-Long Nguyen, Yew-Kwong Woon, and Wee-Keong Ng. 2015. A survey on data stream clustering and classification. Knowledge and Information Systems 45, 3 (01 Dec 2015), 535--569.Google Scholar
- D. Puschmann, P. Barnaghi, and R. Tafazolli. 2017. Adaptive Clustering for Dynamic IoT Data Streams. IEEE Internet of Things Journal 4, 1 (Feb 2017), 64--74.Google ScholarCross Ref
- R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/Google Scholar
- Sergio Ramirez-Gallego, Bartosz Krawczyk, Salvador Garcia, Michal Wozniak, and Francisco Herrera. 2017. A survey on Data Preprocessing for Data Stream Mining: Current status and future directions. Neurocomputing 239 (02 2017).Google Scholar
- Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, André C. P. L. F. de Carvalho, and João Gama. 2013. Data Stream Clustering: A Survey. ACM Comput. Surv. 46, 1 (July 2013), 13:1--13:31.Google ScholarDigital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605. http://www.jmlr.org/papers/v9/vandermaaten08a.htmlGoogle Scholar
- Keiichi Yasumoto, Hirozumi Yamaguchi, and Hiroshi Shigeno. 2016. Survey of Real-time Processing Technologies of IoT Data Streams. Journal of Information Processing 24, 2 (2016), 195--202.Google ScholarCross Ref
- Chunyong Yin, Lian Xia, Sun Zhang, Ruxia Sun, and Jin Wang. 2017. Improved clustering algorithm based on high-speed network data stream. Soft Computing (11 Jul 2017).Google Scholar
- Kai-Song Zhang, Luo Zhong, Lan Tian, Xuan-Ya Zhang, and Lin Li. 2017. DBIECM-an Evolving Clustering Method for Streaming Data Clustering. Amse Journals-Amse Iieta 60, 1 (2017), 239--254.Google Scholar
Index Terms
- Online Embedding and Clustering of Data Streams
Recommendations
Data stream clustering: A survey
Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised ...
Data stream clustering: a review
AbstractNumber of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-...
Clustering data streams
FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer ScienceWe study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model ...
Comments