skip to main content
10.1145/3372454.3372481acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdrConference Proceedingsconference-collections
research-article

Online Embedding and Clustering of Data Streams

Authors Info & Claims
Published:21 January 2020Publication History

ABSTRACT

Number of connected devices is steadily increasing and these devices continuously generate data streams. These data streams are often high dimensional and contain concept drift. Real-time processing of data streams is arousing interest despite many challenges. Clustering is a method that does not need labeled instances (it is unsupervised) and it can be applied with less prior information about the data. These properties make clustering one of the most suitable methods for real-time data stream processing. Moreover, data embedding is a process that may simplify clustering and makes visualization of high dimensional data possible. There exist several data stream clustering algorithms in the literature, however no data stream embedding method exists. UMAP is a data embedding algorithm that is suitable to be applied on data streams, but it cannot adopt concept drift. In this study, we have developed a new method to apply UMAP on data streams, adopt concept drift and cluster embedded data instances using any distance based clustering algorithms.

References

  1. Furqan Alam, Rashid Mehmood, Iyad Katib, and Aiiad Albeshri. 2016. Analysis of Eight Data Mining Algorithms for Smarter Internet of Things (IoT). Procedia Computer Science 98 (2016), 437--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amineh Amini, Hadi Saboohi, Tutut Herawan, and Teh Ying Wah. 2016. MuDi-Stream: A multi density clustering algorithm for evolving data stream. J. Netw. Comput. Appl. 59, C (Jan. 2016), 370--385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jonathan de Andrade Silva, Eduardo Raul Hruschka, and João Gama. 2017. An Evolutionary Algorithm for Clustering Data Streams with a Variable Number of Clusters. Expert Syst. Appl. 67, C (Jan. 2017), 228--238.Google ScholarGoogle Scholar
  4. Matthias Carnein, Dennis Assenmacher, and Heike Trautmann. 2017. An Empirical Comparison of Stream Clustering Algorithms. In Proceedings of the Computing Frontiers Conference (CF'17). 361--366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shifei Ding, Fulin Wu, Jun Qian, Hongjie Jia, and Fengxiang Jin. 2015. Research on Data Stream Clustering Algorithms. Artif. Intell. Rev. 43, 4 (April 2015), 593--600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  7. C. Fahy, S. Yang, and M. Gongora. 2018. Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams. IEEE Transactions on Cybernetics (2018), 1--14.Google ScholarGoogle Scholar
  8. Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. 2016. State-of-the-art on clustering data streams. Big Data Analytics 1, 1 (01 Dec 2016), 13.Google ScholarGoogle Scholar
  9. Michael Hahsler, Matthew Bolanos, and John Forrest. 2015. stream-MOA: Interface for MOA Stream Clustering Algorithms. https://CRAN. R-project.org/package=streamMOA R package version 1. 1--2.Google ScholarGoogle Scholar
  10. Marwan Hassani, Pascal Spaus, Alfredo Cuzzocrea, and Thomas Seidl. 2015. Adaptive Stream Clustering Using Incremental Graph Maintenance. In Proceedings of the 4th International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications - Volume 41 (BIGMINE'15). 49--64.Google ScholarGoogle Scholar
  11. M. Hassani, P. Spaus, A. Cuzzocrea, and T. Seidl. 2016. I-HASTREAM: Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 656--665.Google ScholarGoogle Scholar
  12. Richard Hyde, Plamen Angelov, and A.R. MacKenzie. 2017. Fully online clustering of evolving data streams into arbitrarily shaped clusters. Information Sciences 382-383 (2017), 96--114.Google ScholarGoogle Scholar
  13. Praveen Kumar. 2016. Data Stream Clustering in Internet of Things. SSRG International Journal of Computer Science and Engineering 3, 8 (2016).Google ScholarGoogle Scholar
  14. Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. (2010). http://yann.lecun.com/exdb/mnist/Google ScholarGoogle Scholar
  15. L. McInnes, J. Healy, and J. Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (Feb. 2018). arXiv:stat.ML/1802.03426Google ScholarGoogle Scholar
  16. Maryam Mousavi, Azuraliza Abu Bakar, and Mohammadmahdi Vakilian. 2015. Data stream clustering algorithms: A review. International Journal of Advances in Soft Computing and its Applications 7 (2015), 1--15.Google ScholarGoogle Scholar
  17. Hai-Long Nguyen, Yew-Kwong Woon, and Wee-Keong Ng. 2015. A survey on data stream clustering and classification. Knowledge and Information Systems 45, 3 (01 Dec 2015), 535--569.Google ScholarGoogle Scholar
  18. D. Puschmann, P. Barnaghi, and R. Tafazolli. 2017. Adaptive Clustering for Dynamic IoT Data Streams. IEEE Internet of Things Journal 4, 1 (Feb 2017), 64--74.Google ScholarGoogle ScholarCross RefCross Ref
  19. R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/Google ScholarGoogle Scholar
  20. Sergio Ramirez-Gallego, Bartosz Krawczyk, Salvador Garcia, Michal Wozniak, and Francisco Herrera. 2017. A survey on Data Preprocessing for Data Stream Mining: Current status and future directions. Neurocomputing 239 (02 2017).Google ScholarGoogle Scholar
  21. Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, André C. P. L. F. de Carvalho, and João Gama. 2013. Data Stream Clustering: A Survey. ACM Comput. Surv. 46, 1 (July 2013), 13:1--13:31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605. http://www.jmlr.org/papers/v9/vandermaaten08a.htmlGoogle ScholarGoogle Scholar
  23. Keiichi Yasumoto, Hirozumi Yamaguchi, and Hiroshi Shigeno. 2016. Survey of Real-time Processing Technologies of IoT Data Streams. Journal of Information Processing 24, 2 (2016), 195--202.Google ScholarGoogle ScholarCross RefCross Ref
  24. Chunyong Yin, Lian Xia, Sun Zhang, Ruxia Sun, and Jin Wang. 2017. Improved clustering algorithm based on high-speed network data stream. Soft Computing (11 Jul 2017).Google ScholarGoogle Scholar
  25. Kai-Song Zhang, Luo Zhong, Lan Tian, Xuan-Ya Zhang, and Lin Li. 2017. DBIECM-an Evolving Clustering Method for Streaming Data Clustering. Amse Journals-Amse Iieta 60, 1 (2017), 239--254.Google ScholarGoogle Scholar

Index Terms

  1. Online Embedding and Clustering of Data Streams

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICBDR '19: Proceedings of the 3rd International Conference on Big Data Research
              November 2019
              192 pages
              ISBN:9781450372015
              DOI:10.1145/3372454

              Copyright © 2019 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 21 January 2020

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader