skip to main content
10.1145/3401895.3401898acmotherconferencesArticle/Chapter ViewAbstractPublication Pageseatis-orgConference Proceedingsconference-collections
research-article

Benchmarking performance of different noise detection techniques on data stream clustering

Published:29 January 2021Publication History

ABSTRACT

Numerous internet-based applications produce data streams. A data stream is a succession of available data that may shift over time. Data from the Internet of Things (IoT), social media, traffic lights, financial institutions, phone records, sensor data, banking and healthcare systems are examples of data streams. Obtaining knowledge from data streams presents defiances. The noise reduction process is one of them. Detecting and reducing noise is essential to improve the performance of any machine intelligence technique. In this paper, we create a performance evaluation of four different noise detection algorithms on data stream clustering that are implemented in MOA: Micro-cluster-based Continuous Outlier Detection (MCOD), AbstractC, SimpleCOD and AnyOut. We use each of these techniques to assess the quality of clustering produced by the clustering algorithm known as ClusCTA-MEWMA (Clustering based on Centroid Tracking and Exponentially Weighted Moving Average Chart Detection Method). We reference this algorithm as CM. We set up and monitor experiments using datasets created using a random data generator. The results evidence that CM gets better model quality with with Micro-cluster-based Continuous Outlier Detection algorithm.

References

  1. J. Gama, Knowledge Discovery from Data Streams, 1st ed., Chapman & Hall/CRC, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jaramillo, Londoño and Cardona, Rastreo de centroides en streams de datos, [Online]. Available: http://www.revistaespacios.com/a17v38n59/17385927.html, 2108. [Accessed 11 marzo 2019]Google ScholarGoogle Scholar
  3. S. Jaramillo, J. Londoño and S. A. Cardona, [Online]. Available: Una técnica de clustering con rastreo de centroides y MEWMA. http://www.revistaespacios.com/a17v38n59/17385927.html, 2108. [Accessed 11 marzo 2019].Google ScholarGoogle Scholar
  4. Toliopoulos and Gounaris. Multi-parameter streaming outlier detection. WI '19, 2019.Google ScholarGoogle Scholar
  5. Oh and Iyengar. Sequential Anomaly Detection using Inverse Reinforcement Learning. KDD '19, 2019.Google ScholarGoogle Scholar
  6. Cheng, Zou and Dong. F Outlier detection using isolation forest and local outlier factor. RACS '19, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Palyvos-Giannas, Gulisano and Papatriantafilou. Haren: A Middleware for Ad-Hoc Thread Scheduling Policies in Data Streaming. Middleware '19, 2019Google ScholarGoogle Scholar
  8. Friggstad, Khodamoradi, Rezapour and SalavatipourApproximation Schemes for Clustering with Outliers. TALG, 2019Google ScholarGoogle Scholar
  9. G. Sunitha, C. Jaswitha. Data Stream Clustering Algorithms. IJITEE, Vol 8 (11), 2019traGoogle ScholarGoogle Scholar
  10. Ma, Wang, Zhao and Dian. Weighted LS-SVMR-Based System Identification with Outliers. CACRE2019, 2019.Google ScholarGoogle Scholar
  11. Carnein M and Trautmann H (2019), "Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms", Business & Information Systems Engineering (BISE)., 1, 2019. Vol. 61, pp. 277--297.Google ScholarGoogle Scholar
  12. Ndirangu, Mwangi and Nderu. An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification. CSCA '19 Proceedings, 2019.Google ScholarGoogle Scholar
  13. Tran, Fan and Shahabi. Outlier Detection in Non-stationary Data Streams. SDBM '19 Proceedings, 2019.Google ScholarGoogle Scholar
  14. D. Yang, E. A. Rundensteiner and M. O. Ward, Neighbor-based Pattern Detection for Windows over Streaming Data, in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, New York, NY, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. AMiner Efficient Distributed Outlier Detectionin Data Streams Available: https://hal.archives-ouvertes.fr/hal-02100486/document. [Accessed 04 december 2019].Google ScholarGoogle Scholar
  16. Cano, Krawczyk and Roseberry. Multi-Label Punitive kNN with Self-Adjusting Memory for Drifting Data Streams. CM Transactions on Knowledge Discovery from Data (TKDD). Volume 13 Issue 6, November 2019Google ScholarGoogle Scholar
  17. Benjello, Oussousa, Bennanib, Belfkiha, Ait and Lahcen, Improving outliers detection in data streams using LiCS and voting. Journal of King Saud University - Computer and Information Sciences, 2019.Google ScholarGoogle Scholar
  18. Yang, Zhou, Shu and Zhang. A Fast and Efficient Local Outlier Detection in Data Streams. IVSP, 2019Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Manzoor, Lamba and Akoglu. xStream: Outlier Detection in Feature-Evolving Data Streams. KDD '18, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Cao, Outlier Detection in Big Data, in Doctoral dissertation, Worcester Polytechnic Institute, 2016.Google ScholarGoogle Scholar
  21. Friggstad, Khodamoradi, Rezapour and Salavatipour. Approximation Schemes for Clustering with Outliers. TALG, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yoon, Lee and Lee. NETS: extremely fast outlier detection from a data stream via set-based processing. VLDB Endowment, 2019.Google ScholarGoogle Scholar
  23. Waikato, Stream Clustering, 2015. [Online]. Available: https://moa.cms.waikato.ac.nz/details/stream-clustering/. [Accessed 04 december 2019].Google ScholarGoogle Scholar
  24. S. Jaramillo, Centroid tracking for data streams with concept drift, Universidad Autonoma de Bucaramanga, vol. Vol.38(38) 2017.Google ScholarGoogle Scholar
  25. R. Kirkby, RandomRBFGeneratorEvents, Available: https://rdrr.io/cran/streamMOA/man/DSD_RandomRBFGeneratorEvents.html. [Accessed 04 december 2019].Google ScholarGoogle Scholar
  26. Ceccarello, Pietracaprina and Pucci. Solving k-center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially. VLDB Endowment. 12 (7), 2019Google ScholarGoogle Scholar
  27. Pang, Shen and Hengel. Deep Anomaly Detection with Deviation Networks. KDD '19, 2019.Google ScholarGoogle Scholar

Index Terms

  1. Benchmarking performance of different noise detection techniques on data stream clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EATIS '20: Proceedings of the 10th Euro-American Conference on Telematics and Information Systems
      November 2020
      388 pages
      ISBN:9781450377119
      DOI:10.1145/3401895

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 January 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate17of64submissions,27%
    • Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader