ABSTRACT
Numerous internet-based applications produce data streams. A data stream is a succession of available data that may shift over time. Data from the Internet of Things (IoT), social media, traffic lights, financial institutions, phone records, sensor data, banking and healthcare systems are examples of data streams. Obtaining knowledge from data streams presents defiances. The noise reduction process is one of them. Detecting and reducing noise is essential to improve the performance of any machine intelligence technique. In this paper, we create a performance evaluation of four different noise detection algorithms on data stream clustering that are implemented in MOA: Micro-cluster-based Continuous Outlier Detection (MCOD), AbstractC, SimpleCOD and AnyOut. We use each of these techniques to assess the quality of clustering produced by the clustering algorithm known as ClusCTA-MEWMA (Clustering based on Centroid Tracking and Exponentially Weighted Moving Average Chart Detection Method). We reference this algorithm as CM. We set up and monitor experiments using datasets created using a random data generator. The results evidence that CM gets better model quality with with Micro-cluster-based Continuous Outlier Detection algorithm.
- J. Gama, Knowledge Discovery from Data Streams, 1st ed., Chapman & Hall/CRC, 2010.Google ScholarCross Ref
- Jaramillo, Londoño and Cardona, Rastreo de centroides en streams de datos, [Online]. Available: http://www.revistaespacios.com/a17v38n59/17385927.html, 2108. [Accessed 11 marzo 2019]Google Scholar
- S. Jaramillo, J. Londoño and S. A. Cardona, [Online]. Available: Una técnica de clustering con rastreo de centroides y MEWMA. http://www.revistaespacios.com/a17v38n59/17385927.html, 2108. [Accessed 11 marzo 2019].Google Scholar
- Toliopoulos and Gounaris. Multi-parameter streaming outlier detection. WI '19, 2019.Google Scholar
- Oh and Iyengar. Sequential Anomaly Detection using Inverse Reinforcement Learning. KDD '19, 2019.Google Scholar
- Cheng, Zou and Dong. F Outlier detection using isolation forest and local outlier factor. RACS '19, 2019.Google ScholarDigital Library
- Palyvos-Giannas, Gulisano and Papatriantafilou. Haren: A Middleware for Ad-Hoc Thread Scheduling Policies in Data Streaming. Middleware '19, 2019Google Scholar
- Friggstad, Khodamoradi, Rezapour and SalavatipourApproximation Schemes for Clustering with Outliers. TALG, 2019Google Scholar
- G. Sunitha, C. Jaswitha. Data Stream Clustering Algorithms. IJITEE, Vol 8 (11), 2019traGoogle Scholar
- Ma, Wang, Zhao and Dian. Weighted LS-SVMR-Based System Identification with Outliers. CACRE2019, 2019.Google Scholar
- Carnein M and Trautmann H (2019), "Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms", Business & Information Systems Engineering (BISE)., 1, 2019. Vol. 61, pp. 277--297.Google Scholar
- Ndirangu, Mwangi and Nderu. An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification. CSCA '19 Proceedings, 2019.Google Scholar
- Tran, Fan and Shahabi. Outlier Detection in Non-stationary Data Streams. SDBM '19 Proceedings, 2019.Google Scholar
- D. Yang, E. A. Rundensteiner and M. O. Ward, Neighbor-based Pattern Detection for Windows over Streaming Data, in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, New York, NY, USA, 2009. Google ScholarDigital Library
- AMiner Efficient Distributed Outlier Detectionin Data Streams Available: https://hal.archives-ouvertes.fr/hal-02100486/document. [Accessed 04 december 2019].Google Scholar
- Cano, Krawczyk and Roseberry. Multi-Label Punitive kNN with Self-Adjusting Memory for Drifting Data Streams. CM Transactions on Knowledge Discovery from Data (TKDD). Volume 13 Issue 6, November 2019Google Scholar
- Benjello, Oussousa, Bennanib, Belfkiha, Ait and Lahcen, Improving outliers detection in data streams using LiCS and voting. Journal of King Saud University - Computer and Information Sciences, 2019.Google Scholar
- Yang, Zhou, Shu and Zhang. A Fast and Efficient Local Outlier Detection in Data Streams. IVSP, 2019Google ScholarDigital Library
- Manzoor, Lamba and Akoglu. xStream: Outlier Detection in Feature-Evolving Data Streams. KDD '18, 2018.Google ScholarDigital Library
- L. Cao, Outlier Detection in Big Data, in Doctoral dissertation, Worcester Polytechnic Institute, 2016.Google Scholar
- Friggstad, Khodamoradi, Rezapour and Salavatipour. Approximation Schemes for Clustering with Outliers. TALG, 2019.Google ScholarDigital Library
- Yoon, Lee and Lee. NETS: extremely fast outlier detection from a data stream via set-based processing. VLDB Endowment, 2019.Google Scholar
- Waikato, Stream Clustering, 2015. [Online]. Available: https://moa.cms.waikato.ac.nz/details/stream-clustering/. [Accessed 04 december 2019].Google Scholar
- S. Jaramillo, Centroid tracking for data streams with concept drift, Universidad Autonoma de Bucaramanga, vol. Vol.38(38) 2017.Google Scholar
- R. Kirkby, RandomRBFGeneratorEvents, Available: https://rdrr.io/cran/streamMOA/man/DSD_RandomRBFGeneratorEvents.html. [Accessed 04 december 2019].Google Scholar
- Ceccarello, Pietracaprina and Pucci. Solving k-center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially. VLDB Endowment. 12 (7), 2019Google Scholar
- Pang, Shen and Hengel. Deep Anomaly Detection with Deviation Networks. KDD '19, 2019.Google Scholar
Index Terms
- Benchmarking performance of different noise detection techniques on data stream clustering
Recommendations
Data stream clustering: A survey
Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised ...
Density-Based Clustering Method for Trends Analysis Using Evolving Data Stream
Evolution of data in the data stream environment generates patterns at different time instances. The cluster formation changes with respect to time because of the behaviour and members of clusters. Data stream clustering (DSC) allows us to investigate ...
Data stream clustering: a review
AbstractNumber of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-...
Comments