ABSTRACT
Data stream mining of IoT data can help operators immediately isolate causes of equipment alarms. The challenge, however, is how to keep the classifiers high-purity (i.e., keep data of the same class in the right cluster) while dealing with the concept drifting ascribed to differences between alarm models and entities. We propose continuously revising the classification model in accordance with the data distribution and trend changes. Evaluations showed there was no purity deterioration for oscillation condition data with a drifting rate of 1%. This result demonstrates that our approach can help operators improve their decision making.
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. 2003. A Framework for Clustering Evolving Data Streams. In Proceedings of Conference on Very Large Data Bases (VLDB '03). Berlin, Germany, 81--92.Google Scholar
- M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander. 1999. OPTICS: Ordering Points to Identify the Clustering Structure. In Proceedings on ACM Conferrence on Management of Data. Pennsylvania, USA, 49--60.Google Scholar
- D. Arthur and S. Vassilvitskii. 2007. K-means+ +: The Advantages of Careful Seeding. In Proceedings of ACM-SIAM symposium on Discrete algorithms. New Orleans, Louisiana, USA, 1027--1035.Google Scholar
- J. L. Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Communication of the ACM 18, 9 (Sept. 1975), 509--517.Google ScholarDigital Library
- F. Cao, M. Estert, W. Qian, and A. Zhou. 2006. Density-Based Clustering over an Evolving Data Stream with Noise. In Proceedings of SIAM Conference on Data Mining. Maryland, USA, 325--377.Google Scholar
- W. Colglazier. 2015. Sustainable Development Agenda: 2030. Science 349, 6252 (2015), 1048--1050.Google Scholar
- M. Deepa, P. Ravanthy, and P. Student. 2012. Validation of Document Clustering based on Purity and Entropy measures. International Journal of Advanced Research in Computer and Communication Engineering 1, 3 (2012), 147--152.Google Scholar
- M. Ester, H. P. Kriegel, J. Sander, M. Wimmer, and X. Xu. 1998. Incremental Clustering for Mining in a Data Warehousing Environment. In Proceedings of Conference on Very Large Data Bases. California, USA, 323--333.Google Scholar
- M. Ester, H. P. Kriegel, J. Sander, and X. Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of ACM Conference on Knowledge Discovery and Data Mining. Oregon, USA, 226--231.Google Scholar
- J. Gama, I. Žliobaitė, A. Bifet, and M. Pechenizkiy. 2014. A Survey on Concept Drift Adaptation. ACM Journal of Computing Surveys 46, 4 (2014), 1--37.Google ScholarDigital Library
- J. Gan and Y. Tao. 2015. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation. In Proceedings of ACM Conference on Management of Data. Victoria, Australia, 519--530.Google Scholar
- K. Iwano, Y. Kimura, Y. Takashima, S. Bannai, and N. Yamada. 2017. Future Services & Societal Systems in Society 5.0. Workshop Report (2017).Google Scholar
- Janardan and S. Mehta. 2017. Concept drift in Streaming Data Classification: Algorithms, Platforms and Issues. Procedia Computer Science 122 (2017), 804--811.Google ScholarCross Ref
- J. Leskovec, A. Rajaraman, and J. D. Ullman. 2014. Mining of Massive Datasets (2nd ed.). Cambridge University Press, England.Google Scholar
- D. Puschmann, P. Barnaghi, and R. Tafazolli. 2017. Adaptive Clustering for Dynamic IoT Data Streams. IEEE Internet of Things Journal 4, 1 (2017), 64--74.Google ScholarCross Ref
- C.J. Rhodes. 2016. The 2015 Paris Climate Change Conference: COP21. Science Reviews 2000 Ltd. 99, 1 (2016), 97--104.Google ScholarCross Ref
- M. Szechtman, L. A. S. Pilotto, W. W. Ping, E. Salgado, A. R. C. Dias de Carvalho, W. F. Long, F. L. Alvarado, C. L. DeMarco, and C. A. Cañizares. 1994. DC Multiinfeed Study. Technical Report. Electric Power Research Institute.Google Scholar
Index Terms
- Concept Drift Detection on Data Stream for Revising DBSCAN Cluster
Recommendations
CUSUM Based Concept Drift Detector for Data Stream Clustering
BDIOT '20: Proceedings of the 2020 4th International Conference on Big Data and Internet of ThingsThe last few decades mark an unprecedented growth in the number of applications producing high-speed data streams. Learning from such fast data streams has many inherent challenges. The dynamic change in the concept of the stream is a significant ...
Concept drift detection on stream data for revising DBSCAN
AbstractData stream mining of IoT data can support operator to immediately isolate causes of equipment alarms. The challenge, however, is to keep their classifiers high purity (the data ratio with same proper class in a cluster) with concept drifting ...
Mining Recurring Concept Drifts with Limited Labeled Streaming Data
Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real-world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring ...
Comments