Abstract
Anomaly detection, which is a method of intrusion detection, detects anomaly behaviors and protects network security. Data mining technology has been integrated to improve the performance of anomaly detection and some algorithms have been improved for anomaly detection field. We think that most data mining algorithms are analyzed on static data sets and ignore the influence of dynamic data streams. Data stream is the potentially unbounded, ordered sequence of data objects which arrive over time. The entire data objects cannot be stored and they need to be handled in one-time scanning. The data distribution of data stream may change over time and this phenomenon is called concept drift. The properties of data stream make analysis method different from the method based on data set and the analysis model is required to be updated immediately when concept drift occurs. In this paper, we summarize the characteristics of data stream, compare the difference between data stream and data set, discuss the problems of data stream mining and propose some corresponding strategies.




Similar content being viewed by others
References
Lee, W., Stolfo, S., Mok, K.: Mining audit data to build intrusion detection models. In: International conference on knowledge discovery & data mining, pp. 66–72 (1998)
Keegan, N., Ji, S.Y., Chaudhary, A., Concolato, C., Yu, B., Jeong, D.H.: A survey of cloud-based network intrusion detection analysis. Hum. Centric Comput. Inf. Sci. 6(1), 19–35 (2016)
Yin, C., Zhang, S., Xi, J., Wang, J.: An improved anonymity model for big data security based on clustering algorithm. Concurr. Comput. 29(7), 1–13 (2017)
Yin, C., Zhang, S.: Parallel implementing improved k-means applied for image retrieval and anomaly detection. Multimed Tools Appl. 76, 1–17 (2017)
Wang, G., Hao, J., Ma, J., Huang, L.: A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert Syst. Appl. 37(9), 6225–6232 (2010)
Li, L., Ye, J., Deng, F., Xiong, S., Zhong, L.: A comparison study of clustering algorithms for microblog posts. Clust. Comput. 19(3), 1333–1345 (2016)
Li, W., Li, X., Yao, M., Jiang, J., Jin, Q.: Personalized fitting recommendation based on support vector regression. Hum. Centric Comput. Inf. Sci. 5(1), 21–32 (2015)
Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. 28(7), 1646–1656 (2017)
Gu, B., Victor, S.S.: A robust regularization path algorithm for ν-support vector classification. IEEE Trans. Neural Netw. Learn. Syst. 28(5), 1241–1248 (2017)
Gu, B., Sheng, V.S., Tay, K.Y., Romano, W., Li, S.: Incremental support vector learning for ordinal regression. IEEE Trans. Neural Netw. Learn. Syst. 26(7), 1403–1416 (2015)
De la Hoz, E., de la Hoz, E., Ortiz, A., Ortega, J., Martínez-Álvarez, A.: Feature selection by multi-objective optimisation: application to network anomaly detection by hierarchical self-organising maps. Knowl Based Syst. 71, 322–338 (2014)
Yin, C., Zhang, S., Kim, K.J.: Mobile anomaly detection based on improved self-organizing maps. Mob Inf Syst. 2017, 1–9 (2017)
Ma, T., Zhang, Y., Cao, J., Shen, J., Tang, M., Tian, Y., Al-Dhelaan, A., Al-Rodhaan, M.: KDVEM: a k-degree anonymity with vertex and edge modification algorithm. Computing 97(12), 1165–1184 (2015)
Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distr. 27(9), 2546–2559 (2016)
Wang, J., Zhang, Z., Li, B., Lee, S., Sherratt, R.: An enhanced fall detection system for elderly person monitoring using consumer home networks. IEEE Trans. Consum. Electr. 60(1), 23–29 (2014)
Younghee, K., Wonyoung, K., Ungmo, K.: Mining frequent itemsets with normalized weight in continuous data streams. J. Inform. Process. Syst. 6(1), 79–90 (2010)
Fong, S., Hang, Y., Mohammed, S., Fiaidhi, J.: Stream-based biomedical classification algorithms for analyzing biosignals. J. Inform. Process. Syst. 7(4), 717 (2011)
El-Semary, A.M., Mostafa, G.H.M.: Distributed and scalable intrusion detection system based on agents and intelligent techniques. J. Inform. Process. Syst. 6(4), 481–500 (2010)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inform. Fusion. 37, 132–156 (2017)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 71–80 (2000)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod. Rec. 29(2), 1–12 (2000)
Czarnowski, I., Jędrzejowicz, P.: Ensemble online classifier based on the one-class base classifiers for mining data streams. Cybern. Syst. 46(1–2), 51–68 (2015)
Gaur, M.S., Pant, B.: Trusted and secure clustering in mobile pervasive environment. Hum. Centric Comput. Inf. Sci. 5(1), 1–17 (2015)
Guha, S., Meyerson, A., Mishra, N., Motwani, R.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)
Aggarwal, C., Yu, P., Han, J., Wang, J.: A framework for clustering evolving data streams. In: International conference on very large data bases, pp. 81–92 (2003)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: ACM SigkDD international conference on knowledge discovery & data mining, pp. 133–142 (2007)
Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239(C), 39–57 (2017)
Oh, S., Kang, S., Byun, Y., Jeong, T., Lee, W.: Anomaly intrusion detection based on clustering a data stream. In: ACIS international conference on software engineering research, management and applications, pp. 220–227 (2005)
Guerrieri, A., Montresor, A.: DS-means: distributed data stream clustering. In: International conference on parallel processing, pp. 260–271 (2012)
Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 2017, 1–10 (2017)
Yin, C., Zhang, S., Wang, J.: Improved data stream clustering algorithm for anomaly detection. Adv. Multimed. Ubiquitous Eng. 448, 620–625 (2017)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: ACM SigkDD international conference on knowledge discovery & data mining, pp. 97–106 (2001)
Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: ACM SigkDD international conference on knowledge discovery & data mining, pp. 523–528 (2003)
Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach Learn. 106(9–10), 1469–1495 (2017)
Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: How to adjust an ensemble size in stream data mining? Inform. Sci. 381, 46–54 (2017)
Silva, J., Faria, E., Barros, R., Hruschka, E.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 125–134 (2013)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM international conference on data mining, pp. 328–339 (2006)
Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: evolution-based technique for stream clustering. In: International conference on advanced data mining and applications, pp. 605–615 (2007)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44 (2014)
Laohakiat, S., Phimoltares, S., Lursinsap, C.: A clustering algorithm for stream data with LDA-based unsupervised localized dimension reduction. Inform. Sci. 381, 104–123 (2017)
Acknowledgements
This work was funded by the National Natural Science Foundation of China (61772282, 61772454, 61373134, 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). It was also funded by the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education. Professor Seungwook Min is the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Ruxia Sun declares that she has no conflict of interest. Sun Zhang declares that he has no conflict of interest. Chunyong Yin declares that he has no conflict of interest. Jin Wang declares that he has no conflict of interest. Seungwook Min declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Sun, R., Zhang, S., Yin, C. et al. Strategies for data stream mining method applied in anomaly detection. Cluster Comput 22, 399–408 (2019). https://doi.org/10.1007/s10586-018-2835-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-2835-2