Skip to main content
Log in

Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

This paper proposes a novel online streaming data anomaly detection method. By using the new method, the improved \(L_{1}\) detection neighbor region optimizes the initial hyper-grid-based anomaly detection method by decreasing the quantity of neighbor detection region, and online ensemble learning adapts to the distribution evolving characteristic of streaming data and overcomes the difficulty of obtaining the optimal hyper-grid structure. To validate the proposed method, the paper uses a real-world dataset and two simulated datasets and finds out that the experimental results are near to the optimal results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ando S, Thanomphongphan T, Seki Y, Suzuki E (2015) Ensemble anomaly detection from multi-resolution trajectory features. Data Min Knowl Discov 29:39–83

    Article  MathSciNet  Google Scholar 

  • Angiulli F, Fassetti F (2009) Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data (TKDD) 3:1–57

    Article  Google Scholar 

  • Bifet A, Holmes G, Pfahringer B, Gavald R (2009a) Improving adaptive bagging methods for evolving data streams, advances in machine learning. Springer, Berlin, pp 23–37

    Google Scholar 

  • Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavald R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 139–148

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  • Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM Sigmod Rec 29(2):93–104

  • Chang WC, Cho CW (2010) Online boosting for vehicle detection. IEEE Trans Syst Man Cybern Part B Cybern 40:892–902

    Article  Google Scholar 

  • Di Martino F, Sessa S, Barillari UES, Barillari MR (2014) Spatio-temporal hotspots and application on a disease analysis case via GIS. Soft Comput 18:2377–2384

    Article  Google Scholar 

  • Ding Z-G, Du D-J, Fei M-R (2015) An online anomaly detection method for stream data using isolation principle and statistic histogram. Int J Model Simul Sci Comput (IJMSSC) 6:1–22

    Google Scholar 

  • Ding Z, Fei M (2013) An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. In: 3rd IFAC conference on intelligent control and automation science, ICONS 2013. IFAC Secretariat, Chengdu, pp 12–17

  • Daneshpazhouh A, Sami A (2014) Entropy-based outlier detection using semi-supervised approach with few positive examples. Pattern Recognit Lett 49:77–84

    Article  Google Scholar 

  • Desir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506

    Article  Google Scholar 

  • Dietterich TG (1997) Machine-learning research—four current directions. AI Mag 18:97–136

    Google Scholar 

  • Esmaeili M, Almadan A (2011) Stream data mining and anomaly detection. Int J Comput Appl 34:38–41

    Google Scholar 

  • Fern A, Givan R (2003) Online ensemble learning: an empirical study. Mach Learn 53:71–109

    Article  MATH  Google Scholar 

  • Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34:18–26

    Article  MATH  Google Scholar 

  • Gil P, Santos A, Cardoso A (2014) Dealing with outliers in wireless sensor networks: an oil refinery application. IEEE Trans Control Syst Technol 23:1589–1596

    Google Scholar 

  • Gomez J, Gil C, Banos R, Marquez AL, Montoya FG, Montoya MG (2013) A Pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263

    Article  Google Scholar 

  • Gupta M, Gao J, Aggarwal CC, Han JW (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26:2250–2267

    Article  MATH  Google Scholar 

  • He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IEEE World congress on computational intelligence. IEEE, pp 1322–1328

  • He H, Chen S, Li K, Xu X (2011) Incremental learning from stream data. IEEE Trans Neural Netw Learn Syst 22:1901–1914

    Article  Google Scholar 

  • Huang C-W, Lin K-P, Wu M-C, Hung K-C, Liu G-S, Jen C-H (2015) Intuitionistic fuzzy c-means clustering algorithm with neighborhood attraction in segmenting medical image. Soft Comput 19:459–470

    Article  Google Scholar 

  • Huang H, Yoo S, Qin H, Yu DT (2014) Physics-based anomaly detection defined on manifold space. ACM Trans Knowl Discov Data 9:1–39

    Article  Google Scholar 

  • Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253

    Article  Google Scholar 

  • Kolter JZ, Maloof MA (2007) Dynamic weighted majority: a new ensemble method for tracking concept drift. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  • Lee YJ, Yeh YR, Wang YCF (2013) Anomaly detection via online oversampling principal component analysis. IEEE Trans Knowl Data Eng 25:1460–1470

    Article  Google Scholar 

  • Limthong K, Fukuda K, Ji YS, Yamada S (2014) Unsupervised learning model for real-time anomaly detection in computer networks. IEICE Trans Inf Syst E 97D:2084–2094

    Article  Google Scholar 

  • Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6:1–39

    Article  Google Scholar 

  • Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24:619–633

    Article  Google Scholar 

  • Moshtaghi M, Havens TC, Bezdek JC, Park L, Leckie C, Rajasegarar S, Keller JM, Palaniswami M (2011) Clustering ellipses for anomaly detection. Pattern Recognit 44:55–69

    Article  MATH  Google Scholar 

  • Noto K, Brodley C, Slonim D (2012) FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Discov 25:109–133

    Article  MathSciNet  Google Scholar 

  • Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics. IEEE, pp 2340–2345

  • O’Reilly C, Gluhak A, Imran MA, Rajasegarar S (2014) Anomaly detection in wireless sensor networks in a non-stationary environment. IEEE Commun Surv Tutor 16:1413–1432

  • Palshikar GK (2005) Distance-based outliers in sequences. In: Chakraborty G (ed) Distributed computing and internet technology, proceedings. Springer, Berlin, pp 547–552

    Chapter  Google Scholar 

  • Qi ZQ, Xu YT, Wang LS, Song Y (2011) Online multiple instance boosting for object detection. Neurocomputing 74:1769–1775

    Article  Google Scholar 

  • Quinn JA, Sugiyama M (2014) A least-squares approach to anomaly detection in static and sequential data. Pattern Recognit Lett 40:36–40

    Article  Google Scholar 

  • Sagha H, Bayati H, Mill JDR, Chavarriaga R (2013) On-line anomaly detection and resilience in classifier ensembles. Pattern Recognit Lett 34:1916–1927

    Article  Google Scholar 

  • Salem O, Liu YN, Mehaoua A, Boutaba R (2014) Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE J Biomed Health Inform 18:1541–1551

    Article  Google Scholar 

  • Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13:1443–1471

    Article  MATH  Google Scholar 

  • Segui S, Igual L, Vitria J (2013) Bagged one-class classifiers in the presence of outliers. Int J Pattern Recognit Artif Intell 27:1–21

    Article  Google Scholar 

  • Serdio F, Lughofer E, Pichler K, Buchegger T, Pichler M, Efendic H (2014) Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Inf Fusion 20:272–291

    Article  Google Scholar 

  • Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198

  • Suhailis A, Kadir A, Abu Bakar A, Hamdan AR (2014) Frequent positive and negative (FPN) itemset approach for outlier detection. Intell Data Anal 18:1049–1065

    Google Scholar 

  • Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press, pp 1511–1516

  • Ting K, Zhou G-T, Liu F, Tan S (2013) Mass estimation. Mach Learn 90:127–160

    Article  MathSciNet  MATH  Google Scholar 

  • UCI Machine Learning Repository (2007) http://archive.ics.uci.edu/ml/datasets.html

  • Weka (2005) http://www.cs.waikato.ac.nz/ml/weka/

  • Xie M, Hu J, Han S, Chen H (2012) Scalable hyper-grid k-NN-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24:1661–1670

    Article  Google Scholar 

  • Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300

  • Yang X, Han L, Li Y, He L (2015) A bilateral-truncated-loss based robust support vector machine for classification problems. Soft Comput 19:2871–2882

    Article  MATH  Google Scholar 

  • Yu X, Tang LA, Han J (2009a) Filtering and refinement: a two-stage approach for efficient and effective anomaly detection. In: ICDM’09. Ninth IEEE international conference data mining. IEEE, pp 617–626

  • Yu Y, Guo SQ, Lan S, Ban T (2009b) Anomaly intrusion detection for evolving data stream based on semi-supervised learning. Adv Neuro-Inf Process 5506:571–578

  • Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12:159–170

    Article  Google Scholar 

  • Zhou XZ, Li SP, Ye Z (2013) A novel system anomaly prediction system based on belief Markov model and ensemble classification. Math Probl Eng 2013:831–842

Download references

Acknowledgments

This study was funded by the National High Technology Research and Development Program of China (Grant No. 2011AA040103-7), the National Key Scientific Instrument and Equipment Development Project (Grant No. 2012YQ15008703), The Open Project of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial (Grant No. ZC323014100), the Zhejiang Provincial Natural Science Foundation of China (Grant No. LY13F020015), National Science Foundation of China (Grant No. 61104089), Science and Technology Commission of Shanghai Municipality (Grant No. 11JC1404000), Shanghai Rising-Star Program (Grant No. 13QA1401600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Ding.

Ethics declarations

Conflict of interest

The four authors declare that they have no conflict of interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Y. Jin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, Z., Fei, M., Du, D. et al. Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning. Soft Comput 21, 5905–5917 (2017). https://doi.org/10.1007/s00500-016-2258-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2258-z

Keywords

Navigation