Skip to main content
Log in

ELOF: fast and memory-efficient anomaly detection algorithm in data streams

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Anomaly detection in multivariate data is an import research field. Many studies have been proposed aiming to develop the local outlier factor (LOF). However, the existing LOF-based models have two major problems: (1) need a large amount of memory to store data; (2) unsatisfactory detection results in high-dimensional data. To this end, we propose a new data streams anomaly detection algorithm extract local outlier factor (ELOF). To reduce data storage, we first design a memory window mechanism to limit the amount of data storage; then, we design a new sub-data extraction model to extract the sub-data of the original data information. Through these two designs, the amount of data storage can be effectively reduced. Moreover, the model framework is based on the density discriminant method, and it can be widely applied to different real scenarios without any prior information or assumptions of data distribution. The final comprehensive experimental results show that the ELOF model has a great improvement than many common models in terms of accuracy. Furthermore, the running time of ELOF algorithm is less than 1% of the original LOF algorithm under the same data set. These results indicate that the ELOF improved model consumes less memory in real-time data streams anomaly detection and works better in high-dimensional data streams detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aggarwal CC (2015) Outlier analysis. In: Data mining. Springer, pp 237–263

  • Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on Very large data bases, vol 30, pp 852–863

  • Al-Zoubi MB (2009) An effective clustering-based approach for outlier detection. Eur J Sci Res 28(2):310

    Google Scholar 

  • Angiulli F, Fassetti F (2007) Detecting distance-based outliers in streams of data. In: Proceedings of the sixteenth ACM conference on information and knowledge management, pp 811–820

  • Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: European conference on principles of data mining and knowledge discovery. Springer, pp 15–27

  • Assent I, Kranen P, Baldauf C, Seidl T (2012) Anyout: anytime outlier detection on streaming data. In: International conference on database systems for advanced applications. Springer, pp 228–242

  • Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 29–38

  • Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp 93–104

  • Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner EA (IEEE, 2014) Scalable distance-based outlier detection over high-volume data streams. In: 2014 IEEE 30th international conference on data engineering, pp 76–87

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1

    Article  Google Scholar 

  • Dang TT, Ngan HY, Liu W (, 2015) Distance-based k-nearest neighbors outlier detection method in large-scale traffic data. In: 2015 IEEE international conference on Digital Signal Processing (DSP). IEEE, pp 507–510

  • Gupta M, Gao J, Aggarwal CC, Han J (2013) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250

    Article  Google Scholar 

  • Jin W, Tung AK, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 293–298

  • Liu F, Yu Y, Song P, Fan Y, Tong X (2020) Scalable KDE-based top-n local outlier detection over large-scale data streams. Knowl Based Syst 204:106186

    Article  Google Scholar 

  • Moshtaghi M, Bezdek JC, Havens TC, Leckie C, Karunasekera S, Rajasegarar S, Palaniswami M (2014) Streaming analysis in wireless sensor networks. Wirel Commun Mobile Comput 14(9):905

    Article  Google Scholar 

  • Niennattrakul V, Keogh E, Ratanamahatana CA (2010) Data editing techniques to allow the application of distance-based outlier detection to streams. In: 2010 IEEE international conference on data mining. IEEE, pp 947–952

  • Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings 19th international conference on data engineering (Cat. No. 03CH37405). IEEE, pp 315–326

  • Póczos B, Xiong L, Schneider J (2012) Nonparametric divergence estimation with applications to machine learning on distributions. arXiv preprint arXiv:1202.3758

  • Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: 2007 IEEE symposium on computational intelligence and data mining. IEEE, pp 504–515

  • Rajasegarar S, Leckie C, Palaniswami M (2008) Anomaly detection in wireless sensor networks. IEEE Wirel Commun 15(4):34

    Article  Google Scholar 

  • Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp 427–438

  • Sadik S, Gruenwald L (2014) Research issues in outlier detection for data streams. ACM SIGKDD Explor Newsl 15(1):33

    Article  Google Scholar 

  • Salehi M, Leckie C, Bezdek JC, Vaithianathan T, Zhang X (2016) Fast memory efficient local outlier detection in data streams. IEEE Trans Knowl Data Eng 28(12):3246

    Article  Google Scholar 

  • Schubert E, Zimek A, Kriegel HP (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190

    Article  MathSciNet  Google Scholar 

  • Sun P, Chawla S (2004) On local spatial outliers. In: Fourth IEEE International Conference on Data Mining (ICDM’04). IEEE, pp 209–216

  • Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964

    Article  Google Scholar 

  • Yan Y, Cao L, Kulhman C, Rundensteiner E (2017a) Distributed local outlier detection in big data. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1225–1234

  • Yan Y, Cao L, Rundensteiner EA (2017b) Scalable top-n local outlier detection. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1235–1244

  • Yang D, Rundensteiner EA, Ward MO (2009) Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 529–540

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Chen.

Ethics declarations

Conflict of interest

Author Yun Yang declares that she has no conflict of interest. Author Liang Chen declares that he has no conflict of interest. Author ChongJun Fan declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Chen, L. & Fan, C. ELOF: fast and memory-efficient anomaly detection algorithm in data streams. Soft Comput 25, 4283–4294 (2021). https://doi.org/10.1007/s00500-020-05442-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05442-1

Keywords

Navigation