Skip to main content
Log in

Human-machine interactive streaming anomaly detection by online self-adaptive forest

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Anomaly detectors are used to distinguish differences between normal and abnormal data, which are usually implemented by evaluating and ranking the anomaly scores of each instance. A static unsupervised streaming anomaly detector is difficult to dynamically adjust anomaly score calculation. In real scenarios, anomaly detection often needs to be regulated by human feedback, which benefits adjusting anomaly detectors. In this paper, we propose a human-machine interactive streaming anomaly detection method, named ISPForest, which can be adaptively updated online under the guidance of human feedback. In particular, the feedback will be used to adjust the anomaly score calculation and structure of the detector, ideally attaining more accurate anomaly scores in the future. Our main contribution is to improve the tree-based streaming anomaly detection model that can be updated online from perspectives of anomaly score calculation and model structure. Our approach is instantiated for the powerful class of tree-based streaming anomaly detectors, and we conduct experiments on a range of benchmark datasets. The results demonstrate that the utility of incorporating feedback can improve the performance of anomaly detectors with a few human efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Hawkins D M. Identification of Outliers. London: Chapman and Hall, 1980

    Book  Google Scholar 

  2. Aggarwal C C. Outlier analysis. In: Aggarwal C C, ed. Data Mining. Cham: Springer, 2015, 237–263

    Google Scholar 

  3. Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 2019, 479: 448–455

    Article  Google Scholar 

  4. Tseng V S, Ying J C, Huang C W, Kao Y, Chen K T. FrauDetector: a graph-mining-based framework for fraudulent phone call detection. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 2157–2166

  5. Liu F T, Ting K M, Zhou Z H. Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 413–422

  6. Yang X, Latecki L J, Pokrajac D. Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of 2009 SIAM International Conference on Data Mining. 2009, 145–154

  7. Zong B, Song Q, Min M R, Cheng W, Lumezanu C, Cho D K, Chen H F. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: Proceedings of the 6th International Conference on Learning Representations. 2018

  8. Manzoor E, Milajerdi S M, Akoglu L. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1035–1044

  9. Paulheim H, Meusel R. A decomposition of the outlier detection problem into a set of supervised learning problems. Machine Learning, 2015, 100(2): 509–531

    Article  MathSciNet  Google Scholar 

  10. Overby D, Wall J, Keyser J. Interactive analysis of situational awareness metrics. In: Proceedings of SPIE 8294 Visualization and Data Analysis 2012. 2012, 829406

  11. Cao N, Shi C, Lin S, Lu J, Lin Y R, Lin C Y. TargetVue: visual analysis of anomalous user behaviors in online communication systems. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 280–289

    Article  Google Scholar 

  12. Tan S C, Ting K M, Liu T F. Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1511–1516

  13. Wu K, Zhang K, Fan W, Edwards A, Yu P S. RS-Forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 600–609

  14. Pevný T. Loda: lightweight on-line detector of anomalies. Machine Learning, 2016, 102(2): 275–304

    Article  MathSciNet  Google Scholar 

  15. Erfani S M, Rajasegarar S, Karunasekera S, Leckie C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 2016, 58: 121–134

    Article  Google Scholar 

  16. Zhang K, Hutter M, Jin H. A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 813–822

  17. Guha S, Mishra N, Roy G, Schrijvers O. Robust random cut forest based anomaly detection on streams. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2712–2721

  18. Mu X, Ting K M, Zhou Z H. Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(8): 1605–1618

    Article  Google Scholar 

  19. Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, Holmes G, Abdessalem T. Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106(9–10): 1469–1495

    Article  MathSciNet  Google Scholar 

  20. Ahmad S, Lavin A, Purdy S, Agha Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 2017, 262: 134–147

    Article  Google Scholar 

  21. Malhotra P, Vig L, Shroff G, Agarwal P. Long short term memory networks for anomaly detection in time series. In: Proceedings of the 23rd European Symposium on Artificial Neural Networks. 2015, 89–94

  22. Qiu J, Du Q, Qian C. KPI-TSAD: a time-series anomaly detector for KPI monitoring in cloud applications. Symmetry, 2019, 11(11): 1350

    Article  Google Scholar 

  23. Munir M, Siddiqui S A, Dengel A, Ahmed S. DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access, 2018, 7: 1991–2005

    Article  Google Scholar 

  24. Dong Y, Japkowicz N. Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 2018, 34(1): 261–281

    Article  MathSciNet  Google Scholar 

  25. Veeramachaneni K, Arnaldo I, Korrapati V, Bassias C, Li K. AI2: training a big data machine to defend. In: Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). 2016, 49–54

  26. Das S, Wong W K, Fern A, Dietterich T G, Siddiqui M A. Incorporating feedback into tree-based anomaly detection. 2017, arXiv preprint arXiv: 1708.09441

  27. Das S, Wong W K, Dietterich T, Fern A, Emmott A. Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). 2016, 853–858

  28. Ting K M, Zhou G T, Liu F T, Tan J S C. Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 989–998

  29. Welford B P. Note on a method for calculating corrected sums of squares and products. Technometrics, 1962, 4(3): 419–420

    Article  MathSciNet  Google Scholar 

  30. Bhatia S, Jain A, Li P, Kumar R, Hooi B. MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the Web Conference 2021. 2021, 3371–3382

  31. Hand D J, Till R J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45(2): 171–186

    Article  Google Scholar 

  32. Schölkopf B, Williamson R C, Smola A J, Shawe-Taylor J, Platt J C. Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 582–588

  33. Breunig M M, Kriegel H P, Ng R T, Sander J. LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 93–104

Download references

Acknowledgements

This work was supported in part by the National Science Fund for Distinguished Young Scholars (61725205), the National Natural Science Foundation of China (Grant Nos. 61960206008, 61772428, 61972319, and 61902320).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwen Yu.

Additional information

Qingyang Li received the bachelor’s degree from Northwestern Polytechnical University, China in 2016. She is currently a PhD student with the School of Computer Science, Northwestern Polytechnical University, China. Her research interests include ubiquitous computing, machine learning, and human-computer interaction.

Zhiwen Yu received the PhD degree in computer science from Northwestern Polytechnical University, China in 2005. He is currently a Professor and the Dean of the School of Computer Science, Northwestern Polytechnical University, China. He was an Alexander Von Humboldt Fellow with Mannheim University, Germany and a Research Fellow with Kyoto University, Japan. His research interests include ubiquitous computing, HCI, and mobile sensing and computing.

Huang Xu received the PhD degree in computer science from Northwestern Polytechnical University, China in 2019. His primary research interests include the area of data mining and ubiquitous computing. He has published in refereed conference proceedings, including ACM SIGKDD, IJCAI, and IEEE ICDM.

Bin Guo received the PhD degree in computer science from Keio University, Japan in 2009, He was a Postdoctoral Researcher with the Institut TELECOM SudParis, France. He is currently a Professor with Northwestern Polytechnical University, China. His research interests include ubiquitous computing, mobile crowd sensing and computing, and HCI.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Q., Yu, Z., Xu, H. et al. Human-machine interactive streaming anomaly detection by online self-adaptive forest. Front. Comput. Sci. 17, 172317 (2023). https://doi.org/10.1007/s11704-022-1270-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-1270-y

Keywords

Navigation