EXOS: Explaining Outliers in Data Streams

Panjei, Egawati; Gruenwald, Le

doi:10.1007/978-3-031-39831-5_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14148))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

440 Accesses

Abstract

Real-time outlier detection is important in many data stream applications. To help analysts understand the detected outliers better, the outliers should be presented with their explanations. One type of explanations for an outlier is its set of outlying attributes which is a subset of features responsible for the outlier’s abnormality. There exist techniques that generate outlying attributes in data streams; however, none simultaneously considers the cross-correlation among data streams, the unbounded volume of data, and concept drift. To fill this gap, we propose EXOS, a framework that generates outlying attributes in multi-dimensional data streams. For each outlier, it incrementally finds a local context to determine the decision boundary that separates the outlier from the normal data while handling both the unbounded volume of data and concept drift. It considers the potential data correlation within a data stream and across data streams to estimate the local context. The experiments using three real and two synthetic datasets show that, on average, EXOS achieves up to 49% higher F1 score and 29.6 times lower explanation time than existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41, 15:1–15:58 (2009)
Google Scholar
Tran, L., Mun, M.Y., Shahabi, C.: Real-time distance-based outlier detection in data streams. In: Proceedings of the VLDB Endowment (2020)
Google Scholar
Yoon, S., Lee, J.G., Lee, B.S.: NETS: Extremely fast outlier detection from a data stream via set-based processing. In: Proceedings of the VLDB Endowment (2018)
Google Scholar
Siddiqui, M.A., Fern, A., Dietterich, T.G., Wong, W.-K.: Sequential feature explanations for anomaly detection. ACM Trans. Knowl. Discov. Data 13, 1–22 (2019)
Article Google Scholar
Panjei, E., Gruenwald, L., Leal, E., Nguyen, C., Silvia, S.: A survey on outlier explanations. VLDB J. 31, 977–1008 (2022)
Google Scholar
Sadik, M., Gruenwald, L.: Research issues in outlier detection for data streams. SIGKDD Explor. 15, 33–40 (2014)
Article Google Scholar
Micenková, B., Ng, R.T., Dang, X.-H., Assent, I.: Explaining outliers by subspace separability. In: 2013 IEEE 13th International Conference on Data Mining, pp. 518–527 (2013)
Google Scholar
Liu, N., Shin, D., Hu, X.: Contextual outlier interpretation. In: IJCAI (2018)
Google Scholar
Song, F., Diao, Y., Read, J., Stiegler, A., Bifet, A.: EXAD: a system for explainable anomaly detection on big data traces. In: IEEE International Conference on Data Mining Workshops, pp. 1435–1440 (2018)
Google Scholar
Panjei, E., Gruenwald, L., Leal, E., Nguyen, C.: Micro-clusters-based outlier explanations for data (2021) Streams. https://sites.google.com/view/andea2021/accepted-papers
Li, C.L., Lin, H. ten, Lu, C.J.: Rivalry of two families of algorithms for memory-restricted streaming PCA. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2016)
Google Scholar
Ackerman, M., Dasgupta, S.: Incremental clustering: the case for extra clusters. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: Training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (1992)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 (2011)
Google Scholar
Bailis, P., Gan, E., Madden, S., Narayanan, D., Rong, K., Suri, S.: MacroBase: prioritizing attention in fast data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2017)
Google Scholar
Jacob, V., Song, F., Stiegler, A., Rad, B., Diao, Y., Tatbul, N.: Exathlon: A benchmark for explainable anomaly detection over time series. In: Proceedings of the VLDB Endowment (2021)
Google Scholar
Zhang, H., Diao, Y., Meliou, A.: EXstream: explaining anomalies in event stream monitoring. In: International Conference on Extending Database Technology (2017)
Google Scholar
Bodik, P., Hong, W., Guestrin, C., Madden, S., Paskin, M., Thibaux, R.: Intel Lab Data (2004). http://db.csail.mit.edu/labdata/labdata.html
Buckreis, T., Winders, A., Wang, P., Brandenberg, S., Stewart, J.: Microtremor Data Collected in Sacramento-San Joaquin Delta Region of California (2021). https://doi.org/10.17603/ds2-dk6t-8610
Makonin, S.: AMPds2: The Almanac of Minutely Power Dataset (Version 2) (2016). https://doi.org/10.7910/DVN/FIE0S4
Hardin, J., Garcia, S.R., Golan, D.: A method for generating realistic correlation matrices. Ann Appl Stat. 7, 1733–1762 (2013)
Article MathSciNet MATH Google Scholar
Gu, F.: Concept Drift Detection for Machine Learning with Stream Data (2019). https://opus.lib.uts.edu.au/bitstream/10453/140165/2/02whole.pdf
Das, S.: Best Practices for Dealing with Concept Drift. https://neptune.ai/blog/concept-drift-best-practices. Accessed 03 Apr 2023

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Oklahoma, Norman, OK, USA
Egawati Panjei & Le Gruenwald

Authors

Egawati Panjei
View author publications
You can also search for this author in PubMed Google Scholar
Le Gruenwald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Egawati Panjei .

Editor information

Editors and Affiliations

Poznań University of Technology, Poznan, Poland
Robert Wrembel
Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Johann Gamper
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Panjei, E., Gruenwald, L. (2023). EXOS: Explaining Outliers in Data Streams. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-39831-5_3
Published: 10 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39830-8
Online ISBN: 978-3-031-39831-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics