ABSTRACT
Automatic anomaly detection is a hard but practically useful problem. With telemetry data sizes growing constantly, experts will rely increasingly on automation to bring anomalies to their attention. In this paper, anomaly transition points (called change points elsewhere), are determined using a novel application of a somewhat obscure statistical score called “z-score of mean difference”. Use of this score yields a practical linear-time algorithm called LRZ Convolution with sound statistical underpinnings, and which does not require data normality. Each anomaly transition point is accompanied by a set of explanatory predicates that can form a good starting point for determining an anomaly’s root causes. Careful experimental evaluation and performance in two independent domains show promising results. A preliminary comparison with a well-known machine learning algorithm called Support Vector Machines (SVM) yields a highly favorable outcome.
- 2014 People’s Climate March 2019. Wikipedia. Retrieved January 22, 2020 from https://en.wikipedia.org/wiki/2014_People%27s_Climate_MarchGoogle Scholar
- Sabyasachi Basu and Martin Meckesheimer. 2007. Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems 11, 2 (2007), 137–154. https://doi.org/10.1007/s10115-006-0026-6Google ScholarCross Ref
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009), 15:1–15:58. https://doi.org/10.1145/1541880.1541882Google ScholarDigital Library
- Tamraparni Dasu, Shankar Krishnan, Suresh Venkatasubramanian, and Ke Yi. 2006. An information-theoretic approach to detecting changes in multi-dimensional data streams. In In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications.Google Scholar
- Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277–288. https://doi.org/10.14778/2732240.2732246Google ScholarDigital Library
- Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection On Streams. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. 2712–2721.Google ScholarDigital Library
- Victoria J. Hodge and Jim Austin. 2004. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review 22 (2004), 85–126.Google ScholarDigital Library
- Daniel Kifer, Shai Ben-David, and Johannes Gehrke. 2004. Detecting Change in Data Streams. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004. 180–191. https://doi.org/10.1016/B978-012088469-8.50019-XGoogle Scholar
- NYC Taxi & Limousine Commission 2020. TLC Trip Record Data. Retrieved January 21, 2020 from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.pageGoogle Scholar
- Scikit Learn 2019. Novelty and Outlier Detection. Retrieved May 12, 2020 from https://scikit-learn.org/stable/modules/outlier_detection.html#outlier-detectionGoogle Scholar
- Scikit Learn 2019. Support Vector Machines. Retrieved May 12, 2020 from https://scikit-learn.org/stable/modules/svm.htmlGoogle Scholar
- The Comprehensive R Archive Network 2019. Two sample Z-tests. Retrieved January 7, 2020 from https://cran.r-project.org/web/packages/distributions3/vignettes/two-sample-z-test.htmlGoogle Scholar
- Transaction Processing Performance Council 1992. TPC-C. Retrieved January 7, 2020 from http://www.tpc.org/tpcc/Google Scholar
- Voice of America 2015. New York Police Department Funeral - January 4, 2015. Retrieved January 23, 2020 from https://www.voacambodia.com/a/new-york-police-department-funeral-january-4-2015/2585277.htmlGoogle Scholar
- Charles J. Wheelan. 2013. Naked Statistics. W. W. Norton & Company, New York, NY.Google Scholar
- Wieërs, Dag 2016. Dstat: Versatile resource statistics tool. Retrieved January 7, 2020 from http://dag.wiee.rs/personal/Google Scholar
- Wikipedia 2019. Kernel (image processing). Retrieved January 13, 2020 from https://en.wikipedia.org/wiki/Kernel_(image_processing)Google Scholar
- Wikipedia 2020. Apache Spark. Retrieved May 13, 2020 from https://en.wikipedia.org/wiki/Apache_SparkGoogle Scholar
- Wikipedia 2020. Signal processing. Retrieved January 14, 2020 from https://en.wikipedia.org/wiki/Signal_processingGoogle Scholar
- Dong Young Yoon, Ning Niu, and Barzan Mozafari. 2016. DBSherlock: A Performance Diagnostic Tool for Transactional Databases. In Proceedings of the 2016 International Conference on Management of Data. 1599–1614. https://doi.org/10.1145/2882903.2915218Google ScholarDigital Library
Recommendations
Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges
AbstractAnomaly detection has recently been applied to various areas, and several techniques based on deep learning have been proposed for the analysis of multivariate time series. In this study, we classify the anomalies into three types, ...
Highlights- The methods for anomaly detection on multivariate time series are reviewed.
- The ...
Exact variable-length anomaly detection algorithm for univariate and multivariate time series
The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of ...
Reconstruct Anomaly to Normal: Adversarially Learned and Latent Vector-Constrained Autoencoder for Time-Series Anomaly Detection
PRICAI 2021: Trends in Artificial IntelligenceAbstractTime-series Anomaly Detection has important applications, such as credit card fraud detection and machine fault detection. Anomaly detection based on the generative model generally detect samples with high reconstruction errors as anomalies. ...
Comments