skip to main content
10.1145/3400903.3400904acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

LRZ Convolution: An Algorithm for Automatic Anomaly Detection in Time-series Data

Published:30 July 2020Publication History

ABSTRACT

Automatic anomaly detection is a hard but practically useful problem. With telemetry data sizes growing constantly, experts will rely increasingly on automation to bring anomalies to their attention. In this paper, anomaly transition points (called change points elsewhere), are determined using a novel application of a somewhat obscure statistical score called “z-score of mean difference”. Use of this score yields a practical linear-time algorithm called LRZ Convolution with sound statistical underpinnings, and which does not require data normality. Each anomaly transition point is accompanied by a set of explanatory predicates that can form a good starting point for determining an anomaly’s root causes. Careful experimental evaluation and performance in two independent domains show promising results. A preliminary comparison with a well-known machine learning algorithm called Support Vector Machines (SVM) yields a highly favorable outcome.

References

  1. 2014 People’s Climate March 2019. Wikipedia. Retrieved January 22, 2020 from https://en.wikipedia.org/wiki/2014_People%27s_Climate_MarchGoogle ScholarGoogle Scholar
  2. Sabyasachi Basu and Martin Meckesheimer. 2007. Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems 11, 2 (2007), 137–154. https://doi.org/10.1007/s10115-006-0026-6Google ScholarGoogle ScholarCross RefCross Ref
  3. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009), 15:1–15:58. https://doi.org/10.1145/1541880.1541882Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tamraparni Dasu, Shankar Krishnan, Suresh Venkatasubramanian, and Ke Yi. 2006. An information-theoretic approach to detecting changes in multi-dimensional data streams. In In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications.Google ScholarGoogle Scholar
  5. Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277–288. https://doi.org/10.14778/2732240.2732246Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection On Streams. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. 2712–2721.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Victoria J. Hodge and Jim Austin. 2004. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review 22 (2004), 85–126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniel Kifer, Shai Ben-David, and Johannes Gehrke. 2004. Detecting Change in Data Streams. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004. 180–191. https://doi.org/10.1016/B978-012088469-8.50019-XGoogle ScholarGoogle Scholar
  9. NYC Taxi & Limousine Commission 2020. TLC Trip Record Data. Retrieved January 21, 2020 from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.pageGoogle ScholarGoogle Scholar
  10. Scikit Learn 2019. Novelty and Outlier Detection. Retrieved May 12, 2020 from https://scikit-learn.org/stable/modules/outlier_detection.html#outlier-detectionGoogle ScholarGoogle Scholar
  11. Scikit Learn 2019. Support Vector Machines. Retrieved May 12, 2020 from https://scikit-learn.org/stable/modules/svm.htmlGoogle ScholarGoogle Scholar
  12. The Comprehensive R Archive Network 2019. Two sample Z-tests. Retrieved January 7, 2020 from https://cran.r-project.org/web/packages/distributions3/vignettes/two-sample-z-test.htmlGoogle ScholarGoogle Scholar
  13. Transaction Processing Performance Council 1992. TPC-C. Retrieved January 7, 2020 from http://www.tpc.org/tpcc/Google ScholarGoogle Scholar
  14. Voice of America 2015. New York Police Department Funeral - January 4, 2015. Retrieved January 23, 2020 from https://www.voacambodia.com/a/new-york-police-department-funeral-january-4-2015/2585277.htmlGoogle ScholarGoogle Scholar
  15. Charles J. Wheelan. 2013. Naked Statistics. W. W. Norton & Company, New York, NY.Google ScholarGoogle Scholar
  16. Wieërs, Dag 2016. Dstat: Versatile resource statistics tool. Retrieved January 7, 2020 from http://dag.wiee.rs/personal/Google ScholarGoogle Scholar
  17. Wikipedia 2019. Kernel (image processing). Retrieved January 13, 2020 from https://en.wikipedia.org/wiki/Kernel_(image_processing)Google ScholarGoogle Scholar
  18. Wikipedia 2020. Apache Spark. Retrieved May 13, 2020 from https://en.wikipedia.org/wiki/Apache_SparkGoogle ScholarGoogle Scholar
  19. Wikipedia 2020. Signal processing. Retrieved January 14, 2020 from https://en.wikipedia.org/wiki/Signal_processingGoogle ScholarGoogle Scholar
  20. Dong Young Yoon, Ning Niu, and Barzan Mozafari. 2016. DBSherlock: A Performance Diagnostic Tool for Transactional Databases. In Proceedings of the 2016 International Conference on Management of Data. 1599–1614. https://doi.org/10.1145/2882903.2915218Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SSDBM '20: Proceedings of the 32nd International Conference on Scientific and Statistical Database Management
    July 2020
    241 pages
    ISBN:9781450388146
    DOI:10.1145/3400903

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 30 July 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate56of146submissions,38%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format