Skip to main content
Log in

Online anomaly search in time series: significant online discords

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The aim of this work is to obtain a useful anomaly definition for online analysis of time series. The idea is to develop an anomaly concept which is sustainable for long-lived and frequent streamings. As a solution, we provide an adaptation of the discord concept, which has been successfully used for anomaly detection on time series. An online approach implies the frequent processing of a data streaming for timely providing anomaly alerts. This requires a modification since discord search is not exactly decomposable in its original definition. With a statistical approach, allowing to rate the significance of the discords of each analysis, it has been possible to obtain a solution where the number of false positives is minimized. The new online anomalies are called significant online discords (sods). As a novel feature, sod search determines the quantity of anomalies in the time series under investigation. The search for sods has been implemented and its properties validated with synthetic and real data. As a result, we found that sods can be considered as a useful new tool for anomaly detection in fast streaming time series or Big Data contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Advances in database system. Springer, Berlin

    Book  Google Scholar 

  2. Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering—a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  3. Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262(2017):134–147. https://doi.org/10.1016/j.neucom.2017.04.070 ISSN 0925–2312

    Article  Google Scholar 

  4. Avogadro P, Dominoni MA (2019) Topological approach for finding the nearest neighbor sequence in time series. In: Proceedings of the 12th international conference on knowledge discovery and information retrieval (KDIR) 2019, pp 233–244

  5. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, Addison-Wesley, New York Seiten 75 ff. ISBN 0-201-39829-X

    Google Scholar 

  6. Barbará D, Domeniconi C, Duric Z, Filippone M, Mansfield R, Lawson E (2008) Detecting suspicious behavior in surveillance images. In: IEEE international conference on proceedings of data mining workshops, ICDMW’08, IEEE, pp 891–900

  7. Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8 annual ACM–SIAM symposium on discrete algorithms, pp 360–369

  8. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  9. Box GEP, Jenkins G, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, Hoboken

    MATH  Google Scholar 

  10. Chandola V, Arindam B, Vipin K (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41.3(2009):15

    Google Scholar 

  11. Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceeding KDD ’03 proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 493–498. ISBN: 1-58113-737-0. https://doi.org/10.1145/956750.956808

  12. Gama J (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1:45. https://doi.org/10.1007/s13748-011-0002-6

    Article  Google Scholar 

  13. Gama J, Zliobaite I, Bifet A, Pechenizky M, Bouchachia A (2013) A survey on concept drift adaptation. ACM Comput Surv 46:1–35

    Article  Google Scholar 

  14. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220 [Circulation Electronic Pages; http://circ.ahajournals.org/content/101/23/e215.full]

  15. Goldin DQ, Kanellakis PC (1995) On similarity queries for time-series data: constraint specification and implementation. In: Montanari U, Rossi F (eds) Principles and practice of constraint programming—CP ’95 CP, vol 976. Lecture notes in computer science. Springer, Berlin

    Google Scholar 

  16. Govindan RB, Narayanan K, Gopinathan MS (1998) On the evidence of deterministic chaos in ECG: surrogate and predictability analysis. Chaos 8(2):495–502

    Article  Google Scholar 

  17. Hawkins DM (1980) Identification of outliers. Springer, Dodrecht

    Book  Google Scholar 

  18. Hawkins J, Ahmad S (2016) Why neurons have thousands of synapses, a theory of sequence memory in neocortex. Front Neural Circuits 10(2016):1–13. https://doi.org/10.3389/fncir.2016.00023

    Article  Google Scholar 

  19. Hayes MA, Capretz MAM (2015) Contextual anomaly detection framework for big sensor data. J Big Data 2:2. https://doi.org/10.1186/s40537-014-0011-y

    Article  Google Scholar 

  20. Hill DJ, Minsker BS, Amir E (2009) Real-time Bayesian anomaly detection in streaming environmental data. Water Resour J. https://doi.org/10.1029/2008WR006956

    Article  Google Scholar 

  21. James J et al (2018) Data Never Sleeps 6.0. https://www.domo.com/blog/data-never-sleeps-6/. Accessed 05 Mar 2020

  22. Kaufman L, Rousseeuw PJ (2005) Finding groups in data: an introduction to cluster analysis, 1st edn. Wiley series in probability and statistics. Wiley, New York

    MATH  Google Scholar 

  23. Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the fifth IEEE international conference on data mining (ICDM’05), pp 226–233

  24. Keogh E, Lin J, Lee S-H, Van Herle H (2006) Finding the most unusual time series sequence: algorithms and applications. Knowl Inf Syst 11(1):1–27. https://doi.org/10.1007/s10115-006-0034-6

    Article  Google Scholar 

  25. Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas T, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: Proceedings of the 27th IEEE international conference on data engineering (ICDE’11), Hannover, Germany

  26. Laguna P, Mark RG, Goldberger AL, Moody GB (1997) A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. Comput Cardiol 24:673–676

    Google Scholar 

  27. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery

  28. Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceedings of ESANN 2015, Bruges (Belgium), 22–24 April 2015, ISBN 978-287587014-8

  29. Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Article  Google Scholar 

  30. MOA, Machine Learning for Streams. https://moa.cms.waikato.ac.nz/. Accessed 5 Mar 2020

  31. Padilla DE, Brinkworth R, McDonnell MD (2013) Performance of a hierarchical temporal memory network in noisy sequence learning. In: Proceedings of the international conference on computational intelligence and cybernetics, IEEE, pp 45–51. https://doi.org/10.1109/CyberneticsCom.2013.6865779

  32. Page ES (1954) Continuous inspection scheme. Biometrika 41(1/2):100–115. https://doi.org/10.1093/biomet/41.1-2.100

    Article  MathSciNet  MATH  Google Scholar 

  33. Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. In: ACM SIGKDD explorations newsletter—special issue on learning from imbalanced datasets, vol 6, no 1, pp 50–59, ACM, New York, NY, USA

  34. Pimentel M, Clifton D, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249

    Article  Google Scholar 

  35. Polunchenko AS, Tartakovsky AG (2012) State-of-the-art in sequential change-point detection. Methodol Comput Appl Probab 14:649. https://doi.org/10.1007/s11009-011-9256-5

    Article  MathSciNet  MATH  Google Scholar 

  36. Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Proceedings of ECML/PKDD conference, 2014

  37. Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the international conference on extending database technology, EDBT 15

  38. Sheng B, Li Q, Mao W, Jin W (2007) Outlier detection in sensor networks. In: Proceedings of the 8th ACM international symposium on mobile ad hoc networking and computing, MobiHoc ’07, ACM, New York, NY, USA, pp 219–228

  39. The Matrix Profile Website (2019). https://www.cs.ucr.edu/~eamonn/MatrixProfile.html. Accessed 3 Oct 2019

  40. Tran L, Fan L, Shahabi C (2016) Distance-based outlier detection in data streams. Proc VLDB Endow 9:1089–1100

    Article  Google Scholar 

  41. Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Boston ISBN 0-201-07616-0. OCLC 3058187

    MATH  Google Scholar 

  42. Wang C, Viswanathan K, Choudur L, Talwar V, Satterfield W, Schwan K (2011) Statistical techniques for online anomaly detection in data centers. In: Proceedings of the IFIP/IEEE international symposium on integrated network management (1M), 23–27 May 2011

  43. Wang X, Lin J, Senin P, Oates T, Gandhi, Boedihardjo AP, Chen C, Frankenstein S (2016) RPM: representative pattern mining for efficient time series classification. In: Proceedings of the international conference on extending database technology, EDBT 16, pp 185–196

  44. Wong J (2015) Netflix Surus, GitHub, Online Code Repos. https://github.com/Netflix/Surus. Accessed 5 Mar 2020

  45. Yang D, Rundensteiner E, Ward M (2009) Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th international conference on extending database technology (EDBT’09), Saint Petersburg, Russia

  46. Yeh CC-M, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets, IEEE ICDM 2016

  47. Zimmerman Z, Kamgar K, Senobari NS, Crites B, Funning G, Brisk P, Keogh E (2019) Matrix profile XIV: scaling time series motif discovery with GPUs to break a quintillion pairwise comparisons a day and beyond. In: Proceedings of the ACM symposium on cloud computing, association for computing machinery, New York, NY, USA, SoCC ’19, pp 74–86. https://doi.org/10.1145/3357223.3362721

  48. Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2018) Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. Knowl Inf Syst 54(1):203–236

    Article  Google Scholar 

  49. Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175

    Article  Google Scholar 

  50. Zhao G, Li Z, Liu F, Tang Y (2013) A concept drifting based clustering framework for data streams. In: 2013 fourth international conference on proceedings of emerging intelligent data and web technologies (EIDWT), pp 122–129. https://doi.org/10.1109/EIDWT.2013.26

Download references

Acknowledgements

PA would like to thank Audrey Adams for editing suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Avogadro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Avogadro, P., Palonca, L. & Dominoni, M.A. Online anomaly search in time series: significant online discords. Knowl Inf Syst 62, 3083–3106 (2020). https://doi.org/10.1007/s10115-020-01453-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01453-4

Keywords

Navigation