Abstract
The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. In addition, for multivariate time series, it is difficult to detect anomalies due to the following challenges. First, anomalies may occur in only a subset of dimensions (variables). Second, the locations and lengths of anomalous subsequences may be different in different dimensions. Third, some anomalies may look normal in each individual dimension but different with combinations of dimensions. To mitigate these problems, we introduce a multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences. We evaluate our approaches on several real-world datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.



























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We use the whole time series to demonstrate the idea, but our approach also works in the scenario when points of time series come in a streaming fashion.
Reflow soldering: https://en.wikipedia.org/wiki/Reflow_soldering.
References
Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196
Ahmed M, Baqqar M, Gu F, Ball AD (2012) Fault detection and diagnosis using principal component analysis of vibration data from a reciprocating compressor. In: Proceedings of 2012 UKACC international conference on control, pp 461–466
Baragona R, Battaglia F (2007) Outliers detection in multivariate time series by independent component analysis. Neural Comput 19(7):1962–1984
Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 49–58
Budalakoti S, Srivastava AN, Akella R, Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Tech Rep
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 333–342
Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. University of Minnesota, Tech Rep, Computer Science Department
Cheng H, Tan PN, Potter C, Klooster S (2009) Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM international conference on data mining, pp 413–424
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD’96, pp 226–231
Galeano P, Pea D, Tsay RS (2006) Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 101(474):654–669
Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129
Hawkins DM (1980) Identification of outliers, vol 11. Springer, Dordrecht
He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9):1641–1650
Hyndman RJ, Wang E, Laptev N (2015) Large-scale unusual time series detection. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 1616–1619
Id T, Papadimitriou S, Vlachos M (2007) Computing correlation anomaly scores using stochastic nearest neighbors. In: Seventh IEEE international conference on data mining (ICDM 2007), pp 523–528
Izakian H, Pedrycz W (2014) Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans Fuzzy Syst 22(6):1612–1624
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8
Keogh E, Lin J, Lee SH, Herle HV (2007) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27
Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 1939–1947
Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining, pp 895–906
Li J, Pedrycz W, Jamal I (2017) Multivariate time series anomaly detection: a framework of hidden markov models. Appl Soft Comput 60(Supplement C):229–240
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Miljkovi D (2011) Fault detection methods: a literature survey. In: 2011 Proceedings of the 34th international convention MIPRO, pp 750–755
Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res 7(1):67–82
Pires AM, Santos-Pereira C (2005) Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the international conference on robust statistics
Pukelsheim F (1994) The three sigma rule. Am Stat 48(2):88–91
Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for time-series anomaly detection. In: 2012 IEEE 12th international conference on data mining, pp 1074–1079
Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin pp 468–472
Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pp 481–492
Sequeira K, Zaki M (2002) Admit: anomaly-based data mining for intrusions. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’02, pp 386–395
Sun H, Bao Y, Zhao F, Yu G, Wang D (2004) Cd-trees: an efficient index structure for outlier detection. In: Li Q, Wang G, Feng L (eds) Advances in web-age information management: 5th international conference, WAIM 2004, Dalian, China, July 15–17, 2004. Springer, Berlin, pp 600–609
Wang H, Tang M, Park Y, Priebe CE (2014) Locality statistics for anomaly detection in time series of graphs. IEEE Trans Signal Process 62(3):703–717
Wang X, Gao Y, Lin J, Rangwala H, Mittu R (2015) A machine learning approach to false alarm detection for critical arrhythmia alarms. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 202–207
Wang X, Lin J, Patel N, Braun M (2016) A self-learning and online algorithm for time series anomaly detection, with application in cpu manufacturing. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, New York, CIKM ’16, pp 1823–1832
Wei L, Keogh E, Xi X (2006) Saxually explicit images: Finding unusual shapes. In: Sixth international conference on data mining (ICDM’06), pp 711–720
Xie Y, Huang J, Willett R (2013) Change-point detection for high-dimensional time series with missing data. IEEE J Sel Top Signal Process 7(1):12–27
Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Eamonn Keogh.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Lin, J., Patel, N. et al. Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Min Knowl Disc 32, 1806–1844 (2018). https://doi.org/10.1007/s10618-018-0569-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-018-0569-7