Abstract
We investigate algorithms for efficiently detecting anomalies in real-valued one-dimensional time series. Past work has shown that a simple brute force algorithm that uses as an anomaly score the Euclidean distance between nearest neighbors of subsequences from a testing time series and a training time series is one of the most effective anomaly detectors. We investigate a very efficient implementation of this method and show that it is still too slow for most real world applications. Next, we present a new method based on summarizing the training time series with a small set of exemplars. The exemplars we use are feature vectors that capture both the high frequency and low frequency information in sets of similar subsequences of the time series. We show that this exemplar-based method is both much faster than the efficient brute force method as well as a prediction-based method and also handles a wider range of anomalies. We compare our algorithm across a large variety of publicly available time series and encourage others to do the same. Our exemplar-based algorithm is able to process time series in minutes that would take other methods days to process.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We did test the z-normalized BFED algorithm and as expected found it to be less accurate for anomaly detection. Over all the testing time series used in Sect. 6, the z-normalized BFED algorithm has a detection rate of 31/45 with no false positives which is worse than the unnormalized BFED algorithm as well as our exemplar approach.
References
Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of the 11th international conference on extending database technology: advances in database technology (EDBT)
Bay S, Saito K, Ueda N, Langley P (2004) A framework for discovering anomalous regimes in multivariate time-series data with local models. Symposium on machine learning for anomaly detection. Stanford University
Bentley J (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Chan P, Mahoney M (2005) Modeling multiple time series for anomaly detection. In: Fifth IEEE international conference on data mining, pp 90–97
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)
Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. Dept. of Computer Science and Engineering, Univ. of Minnesota Technical Report, TR 09–004
Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Trans Intell Syst Technol 2(3): article no. 27, 1–27
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498
Dasgupta D, Forrest S (1996) Novelty Detection in time series data using ideas from immunology. In: 5th international conference on intelligent systems
Farrell B, Santuro S (2005) NASA shuttle valve Data. http://www.cs.fit.edu/ pkc/nasa/data/
Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Jones M, Nikovski D, Imamura M, Hirata T (2014) Anomaly detection in real-valued multidimensional time series. In: Proceedings of the 2nd international ASE conference on big data science and computing
Keogh E, Lin J, Fu A (2005) HOT SAX: finding the most unusual time series subsequence: algorithms and applications. In: Proceedings of the Fifth IEEE international conference on data mining, pp 226–233
Keogh E (2005) www.cs.ucr.edu/ eamonn/discords/
Liu B, Chen H, Sharma A, Jiang G, Xiong H (2013) Modeling heterogeneous time series dynamics to profile big sensor data in complex physical systems. In: IEEE international conference on big data, pp 631–638
Ma J, Perkins S (2003) Online novelty detection on temporal sequences. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618
Mahoney M, Chan P (2005) Trajectory boundary modeling of time series for anomaly detection. Workshop on data mining methods for anomaly detection at SIGKDD
Oliveira A, Meira S (2006) Detecting novelties in time series through neural network forcasting with robust confidence intervals. Neurocomputing 70:79–92
Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE international conference on data mining, pp 370–377
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Jones, M., Nikovski, D., Imamura, M. et al. Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min Knowl Disc 30, 1427–1454 (2016). https://doi.org/10.1007/s10618-015-0449-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0449-3