Abstract
In the last decade, outlier detection for temporal data has received much attention from data mining and machine learning communities. While other works have addressed this problem by two-way approaches (similarity and clustering), we propose in this paper an embedded technique dealing with both methods simultaneously. We reformulate the task of outlier detection as a weighted clustering problem based on entropy and dynamic time warping for time series. The outliers are then detected by an optimization problem of a new proposed cost function adapted to this kind of data. Finally, we provide some experimental results for validating our proposal and comparing it with other methods of detection.








Similar content being viewed by others
References
Aggarwal C (2013) Outlier analysis. Springer, Berlin
Aggarwal C, Zhao Y, Yu P (2011) Outlier detection in graph streams. In: Proceedings of ICDE, pp 399–409
Aggarwal C, Subbian K (2012) Event detection in social streams. In: Proceedings of SDM, pp 624–635
Bahadori M, Kale D, Yingying F, Yan L (2015) Functional subspace clustering with application to time series. Proceedings of ICML, pp. 228–237
Basu S, Meckesheimer M (2007) Automatic outlier detection for time series: an application to sensor data. Knowl Inf Syst 11(2):137–154
Bevilacqua M, Tsaftaris S (2015) Dictionary-decomposition-based one-class svm for unsupervised detection of anomalous time series. In: Proceedings of 23rd European signal processing conference (EUSIPCO), pp 1776–1780
Budalakoti S, Srivastava A, Otey M (2009) Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety. IEEE Trans Syst Man Cybern Part C Appl 39(1):101–113
Chandola V, Mithal V, Kumar V (2008) Comparative evaluation of anomaly detection techniques for sequence data. In: Proceedings of ICDM, pp 743–748
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The ucr time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/
Dasgupta D, Nino F (2000) A comparison of negative and positive selection algorithms in novel pattern detection. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, pp 125–130
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dit-Yan Y, Calvin C (2002) Parzen-window network intrusion detectors. In: Proceedings of ICPR, pp 385–388
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, Berlin
Fox A (1972) Outliers in time series. J R Stat Soc Ser B Methodol 34(3):350–363
Gao B, Ma H, Yang Y (2002) Hmms (hidden markov models) based on anomaly intrusion detection method. In: Proceedings of Conference on machine learning and cybernetics, pp 381–385
Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings of KDD, pp 813–822
Goldstein M, Uchida S (2016) A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4):1–31
Görnitz N, Braun L, Kloft M (2015) Hidden markov anomaly detection. In: Proceedings of ICML, pp 1833–1842
Green P, Kim J, Carmone F (1990) A preliminary study of optimal variable weighting in k-means clustering. J Classif 7(2):271–285
Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Gupta M, Gao J, Sun Y, Han J (2012) Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of KDD, pp 859–867
Gupta M, Gao J, Sun Y, Han J (2012) Community trend outlier detection using soft temporal pattern mining. In: Proceedings of ECML/PKDD, pp 692–708
Hautamaki T, Nykanen P, Frant P (2008) Time-series clustering by approximate prototypes. In: Proceedings of ICPR, pp 1–4
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27:657–668
Jing L, Ng M, Huang Z (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041
Kassab R, Alexandre F (2009) Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data. Mach Learn 74(2):191–234
Keogh E, Lin J, Lee S, Herle V (2006) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27
Lane T, Brodley C (1997) Sequence matching and learning in anomaly detection for computer security. AI Approaches to Fraud Detection and Risk Management. In: AAAI Workshop, pp 43–49
Lee Y, Yeh R, Wang F (2013) Anomaly detection via online oversampling principal component analysis. IEEE Trans Knowl Data Eng 25(7):1460–1470
Makarenkov V, Legendre P (2001) Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. J Classif 18:245–271
Markus M, Hans-Peter K, Raymond T, Jrg S (2000) LOF: identifying density-based local outliers. In: Proceedings of SIGMOD Conference, pp 93–104
Modha D, Spangler S (2003) Feature weighting in k-means clustering. Mach Learn 52:217–237
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Proceedings of neural information processing systems (NIPS), pp 849–856. MIT Press
Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2003) Distributed deviation detection in sensor networks. Proc SIGMOD Rec 32(4):77–82
Paulheim H, Meusel R (2015) A decomposition of the outlier detection problem into a set of supervised learning problems. Mach Learn 100(2–3):509–531
Petitjean F, Forestier G, Webb G, Nicholson A, Chen Y, Keogh E (2014) Dynamic time warping averaging of time series allows faster and more accurate classification. Proc ICDM 2014(2014):470–479
Portnoy L, Eskin E, Stolfo S (2001) Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA), pp 5–8
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover M, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of ACM SIGKDD, pp 262–270
Ratanamahatana C, Keogh E (2004) Making time-series classification more accurate using learned constraints. Proc SIAM 2004:11–22
Rebbapragada U, Protopapas P, Brodley C, Alcock C (2009) Finding anomalous periodic time series. Mach Learn 74(3):281–313
Salvador S, Chan P (2004) Fastdtw: Toward accurate dynamic time warping in linear time and space. In: Proceedings of KDD workshop on mining temporal and sequential data, pp 70–80
Salvador S, Chan P (2005) Learning states and rules for detecting anomalies in time series. Appl Intell 23(3):241–255
Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5):561–580
Schölkopf B, Williamson R, Smola A, Shawe-Taylor J, Platt J (1999) Support vector method for novelty detection. In: Proceedings of neural information processing systems (NIPS), pp 582–588
Shang H (2014) A survey of functional principal component analysis. Adv Stat Anal 98(2):121–142
Tian S, Mu S, Yin C (2007) Sequence-similarity kernels for svms to detect anomalies in system calls. Neurocomputing 70(4–6):859–866
Vintsyuk T (1968) Speech discrimination by dynamic programming. Cybernetics 4(1):52–57
Acknowledgements
We thank anonymous reviewers for their very useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Benkabou, SE., Benabdeslem, K. & Canitia, B. Unsupervised outlier detection for time series by entropy and dynamic time warping. Knowl Inf Syst 54, 463–486 (2018). https://doi.org/10.1007/s10115-017-1067-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1067-8