Abstract
Anomaly detection has received much attention due to its various applications. Generally, the first step to discover anomalies is a data representation method which reduces dimensionality as well as preserves key information. Anomaly detection based on real-value representation methods is meaningful for its convenience in numeric operation. A typical real-value representation method is the Piecewise Aggregate Approximation (PAA) that is simple and intuitive by capturing mean values of segments in a sequence. However, if segments are same or similar in their average values but different in their oscillation amplitudes, the PAA method is ineffective to describe a sequence composed of such segments. To address this issue, we propose a representation method called the Piecewise Aggregate Approximation in the Amplitude Domain (AD-PAA). For discovering anomalies, a sequence is partitioned into subsequences by a sliding window firstly. Then in the AD-PAA method, a subsequence is divided into equal size subsections according to the amplitude domain. With mean values of subsections computed, the amplitude oscillation of a subsequence is embodied effectively. When the AD-PAA method is applied to approximate subsequences, the AD-PAA representation of a sequence is constructed. Anomalies are determined by anomaly scores that are based on similarities among representation results. Experimental results on various data confirm that the proposed method is more accurate than the PAA based method and other comparison methods. The ability to differentiate anomalies of the proposed algorithm is also superior.
Similar content being viewed by others
References
Akouemo H N, Povinelli R J (2016) Probabilistic anomaly detection in natural gas time series data. Int J Forecast 32(3):948–956. doi:10.1016/j.ijforecast.2015.06.001
Andrysiak T (2016) Machine learning techniques applied to data analysis and anomaly detection in ecg signals. Appl Artif Intell 30(6):610–634. doi:10.1080/08839514.2016.1193720
Avazbeigi M, Doulabi S H H, Karimi B (2010) Choosing the appropriate order in fuzzy time series: a new N-factor fuzzy time series for prediction of the auto industry production. Expert Syst Appl 37(8):5630–5639. doi:10.1016/j.eswa.2010.02.049
Balasooriya U (1989) Detection of outliers in the exponential distribution based on prediction. Commun Stat- Theory Methods 18(2):711–720. doi:10.1080/03610928908829929
Breunig MM, Kriegel H, Ng RT, Jsander (2000) Lof: identifying density-based local outliers. In: ACM SIGMOD international conference on management of data, pp 93–104. doi:10.1145/342009.335388
Buu HTQ, Anh DT (2011) Time series discord discovery based on iSAX symbolic representation. In: Proceedings of the third international conference on knowledge and systems engineering, pp 11–18. doi:10.1109/KSE.2011.11
Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228. doi:10.1145/568518.568520
Chan F K P, Fu A W C, Yu C (2003) Haar wavelets for efficient similarity search of time-series: with and without time warping. IEEE Trans Knowl Data Eng 15(3):686–705. doi:10.1109/TKDE.2003.1198399
Chang P C, Fan C Y, Lin J L (2011) Trend discovery in financial time series data using a case based fuzzy decision tree. Expert Syst Appl 38(5):6070–6080. doi:10.1016/j.eswa.2010.11.006
Chaovalit P, Gangopadhyay A, Karabatis G, Chen Z Y (2011) Discrete wavelet transform-based time series analysis and mining. ACM Comput Surv 43(2):33–63. doi:10.1145/1883612.1883613
Chen X Y, Zhan Y Y (2008) Multi-scale anomaly detection algorithm based on infrequent pattern of time series. J Comput Appl Math 214(1):227–237. doi:10.1016/j.cam.2007.02.027
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12–45. doi:10.1145/2379776.2379788
Fu AWC, Leung OTW, Keogh E, Lin J (2006) Finding time series discords based on haar transform. In: Proceedings of international conference on advanced data mining and applications, pp 31–41. doi:10.1007/11811305_3
Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245. doi:10.1109/TPAMI.2010.44
Guerrero J L, Berlanga A, Garc J, Molina J M (2010) Piecewise linear representation segmentation as a multiobjective optimization problem. Adv Intell Soft Comput 79:267–274. doi:10.1007/978-3-642-14883-5_35
Guo CH, Li HL, Pan DH (2010) An improved piecewise aggregate approximation based on statistical features for time series mining. In: International conference on knowledge science, engineering and management, pp 234–244. doi:10.1007/978-3-642-15280-1_23
Hung NQ, Anh DT (2008) An improvement of PAA for dimensionality reduction in large time series databases. In: Proceedings of pacific rim international conference on artificial intelligence, pp 698–707. doi:10.1007/978-3-540-89197-0_64
Izakian H, Pedrycz W (2013) Anomaly detection in time series data using a fuzzy C-means clustering. In: Proceedings of IFSA world congress and NAFIPS meeting, pp 1513–1518. doi:10.1109/IFSA-NAFIPS.2013.6608627
Jaing M F, Tseng S S, Su C M (2001) Two-phase clustering process for outliers detection. Pattern Recogn Lett 22(6–7):691–700. doi:10.1016/S0167-8655(00)00131-8
Jones M, Nikovski D, Imamura M, Hirata T (2016) Exemplar learning for extremely efficient anomaly detection in real-valued time series. Data Min Knowl Disc 30(6):1–28. doi:10.1007/s10618-015-0449-3
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001a) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286. doi:10.1007/PL00011669
Keogh E, Chu S, Hart D, Pazzani M (2001b) An online algorithm for segmenting time series. In: Proceedings of IEEE international conference on data mining, pp 289–296. doi:10.1109/ICDM.2001.989531
Keogh E, Lin J, Fu AWC (2005) Details about time series discords. http://www.cs.ucr.edu/eamonn/discords
Keogh E, Lin J, Fu A W, Herle H V (2006) Finding unusual medical time-series subsequences: algorithms and applications. IEEE Trans Inf Technol Biomed 10(3):429–439. doi:10.1109/TITB.2005.863870
Knorr E M, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8 (3):237–253. doi:10.1007/s007780050006
Lemire D, 2007 A better alternative to piecewise linear time series segmentation. In: Proceedings of SIAM international conference on data mining, pp 985–993. doi:10.1137/1.9781611972771.59
Leng MW, Lai XS, Tan G, Xu X (2009) Time series representation for anomaly detection. In: IEEE international conference on computer science and information technology, pp 628–632. doi:10.1109/ICCSIT.2009.5234775
Leng M W, Yu W, Wu S, Hu H (2013) Anomaly detection algorithm based on pattern density in time series. Lecture Notes Electr Eng 236:305–311. doi:10.1007/978-1-4614-7010-6_35
Li G L, Bräysy O, Jiang L X, Wu Z D, Wang Y Z (2013) Finding time series discord based on bit representation clustering. Knowl-Based Syst 54(4):243–254. doi:10.1016/j.knosys.2013.09.015
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the eighth ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11. doi:10.1145/882082.882086
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144. doi:10.1007/s10618-007-0064-z
Lippi M, Bertini M, Frasconi P (2013) Short-term traffic flow forecasting: an experimental comparison of time-series analysis and supervised learning. IEEE Trans Intell Transp Syst 14 (2):871–882. doi:10.1109/TITS.2013.2247040
Lonardi S, Lin J, Keogh E, Chiu B (2006) Efficient discovery of unusual patterns in time series. N Gener Comput 25(1):61–93. doi:10.1007/s00354-006-0004-2
Luo W, Gallagher M, Wiles J (2013) Parameter-free search of time-series discord. J Comput Sci Technol 28(2):300–310. doi:10.1007/s11390-013-1330-8
Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618. doi:10.1145/956750.956828
Ma J G, Sun L, Wang H, Zhang Y C, Aickelin U (2016) Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans Internet Technol 16(1):1–20. doi:10.1145/2806890
Mok M S, Sohn S Y, Ju Y H (2010) Random effects logistic regression model for anomaly detection. Expert Syst Appl 37(10):7162–7166. doi:10.1016/j.eswa.2010.04.017
Quinn J A, Sugiyama M (2014) A least-squares approach to anomaly detection in static and sequential data. Pattern Recogn Lett 40(1):36–40. doi:10.1016/j.patrec.2013.12.016
Shahabi C, Tian XM, Zhao WG (2000) TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In: Proceedings of the twelfth international conference on scientific and statistical database management, pp 55–68. doi:10.1109/SSDM.2000.869778
Tewatia D K, Tolakanahalli R P, Paliwal B R, Tomé W A (2011) Time series analyses of breathing patterns of lung cancer patients using nonlinear dynamical system theory. Phys Med Biol 56(7):2161–2181. doi:10.1118/1.4734982
Truong C D, Anh D T (2015) An efficient method for motif and anomaly detection in time series based on clustering. Int J Bus Intell Data Min 10(4):356–377. doi:10.1504/IJBIDM.2015.072212
Viinikka J, Debar H, Mé L, Lehikoinen A, Tarvainen M (2009) Processing intrusion detection alert aggregates with time series modeling. Inf Fusion 10(4):312–324. doi:10.1016/j.inffus.2009.01.003
Yan Q Y, Chen X T (2013) A novel never-ending uncertain Top-k discord detection method. Inf Technol J 12(19):4906–4910. doi:10.3923/itj.2013.4906.4910
Yang Y, Hu H P, Xiong W, Ding F (2011) A novel network traffic anomaly detection model based on superstatistics theory. J Networks 6(2):311–318. doi:10.4304/jnw.6.2.311-318
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary L p Norms. In: Proceedings of the twenty-sixth international conference on very large data bases, pp 385–394
Zhao J, Liu K, Wang W, Liu Y (2014) Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry. Inf Sci Int J 259(3):335–345. doi:10.1016/j.ins.2013.05.018
Acknowledgements
The authors extend their appreciation to the International Scientific Partnership Program ISPP at King Saud University for funding this research Work through ISPP#0799.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ren, H., Liao, X., Li, Z. et al. Anomaly detection using piecewise aggregate approximation in the amplitude domain. Appl Intell 48, 1097–1110 (2018). https://doi.org/10.1007/s10489-017-1017-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1017-x