Abstract
Discords are the most unusual subsequences of a time series. Sequential discovery of discords is time consuming. As the scale of datasets increases unceasingly, datasets have to be kept on hard disk, which degrades the utilization of computing resources. Furthermore, the results discovered from segmentations of a time series are non-combinable, which makes discord discovery hard to parallelize. In this paper, we propose Parallel Discord Discovery (PDD), which divides the discord discovery problem in a combinable manner and solves its sub-problems in parallel. PDD accelerates discord discovery with multiple computing nodes and guarantees the correctness of the results. PDD stores large time series in distributed memory and takes advantage of in-memory computing to improve the utilization of computing resources. Experiments show that given 10 computing nodes, PDD is seven times faster than the sequential method HOTSAX. PDD is able to handle larger datasets than HOTSAX does. PDD achieves over 90 % utilization of computing resources, nearly twice as much as the disk-aware method does.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ameen, J., Basha, R.: Higherrarchical data mining for unusual sub-sequence identifications in time series processes. In: Second International Conference on Innovative Computing, Information and Control, 2007. ICICIC 2007, p. 177. IEEE (2007)
Basha, R., Ameen, J.: Unusual sub-sequence identifications in time series with periodicity. Int. J. Innovative Comput. Inf. Control 3(2), 471–480 (2007)
Bu, Y., Leung, O.T.W., Fu, A.W.C., Keogh, E.J., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM, pp. 449–454. SIAM (2007)
Buu, H.T.Q., Anh, D.T.: Time series discord discovery based on isax symbolic representation. In: 2011 Third International Conference on Knowledge and Systems Engineering (KSE), pp. 11–18. IEEE (2011)
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67, December 2010
Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–498. ACM (2003)
Fu, A.W., Leung, O.T.-W., Keogh, E.J., Lin, J.: Finding time series discords based on haar transform. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 31–41. Springer, Heidelberg (2006)
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Huang, T., Zhu, Y., Wu, Y., Bressan, S., Dobbie, G.: Anomaly detection and identification scheme for VM live migration in cloud infrastructure. Future Gener. Comput. Syst. 56, 736–745 (2016)
Jones, M., Nikovski, D., Imamura, M., Hirata, T.: Anomaly detection in real-valued multidimensional time series. In: International Conference on Bigdata/Socialcom/Cybersecurity. Stanford University, ASE (2014). ASE@360 Open Scientific Digital Library. http://www.ase360.org/bitstream/handle/123456789/56/submission34.pdf?sequence=1&isAllowed=y
Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, p. 8. IEEE (2005)
Li, G., Bräysy, O., Jiang, L., Wu, Z., Wang, Y.: Finding time series discord based on bit representation clustering. Knowl.-Based Syst. 54, 243–254 (2013)
Lin, J., Keogh, E., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: 18th IEEE Symposium on Computer-Based Medical Systems, 2005. Proceedings, pp. 329–334. IEEE (2005)
Luo, W., Gallagher, M.: Faster and parameter-free discord search in quasi-periodic time series. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 135–148. Springer, Heidelberg (2011)
Luo, W., Gallagher, M., Wiles, J.: Parameter-free search of time-series discord. J. Comput. Sci. Technol. 28(2), 300–310 (2013)
Miller, C., Nagy, Z., Schlueter, A.: Automated daily pattern filtering of measured building performance data. Autom. Constr. 49, 1–17 (2015)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Spark, A.: Apache spark–lightning-fast cluster computing (2014)
Wei, L., Keogh, E.J., Xi, X.: Saxually explicit images: finding unusual shapes. In: ICDM, vol. 6, pp. 711–720 (2006)
Yankov, D., Keogh, E., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst. 17(2), 241–262 (2008)
Acknowledgments
This paper is sponsored by National Natural Science Foundation of China (No. 61373032), the National Research Foundation Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) program and the National High Technology and Research Development Program of China (863 Program, 2015AA050204).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, T. et al. (2016). Parallel Discord Discovery. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-31750-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)