Parallel Discord Discovery

Huang, Tian; Zhu, Yongxin; Mao, Yishu; Li, Xinyang; Liu, Mengyun; Wu, Yafei; Ha, Yajun; Dobbie, Gillian

doi:10.1007/978-3-319-31750-2_19

Tian Huang¹⁹,
Yongxin Zhu¹⁹,
Yishu Mao¹⁹,
Xinyang Li¹⁹,
Mengyun Liu¹⁹,
Yafei Wu¹⁹,
Yajun Ha²⁰ &
…
Gillian Dobbie²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2998 Accesses
6 Citations

Abstract

Discords are the most unusual subsequences of a time series. Sequential discovery of discords is time consuming. As the scale of datasets increases unceasingly, datasets have to be kept on hard disk, which degrades the utilization of computing resources. Furthermore, the results discovered from segmentations of a time series are non-combinable, which makes discord discovery hard to parallelize. In this paper, we propose Parallel Discord Discovery (PDD), which divides the discord discovery problem in a combinable manner and solves its sub-problems in parallel. PDD accelerates discord discovery with multiple computing nodes and guarantees the correctness of the results. PDD stores large time series in distributed memory and takes advantage of in-memory computing to improve the utilization of computing resources. Experiments show that given 10 computing nodes, PDD is seven times faster than the sequential method HOTSAX. PDD is able to handle larger datasets than HOTSAX does. PDD achieves over 90 % utilization of computing resources, nearly twice as much as the disk-aware method does.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ameen, J., Basha, R.: Higherrarchical data mining for unusual sub-sequence identifications in time series processes. In: Second International Conference on Innovative Computing, Information and Control, 2007. ICICIC 2007, p. 177. IEEE (2007)
Google Scholar
Basha, R., Ameen, J.: Unusual sub-sequence identifications in time series with periodicity. Int. J. Innovative Comput. Inf. Control 3(2), 471–480 (2007)
Google Scholar
Bu, Y., Leung, O.T.W., Fu, A.W.C., Keogh, E.J., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM, pp. 449–454. SIAM (2007)
Google Scholar
Buu, H.T.Q., Anh, D.T.: Time series discord discovery based on isax symbolic representation. In: 2011 Third International Conference on Knowledge and Systems Engineering (KSE), pp. 11–18. IEEE (2011)
Google Scholar
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67, December 2010
Google Scholar
Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–498. ACM (2003)
Google Scholar
Fu, A.W., Leung, O.T.-W., Keogh, E.J., Lin, J.: Finding time series discords based on haar transform. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 31–41. Springer, Heidelberg (2006)
Chapter Google Scholar
Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Article Google Scholar
Huang, T., Zhu, Y., Wu, Y., Bressan, S., Dobbie, G.: Anomaly detection and identification scheme for VM live migration in cloud infrastructure. Future Gener. Comput. Syst. 56, 736–745 (2016)
Article Google Scholar
Jones, M., Nikovski, D., Imamura, M., Hirata, T.: Anomaly detection in real-valued multidimensional time series. In: International Conference on Bigdata/Socialcom/Cybersecurity. Stanford University, ASE (2014). ASE@360 Open Scientific Digital Library. http://www.ase360.org/bitstream/handle/123456789/56/submission34.pdf?sequence=1&isAllowed=y
Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, p. 8. IEEE (2005)
Google Scholar
Li, G., Bräysy, O., Jiang, L., Wu, Z., Wang, Y.: Finding time series discord based on bit representation clustering. Knowl.-Based Syst. 54, 243–254 (2013)
Article Google Scholar
Lin, J., Keogh, E., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: 18th IEEE Symposium on Computer-Based Medical Systems, 2005. Proceedings, pp. 329–334. IEEE (2005)
Google Scholar
Luo, W., Gallagher, M.: Faster and parameter-free discord search in quasi-periodic time series. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 135–148. Springer, Heidelberg (2011)
Chapter Google Scholar
Luo, W., Gallagher, M., Wiles, J.: Parameter-free search of time-series discord. J. Comput. Sci. Technol. 28(2), 300–310 (2013)
Article MATH Google Scholar
Miller, C., Nagy, Z., Schlueter, A.: Automated daily pattern filtering of measured building performance data. Autom. Constr. 49, 1–17 (2015)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Spark, A.: Apache spark–lightning-fast cluster computing (2014)
Google Scholar
Wei, L., Keogh, E.J., Xi, X.: Saxually explicit images: finding unusual shapes. In: ICDM, vol. 6, pp. 711–720 (2006)
Google Scholar
Yankov, D., Keogh, E., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst. 17(2), 241–262 (2008)
Article Google Scholar

Download references

Acknowledgments

This paper is sponsored by National Natural Science Foundation of China (No. 61373032), the National Research Foundation Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) program and the National High Technology and Research Development Program of China (863 Program, 2015AA050204).

Author information

Authors and Affiliations

School of Microelectronics, Shanghai Jiao Tong University, Shanghai, China
Tian Huang, Yongxin Zhu, Yishu Mao, Xinyang Li, Mengyun Liu & Yafei Wu
Institute for Infocomm Research, A*STAR, Singapore, Singapore
Yajun Ha
Department of Computer Science, University of Auckland, Auckland, New Zealand
Gillian Dobbie

Authors

Tian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yongxin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yishu Mao
View author publications
You can also search for this author in PubMed Google Scholar
Xinyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Mengyun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Ha
View author publications
You can also search for this author in PubMed Google Scholar
Gillian Dobbie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongxin Zhu .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, T. et al. (2016). Parallel Discord Discovery. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-31750-2_19
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics