Skip to main content

Parallel Discord Discovery

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Abstract

Discords are the most unusual subsequences of a time series. Sequential discovery of discords is time consuming. As the scale of datasets increases unceasingly, datasets have to be kept on hard disk, which degrades the utilization of computing resources. Furthermore, the results discovered from segmentations of a time series are non-combinable, which makes discord discovery hard to parallelize. In this paper, we propose Parallel Discord Discovery (PDD), which divides the discord discovery problem in a combinable manner and solves its sub-problems in parallel. PDD accelerates discord discovery with multiple computing nodes and guarantees the correctness of the results. PDD stores large time series in distributed memory and takes advantage of in-memory computing to improve the utilization of computing resources. Experiments show that given 10 computing nodes, PDD is seven times faster than the sequential method HOTSAX. PDD is able to handle larger datasets than HOTSAX does. PDD achieves over 90 % utilization of computing resources, nearly twice as much as the disk-aware method does.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ameen, J., Basha, R.: Higherrarchical data mining for unusual sub-sequence identifications in time series processes. In: Second International Conference on Innovative Computing, Information and Control, 2007. ICICIC 2007, p. 177. IEEE (2007)

    Google Scholar 

  2. Basha, R., Ameen, J.: Unusual sub-sequence identifications in time series with periodicity. Int. J. Innovative Comput. Inf. Control 3(2), 471–480 (2007)

    Google Scholar 

  3. Bu, Y., Leung, O.T.W., Fu, A.W.C., Keogh, E.J., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM, pp. 449–454. SIAM (2007)

    Google Scholar 

  4. Buu, H.T.Q., Anh, D.T.: Time series discord discovery based on isax symbolic representation. In: 2011 Third International Conference on Knowledge and Systems Engineering (KSE), pp. 11–18. IEEE (2011)

    Google Scholar 

  5. Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67, December 2010

    Google Scholar 

  6. Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–498. ACM (2003)

    Google Scholar 

  7. Fu, A.W., Leung, O.T.-W., Keogh, E.J., Lin, J.: Finding time series discords based on haar transform. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 31–41. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)

    Article  Google Scholar 

  9. Huang, T., Zhu, Y., Wu, Y., Bressan, S., Dobbie, G.: Anomaly detection and identification scheme for VM live migration in cloud infrastructure. Future Gener. Comput. Syst. 56, 736–745 (2016)

    Article  Google Scholar 

  10. Jones, M., Nikovski, D., Imamura, M., Hirata, T.: Anomaly detection in real-valued multidimensional time series. In: International Conference on Bigdata/Socialcom/Cybersecurity. Stanford University, ASE (2014). ASE@360 Open Scientific Digital Library. http://www.ase360.org/bitstream/handle/123456789/56/submission34.pdf?sequence=1&isAllowed=y

  11. Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, p. 8. IEEE (2005)

    Google Scholar 

  12. Li, G., Bräysy, O., Jiang, L., Wu, Z., Wang, Y.: Finding time series discord based on bit representation clustering. Knowl.-Based Syst. 54, 243–254 (2013)

    Article  Google Scholar 

  13. Lin, J., Keogh, E., Fu, A., Van Herle, H.: Approximations to magic: finding unusual medical time series. In: 18th IEEE Symposium on Computer-Based Medical Systems, 2005. Proceedings, pp. 329–334. IEEE (2005)

    Google Scholar 

  14. Luo, W., Gallagher, M.: Faster and parameter-free discord search in quasi-periodic time series. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 135–148. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. Luo, W., Gallagher, M., Wiles, J.: Parameter-free search of time-series discord. J. Comput. Sci. Technol. 28(2), 300–310 (2013)

    Article  MATH  Google Scholar 

  16. Miller, C., Nagy, Z., Schlueter, A.: Automated daily pattern filtering of measured building performance data. Autom. Constr. 49, 1–17 (2015)

    Article  Google Scholar 

  17. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  18. Spark, A.: Apache spark–lightning-fast cluster computing (2014)

    Google Scholar 

  19. Wei, L., Keogh, E.J., Xi, X.: Saxually explicit images: finding unusual shapes. In: ICDM, vol. 6, pp. 711–720 (2006)

    Google Scholar 

  20. Yankov, D., Keogh, E., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst. 17(2), 241–262 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This paper is sponsored by National Natural Science Foundation of China (No. 61373032), the National Research Foundation Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) program and the National High Technology and Research Development Program of China (863 Program, 2015AA050204).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongxin Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Huang, T. et al. (2016). Parallel Discord Discovery. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics