Skip to main content

Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13948))

Included in the following conference series:

  • 1037 Accesses

Abstract

The mining of time series data plays an important role in modern information retrieval and monitoring infrastructures. In particular, the identification of similarities within and across large time series is of great importance in analytics and knowledge discovery. For this task, the matrix profile similarity indexing approach, which encodes the correlations among snapshots of a time series, is well-established. However, it is computationally expensive, especially for long time series, as existing exact approaches mostly rely on exhaustive, exact query (search) operations and are inefficient. Similarly, existing approximate approaches are limited with respect to parallelism, scalability, or their extent of practicality. We, therefore, focus on an approximate parallel tree-based nearest-neighbors approach and address the weak scaling challenges raised when applied to large time series in HPC settings.

We build on the existing concept of parallel iterative tree-based nearest neighbor solvers and introduce a novel approach for the approximate calculation of the matrix profile. To improve the performance and overcome weak scalability challenges, we exploit a mix of creating a forest of parallel trees on exclusive ensembles of resources combined with pipelining of iterations. We provide an implementation targeting large-scale CPU-based HPC systems and illustrate the performance of this new approach with experimental data. Finally, we demonstrate the mining of time series at billion-records-scale datasets on the SuperMUC-NG system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The implementation can be provided upon request.

  2. 2.

    Leibniz Supercomputing Centre.

  3. 3.

    Other metrics are also used in the literature, and some might be subjective to particular practical use cases. An investigation of such alternative metrics is beyond the scope of the present work.

References

  1. Arya, S., et al.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  2. Barnes, J., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986)

    Article  Google Scholar 

  3. Cools, S., et al.: Improving strong scaling of the conjugate gradient method for solving large linear systems using global reduction pipelining. ArXiv abs/1905.06850 (2019)

    Google Scholar 

  4. Curtin, R.R.: Faster dual-tree traversal for nearest neighbor search. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 77–89. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_7

    Chapter  Google Scholar 

  5. Dau, H.A., Keogh, E.: Matrix profile V: a generic technique to incorporate domain knowledge into motif discovery. In: 23rd ACM SIGKDD, pp. 125–134 (2017)

    Google Scholar 

  6. Eamonn Keogh: Electrocardiography Dataset. https://www.cs.ucr.edu/~eamonn/ECG_one_day.zip. Accessed 15 Aug 2022

  7. Gharghabi, S., et al.: Domain agnostic online semantic segmentation for multi-dimensional time series. In: Data Mining and Knowledge Discovery (2018)

    Google Scholar 

  8. Heldens, S., et al.: Rocket: efficient and scalable all-pairs computations on heterogeneous platforms. In: Proceedings of SC 2020. IEEE Press (2020)

    Google Scholar 

  9. Jirkovský, V., et al.: Big data analysis for sensor time-series in automation. In: IEEE Emerging Technology and Factory Automation (ETFA), pp. 1–8 (2014)

    Google Scholar 

  10. Jones, P.W., et al.: Randomized approximate nearest neighbors algorithm. Proc. Natl. Acad. Sci. 108(38), 15679–15686 (2011)

    Article  Google Scholar 

  11. Ju, Y., et al.: Exploiting reduced precision for GPU-based Time series mining. In: IEEE IPDPS, pp. 124–134 (2022)

    Google Scholar 

  12. Karlstetter, R., et al.: Turning dynamic sensor measurements from gas turbines into insights: a big data approach. In: Turbo Expo, vol. 6 (2019)

    Google Scholar 

  13. Karlstetter, R., et al.: Living on the edge: efficient handling of large scale sensor data. In: 2021 IEEE/ACM CCGrid 2021, pp. 1–10 (2021)

    Google Scholar 

  14. Linardi, M., et al.: Matrix profile X: VALMOD - scalable discovery of variable-length motifs in data series. In: ACM SIGMOD, p. 1053–1066 (2018)

    Google Scholar 

  15. Lu, Y., et al.: Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In: ACM SIGKDD (2022)

    Google Scholar 

  16. Mercer, R., et al.: Matrix profile XXIII: contrast profile: a novel time series primitive that allows real world classification. In: IEEE ICDM (2021)

    Google Scholar 

  17. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  18. Netti, A.: HPC-ODA dataset collection (2020). https://doi.org/10.5281/zenodo.3701440

  19. Patwary, M.M.A., et al.: PANDA: extreme scale parallel k-nearest neighbor on distributed architectures. CoRR abs/1607.08220 (2016)

    Google Scholar 

  20. Pfeilschifter, G.: time series analysis with matrix profile on HPC systems. Master thesis, Technische Universität München (2019)

    Google Scholar 

  21. Raksha, S., et al.: Weather forecasting framework for time series data using intelligent learning models. In: 5th ICEECCOT 2021, pp. 783–787 (2021)

    Google Scholar 

  22. Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under dynamic time warping. In: ACM SIGKDD, pp. 262–270 (2012)

    Google Scholar 

  23. Ram, P., Sinha, K.: Revisiting KD-tree for nearest neighbor search. In: KDD 2019, pp. 1378–1388. Association for Computing Machinery, New York (2019)

    Google Scholar 

  24. Raoofy, A., Karlstetter, R., Yang, D., Trinitis, C., Schulz, M.: Time series mining at petascale performance. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 104–123. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_6

    Chapter  Google Scholar 

  25. Rokhlin, V.: Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60(2), 187–207 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  26. Schall-Zimmerman, Z., et al.: Matrix profile XVIII: time series mining in the face of fast moving streams using a learned approximate matrix profile. In: IEEE ICDM, pp. 936–945 (2019)

    Google Scholar 

  27. Schmidl, S., et al.: Anomaly detection in time series: a comprehensive evaluation. Proc. VLDB Endow. 15(9), 1779–1797 (2022)

    Article  Google Scholar 

  28. Shakibay Senobari, et al.: Using the similarity matrix profile to investigate foreshock behavior of the 2004 parkfield earthquake. In: AGU Fall Meeting Abstracts, vol. 2018, pp. S51B–03 (2018)

    Google Scholar 

  29. Steinbusch, B., et al.: A massively parallel barnes-hut tree code with dual tree traversal. In: PARCO (2015)

    Google Scholar 

  30. Thill, M., et al.: MarkusThill/MGAB: The Mackey-glass anomaly benchmark (2020). https://doi.org/10.5281/zenodo.3760086

  31. Van Der Maaten, L.: Accelerating T-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)

    MathSciNet  MATH  Google Scholar 

  32. Xiao, B., Biros, G.: Parallel algorithms for nearest neighbor search problems in high dimensions. SIAM J. Sci. Comput. 38(5), S667–S699 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  33. Yeh, C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: IEEE ICDM, pp. 1317–1322 (2016)

    Google Scholar 

  34. Yeh, C.M., et al.: Matrix profile III: the matrix profile allows visualization of salient subsequences in massive time series. In: IEEE ICDM, pp. 579–588 (2016)

    Google Scholar 

  35. Yu, C.D., et al.: Performance optimization for the K-nearest neighbors kernel on X86 architectures. In: ACM SC (2015)

    Google Scholar 

  36. Zheng, X., et al.: PSML: a multi-scale time-series dataset for machine learning in decarbonized energy grids (dataset) (2021). https://doi.org/10.5281/zenodo.5130612

  37. Zhu, Y., et al.: Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: IEEE ICDM, pp. 837–846 (2018)

    Google Scholar 

  38. Zhu, Y., et al.: Matrix profile VII: time series chains: a new primitive for time series data mining. In: 2017 IEEE ICDM 2017, pp. 695–704 (2017)

    Google Scholar 

  39. Zhu, Y., et al.: Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. Knowl. Inf. Syst. 54(1) (2018)

    Google Scholar 

  40. Zhu, Y., et al.: The swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code. In: KDD 2020, vol. 34, pp. 949–979 (2020)

    Google Scholar 

  41. Zimmerman, Z., et al.: Matrix profile XIV: scaling time series motif discovery with GPUs to break a quintillion pairwise comparisons a day and beyond. In: ACM SoCC, pp. 74–86 (2019)

    Google Scholar 

Download references

Acknowledgements

This work is partially funded by Bayerische Forschungsstiftung under the research grants Optimierung von Gasturbinen mit Hilfe von Big Data (AZ-1214-16), and Von der Edge zur Cloud und zurück: Skalierbare und Adaptive Sensordatenverarbeitung (AZ-1468-20). The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this work by providing computing time on GCS Supercomputer SuperMUC-NG at at Leibniz Supercomputing Centre (www.lrz.de).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Raoofy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Raoofy, A., Karlstetter, R., Schreiber, M., Trinitis, C., Schulz, M. (2023). Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-32041-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32040-8

  • Online ISBN: 978-3-031-32041-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics