Skip to main content

Normalization in Motif Discovery

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2022)

Abstract

Motif discovery can be used as a subroutine in many time series data mining tasks such as classification, clustering and anomaly detection. A motif represents two or more highly similar subsequences of a time series. The vast majority of the motif discovery methods implicitly assume that subsequences need to be normalized before determining their similarity. While normalization is widely adopted, it may affect the discovery of motifs. We examine the effect of normalization on motif discovery using 96 real-world time series. To determine if the discovered motifs are meaningful, all time series are assigned labels that indicate the states of the system generating the time series. Our experiments show that in over half of the considered cases, normalization affects motif discovery negatively by returning motifs that are not meaningful. We therefore conclude that the assumption underlying normalization does not always hold for real-world time series and thus should not be uncritically adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    These classes are often referred to as pair motifs and set motifs respectively [6, 10].

  2. 2.

    In the practical implementation of motif discovery, an exclusion zone of length m/2 before and after the location of the subsequence of interest is commonly set [20]. This ensures that so-called trivial matches are avoided.

  3. 3.

    The time series are downloaded from the website of the authors, where the names of some time series differ from the original naming convention.

  4. 4.

    Due to the nature of the ECG and SLC time series, we concatenated the separate sequences into a single time series. Due to the large size of the SLC time series, we took a subset including two different states occurring at least twice.

  5. 5.

    The selected time series, code and results are available at Github.

  6. 6.

    It can be the case that several subsequences have minimum distance to the subsequence of interest. In this case only one subsequence is randomly chosen to be directly compared.

  7. 7.

    Due to the demanding running time of the calculations, the maximum length of the time series is set to \(n=20,000\). We only consider time series without missing values and for which hold that the (sub)set consists of two or more states.

References

  1. Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Autom. Sin. 6(6), 1293–1305 (2019). https://doi.org/10.1109/jas.2019.1911747

    Article  Google Scholar 

  2. Dau, H.A., Keogh, E.: Matrix profile V: a generic technique to incorporate domain knowledge into motif discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 125–134 (2017)

    Google Scholar 

  3. Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12 (2012)

    Article  MATH  Google Scholar 

  4. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33(4), 917–963 (2019). https://doi.org/10.1007/s10618-019-00619-1

    Article  MathSciNet  MATH  Google Scholar 

  5. Gao, Y., Lin, J.: HIME: discovering variable-length motifs in large-scale time series. Knowl. Inf. Syst. 61(1), 513–542 (2018). https://doi.org/10.1007/s10115-018-1279-6

    Article  MathSciNet  Google Scholar 

  6. Gao, Y., Lin, J., Rangwala, H.: Iterative grammar-based framework for discovering variable-length time series motifs. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 111–116. IEEE (2017)

    Google Scholar 

  7. Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Disc. 7(4), 349–371 (2003)

    Article  MathSciNet  Google Scholar 

  8. van Leeuwen, F., Bosma, B., van den Born, A., Postma, E.: RTL: a robust time series labeling algorithm. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds.) IDA 2021. LNCS, vol. 12695, pp. 414–425. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74251-5_33

    Chapter  Google Scholar 

  9. Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11 (2003)

    Google Scholar 

  10. Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.: Matrix profile X: VALMOD-scalable discovery of variable-length motifs in data series. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1053–1066 (2018)

    Google Scholar 

  11. Madrid, F., Singh, S., Chesnais, Q., Mauck, K., Keogh, E.: Matrix profile XVI: efficient and effective labeling of massive time series archives. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 463–472 (2019). https://doi.org/10.1109/DSAA.2019.00061

  12. Mohammad, Y., Nishida, T.: Exact discovery of length-range motifs. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8398, pp. 23–32. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_3

    Chapter  Google Scholar 

  13. Mueen, A., Chavoshi, N.: Enumeration of time series motifs of all lengths. Knowl. Inf. Syst. 45(1), 105–132 (2015)

    Article  Google Scholar 

  14. Mueen, A., Keogh, E.: Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1089–1098 (2010)

    Google Scholar 

  15. Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 473–484. SIAM (2009)

    Google Scholar 

  16. Patel, P., Keogh, E., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 370–377. IEEE (2002)

    Google Scholar 

  17. Senin, P., et al.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 468–472. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_37

    Chapter  Google Scholar 

  18. Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.I.: TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min. Knowl. Disc. 34(3), 742–775 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  19. Wang, X., et al.: RPM: representative pattern mining for efficient time series classification. In: EDBT, pp. 185–196 (2016)

    Google Scholar 

  20. Yeh, C.C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)

    Google Scholar 

  21. Yin, M.S., Tangsripairoj, S., Pupacdi, B.: Variable length motif-based time series classification. In: Boonkrong, S., Unger, H., Meesad, P. (eds.) Recent Advances in Information and Communication Technology. AISC, vol. 265, pp. 73–82. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06538-0_8

    Chapter  Google Scholar 

  22. Yingchareonthawornchai, S., Sivaraks, H., Rakthanmanon, T., Ratanamahatana, C.A.: Efficient proper length time series motif discovery. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1265–1270. IEEE (2013)

    Google Scholar 

  23. Zhu, Y., Yeh, C.C.M., Zimmerman, Z., Kamgar, K., Keogh, E.: Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 837–846. IEEE (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederique van Leeuwen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

van Leeuwen, F., Bosma, B., den Born, A.v., Postma, E. (2023). Normalization in Motif Discovery. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25891-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25890-9

  • Online ISBN: 978-3-031-25891-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics