Normalization in Motif Discovery

van Leeuwen, Frederique; Bosma, Bas; den Born, Arjan van; Postma, Eric

doi:10.1007/978-3-031-25891-6_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13811))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

838 Accesses

Abstract

Motif discovery can be used as a subroutine in many time series data mining tasks such as classification, clustering and anomaly detection. A motif represents two or more highly similar subsequences of a time series. The vast majority of the motif discovery methods implicitly assume that subsequences need to be normalized before determining their similarity. While normalization is widely adopted, it may affect the discovery of motifs. We examine the effect of normalization on motif discovery using 96 real-world time series. To determine if the discovered motifs are meaningful, all time series are assigned labels that indicate the states of the system generating the time series. Our experiments show that in over half of the considered cases, normalization affects motif discovery negatively by returning motifs that are not meaningful. We therefore conclude that the assumption underlying normalization does not always hold for real-world time series and thus should not be uncritically adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Method for Discovering Motifs in Large Time Series

Exact Discovery of Length-Range Motifs

Exploring variable-length time series motifs in one hundred million length scale

Article 10 May 2018

Notes

1.
These classes are often referred to as pair motifs and set motifs respectively [6, 10].
2.
In the practical implementation of motif discovery, an exclusion zone of length m/2 before and after the location of the subsequence of interest is commonly set [20]. This ensures that so-called trivial matches are avoided.
3.
The time series are downloaded from the website of the authors, where the names of some time series differ from the original naming convention.
4.
Due to the nature of the ECG and SLC time series, we concatenated the separate sequences into a single time series. Due to the large size of the SLC time series, we took a subset including two different states occurring at least twice.
5.
The selected time series, code and results are available at Github.
6.
It can be the case that several subsequences have minimum distance to the subsequence of interest. In this case only one subsequence is randomly chosen to be directly compared.
7.
Due to the demanding running time of the calculations, the maximum length of the time series is set to \(n=20,000\). We only consider time series without missing values and for which hold that the (sub)set consists of two or more states.

References

Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Autom. Sin. 6(6), 1293–1305 (2019). https://doi.org/10.1109/jas.2019.1911747
Article Google Scholar
Dau, H.A., Keogh, E.: Matrix profile V: a generic technique to incorporate domain knowledge into motif discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 125–134 (2017)
Google Scholar
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12 (2012)
Article MATH Google Scholar
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33(4), 917–963 (2019). https://doi.org/10.1007/s10618-019-00619-1
Article MathSciNet MATH Google Scholar
Gao, Y., Lin, J.: HIME: discovering variable-length motifs in large-scale time series. Knowl. Inf. Syst. 61(1), 513–542 (2018). https://doi.org/10.1007/s10115-018-1279-6
Article MathSciNet Google Scholar
Gao, Y., Lin, J., Rangwala, H.: Iterative grammar-based framework for discovering variable-length time series motifs. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 111–116. IEEE (2017)
Google Scholar
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Disc. 7(4), 349–371 (2003)
Article MathSciNet Google Scholar
van Leeuwen, F., Bosma, B., van den Born, A., Postma, E.: RTL: a robust time series labeling algorithm. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds.) IDA 2021. LNCS, vol. 12695, pp. 414–425. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74251-5_33
Chapter Google Scholar
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11 (2003)
Google Scholar
Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.: Matrix profile X: VALMOD-scalable discovery of variable-length motifs in data series. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1053–1066 (2018)
Google Scholar
Madrid, F., Singh, S., Chesnais, Q., Mauck, K., Keogh, E.: Matrix profile XVI: efficient and effective labeling of massive time series archives. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 463–472 (2019). https://doi.org/10.1109/DSAA.2019.00061
Mohammad, Y., Nishida, T.: Exact discovery of length-range motifs. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8398, pp. 23–32. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_3
Chapter Google Scholar
Mueen, A., Chavoshi, N.: Enumeration of time series motifs of all lengths. Knowl. Inf. Syst. 45(1), 105–132 (2015)
Article Google Scholar
Mueen, A., Keogh, E.: Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1089–1098 (2010)
Google Scholar
Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 473–484. SIAM (2009)
Google Scholar
Patel, P., Keogh, E., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 370–377. IEEE (2002)
Google Scholar
Senin, P., et al.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 468–472. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_37
Chapter Google Scholar
Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.I.: TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min. Knowl. Disc. 34(3), 742–775 (2020)
Article MathSciNet MATH Google Scholar
Wang, X., et al.: RPM: representative pattern mining for efficient time series classification. In: EDBT, pp. 185–196 (2016)
Google Scholar
Yeh, C.C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)
Google Scholar
Yin, M.S., Tangsripairoj, S., Pupacdi, B.: Variable length motif-based time series classification. In: Boonkrong, S., Unger, H., Meesad, P. (eds.) Recent Advances in Information and Communication Technology. AISC, vol. 265, pp. 73–82. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06538-0_8
Chapter Google Scholar
Yingchareonthawornchai, S., Sivaraks, H., Rakthanmanon, T., Ratanamahatana, C.A.: Efficient proper length time series motif discovery. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1265–1270. IEEE (2013)
Google Scholar
Zhu, Y., Yeh, C.C.M., Zimmerman, Z., Kamgar, K., Keogh, E.: Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 837–846. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Jheronimus Academy of Data Science, ’s-Hertogenbosch, The Netherlands
Frederique van Leeuwen, Arjan van den Born & Eric Postma
Tilburg University, Tilburg, The Netherlands
Frederique van Leeuwen, Arjan van den Born & Eric Postma
VU Amsterdam, Amsterdam, The Netherlands
Bas Bosma

Authors

Frederique van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar
Bas Bosma
View author publications
You can also search for this author in PubMed Google Scholar
Arjan van den Born
View author publications
You can also search for this author in PubMed Google Scholar
Eric Postma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frederique van Leeuwen .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Gabriele La Malfa
University of Florida, Gainesville, FL, USA
Panos Pardalos
Free University of Bozen-Bolzano, Bolzano, Italy
Giuseppe Di Fatta
University of Catania, Catania, Italy
Giovanni Giuffrida
Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Leeuwen, F., Bosma, B., den Born, A.v., Postma, E. (2023). Normalization in Motif Discovery. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-25891-6_24
Published: 10 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Normalization in Motif Discovery

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Method for Discovering Motifs in Large Time Series

Exact Discovery of Length-Range Motifs

Exploring variable-length time series motifs in one hundred million length scale

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Normalization in Motif Discovery

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Method for Discovering Motifs in Large Time Series

Exact Discovery of Length-Range Motifs

Exploring variable-length time series motifs in one hundred million length scale

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation