Abstract
Discovering frequently occurring patterns (or motifs) in time series has many real-life applications in financial data, streaming media data, meteorological data, and sensor data. It is challenging to provide efficient motif discovery algorithms when the time series is big. Existing motif discovery algorithms trying to improve the performance can be classified into two categories: (i) reducing the computation cost but keeping the original time series dimensions; and (ii) applying feature representation models to reduce the dimensions. However, both of them have limitations when scaling to big time series. The performance of the first category algorithms heavily rely on the size of the dimension of the original time series, which performs bad when the time series is big. The second category algorithms cannot guarantee the original similarity properties, which means originally similar patterns may be identified as dissimilar. To address the limitations, we provide an efficient motif discovery algorithm, called FastM, which can reduce dimensions and maintain the similarity properties. FastM extends the deep neural network stacked AutoEncoder by introducing new central loss functions based on labels assigned by clustering algorithms. Comprehensive experimental results on three real-life datasets demonstrate both the high efficiency and accuracy of FastM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9(2), 225–242 (2000)
Castro, N., Azevedo, P.J.: Multiresolution motif discovery in time series. In: SDM, pp. 665–676 (2010)
Chu, W., Cai, D.: Stacked similarity-aware autoencoders. In: IJCAI, pp. 1561–1567 (2017)
Gao, Y., Lin, J.: Efficient discovery of variable-length time series motifs with large length range in million scale time series. CoRR abs/1802.04883 (2018)
Gao, Y., Lin, J., Rangwala, H.: Iterative grammar-based framework for discovering variable-length time series motifs. In: ICMLA, pp. 7–12 (2016)
Lam, H.T., Calders, T., Pham, N.: Online discovery of top-k similar motifs in time series data. In: ICDM, pp. 1004–1015 (2010)
Li, Y., Lin, J., Oates, T.: Visualizing variable-length time series motifs. In: ICDM, pp. 895–906 (2012)
Li, Y., U, L.H., Yiu, M.L., Gong, Z.: Quick-motif: an efficient and scalable framework for exact motif discovery. In: ICDE, pp. 579–590 (2015)
Lin, J., Keogh, E., Li, W., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15, 107–144 (2007). https://doi.org/10.1007/s10618-007-0064-z
Lin, J., Keogh, E., Lonardi, S., Patel, P.: Finding motifs in time series. In: Proceedings of 2nd Workshop on Temporal Data Mining at KDD, pp. 53–68 (2002)
Lin, J., Li, Y.: Finding approximate frequent patterns in streaming medical data. In: CBMS, pp. 13–18 (2010)
Lu, J., Lin, C., Wang, W., Li, C., Wang, H.: String similarity measures and joins with synonyms. In: SIGMOD, pp. 373–384 (2013)
Mueen, A.: Enumeration of time series motifs of all lengths. In: ICDM, pp. 547–556 (2013)
Mueen, A., Keogh, E.J., Zhu, Q., Cash, S., Westover, M.B.: Exact discovery of time series motifs. In: SDM, pp. 473–484 (2009)
Narang, A., Bhattacherjee, S.: Real-time approximate range motif discovery & data redundancy removal algorithm. In: EDBT, pp. 485–496 (2011)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7, 67–82 (1997)
Nunthanid, P., Niennattrakul, V., Ratanamahatana, C.A.: Discovery of variable length time series motif. In: EEE, pp. 472–475 (2011)
Patel, P., Keogh, E.J., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: ICDM, pp. 370–377 (2002)
Rong, C., Lin, C., Silva, Y.N., Wang, J., Lu, W., Du, X.: Fast and scalable distributed set similarity joins for big data analytics. In: ICDE, pp. 1059–1070 (2017)
Senin, P., et al.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 468–472. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_37
Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multi-dimensional data based on MDL principle. Mach. Learn. 58(2–3), 269–300 (2005). https://doi.org/10.1007/s10994-005-5829-2
Tang, H., Liao, S.S.: Discovering original motifs with different lengths from time series. Knowl. Based Syst. 21, 666–671 (2008)
Wen, Y., Zhang, K., Li, Z., Qiao, Yu.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31
Yeh, C.C.M., Yan, Z., Ulanova, L., Begum, N., Keogh, E.: Matrix profile i: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, pp. 1317–1322 (2016)
Yeh, C.-C.M., et al.: Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Min. Knowl. Disc. 32(1), 83–123 (2017). https://doi.org/10.1007/s10618-017-0519-9
Zhu, Y., Zimmerman, Z., Senobari, N.S., Yeh, C.M.: Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: ICDM, pp. 739–748 (2016)
Acknowledgment
This work was supported by the project of Natural Science Foundation of China (No.61402329) and the Natural Science Foundation of Tianjin(No.19JCYBJC15400, No.18JCYBJC15300).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rong, C., Chen, Z., Lin, C., Wang, J. (2020). Motif Discovery Using Similarity-Constraints Deep Neural Networks. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)