Skip to main content
Log in

Discovering sub-patterns from time series using a normalized cross-match algorithm

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Time series data stream mining has attracted considerable research interest in recent years. Pattern discovery is a challenging problem in time series data stream mining. Because the data update continuously and the sampling rates may be different, dynamic time warping (DTW)-based approaches are used to solve the pattern discovery problem in time series data streams. However, the naive form of the DTW-based approach is computationally expensive. Therefore, Toyoda proposed the CrossMatch (CM) approach to discover the patterns between two time series data streams (sequences), which requires only O(n) time per data update, where n is the length of one sequence. CM, however, does not support normalization, which is required for some kinds of sequences (e.g. stock prices, ECG data). Therefore, we propose a normalized-CrossMatch approach that extends CM to enforce normalization while maintaining the same performance capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: IEEE 23rd international conference on data engineering (ICDE), pp 1046–1055

  2. Gong X, Si Y-W, Fong S, Mohammed S (2014) Nspring: normalization-supported spring for subsequence matching on time series streams. In: IEEE 15th international symposium on computational intelligence and informatics (CINTI), pp 373–378

  3. Toyoda M, Sakurai Y, Ichikawa T (2008) Identifying similar subsequences in data streams. In: Database and expert systems applications, pp 210–224

  4. Toyoda M, Sakurai Y (2010) Discovery of cross-similarity in data streams. In: IEEE 26th international conference on data engineering (ICDE), pp 101–104

  5. Toyoda M, Sakurai Y, Ishikawa Y (2013) Pattern discovery in data streams under the time warping distance. VLDB J 22(3):295–318

    Article  Google Scholar 

  6. Angiulli F, Fassetti F (2007) Detecting distance-based outliers in streams of data. In: Proceedings of the 16th conference on information and knowledge management (CIKM), pp 811–820

  7. Bu Y, Chen L, Fu AW-C, Liu D (2009) Efficient anomaly monitoring over moving object trajectory streams. In: Proceedings of the 15th international conference on knowledge discovery and data mining (SIGKDD), pp 159–168

  8. Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371

    Article  MathSciNet  Google Scholar 

  9. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th international conference on knowledge discovery and data mining (SIGKDD), pp 262–270

  10. Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495–508

    Article  Google Scholar 

  11. Yi B-K, Jagadish H, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th international conference on data engineering (ICDE), pp 201–208

  12. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72

    Article  Google Scholar 

  13. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49

    Article  MATH  Google Scholar 

  14. Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386

    Article  Google Scholar 

  15. Keogh E, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. Int J Very Large Data Bases 18(3):611–630

    Article  Google Scholar 

  16. Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the 9th international conference on knowledge discovery and data mining (SIGKDD), pp 493–498

  17. Mueen A (2013) Enumeration of time series motifs of all lengths. In: IEEE 13th international conference on data mining (ICDM), pp 547–556

  18. Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: SDM, pp 473–484

  19. Ringeval F, Sonderegger A, Sauer J, Lalanne D (2013) Introducing the recola multimodal corpus of remote collaborative and affective interactions. In: 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp 1–8

  20. Agrawal R, Faloutsos C, Swami AN (1993) Efficient similarity search in sequence databases. In: Proceedings of the 4th international conference on foundations of data organization and algorithms (FODO), pp 69–84

  21. Wan Y, Gong X, Si Y-W (2016) Effect of segmentation on financial time series pattern matching. Appl Soft Comput 38:346–359

    Article  Google Scholar 

Download references

Acknowledgments

The authors are thankful for the financial support from the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF)”, Grant No. MYRG2015-00128-FST, offered by the University of Macau, FST, and RDAO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Fong.

Appendix

Appendix

The original equation of \(\sigma _{i,j}\) is shown below:

$$\begin{aligned} \sigma _{i,j}=\sqrt{\frac{1}{j-i+1}\sum _{r=i}^{j} (x_{r}-\mu _{i,j})^{2}} \end{aligned}$$

Then it is derived as follows:

$$\begin{aligned} \sigma ^{2}_{i,j}&=\frac{1}{j-i+1}\sum _{r=i}^{j}(x_{r}-\mu _{i,j})^{2}\\&=\frac{1}{j-i+1}\sum _{r=i}^{j}(x^{2}_{r}-2x_{r}\mu _{i,j}+\mu ^{2}_{i,j})\\&=\frac{1}{j-i+1}\left( \sum _{r=i}^{j}x^{2}_{r}-\sum _{r=i}^{j}2x_{r}\mu _{i,j}+\sum _{r=i}^{j}\mu ^{2}_{i,j}\right) \\&=\frac{1}{j-i+1}\left( \sum _{r=i}^{j}x^{2}_{r}-2\mu _{i,j}\sum _{r=i}^{j}x_{r}+\left( j-i+1\right) \mu ^{2}_{i,j}\right) \\&=\frac{1}{j-i+1}\sum _{r=i}^{j}x^{2}_{r}-2\mu _{i,j}\frac{1}{j-i+1}\sum _{r=i}^{j}x_{r}+\mu ^{2}_{i,j} \end{aligned}$$

From Eq. (5), we know \(\frac{1}{j-i+1}\sum _{r=i}^{j}x_{r}=\mu _{i,j}\). Then we have:

$$\begin{aligned} \sigma ^{2}_{i,j}&=\frac{1}{j-i+1}\sum _{r=i}^{j}x^{2}_{r}-2\mu ^{2}_{i,j}+\mu ^{2}_{i,j}\\&=\frac{1}{j-i+1}\sum _{r=i}^{j}x^{2}_{r}-\mu ^{2}_{i,j} \end{aligned}$$

Finally, we get:

$$\begin{aligned} \sigma _{i,j}=\sqrt{\frac{1}{j-i+1}\sum _{r=i}^{j}x^{2}_{r}-\mu ^{2}_{i,j}} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, X., Fong, S., Wong, R.K. et al. Discovering sub-patterns from time series using a normalized cross-match algorithm. J Supercomput 72, 3850–3867 (2016). https://doi.org/10.1007/s11227-016-1632-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1632-z

Keywords

Navigation