Skip to main content
Log in

RLC: ranking lag correlations with flexible sliding windows in data streams

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Lag correlation between two time series is the correlation shifted in time relative to one another. Existing work focuses on two computation models, landmark (where the lag correlation is computed over the entire stream) and sliding window (where the lag correlation is computed over the current window). However, these models may suffer from problems like result freshness (e.g., perished items in the landmark model) and parametric tuning (e.g., setting a proper length in the sliding window model). In this work, we attempt to analyze the lag correlation which is computed based on flexible sliding windows. In view of that, a new query called RLC (ranking lag correlations with flexible sliding windows in data streams) is proposed. The key challenge in answering the RLC query is that the number of windows to be analyzed will grow quadratically with the length of the stream, resulting a quadratic computation cost. To boost the computation, we employ the running sum and the geometric probing techniques to facilitate the query processing. We also present an approximate solution that further reduces the computation cost with an acceptable error rate in practice. The extensive experiments verify the efficiency of our proposed methods. We also demonstrate some lag correlations discovered from real datasets to show the practicality of this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Regarding the extreme cases, one case is that there is no good match throughout the entire sequence. In our query, there is no minimum correlation constraint; we still return top-k subsequences with the highest lag correlations (even though all correlations are very low). The other case is that all object sequences are entirely identical to the query sequences. In our problem definition, we return the longer subsequence when multiple correlations are identical. Hence, in this case, we will return the entire sequence.

  2. Monte Carlo simulated stock price generator. http://25yearsofprogramming.com/blog/20070412c-montecarlostockprices.html.

  3. http://www.pmel.noaa.gov/tao/.

  4. http://finance.yahoo.com.

References

  1. Athitsos V, Papapetrou P, Potamias M, Kollios G, Gunopulos D (2008) Approximate embedding-based subsequence matching of time series. In: SIGMOD, pp 365–378

  2. Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  3. Cai Y, Tong H, Fan W, Ji P (2015a) Fast mining of a network of coevolving time series. In: SDM

  4. Cai Y, Tong H, Fan W, Ji P, He Q (2015b) Facets: fast comprehensive mining of coevolving high-order time series. In: SIGKDD, pp 79–88

  5. Cao J, Zhou Y, Wu M (2015) Adaptive grid-based k-median clustering of streaming data with accuracy guarantee. In: DASFAA, pp 75–91

  6. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: SIGMOD, pp 419–429

  7. Gong X, Xiong Y, Huang W, Chen L, Lu Q, Hu Y (2015) Fast similarity search of multi-dimensional time series via segment rotation. In: DASFAA, pp 108–124

  8. Kahveci T, Singh AK (2004) Optimizing similarity search for arbitrary length time series queries. IEEE Trans Knowl Data Eng 16:418–433

    Article  Google Scholar 

  9. Koper KD, Wallace TC, Taylor SR, Hartse HE (2001) Forensic seismology and the sinking of the kursk. EOS Trans Am Geophys Union 82:37–46

    Article  Google Scholar 

  10. Kusmierczyk T, Nørvåg K (2015) Mining correlations on massive bursty time series collections. In: DASFAA, pp 55–71

  11. Lee ML, Hsu W, Li L, Tok WH (2009) Consistent top-k queries over time. In: DASFAA, pp 51–65

  12. Li Y, Yiu ML, Gong Z (2013) Discovering longest-lasting correlation in sequence databases. Proc VLDB Endow 6:1666–1677

    Article  Google Scholar 

  13. Li Y, Leong Hou U, Yiu ML, Gong Z (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: ICDE

  14. Mueen A (2013) Enumeration of time series motifs of all lengths. In: ICDM, pp 547–556

  15. Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: SDM, pp 473–484

  16. Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: SIGKDD, pp 1154–1162

  17. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD, pp 262–270

  18. Sakurai Y, Papadimitriou S, Faloutsos C (2005) Braid: stream mining through group lag correlations. In: SIGKDD, pp 599–610

  19. Traina A, Traina C Jr, Faloutsos C (2001) Similarity search without tears: the omni-family of all-purpose access methods. In: ICDE, pp 623–630

  20. Wu D, Ke Y, Yu JX, Philip SY, Chen L (2010) Detecting leaders from correlated time series. In: DASFAA, pp 352–367

  21. Wu D, Ke Y, Yu JX, Philip SY, Chen L (2011) Leadership discovery when data correlatively evolve. World Wide Web 14:1–25

    Article  Google Scholar 

  22. Xu E, Hsu W, Lee ML, Patel D (2015) k-Consistent influencers in network data. In: DASFAA, pp 452–468

  23. Zhou X, Hong H, Xing X, Huang W, Bian K, Xie K (2015) Mining dependencies considering time lag in spatio-temporal traffic data. Web-age information management. Springer, Berlin

    Google Scholar 

Download references

Acknowledgements

We thank Yasushi Sakurai and Yuhong Li for providing us the data sets. This work was supported by National Program on Key Basic Research Project (973 Program) (2012CB725305), National Science and Technology Supporting plan (2015BAH45F01), the public key plan of Zhejiang Province (2014C23005), the cultural relic protection science and technology project of Zhejiang Province, University of Macau RC (MYRG2014-00106-FST), and NSFC of China (61502548).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaizhong Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, S., Lin, H., Wang, W. et al. RLC: ranking lag correlations with flexible sliding windows in data streams. Pattern Anal Applic 20, 601–611 (2017). https://doi.org/10.1007/s10044-016-0577-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0577-4

Keywords

Navigation