Abstract
Lag correlation between two time series is the correlation shifted in time relative to one another. Existing work focuses on two computation models, landmark (where the lag correlation is computed over the entire stream) and sliding window (where the lag correlation is computed over the current window). However, these models may suffer from problems like result freshness (e.g., perished items in the landmark model) and parametric tuning (e.g., setting a proper length in the sliding window model). In this work, we attempt to analyze the lag correlation which is computed based on flexible sliding windows. In view of that, a new query called RLC (ranking lag correlations with flexible sliding windows in data streams) is proposed. The key challenge in answering the RLC query is that the number of windows to be analyzed will grow quadratically with the length of the stream, resulting a quadratic computation cost. To boost the computation, we employ the running sum and the geometric probing techniques to facilitate the query processing. We also present an approximate solution that further reduces the computation cost with an acceptable error rate in practice. The extensive experiments verify the efficiency of our proposed methods. We also demonstrate some lag correlations discovered from real datasets to show the practicality of this work.
Similar content being viewed by others
Notes
Regarding the extreme cases, one case is that there is no good match throughout the entire sequence. In our query, there is no minimum correlation constraint; we still return top-k subsequences with the highest lag correlations (even though all correlations are very low). The other case is that all object sequences are entirely identical to the query sequences. In our problem definition, we return the longer subsequence when multiple correlations are identical. Hence, in this case, we will return the entire sequence.
Monte Carlo simulated stock price generator. http://25yearsofprogramming.com/blog/20070412c-montecarlostockprices.html.
References
Athitsos V, Papapetrou P, Potamias M, Kollios G, Gunopulos D (2008) Approximate embedding-based subsequence matching of time series. In: SIGMOD, pp 365–378
Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control. Prentice Hall, Upper Saddle River
Cai Y, Tong H, Fan W, Ji P (2015a) Fast mining of a network of coevolving time series. In: SDM
Cai Y, Tong H, Fan W, Ji P, He Q (2015b) Facets: fast comprehensive mining of coevolving high-order time series. In: SIGKDD, pp 79–88
Cao J, Zhou Y, Wu M (2015) Adaptive grid-based k-median clustering of streaming data with accuracy guarantee. In: DASFAA, pp 75–91
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: SIGMOD, pp 419–429
Gong X, Xiong Y, Huang W, Chen L, Lu Q, Hu Y (2015) Fast similarity search of multi-dimensional time series via segment rotation. In: DASFAA, pp 108–124
Kahveci T, Singh AK (2004) Optimizing similarity search for arbitrary length time series queries. IEEE Trans Knowl Data Eng 16:418–433
Koper KD, Wallace TC, Taylor SR, Hartse HE (2001) Forensic seismology and the sinking of the kursk. EOS Trans Am Geophys Union 82:37–46
Kusmierczyk T, Nørvåg K (2015) Mining correlations on massive bursty time series collections. In: DASFAA, pp 55–71
Lee ML, Hsu W, Li L, Tok WH (2009) Consistent top-k queries over time. In: DASFAA, pp 51–65
Li Y, Yiu ML, Gong Z (2013) Discovering longest-lasting correlation in sequence databases. Proc VLDB Endow 6:1666–1677
Li Y, Leong Hou U, Yiu ML, Gong Z (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: ICDE
Mueen A (2013) Enumeration of time series motifs of all lengths. In: ICDM, pp 547–556
Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: SDM, pp 473–484
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: SIGKDD, pp 1154–1162
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD, pp 262–270
Sakurai Y, Papadimitriou S, Faloutsos C (2005) Braid: stream mining through group lag correlations. In: SIGKDD, pp 599–610
Traina A, Traina C Jr, Faloutsos C (2001) Similarity search without tears: the omni-family of all-purpose access methods. In: ICDE, pp 623–630
Wu D, Ke Y, Yu JX, Philip SY, Chen L (2010) Detecting leaders from correlated time series. In: DASFAA, pp 352–367
Wu D, Ke Y, Yu JX, Philip SY, Chen L (2011) Leadership discovery when data correlatively evolve. World Wide Web 14:1–25
Xu E, Hsu W, Lee ML, Patel D (2015) k-Consistent influencers in network data. In: DASFAA, pp 452–468
Zhou X, Hong H, Xing X, Huang W, Bian K, Xie K (2015) Mining dependencies considering time lag in spatio-temporal traffic data. Web-age information management. Springer, Berlin
Acknowledgements
We thank Yasushi Sakurai and Yuhong Li for providing us the data sets. This work was supported by National Program on Key Basic Research Project (973 Program) (2012CB725305), National Science and Technology Supporting plan (2015BAH45F01), the public key plan of Zhejiang Province (2014C23005), the cultural relic protection science and technology project of Zhejiang Province, University of Macau RC (MYRG2014-00106-FST), and NSFC of China (61502548).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, S., Lin, H., Wang, W. et al. RLC: ranking lag correlations with flexible sliding windows in data streams. Pattern Anal Applic 20, 601–611 (2017). https://doi.org/10.1007/s10044-016-0577-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-016-0577-4