RLC: ranking lag correlations with flexible sliding windows in data streams

Wu, Shanshan; Lin, Huaizhong; Wang, Wenxiang; Lu, Dongming; U, Leong Hou; Gao, Yunjun

doi:10.1007/s10044-016-0577-4

RLC: ranking lag correlations with flexible sliding windows in data streams

Industrial and Commercial Application
Published: 01 September 2016

Volume 20, pages 601–611, (2017)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Shanshan Wu¹,
Huaizhong Lin¹,
Wenxiang Wang¹,
Dongming Lu¹,
Leong Hou U² &
…
Yunjun Gao¹

392 Accesses
2 Citations
Explore all metrics

Abstract

Lag correlation between two time series is the correlation shifted in time relative to one another. Existing work focuses on two computation models, landmark (where the lag correlation is computed over the entire stream) and sliding window (where the lag correlation is computed over the current window). However, these models may suffer from problems like result freshness (e.g., perished items in the landmark model) and parametric tuning (e.g., setting a proper length in the sliding window model). In this work, we attempt to analyze the lag correlation which is computed based on flexible sliding windows. In view of that, a new query called RLC (ranking lag correlations with flexible sliding windows in data streams) is proposed. The key challenge in answering the RLC query is that the number of windows to be analyzed will grow quadratically with the length of the stream, resulting a quadratic computation cost. To boost the computation, we employ the running sum and the geometric probing techniques to facilitate the query processing. We also present an approximate solution that further reduces the computation cost with an acceptable error rate in practice. The extensive experiments verify the efficiency of our proposed methods. We also demonstrate some lag correlations discovered from real datasets to show the practicality of this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast correlation coefficient estimation algorithm for HBase-based massive time series data

Article 18 June 2019

Query Refinement for Correlation-Based Time Series Exploration

Efficient Computation of All-Window Length Correlations

Notes

Regarding the extreme cases, one case is that there is no good match throughout the entire sequence. In our query, there is no minimum correlation constraint; we still return top-k subsequences with the highest lag correlations (even though all correlations are very low). The other case is that all object sequences are entirely identical to the query sequences. In our problem definition, we return the longer subsequence when multiple correlations are identical. Hence, in this case, we will return the entire sequence.
Monte Carlo simulated stock price generator. http://25yearsofprogramming.com/blog/20070412c-montecarlostockprices.html.
http://www.pmel.noaa.gov/tao/.
http://finance.yahoo.com.

References

Athitsos V, Papapetrou P, Potamias M, Kollios G, Gunopulos D (2008) Approximate embedding-based subsequence matching of time series. In: SIGMOD, pp 365–378
Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control. Prentice Hall, Upper Saddle River
MATH Google Scholar
Cai Y, Tong H, Fan W, Ji P (2015a) Fast mining of a network of coevolving time series. In: SDM
Cai Y, Tong H, Fan W, Ji P, He Q (2015b) Facets: fast comprehensive mining of coevolving high-order time series. In: SIGKDD, pp 79–88
Cao J, Zhou Y, Wu M (2015) Adaptive grid-based k-median clustering of streaming data with accuracy guarantee. In: DASFAA, pp 75–91
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: SIGMOD, pp 419–429
Gong X, Xiong Y, Huang W, Chen L, Lu Q, Hu Y (2015) Fast similarity search of multi-dimensional time series via segment rotation. In: DASFAA, pp 108–124
Kahveci T, Singh AK (2004) Optimizing similarity search for arbitrary length time series queries. IEEE Trans Knowl Data Eng 16:418–433
Article Google Scholar
Koper KD, Wallace TC, Taylor SR, Hartse HE (2001) Forensic seismology and the sinking of the kursk. EOS Trans Am Geophys Union 82:37–46
Article Google Scholar
Kusmierczyk T, Nørvåg K (2015) Mining correlations on massive bursty time series collections. In: DASFAA, pp 55–71
Lee ML, Hsu W, Li L, Tok WH (2009) Consistent top-k queries over time. In: DASFAA, pp 51–65
Li Y, Yiu ML, Gong Z (2013) Discovering longest-lasting correlation in sequence databases. Proc VLDB Endow 6:1666–1677
Article Google Scholar
Li Y, Leong Hou U, Yiu ML, Gong Z (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: ICDE
Mueen A (2013) Enumeration of time series motifs of all lengths. In: ICDM, pp 547–556
Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: SDM, pp 473–484
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: SIGKDD, pp 1154–1162
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD, pp 262–270
Sakurai Y, Papadimitriou S, Faloutsos C (2005) Braid: stream mining through group lag correlations. In: SIGKDD, pp 599–610
Traina A, Traina C Jr, Faloutsos C (2001) Similarity search without tears: the omni-family of all-purpose access methods. In: ICDE, pp 623–630
Wu D, Ke Y, Yu JX, Philip SY, Chen L (2010) Detecting leaders from correlated time series. In: DASFAA, pp 352–367
Wu D, Ke Y, Yu JX, Philip SY, Chen L (2011) Leadership discovery when data correlatively evolve. World Wide Web 14:1–25
Article Google Scholar
Xu E, Hsu W, Lee ML, Patel D (2015) k-Consistent influencers in network data. In: DASFAA, pp 452–468
Zhou X, Hong H, Xing X, Huang W, Bian K, Xie K (2015) Mining dependencies considering time lag in spatio-temporal traffic data. Web-age information management. Springer, Berlin
Google Scholar

Download references

Acknowledgements

We thank Yasushi Sakurai and Yuhong Li for providing us the data sets. This work was supported by National Program on Key Basic Research Project (973 Program) (2012CB725305), National Science and Technology Supporting plan (2015BAH45F01), the public key plan of Zhejiang Province (2014C23005), the cultural relic protection science and technology project of Zhejiang Province, University of Macau RC (MYRG2014-00106-FST), and NSFC of China (61502548).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Shanshan Wu, Huaizhong Lin, Wenxiang Wang, Dongming Lu & Yunjun Gao
Faculty of Science and Technology, University of Macau, Macau, China
Leong Hou U

Authors

Shanshan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huaizhong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wenxiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Lu
View author publications
You can also search for this author in PubMed Google Scholar
Leong Hou U
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaizhong Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, S., Lin, H., Wang, W. et al. RLC: ranking lag correlations with flexible sliding windows in data streams. Pattern Anal Applic 20, 601–611 (2017). https://doi.org/10.1007/s10044-016-0577-4

Download citation

Received: 20 March 2016
Accepted: 18 August 2016
Published: 01 September 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10044-016-0577-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RLC: ranking lag correlations with flexible sliding windows in data streams

Abstract

Access this article

Similar content being viewed by others

Fast correlation coefficient estimation algorithm for HBase-based massive time series data

Query Refinement for Correlation-Based Time Series Exploration

Efficient Computation of All-Window Length Correlations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RLC: ranking lag correlations with flexible sliding windows in data streams

Abstract

Access this article

Similar content being viewed by others

Fast correlation coefficient estimation algorithm for HBase-based massive time series data

Query Refinement for Correlation-Based Time Series Exploration

Efficient Computation of All-Window Length Correlations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation