Abstract
In this paper, we focus on the problem of exploring sequential data to discover time sub-intervals that satisfy certain pairwise correlation constraints. Differently than most existing works, we use the deviation from targeted pairwise correlation constraints as an objective to minimize in our problem. Moreover, we include users preferences as an objective in the form of maximizing similarity to users’ initial sub-intervals. The combination of these two objectives are prevalent in applications where users explore time series data to locate time sub-intervals in which targeted patterns exist. Discovering these sub-intervals among time series data is extremely useful in various application areas such as network and environment monitoring.
Towards finding the optimal sub-interval (i.e., optimal query) satisfying these objectives, we propose applying query refinement techniques to enable efficient processing of candidate queries. Specifically, we propose QFind, an efficient algorithm which refines a user’s initial query to discover the optimal query by applying novel pruning techniques. QFind applies two-level pruning techniques to safely skip processing unqualified candidate queries, and early abandon the computations of correlation for some pairs based on a monotonic property. We experimentally validate the efficiency of our proposed algorithm against state-of-the-art algorithm under different settings using real and synthetic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blyth, C.R.: On simpson’s paradox and the sure-thing principle. J. Am. Stat. Assoc. 67(338), 364–366 (1972)
Chaudhuri, S.: Generalization and a framework for query modification. In: Proceedings of the Sixth International Conference on Data Engineering, Los Angeles, California, USA, 5–9 February 1990, pp. 138–145 (1990)
Gavrilov, M., Anguelov, D., Indyk, P., Motwani, R.: Mining the stock market (extended abstract): which measure is best? In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000, pp. 487–496 (2000)
Guo, T., Sathe, S., Aberer, K.: Fast distributed correlation discovery over streaming time-series data. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, 19–23 October 2015, pp. 1161–1170 (2015)
Li, Y., Huo U, L., Yiu, M.L., Gong, Z.: Efficient discovery of longest-lasting correlation in sequence databases. VLDB J. 25(6), 767–790 (2016)
Lin, J., Keogh, E.J., Lonardi, S., Lankford, J.P., Nystrom, D.M.: Visually mining and monitoring massive time series. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 22–25 August 2004, pp. 460–469 (2004)
Liu, J., Terzis, A.: Sensing data centres for energy efficiency. Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 370(1958), 136–157 (2012)
Matsubara, Y., Sakurai, Y., Ueda, N., Yoshikawa, M.: Fast and exact monitoring of co-evolving data streams. In: 2014 IEEE International Conference on Data Mining, ICDM 2014, Shenzhen, China, 14–17 December 2014, pp. 390–399 (2014)
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, 6–10 June 2010, pp. 171–182 (2010)
Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)
Pelkonen, T., Franklin, S., Cavallaro, P., Huang, Q., Meza, J., Teller, J., Veeraraghavan, K.: Gorilla: a fast, scalable, in-memory time series database. PVLDB 8(12), 1816–1827 (2015)
Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G.E.A.P.A., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J.: Searching and mining trillions of time series subsequences under dynamic time warping. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, Beijing, China, 12–16 August 2012, pp. 262–270 (2012)
Reeves, G., Liu, J., Nath, S., Zhao, F.: Managing massive time series streams with multiscale compressed trickles. PVLDB 2(1), 97–108 (2009)
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format + schema. Technical report, Google Inc., Mountain View, CA, USA, November 2011
Sakurai, Y., Papadimitriou, S., Faloutsos, C.: BRAID: stream mining through group lag correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, 14–16 June 2005, pp. 599–610 (2005)
Tabachnick, B.G., Fidell, L.S.: Using Multivariate Statistics, 5th edn. Allyn & Bacon Inc., Needham Heights (2006)
Utomo, C., Li, X., Wang, S.: Classification based on compressive multivariate time series. In: Cheema, M.A., Zhang, W., Chang, L. (eds.) ADC 2016. LNCS, vol. 9877, pp. 204–214. Springer, Cham (2016). doi:10.1007/978-3-319-46922-5_16
Vartak, M., Raghavan, V., Rundensteiner, E.A.: Qrelx: generating meaningful queries that provide cardinality assurance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, 6–10 June 2010, pp. 1215–1218 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Albarrak, A.M., Sharaf, M.A. (2017). Query Refinement for Correlation-Based Time Series Exploration. In: Huang, Z., Xiao, X., Cao, X. (eds) Databases Theory and Applications. ADC 2017. Lecture Notes in Computer Science(), vol 10538. Springer, Cham. https://doi.org/10.1007/978-3-319-68155-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-68155-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68154-2
Online ISBN: 978-3-319-68155-9
eBook Packages: Computer ScienceComputer Science (R0)