Abstract
Numerous applications continuously produce big amounts of data series, and in several time critical scenarios analysts need to be able to query these data as soon as they become available. This, however, is not currently possible with the state-of-the-art indexing methods and for very large data series collections. In this paper, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections. We present a detailed design and evaluation of our method using approximate and exact query algorithms with both synthetic and real data sets. Adaptive indexing significantly outperforms previous solutions, gracefully handling large data series collections, reducing the data to query delay: By the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), our method has already answered \(3*10^5\) queries.






















Similar content being viewed by others
Notes
This paper is an extended version of [22]. It describes an exact search algorithm and a new full index construction method, both outperforming the state of the art. It also includes more detailed discussions and additional experiments.
References
Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Intell. Mag. 9(3), 27–39 (2014)
Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP (1999)
Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng. 27(8), 2231–2244 (2015)
Shasha, D.: Tuning time series queries in finance: case studies and recommendations. IEEE Data Eng. Bull. 22(2), 40–46 (1999)
Ye, L., Keogh, E.J.: Time series shapelets: a new primitive for data mining. In: KDD (2009)
Bu, Y., Wing L.T., Chee F.A.W., Keogh, E., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM (2007)
Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: return to the basics. PVLDB 5(11), 1662–1673 (2012)
Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 8(1), 13–24 (2014)
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD (2012)
Rodrigues, P., Gama, J., Pedroso, J.: Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20(5), 615–627 (2008)
Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 6(10), 793–804 (2013)
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: indexing and mining one billion time series. In: ICDM (2010)
QualiMaster a configurable real-time data processing infrastructure mastering autonomous quality adaptation—deliverable D1.1: initial use cases and requirements. Technical report, QualiMaster Project (2014)
Rogers, S.: Big data is scaling bi and analytics Information Management. http://www.information-management.com/issues/21_5/big-data-is-scaling-bi-and-analytics-10021093-1.html (2011). Accessed 28 Aug 2016
Adhd-200. http://fcon_1000.projects.nitrc.org/indi/adhd200/ (2011)
Sloan digital sky survey. https://www.sdss3.org/dr10/data_access/volume.php (2015)
Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my data files. Here are my queries. Where are my results? In: CIDR (2011)
Idreos, S., Liarou, E.: dbtouch: analytics at your fingertips. In: CIDR (2013)
Guttman, A.: R-trees a dynamic structure for spatial searching. In: SIGMOD (1984)
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: VLDB (1996)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD (2014)
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: FODO Conference (1993)
Keogh, E.J., Pazzani, M.J.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: KDD (1998)
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Time series epenthesis: clustering time series streams requires ignoring some data. In: ICDE (2011)
Warren, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. DMKD 26(2), 275–309 (2013)
Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: SIGMOD (2005)
Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE (2002)
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D.: Streaming time series summarization using user-defined amnesic functions. TKDE 20(7), 992–1006 (2008)
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D., Truppel, W.: Online amnesic approximation of streaming time series. In: ICDE, pp. 339–349 (2004)
Chan, K.P., Fu, A.C.: Efficient time series matching by wavelets. In: ICDE (1999)
Keogh, E., Chakrabarti, K., Pazzani, M.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2000)
Yi, B., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000)
Lin, J., Keogh, E., Lonardi, S.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD, pp. 2–11 (2003)
Assent, I., Krieger, R., Afschari, F., Seidl, T.: The TS-tree: efficient time series search and retrieval. In: EDBT (2008)
Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: KDD (2008)
Shieh, J., Keogh, E.: iSAX: disk-aware mining and indexing of massive time series datasets. DMKD 19(1), 24–57 (2009)
Graefe, G., Halim, F., Idreos, S., Kuno, H.A., Manegold, S.: Concurrency control for adaptive indexing. PVLDB 5(7), 656–667 (2012)
Graefe, G., Halim, F., Idreos, S., Kuno, H.A., Manegold, S., Seeger, B.: Transactional support for adaptive indexing. VLDB J. 23(2), 303–328 (2014)
Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. PVLDB 5(6), 502–513 (2012)
Idreos, S., Kersten, M.L., Manegold, S.: Updating a cracked database. In: SIGMOD, pp. 413–424 (2007)
Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR (2007)
Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: SIGMOD (2009)
Idreos, S., Manegold, S., Kuno, H.A., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. PVLDB 4(9), 585–597 (2011)
Schuhknecht, F.M., Jindal, A., Dittrich, J.: The uncracked pieces in database cracking. PVLDB 7(2), 97–108 (2013)
Richter, S., Quiane-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in hadoop. VLDBJ 23(3), 469–494 (2013)
Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB (2003)
Zhou, J., Ross, K.A., Buffering database operations for enhanced instruction cache performance. In: SIGMOD (2004)
Stonebraker, M.: The case for partial indexes. SIGMOD Rec. 18(4), 4–11 (1989)
Achakeev, D., Seeger, B.: Efficient bulk updates on multiversion b-trees. PVLDB 6(14), 1834–1845 (2013)
Ghanem, T.M., Shah, R., Mokbel, M.F., Aref, W.G., Vitter, J.S.: Bulk operations for space-partitioning trees. In: ICDE (2004)
Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: KDD (2015)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD (1994)
Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: SIGMOD, pp. 13–25 (1997)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)
Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.: Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. KAIS 39(1), 123–151 (2014)
Incorporated Research Institutions for Seismology—Seismic Data Access. http://ds.iris.edu/data/access/ (2016)
Soldi, S., Beckmann, V., Baumgartner, W., Ponti, G., Shrader, C.R., Lubiński, P., Krimm, H., Mattana, F., Tueller, J.: Long-term variability of agn at hard X-rays. Astron. Astrophys. 563, A57 (2014)
Kashyap, S., Karras, P.: Scalable kNN search on vertically stored time series. In: KDD (2011)
Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)
Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 8(12), 1912–1923 (2015)
du Mouza, C., Litwin, W., Rigaux, P.: SD-Rtree: a scalable distributed rtree. In: ICDE (2007)
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C,: Indexing multi-dimensional data in a cloud system. In: SIGMOD (2010)
Xie, Y., Palsetia, D., Trajcevski, G., Agrawal, A., Choudhary, A.N.: SILVERBACK: scalable association mining for temporal data in columnar probabilistic databases. In: ICDE (2014)
Acknowledgments
We would like to thank Prof. Volker Beckmann for providing us the Astro data set [60].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zoumpatianos, K., Idreos, S. & Palpanas, T. ADS: the adaptive data series index. The VLDB Journal 25, 843–866 (2016). https://doi.org/10.1007/s00778-016-0442-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-016-0442-5