Abstract
Recently, sensor-based applications have emerged and collected plenty of long time series. Traditional whole matching similarity search can only query full length time series. However, for long time series, similarity search on arbitrary time windows is more attractive and important. In this paper, we address the problem of window-based KNN search of time series data on HBase. Based on PAA approximation, we propose a composite index structure comprising Horizontal Segment Tree and Vertical Inverted Table. VI-Table is capable to prune time series by data summary in high levels, while HS-Tree leverages data summary in low levels to reduce access of the raw time series data. Both VI-Table and HS-Tree can be built parallel and incrementally. Our experiment results show the effectiveness and robustness of the proposed approach.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The work was supported by the Ministry of Science and Technology of China, National Key Research and Development Program under No. 2016YFB1000700, National Key Basic Research Program of China under No. 2015CB358800 and NSFC (61672163, U1509213).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache HBase. http://hbase.apache.org
ITHBase. https://github.com/hbase-trx/hbase-transactional-tableindexed
OpenTSDB. http://opentsdb.net
Phoenix. http://phoenix.apache.org
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX2.0: indexing and mining one billion time series. In: ICDM 2010 (2010)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD 1994 (1994)
Jestes, J., Phillips, J.M., Li, F., Tang, M.: Ranking large temporal data. In: VLDB 2012 (2012)
Li, Y., Hou, U.-L., Yiu, M.-L., Gong, Z.: Discovering longest-lasting correlation in sequence database. In: VLDB 2013 (2013)
Lian, X., Chen, L., Yu, J.X., Wang, G., Yu, G.: Similarity match over high speed time-series streams. In: ICDE 2007 (2007)
Liu, Y., Songlin, H., Rabl, T., Liu, W., Jacobsen, H.-A., Kaifeng, W., Chen, J., Li, J.: DGFIndex for smart grid: enhancing hive with a cost-effective multidimensional range index. In: VLDB 2014 (2014)
Moon, Y.-S., Whang, K.-Y., Han, W.-S.: General match: a subsequence matching method in time-series databases based on generalized windows. In: SIGMOD 2002 (2002)
Mueen, A., Keogh, E.J., Young, N.: Logical-shapelets: an expressive primitive for time series classification. In: KDD 2011 (2011)
Mueen, A., Keogh, E.J., Zhu, Q., Cash, S., Westover, M.B.: Exact discovery of time series motifs. In: SDM 2009 (2009)
Papapetrou, P., Athitsos, V., Potamias, M., Kollios, G., Gunopulos, D.: Embedding-based subsequence matching in time-series databases. In: TODS 2011 (2011)
Athitsos, V., Papapetrou, P., Potamias, M., Kollios, G., Gunopulos, D.: Approximate embedding-based subsequence matching of time series. In: SIGMOD 2008 (2008)
Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, X., Fang, Z., Wang, P., Zhu, R., Wang, W. (2017). A Distributed Multi-level Composite Index for KNN Processing on Long Time Series. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-55753-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)