Abstract
There is much interest in the processing of data streams for applications in the fields such as financial analysis, network monitoring, mobile services, and sensor network management. The key characteristic of stream data, that it continues to arrive, demands a new approach. This paper focuses on the problem of detecting, exactly, similar pairs of subsequences of arbitrary length in streaming fashion. We propose DAPSS (DAta stream Processing for Store and Search), an efficient and effective method to detect the similar pairs, which keeps (1) the feature data of each sequence in the memory space and (2) the compressed data of the original sequences in the disk space. Experiments on synthetic and real data sets show that DAPSS is significantly (up to 35 times) faster than the naive method while it guarantees the correctness of query results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The r*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD Conference, pp. 419–429 (1994)
Moon, Y.S., Whang, K.Y., Han, W.S.: General match: a subsequence matching method in time-series databases based on generalized windows. In: SIGMOD Conference, pp. 382–393 (2002)
Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative trends in massive time series data sets using sketches. In: VLDB, pp. 363–372 (2000)
Sakurai, Y., Yoshikawa, M., Faloutsos, C.: Ftw: Fast similarity search under the time warping distance. In: PODS, pp. 326–337 (2005)
Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Record 32(2), 5–14 (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Law, Y.N., Wang, H., Zaniolo, C.: Query languages and data models for database sequences and data streams. In: VLDB, pp. 492–503 (2004)
Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Galvez, E.F., Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R., Zdonik, S.B.: Retrospective on aurora. VLDB J. 13, 370–383 (2004)
Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD Conference, pp. 1–12 (2005)
Chandrasekaran, S., Franklin, M.J.: Remembrance of streams past: Overload-sensitive management of archived streams. In: VLDB, pp. 348–359 (2004)
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: VLDB, pp. 204–215 (2004)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB, pp. 852–863 (2004)
Sakurai, Y., Papadimitriou, S., Faloutsos, C.: Braid: Stream mining through group lag correlations. In: SIGMOD Conference, pp. 599–610 (2005)
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 358–369. Springer, Heidelberg (2003)
Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: ICDE, pp. 44–55 (2005)
Pong Chan, K., Fu, A.W.C.: Efficient time series matching by wavelets. In: ICDE, pp. 126–133 (1999)
Katayama, N., Satoh, S.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. In: SIGMOD Conference, pp. 369–380 (1997)
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The a-tree: An index structure for high-dimensional spaces using relative approximation. In: VLDB, pp. 516–526 (2000)
Kollios, G., Papadopoulos, D., Gunopulos, D., Tsotras, V.J.: Indexing mobile objects using dual transformations. VLDB J. 14, 238–256 (2005)
Keogh, E.J., Chakrabarti, K., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD Conference, pp. 188–228 (2001)
Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD Conference, pp. 289–300 (1997)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, SRC Research Report (1994)
Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40, 1098–1101 (1952)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–343 (1977)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)
Bozkaya, T., Özsoyoglu, Z.M.: Distance-based indexing for high-dimensional metric spaces. In: SIGMOD Conference, pp. 357–368 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fujiwara, Y., Sakurai, Y., Yamamuro, M. (2006). DAPSS: Exact Subsequence Matching for Data Streams. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_8
Download citation
DOI: https://doi.org/10.1007/11733836_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)