DAPSS: Exact Subsequence Matching for Data Streams

Fujiwara, Yasuhiro; Sakurai, Yasushi; Yamamuro, Masashi

doi:10.1007/11733836_8

Yasuhiro Fujiwara¹⁹,
Yasushi Sakurai¹⁹ &
Masashi Yamamuro¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1034 Accesses

Abstract

There is much interest in the processing of data streams for applications in the fields such as financial analysis, network monitoring, mobile services, and sensor network management. The key characteristic of stream data, that it continues to arrive, demands a new approach. This paper focuses on the problem of detecting, exactly, similar pairs of subsequences of arbitrary length in streaming fashion. We propose DAPSS (DAta stream Processing for Store and Search), an efficient and effective method to detect the similar pairs, which keeps (1) the feature data of each sequence in the memory space and (2) the compressed data of the original sequences in the disk space. Experiments on synthetic and real data sets show that DAPSS is significantly (up to 35 times) faster than the naive method while it guarantees the correctness of query results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Chapter Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The r*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)
Google Scholar
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD Conference, pp. 419–429 (1994)
Google Scholar
Moon, Y.S., Whang, K.Y., Han, W.S.: General match: a subsequence matching method in time-series databases based on generalized windows. In: SIGMOD Conference, pp. 382–393 (2002)
Google Scholar
Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative trends in massive time series data sets using sketches. In: VLDB, pp. 363–372 (2000)
Google Scholar
Sakurai, Y., Yoshikawa, M., Faloutsos, C.: Ftw: Fast similarity search under the time warping distance. In: PODS, pp. 326–337 (2005)
Google Scholar
Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Record 32(2), 5–14 (2003)
Article Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Google Scholar
Law, Y.N., Wang, H., Zaniolo, C.: Query languages and data models for database sequences and data streams. In: VLDB, pp. 492–503 (2004)
Google Scholar
Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Galvez, E.F., Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R., Zdonik, S.B.: Retrospective on aurora. VLDB J. 13, 370–383 (2004)
Article Google Scholar
Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD Conference, pp. 1–12 (2005)
Google Scholar
Chandrasekaran, S., Franklin, M.J.: Remembrance of streams past: Overload-sensitive management of archived streams. In: VLDB, pp. 348–359 (2004)
Google Scholar
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: VLDB, pp. 204–215 (2004)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB, pp. 852–863 (2004)
Google Scholar
Sakurai, Y., Papadimitriou, S., Faloutsos, C.: Braid: Stream mining through group lag correlations. In: SIGMOD Conference, pp. 599–610 (2005)
Google Scholar
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 358–369. Springer, Heidelberg (2003)
Google Scholar
Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: ICDE, pp. 44–55 (2005)
Google Scholar
Pong Chan, K., Fu, A.W.C.: Efficient time series matching by wavelets. In: ICDE, pp. 126–133 (1999)
Google Scholar
Katayama, N., Satoh, S.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. In: SIGMOD Conference, pp. 369–380 (1997)
Google Scholar
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The a-tree: An index structure for high-dimensional spaces using relative approximation. In: VLDB, pp. 516–526 (2000)
Google Scholar
Kollios, G., Papadopoulos, D., Gunopulos, D., Tsotras, V.J.: Indexing mobile objects using dual transformations. VLDB J. 14, 238–256 (2005)
Article Google Scholar
Keogh, E.J., Chakrabarti, K., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD Conference, pp. 188–228 (2001)
Google Scholar
Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD Conference, pp. 289–300 (1997)
Google Scholar
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, SRC Research Report (1994)
Google Scholar
Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40, 1098–1101 (1952)
Article MATH Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–343 (1977)
Article MathSciNet MATH Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)
Google Scholar
Bozkaya, T., Özsoyoglu, Z.M.: Distance-based indexing for high-dimensional metric spaces. In: SIGMOD Conference, pp. 357–368 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Cyber Space Laboratories, NTT Corporation, 1-1 Hikarinooka, Yokosuka-Shi, Kanagawa, 239-0847, Japan
Yasuhiro Fujiwara, Yasushi Sakurai & Masashi Yamamuro

Authors

Yasuhiro Fujiwara
View author publications
You can also search for this author in PubMed Google Scholar
Yasushi Sakurai
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Yamamuro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, Singapore
Mong Li Lee
School of Computing, National University of Singapore, Singapore
Kian-Lee Tan
School of Engineering and Technology, Asian Institute of Technology, P.O. Box 4, 12120, Klong Luang, Pathum Thani, Thailand
Vilas Wuwongse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujiwara, Y., Sakurai, Y., Yamamuro, M. (2006). DAPSS: Exact Subsequence Matching for Data Streams. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_8

Download citation

DOI: https://doi.org/10.1007/11733836_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics