Abstract
Filtering queries are widely used in data stream applications. As more and more filtering queries are registered in high-speed data stream management system, the processing efficiency becomes crucial. This paper presents an efficient query index structure based on decision tree. The index structure makes full use of predicate indices on single attributes, as well as the conjunction relationship between predicates in a single query. It is easy to integrate various predicate indices into this structure. How to select dividing attributes during construction is crucial to the performance of the index tree. Two dividing attribute selection algorithms are described. One is based on information gain (IG) and the other is based on estimated time cost (ETC). The latter takes some sample tuples as a training data set and is able to build more efficient trees. Our experiments demonstrate that.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream. In: Proc. ACM Symp. on Principles of Database Systems, pp. 1–16 (2002)
Hanson, E., Chaaboun, M., Kim, C.-H., Wang, Y.-W.: A predicate matching algorithm for database rule systems. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 271–280 (1990)
Hanson, E.N., Johnson, T.: The Interval Skip List: A data structure for finding all intervals that overlap a point. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 153–164. Springer, Heidelberg (1991)
Hanson, E., Johnson, T.: Selection predicate indexing for active database using interval skip lists. Information Systems 21(3), 269–298 (1996)
Wu, K.-L., Chen, S.-K., Yu, P.S.: Interval Query Indexing for Efficient Stream Processing. In: Proc. of ACM CIKM (2004)
Wu, K.-L., Chen, S.-K., Yu, P.S.: Query indexing with containment-encoded intervals for efficient stream processing. Knowl. Inf. Syst. 9(1), 62–90 (2006)
Wu, K.-L., Chen, S.-K., Yu, P.S.: On-Demand Index for Efficient Structural Joins. IBM Research Report (2006)
Chandrasekaran, S., Franklin, M.J.: Streaming Queries over Streaming Data. In: Proceedings of the 28th VLDB Conference, Hong Kong, China (2002)
Aguilera, M.K., Strom, R.E., Sturman, D.C., Astley, M., Chandra, T.D.: Matching events in a content-based subscription system. In: Proc. of the 18th ACM Symp. on Principles of Distributed Computing, Atlanta, pp. 53–61 (1999)
Campailla, A., Chaki, S., Clarke, E., Jha, S., Veith, H.: Efficient filtering in publish-subscribe systems using binary decision diagrams. In: Proc. of the ICSE 2001, pp. 443–452. IEEE Computer Society, Toronto (2001)
Fabret, F., Jacobsen, H.A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe systems. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data (2001)
Krügel, C., Tóth, T.: Using decision trees to improve signature-based intrusion detection. In: Vigna, G., Krügel, C., Jonsson, E. (eds.) RAID 2003. LNCS, vol. 2820, pp. 173–191. Springer, Heidelberg (2003)
Navarro, G., Raffinot, M.: Flexibale Pattern Matching in Strings, pp. 49–54. Cambridge University Press, Cambridge (2002)
Snort. Open-source Network Intrusion Detection System, http://www.snort.org
Mitchell, T.M.: Machine Learning, pp. 63–66. McGraw-Hill, New York (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Bai, S., Tan, J., Guo, L. (2006). Efficient Filtering Query Indexing in Data Stream. In: Feng, L., Wang, G., Zeng, C., Huang, R. (eds) Web Information Systems – WISE 2006 Workshops. WISE 2006. Lecture Notes in Computer Science, vol 4256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11906070_1
Download citation
DOI: https://doi.org/10.1007/11906070_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47663-4
Online ISBN: 978-3-540-47664-1
eBook Packages: Computer ScienceComputer Science (R0)