Finding hot query patterns over an XQuery stream

Yang, Liang Huai; Lee, Mong Li; Hsu, Wynne

doi:10.1007/s00778-004-0134-4

Finding hot query patterns over an XQuery stream

Published: December 2004

Volume 13, pages 318–332, (2004)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Liang Huai Yang^1,2,
Mong Li Lee¹ &
Wynne Hsu¹

90 Accesses
20 Citations
3 Altmetric
Explore all metrics

Abstract.

Caching query results is one efficient approach to improving the performance of XML management systems. This entails the discovery of frequent XML queries issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine the frequent query patterns over the query stream. To facilitate the one-pass mining process, we devise a novel data structure called DTS to summarize the pattern trees seen so far. By grouping the incoming pattern trees into batches, we can dynamically mark the active portion of the current batch in DTS and limit the enumeration of candidate trees to only the currently active pattern trees. We also design another summary data structure called ECTree that provides for the incremental computation of the frequent tree patterns over the query stream. Based on the above two constructs, we present two mining algorithms called XQSMinerI and XQSMinerII. XQSMinerI is fast, but it tends to overestimate, while XQSMinerII adopts a filter-and-refine approach to minimize the amount of overestimation. Experimental results show that the proposed methods are both efficient and scalable and require only small memory footprints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120-139
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: VLDB, pp 487-499
Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: ACM PODS, pp 221-232
Asai T, Arimura H(2002) Online algorithms for mining semi-structured data stream. In: ICDM, pp 27-34
Asai T, Abe K, Kawasoe S(2002) Efficient substructure discovery from large semi-structured data. In: 2nd SIAM international conference on data mining
Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: SODA, pp 633-634
Babcock B, Babu S, Datar M, Motwani R (2003) Chain: operator scheduling for memory minimization in data stream systems. In: ACM SIGMOD, pp 253-264
Carney D, Cetintemel U, Rasin A, Zdonik SB, Cherniack M, Stonebraker M (2003) Operator scheduling in a data stream manager. In: VLDB, pp 838-849
Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden S, Raman V, Reiss F, Shah MA (2003) Telegraphcq: continuous dataflow processing for an uncertain world. In: CIDR
Charikar M, Chen K, Colton MF (2002) Finding frequent items in data streams. In: 29th international colloquium on automata, languages and programming, pp 693-703
Chen L, Rundensteiner EA, Wang S (2002) Xcache - a semantic caching system for XML queries. In: ACM SIGMOD, pp 618
Cormode G, Datar M, Indyk P, Muthukrishnan S (2002) Comparing data streams using hamming norms (how to zero in). In: VLDB, pp 335-345
Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: ACM SIGMOD
Das A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: ACM SIGMOD, pp 40-51
Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: SODA, pp 635-644
Dobra A, Garofalakis M, Gehrke J, Rastogi R (2002) Processing complex aggregate queries over data streams. In: ACM SIGMOD, pp 61-72
Domingos P, Hulten G (2000) Mining high-speed data streams. In: ACM SIGKDD, pp 71-80
Ganguly S, Garofalakis M, Rastogi R (2003) Processing set expressions over continuous update streams. In: ACM SIGMOD, pp 265-276
Gibbons PB, Matias Y (1998) New sampling-based summary statistics for improving approximate query answers. In: ACM SIGMOD, pp 331-342
Gibbons PB, Matias Y (1999) Synopsis data structures for massive data sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, pp 39-70
Google Scholar
Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: VLDB, pp 79-88
Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515--528
Article Google Scholar
Hidber C (1999) Online association rule mining. In: ACM SIGMOD, pp 145--156
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: ACM SIGKDD, pp 97-106
Luccio F, Enriquez AM, Rieumont PO, Pagli L (2001) Exact rooted subtree matching in sublinear time. Technical report, University of Pisa, Italy. ftp://ftp.di.unipi.it/pub/techreports/TR-01-14.ps.Z
Madden S, Shah M, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: ACM SIGMOD, pp 49-60
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB, pp 346-357
Mazlack L (2001) Granulation of quantitative association rules. Int J Fuzzy Sys 3(3):400-408
Google Scholar
Miklau G, Suciu D (2002) Containment and equivalence for an XPath fragment. In: ACM PODS, pp 65-76
Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku G, Olston C, Rosenstein J, Varma R (2003) Query processing, resource management, and approximation in a data stream management system. In: CIDR
Naughton JF, DeWitt DJ, Maier D(2001) The Niagara Internet query system. IEEE Data Eng Bull 24(2):27-33
Google Scholar
Neven F, Schwentick T (2003) XPath containment in the presence of disjunction, DTDs, and variables. In: ICDT, pp 330-345
Papadimitriou S, Brockwell A, Faloutsos C (2003) Adaptive, hands-off stream mining. In: VLDB, pp 560-571
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: VLDB, pp 432-444
Schwentick T (2004) XPath query containment. In: ACM SIGMOD Record 33(1):101-109
Shasha D, Wang JTL, Giugno R (2002) Algorithmics and applications of tree and graph searching. In: ACM PODS, pp 39-52
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relatioal tables. In: ACM SIGMOD, pp 1-12
Tatbul N, Cetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: VLDB
Termier A, Rousset MC, Sebag M (2002) TreeFinder: a first step towards XML data mining. In: IEEE ICDM
Toivonen H (1996) Sampling large database for association rules. In: VLDB, pp 134-145
Wang K, Liu H (2000) Discovering structural association of semistructured data. IEEE TKDE 12(3):353-371
Google Scholar
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: ACM SIGKDD, pp 226-235
Wood P (2003) Containment for XPath fragments under DTD constraints. In: ICDT, pp 300-314
XML Path Language (XPath). http://www.w3.org/TR/xpath
Yang LH, Lee ML, Hsu W (2003) Mining frequent query patterns in XML. In: DASFAA, pp 355-362
Yang LH, Lee ML, Hsu W (2003) Efficient mining of frequent query patterns for caching. In: VLDB
Zaki M (2002) Efficiently mining frequent trees in a forest. In: ACM SIGKDD
Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: ACM SIGKDD, pp 336-345

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore
Liang Huai Yang, Mong Li Lee & Wynne Hsu
School of Electronics Engineering and Computer Science, Peking University, P.R. China
Liang Huai Yang

Authors

Liang Huai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mong Li Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wynne Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Huai Yang.

Additional information

Received: 17 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004

Edited by: J. Gehrke and J. Hellerstein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L.H., Lee, M.L. & Hsu, W. Finding hot query patterns over an XQuery stream. VLDB 13, 318–332 (2004). https://doi.org/10.1007/s00778-004-0134-4

Download citation

Issue Date: December 2004
DOI: https://doi.org/10.1007/s00778-004-0134-4

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding hot query patterns over an XQuery stream

Abstract.

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Stratified random sampling from streaming and stored data

The big data system, components, tools, and technologies: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

Finding hot query patterns over an XQuery stream

Abstract.

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Stratified random sampling from streaming and stored data

The big data system, components, tools, and technologies: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation