Skip to main content
Log in

Finding hot query patterns over an XQuery stream

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Caching query results is one efficient approach to improving the performance of XML management systems. This entails the discovery of frequent XML queries issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine the frequent query patterns over the query stream. To facilitate the one-pass mining process, we devise a novel data structure called DTS to summarize the pattern trees seen so far. By grouping the incoming pattern trees into batches, we can dynamically mark the active portion of the current batch in DTS and limit the enumeration of candidate trees to only the currently active pattern trees. We also design another summary data structure called ECTree that provides for the incremental computation of the frequent tree patterns over the query stream. Based on the above two constructs, we present two mining algorithms called XQSMinerI and XQSMinerII. XQSMinerI is fast, but it tends to overestimate, while XQSMinerII adopts a filter-and-refine approach to minimize the amount of overestimation. Experimental results show that the proposed methods are both efficient and scalable and require only small memory footprints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120-139

    Article  Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: VLDB, pp 487-499

  3. Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: ACM PODS, pp 221-232

  4. Asai T, Arimura H(2002) Online algorithms for mining semi-structured data stream. In: ICDM, pp 27-34

  5. Asai T, Abe K, Kawasoe S(2002) Efficient substructure discovery from large semi-structured data. In: 2nd SIAM international conference on data mining

  6. Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: SODA, pp 633-634

  7. Babcock B, Babu S, Datar M, Motwani R (2003) Chain: operator scheduling for memory minimization in data stream systems. In: ACM SIGMOD, pp 253-264

  8. Carney D, Cetintemel U, Rasin A, Zdonik SB, Cherniack M, Stonebraker M (2003) Operator scheduling in a data stream manager. In: VLDB, pp 838-849

  9. Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden S, Raman V, Reiss F, Shah MA (2003) Telegraphcq: continuous dataflow processing for an uncertain world. In: CIDR

  10. Charikar M, Chen K, Colton MF (2002) Finding frequent items in data streams. In: 29th international colloquium on automata, languages and programming, pp 693-703

  11. Chen L, Rundensteiner EA, Wang S (2002) Xcache - a semantic caching system for XML queries. In: ACM SIGMOD, pp 618

  12. Cormode G, Datar M, Indyk P, Muthukrishnan S (2002) Comparing data streams using hamming norms (how to zero in). In: VLDB, pp 335-345

  13. Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: ACM SIGMOD

  14. Das A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: ACM SIGMOD, pp 40-51

  15. Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: SODA, pp 635-644

  16. Dobra A, Garofalakis M, Gehrke J, Rastogi R (2002) Processing complex aggregate queries over data streams. In: ACM SIGMOD, pp 61-72

  17. Domingos P, Hulten G (2000) Mining high-speed data streams. In: ACM SIGKDD, pp 71-80

  18. Ganguly S, Garofalakis M, Rastogi R (2003) Processing set expressions over continuous update streams. In: ACM SIGMOD, pp 265-276

  19. Gibbons PB, Matias Y (1998) New sampling-based summary statistics for improving approximate query answers. In: ACM SIGMOD, pp 331-342

  20. Gibbons PB, Matias Y (1999) Synopsis data structures for massive data sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, pp 39-70

    Google Scholar 

  21. Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: VLDB, pp 79-88

  22. Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515--528

    Article  Google Scholar 

  23. Hidber C (1999) Online association rule mining. In: ACM SIGMOD, pp 145--156

  24. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: ACM SIGKDD, pp 97-106

  25. Luccio F, Enriquez AM, Rieumont PO, Pagli L (2001) Exact rooted subtree matching in sublinear time. Technical report, University of Pisa, Italy. ftp://ftp.di.unipi.it/pub/techreports/TR-01-14.ps.Z

  26. Madden S, Shah M, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: ACM SIGMOD, pp 49-60

  27. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB, pp 346-357

  28. Mazlack L (2001) Granulation of quantitative association rules. Int J Fuzzy Sys 3(3):400-408

    Google Scholar 

  29. Miklau G, Suciu D (2002) Containment and equivalence for an XPath fragment. In: ACM PODS, pp 65-76

  30. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku G, Olston C, Rosenstein J, Varma R (2003) Query processing, resource management, and approximation in a data stream management system. In: CIDR

  31. Naughton JF, DeWitt DJ, Maier D(2001) The Niagara Internet query system. IEEE Data Eng Bull 24(2):27-33

    Google Scholar 

  32. Neven F, Schwentick T (2003) XPath containment in the presence of disjunction, DTDs, and variables. In: ICDT, pp 330-345

  33. Papadimitriou S, Brockwell A, Faloutsos C (2003) Adaptive, hands-off stream mining. In: VLDB, pp 560-571

  34. Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: VLDB, pp 432-444

  35. Schwentick T (2004) XPath query containment. In: ACM SIGMOD Record 33(1):101-109

  36. Shasha D, Wang JTL, Giugno R (2002) Algorithmics and applications of tree and graph searching. In: ACM PODS, pp 39-52

  37. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relatioal tables. In: ACM SIGMOD, pp 1-12

  38. Tatbul N, Cetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: VLDB

  39. Termier A, Rousset MC, Sebag M (2002) TreeFinder: a first step towards XML data mining. In: IEEE ICDM

  40. Toivonen H (1996) Sampling large database for association rules. In: VLDB, pp 134-145

  41. Wang K, Liu H (2000) Discovering structural association of semistructured data. IEEE TKDE 12(3):353-371

    Google Scholar 

  42. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: ACM SIGKDD, pp 226-235

  43. Wood P (2003) Containment for XPath fragments under DTD constraints. In: ICDT, pp 300-314

  44. XML Path Language (XPath). http://www.w3.org/TR/xpath

  45. Yang LH, Lee ML, Hsu W (2003) Mining frequent query patterns in XML. In: DASFAA, pp 355-362

  46. Yang LH, Lee ML, Hsu W (2003) Efficient mining of frequent query patterns for caching. In: VLDB

  47. Zaki M (2002) Efficiently mining frequent trees in a forest. In: ACM SIGKDD

  48. Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: ACM SIGKDD, pp 336-345

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Huai Yang.

Additional information

Received: 17 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004

Edited by: J. Gehrke and J. Hellerstein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L.H., Lee, M.L. & Hsu, W. Finding hot query patterns over an XQuery stream. VLDB 13, 318–332 (2004). https://doi.org/10.1007/s00778-004-0134-4

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0134-4

Keywords:

Navigation