Abstract
One efficient approach to improve the performance of XML management systems is to cache the frequently retrieved results. This entails the discovery of frequent query patterns that are issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine for frequent query patterns in a batch-wise manner. We design a novel data structure called D-GQPT to merge the pattern trees of the batches seen so far, and to dynamically mark the active portion of the current batch. With the D-GQPT, we are able to limit the enumeration of candidate trees to only the currently active pattern trees. We also design a summary data structure called ECTree to incrementally compute the frequent tree patterns over the query stream. Based on the above two constructs, we present the frequent query pattern mining algorithm called AppXQSMiner over the XML query stream. Experiment results show that the proposed approach is both efficient and scalable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asai, T., Arimura, H., et al.: Online Algorithms for Mining Semi-structured Data Stream. IEEE ICDM, 27–34 (2002)
Asai, T., Abe, K., Kawasoe, S., et al.: Efficient Substructure Discovery from Large Semi-structured Data. In: 2nd SIAM Int. Conference on Data Mining (2002)
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: 29th Int. Colloquium on Automata, Languages and Programming (2002)
Charikar, M., Chaudhuri, S., Motwani, R., Narasayya, V.R.: Towards Estimation Error Guarantees for Distinct Values. In: ACM PODS, pp. 268–279 (2000)
Chen, L., Rundensteiner, E.A., Wang, S.: XCache-A Semantic Caching System for XML Queries. ACM SIGMOD, 618 (2002)
Gibbons, P.B., Matias, Y.: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. ACM SIGMOD, 331–342 (1998)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams: Theory and Practice. IEEE Transactions on Knowledge and Data Engineering, 515–528 (2003)
Hidber, C.: Online Association Rule Mining. ACM SIGMOD, 145–156 (1999)
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB, pp. 346–357 (2002)
Termier, A., Rousset, M.C., Sebag, M.: TreeFinder: a First Step towards XML Data Mining. In: IEEE ICDM (2002)
Wang, K., Liu, H.: Discovering Structural Association of Semistructured data. IEEE TKDE 12(3), 353–371 (2000)
Yang, L.H., Lee, M.L., Hsu, W.: Mining Frequent Query Patterns in XML. In: DASFAA, pp. 355–362 (2003)
Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of Frequent Query Patterns for Caching. In: VLDB (2003)
Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: ACM SIGKDD (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, L.H., Lee, M.L., Hsu, W. (2004). Approximate Counting of Frequent Query Patterns over XQuery Stream. In: Lee, Y., Li, J., Whang, KY., Lee, D. (eds) Database Systems for Advanced Applications. DASFAA 2004. Lecture Notes in Computer Science, vol 2973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24571-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-24571-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21047-4
Online ISBN: 978-3-540-24571-1
eBook Packages: Springer Book Archive