Skip to main content
Log in

Incremental sequence-based frequent query pattern mining from XML queries

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487–499

  • Aggarwal C, Ta N, Wang J, Feng J, Zaki MJ (2007) Xproj: a framework for projected structural clustering of xml documents. In: KDD

  • Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM

  • Aumann Y, Feldman R, Liphstat O, Mannila H (1999) Borders: an efficient algorithm for association generation in dynamic databases. J Intell Inf Syst 12(1): 61–73

    Article  Google Scholar 

  • Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD

  • Balmin A, Ozcan F, Beyer K, Cochrane R, Pirahesh H (2004) A framework for using materialized xpath views in xml query processing. In: VLDB, pp 60–71

  • Bettini C, Wang XS, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. IEEE Data Eng Bull 21(1): 32–38

    MathSciNet  Google Scholar 

  • Chen L, Rundensteiner EA, Wang S (2002) Xcache: a semantic caching system for xml queries. In: SIGMOD

  • Chen Y, Yang L, Wang YG (2004) Incremental mining of frequent xml query patterns. In: ICDM, pp 343–346

  • Chung C-W, Min J-K, Shim K (2002) Apex: an adaptive path index for xml data. In: SIGMOD Conference, pp 121–132

  • Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: KDD, pp 30–36

  • Feng J, Qian Q, Wang J, Zhou L (2006) Exploit sequencing to accelerate hot xml query pattern mining. In: ACM SAC

  • Feng J, Ta N, Li G (2007) Exploit sequencing views in semantic cache to accelerate xpath query evaluation. In: WWW

  • Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: ICDE, pp 106–115

  • Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M (2000) Freespan: frequent pattern-projected sequential pattern mining. In: KDD, pp 355–359

  • Hristidis V, Petropoulos M (2002) Semantic caching of xml databases. In: WebDB

  • Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting local similarity for indexing paths in graph-structured data. In: ICDE, pp 129–140

  • Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM, pp 313–320

  • Kwon J, Rao P, Moon B, Lee S (2005) Fist: scalable xml document filtering by sequencing twig patterns. In: VLDB, pp 217–228

  • Li G, Feng J, Ta N, Zhang Y, Zhou L (2006a) Scend: an efficient semantic cache to exploit xpath query/view answerability. In: WISE, pp 460–473

  • Li G, Feng J, Wang J, Zhang Y, Zhou L (2006b) Incremental mining of frequent query patterns from xml queries for caching. In: ICDM, pp 350–361

  • Luo Q, Krishnamurthy S, Mohan C, Pirahesh H, Woo H, Lindsay BG, Naughton JF (2002) Middle-tier database caching for e-business. In: SIGMOD, pp 600–611

  • Mandhani B, Suciu D (2005) Query caching and view selection for xml databases. In: VLDB

  • Masseglia F, Cathala F, Poncelet P (1998) The psp approach for mining sequential patterns. In: PKDD

  • Milo T, Suciu D (1999) Index structures for path expressions. In: ICDT, pp 277–295

  • Ozden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: ICDE, pp 412–421

  • Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp 215–224

  • Prüfer H (1918) Neuer beweis eines satzes uber permutationen. Archiv fur Mathematik und Physik 27: 142–144

    Google Scholar 

  • Qun C, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: SIGMOD, pp 134–144

  • Rao PR, Moon B (2004) Prix: indexing and querying xml using prufer sequences. In: ICDE, pp 288–299

  • Re C, Brinkley J, Hinshaw K, Suciu D (2004) Distributed xquery. In: Information integration on the web (IIWeb)

  • Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp 3–17

  • Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: ICDM, pp 450–457

  • Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79–90

  • Wang K, Liu H (2000) Discovering structural association of semistructured data. IEEE TKDE 12(2): 353–371

    Google Scholar 

  • Wang H, Park S, Fan W, Yu PS (2003) Vist: a dynamic index method for querying xml data by tree structures. In: SIGMOD, pp 110–121

  • Xu W (2005) The framework of an xml semantic caching system. In: WebDB

  • Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large databases. In: SDM

  • Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: SIGMOD, pp 406–417

  • Yang LH, Lee M-L, Hsu W (2003a) Efficient mining of xml query patterns for caching. In: VLDB, pp 69–80

  • Yang LH, Lee M-L, Hsu W, Acharya S (2003b) Mining frequent query patterns from xml queries. In: DASFAA, pp 355–362

  • Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD

  • Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: SIGKDD, pp 71–80

  • Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE TKDE 17(8): 1021–1035

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Li.

Additional information

Responsible editor: M. J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Feng, J., Wang, J. et al. Incremental sequence-based frequent query pattern mining from XML queries. Data Min Knowl Disc 18, 472–516 (2009). https://doi.org/10.1007/s10618-009-0126-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-009-0126-5

Keywords

Navigation