Abstract
XML structural joins, which evaluate the containment (ancestor-descendant) relationships between XML elements, are important operations of XML query processing. Estimating structural join size accurately and quickly is crucial to the success of XML query plan selection and the query optimization. XML structural joins are essentially complex θ-joins, which render well-known estimation techniques for relational equijoins, such as discrete cosine transform, wavelet transform, and sketch, not applicable. In this paper, we model structural joins from a relational point of view and convert the complex θ-joins to equijoins so that those well-known estimation techniques become applicable to structural join size estimation. Theoretical analyses and extensive experiments have been performed on these estimation methods. It is shown that discrete cosine transform requires the least memory and yields the best estimates among the three techniques. Compared with state-of-the-art method IM-DA-Est, discrete cosine transform is much faster, requires less memory, and yields comparable estimates.
Similar content being viewed by others
References
Aboulnaga A, Alameldeen A, Naughton J (2001) Estimating the selectivity of XML path expressions for internet scale applications. In: Proceedings of 27th international conference on very large data bases, pp 591–600
Al-Khalifa S, Jagadish V, Koudas N, Patel M, Srivastava D, Wu Y (2002) Structural joins: a primitive for efficient XML query pattern matching. ICDE, pp 141–152
Alon N, Matias Y, Szegedy M (1996) The space complexity of approximating the frequency moments. In: Proceedings of the 28th annual ACM symposium on theory of computing, pp 20–29
Alon N, Gibons P, Matias Y, Szegedy M (1999) Tracking join and self-join sizes in limited storage. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 10–20
Briggs L, Henson E (1995) DFT: an owner’s manual for the discrete Fourier transform. Philadelphia Society for Industrial and Applied Mathematics
Chamberlin D, Florescu D, Robie J, Simeon J, Stefanescu M (2004) XQuery 1.0: an XML query language. W3C Working Draft http://www.w3.org/TR/xquery/
Chen Z, Jagadish V, Korn F, Koudas N, Muthukrishnan S, Ng T, Srivastava D (2001) Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering, pp 595–604
Chui C (1992). An introduction to wavelets. Academic, New York
Clark J, DeRose S (1999) XML path language (XPath). W3C Working Draft http://www.w3.org/TR/xpath
Dobra A, Garofalakis M, Gchrkc J, Rastogi R (2002) Processing complex aggregate queries over data stream. ACM-SIGMOD, Madison, pp 61–72
Freire J, Haritsa R, Ramanath M, Roy P, Siméon J (2002) Statix: making XML count. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 181–191
Gilbert A, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the 27th international conferrence on VLDB, pp 79–88
Issacson E and Keller B (1994). Analysis of numerical methods theorem 3. Dover Publications, New York, 238
Jiang H, Lu H, Wang W, Ooi B (2003) XR-Tree: indexing XML data for efficient structural join. In: Proceedings of ICDE, India, pp 253–264
Jiang Z, Luo C, Hou W-C, Yan F, Zhu Q and Wang C-F (2007). Join size estimation over data streams using cosine series. Int J Inf Technol 12(9): 27–45
Lee J, Kim, Chung C (1999) Multi-dimensional selectivity estimation using compressed histogram information. In: Proceedings ACM SIGMOD conference, pp 205–214
Ley M (2002) The dblp computer science bibliography: Evolution, research issues, perspectives. In: SPIRE 2002, Lisbon, Portugal, September 11–12, 2002. Springer, Heidelberg, pp 1–10
Li Q, Moon B (2001) Indexing and querying XML data for regular path expressions. VLDB, pp 361–370
Matias Y, Vitter J, Wang M (1998) Wavelet-based histograms for selectivity estimation. SIGMOD
McHugh J, Widom J (1999) Optimizing branching path expressions. VLDB, pp 315–326
Nievergelt Y (1999). Wavelets made easy. Birkhauser, Basel
Paparizos S, Al-Khalifa S, Chapman A, Jagadish V, Lakshmanan S, Nierman A, Patel M, Srivastava D, Wiwatwattana N, Wu Y and Yu C (2002). TIMBER: a native system for querying XML. VLDB J 11(4): 274–291
Polyzotis N, Garofalakis N (2002) Statistical synopses for graph-structured XML databases. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 358–369
Schmidt A, Waas F, Kersten M, Florescu D, Manolescu L, Carey J, Busse R (2001) The XML benchmark project. Technical report CWI
Wang W, Jiang H, Lu H, Yu X (2003) Containment join size estimation: models and methods. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 145–156
Wu Y, Patel M, Jagadish V (2002) Estimating answer sizes for xml queries. In: 8th International conference on extending database technology, pp 590–608
Zhang C, Naughton F, DeWitt J, Luo Q, Lohman M (2001) On supporting containment queries in relational database management systems. SIGMOD
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luo, C., Jiang, Z., Hou, WC. et al. A relational model for XML structural joins and their size estimations. Knowl Inf Syst 16, 97–127 (2008). https://doi.org/10.1007/s10115-007-0089-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0089-z