Abstract
A key step in the optimization of declarative queries over XML data is estimating the selectivity of path expressions, i.e., the number of elements reached by a specific navigation pattern through the XML data graph. Recent studies have introduced XSketch structural graph synopses as an effective, space-efficient tool for the compile-time estimation of complex path-expression selectivities over graph-structured, schema-less XML data. Briefly, XSketches exploit localized graph stability and well-founded statistical assumptions to accurately approximate the path and branching distribution in the underlying XML data graph. Empirical results have demonstrated the effectiveness of XSketch summaries over real-life and synthetic data sets, and for a variety of path-expression workloads.
In this paper, we introduce fractional XSketches (fXSketches) a simple, yet intuitive and very effective generalization of the basic XSketch summarization mechanism. In a nutshell, our fXSketch synopsis extends the conventional notion of binary stability (employed in XSketches) with that of fractional stability, essentially recording more detailed path/branching distribution information on individual synopsis edges. As we demonstrate, this natural extension results in several key benefits over conventional XSketches, including (a) a simplified estimation framework, (b) reduced run-time complexity for the synopsis-construction algorithm, and (c) lifting the need for critical uniformity assumptions during estimation (thus resulting in more accurate estimates). Results from an extensive experimental study show that our fXSketch synopses yield significantly better selectivity estimates than conventional XSketches, especially in the context of complex path expressions with branching predicates.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Clark, J., DeRose, S.: XML Path Language (XPath), Version 1.0. W3C Recommendation (1999), available from http://www.w3.org/TR/xpath/
Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In: Proceedings of the 27th Intl. Conf. on Very Large Data Bases (2001)
Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: StatiX: Making XML Count. In: Proceedings of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (2002)
Lim, L., Wang, M., Padmanabhan, S., Vitter, J., Parr, R.: XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation. In: Proceedings of the 28th Intl. Conf. on Very Large Data Bases (2002)
Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph Structured XML Databases. In: Proceedings of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (2002)
Polyzotis, N., Garofalakis, M.: Structure and Value Synopses for XML Data Graphs. In: Proceedings of the 28th Intl. Conf. on Very Large Data Bases (2002)
Wang, W., Jiang, H., Lu, H., Yu, J.X.: Containment join size estimation: Models and methods. In: Proceedings of the 2003 ACM SIGMOD Intl. Conf. on Management of Data (2003)
Wu, Y., Patel, J.M., Jagadish, H.: Estimating Answer Sizes for XML Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 590. Springer, Heidelberg (2002)
Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting Local Similarity for Efficient Indexing of Paths in Graph Structured Data. In: Proceedings of the Eighteenth Intl. Conf. on Data Engineering, San Jose, California (2002)
Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E.: Extensible Markup Language(XML) 1.0 (Second Edn.). W3C Recommendation (2000), available from http://www.w3.org/TR/REC-xml/
DeRose, S., Maler, E., Orchard, D.: XML Linking Language (XLink), Version 1.0. W3C Recommendation (2001), available from http://www.w3.org/TR/xlink/
McHugh, J., Widom, J.: Query Optimization for XML. In: Proceedings of the 25th Intl. Conf. on Very Large Data Bases (1999)
Chamberlin, D., Clark, J., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery 1.0: An XML Query Language. W3C Working Draft 07 (2001), available from http://www.w3.org/TR/xquery/
Paige, R., Tarjan, R.E.: Three Partition Refinement Algorithms. SIAM Journal on Computing 16 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Drukh, N., Polyzotis, N., Garofalakis, M., Matias, Y. (2004). Fractional XSketch Synopses for XML Databases. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds) Database and XML Technologies. XSym 2004. Lecture Notes in Computer Science, vol 3186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30081-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-30081-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22969-8
Online ISBN: 978-3-540-30081-6
eBook Packages: Springer Book Archive