XHQE: A hybrid system for scalable selectivity estimation of XML queries

El-Alfy, E.-S. M.; Mohammed, S.; Barradah, A. F.

doi:10.1007/s10796-015-9561-6

XHQE: A hybrid system for scalable selectivity estimation of XML queries

Published: 09 June 2015

Volume 18, pages 1233–1249, (2016)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

E.-S. M. El-Alfy¹,
S. Mohammed¹ &
A. F. Barradah²

218 Accesses
Explore all metrics

Abstract

With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved selectivity estimator for XML queries based on structural synopsis

Article 06 December 2014

Smooth Scan: robust access path selection without cardinality estimation

Article 29 May 2018

SMat-J: A Sparse Matrix-Based Join for SPARQL Query Processing

References

Aboulnaga, A, & Naughton, JF (2003). Building XML statistics for the hidden web. In Proceedings of the twelfth ACM International Conference on Information and Knowledge Management, pp 358–365.
Aboulnaga, A, Alameldeen, AR, & Naughton, JF (2001). Estimating the selectivity of XML path expressions for Internet scale applications. In Proceedings of the 27th International Conference on Very Large Data Bases, San Francisco, CA, USA, VLDB’01.
Agrawal, R, Ailamaki, A, Bernstein, PA, Brewer, EA, Carey, MJ, Chaudhuri, S, Doan, A, Florescu, D, Franklin, MJ, Garcia-Molina, H, & et al (2009). The claremont report on database research. Communications of the ACM, 52(6), 56–65.
Article Google Scholar
Alrammal, M, & Hains, G (2014). A research survey on large XML data: Streaming, selectivity estimation and parallelism. Inter-cooperative Collective Intelligence: Techniques and Applications Studies in Computational Intelligence, 495, 167–202.
Article Google Scholar
Alrammal, M, Hains, G, & Zergaoui, M (2011). Path tree: Document synopsis for XPath query selectivity estimation. In Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp 321–328.
Bosak, J (2014). verified April 2014 Shakespeare plays. http://www.ibiblio.org/xml/examples/shakespeare/.
Bray, TJ, Paoli, C, McQueen, S, & Maler, E (2000). Extensible markup language (XML) 1.0, Second Edition. Available: http://www.w3.org/TR/REC-xml.
Bruno, N, Koudas, N, & Srivastava, D (2002). Holistic twig joins: optimal XML pattern matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 310–321.
Chu, Y, & Yu, J (2012). The research of database query optimization based on XML. Advanced Materials Research, 546-547, 519–525.
Article Google Scholar
Drukh, N, Polyzotis, N, Garofalakis, M, & Matias, Y (2004). Fractional XSketch synopses for XML databases. In Bellahsne, Z, Milo, T, Rys, M, Suciu, D, & Unland, R (Eds.) Database and XML Technologies, Lecture Notes in Computer Science, (Vol. 3186 pp. 189–203): Springer Berlin Heidelberg.
Fisher, D, & Maneth, S (2007). Structural selectivity estimation for XML documents. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, (Vol. 2007 pp. 626–635).
Gou, G, & Chirkova, R (2007). Efficiently querying large XML data repositories: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1381–1403.
Article Google Scholar
Grün C (2010). Storing and querying large XML instances. PhD thesis.
Hachicha, M, & Darmont, J (2013). A survey of XML tree patterns. IEEE Transactions on Knowledge and Data Engineering, 25(1), 29–46.
Article Google Scholar
Haw, SC, & Lee, CS (2011). Data storage practices and query processing in XML databases: A survey. Knowledge-Based Systems, 24(8), 1317–1340.
Article Google Scholar
He, W, Lv, T, Meis, M, & Yan, P (2013). Visual evaluation of XPath queries. In IEEE Fifth International Conference on Computational and Information Sciences (ICCIS), pp 434–437.
Izadi, SK, Haghjoo, MS, & H?rder, T (2012). S3: Processing tree-pattern XML queries with all logical operators. Data & Knowledge Engineering 72:31–62.
Karp, RM, & Rabin, MO (1987). Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2), 249–260.
Article Google Scholar
Lee, ML, Li, H, Hsu, W, & Ooi, BC (2004). A statistical approach for XML query size estimation. In Proceedings of the 2004 International Conference on Current Trends in Database Technology, EDBT’04.
Li, H, Lee, ML, & Hsu, W (2005a). A histogram-based selectivity estimator for skewed xml data. In Database and Expert Systems Applications, Springer, pp 270–279.
Li, H, Lee, ML, & Hsu, W (2005b). A histogram-based selectivity estimator for skewed XML data. In Andersen, K, Debenham, J, & Wagner, R (Eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol 3588, Springer Berlin Heidelberg (pp. 270–279).
Lim L, Wang M, Padmanabhan S, Vitter JS, & Parr R (2002). Xpathlearner: An on-line self-tuning markov histogram for XML path selectivity estimation. In Proceedings of the 28th International Conference on Very Large Data Bases, pp 442–453.
Liu, X, Chen, L, Wan, C, Liu, D, & Xiong, N (2013). Exploiting structures in keyword queries for effective XML search. Information Sciences, 240, 56–71.
Article Google Scholar
Lu, J, Ling, T, Bao, Z, & Wang, C (2011). Extended XML tree pattern matching: Theories and algorithms. IEEE Transactions on Knowledge and Data Engineering, 23(3), 402–416.
Article Google Scholar
Luo, C, Jiang, Z, Hou, WC, Yu, F, & Zhu, Q (2009). A sampling approach for xml query selectivity estimation. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp 335–344.
Madria, S, Chen, Y, Passi, K, & Bhowmick, S (2007). Efficient processing of XPath queries using indexes. Information Systems, 32(1), 131–159.
Article Google Scholar
Mohammed, SA, El-Alfy, ESM, & Barradah, AF (2014). Improved selectivity estimator for XML queries based on structural synopsis. World Wide Web 10.1007/s11280-014-0311-3.
Neoklis, P, & Minos, G (2006). Xcluster synopses for structured xml content. In Proceedings of the International Conference on Data Engineering.
Phan, BV, Pardede, E, & Rahayu, W (2013). On the improvement of active XML (AXML) representation and query evaluation. Information Systems Frontiers, 15(2), 203–222.
Article Google Scholar
Polyzotis, N, & Garofalakis, M (2002). Statistical synopses for graph-structured XML databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp 358–369.
Polyzotis, N, & Garofalakis, M (2006). XSketch synopses for XML data graphs. ACM Transactions on Database Systems, 31(3), 1014–1063.
Article Google Scholar
Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004a). Approximate XML query answers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’04.
Polyzotis, N, Garofalakis, M, & Ioannidis, Y (2004b). Selectivity estimation for XML twigs. In Proceedings of the IEEE 20th International Conference on Data Engineering.
Sakr, S. (2007). Cardinality-aware and purely relational implementation of an XQuery processor: PhD thesis, University of Konstanz.
Sakr, S (2008). Algebra-based XQuery cardinality estimation. International Journal of Web Information Systems, 4(1), 7–46.
Article Google Scholar
Sakr, S (2010). Towards a comprehensive assessment for selectivity estimation approaches of XML queries. International Journal of Web Engineering and Technology, 6, 58–82.
Article Google Scholar
Sartiani, C (2003). A framework for estimating XML query cardinality. In WebDB, pp 43–48.
Schmidt, A, Waas, F, Kersten, M, Carey, MJ, Manolescu, I, & Busse, R (2002). XMark: A benchmark for XML data management. In Proceedings of the 28th International Conference on Very Large Databases, VLDB’02, pp 974–985.
Teubner, J, Grust, T, Maneth, S, & Sakr, S (2008). Dependable cardinality forecasts for XQuery. Proceedings of the VLDB Endowment, 1(1), 463–477.
Article Google Scholar
Tian, P, Luo, D, Li, Y, & Gu, J (2014). XML multi-core query optimization based on task preemption and data partition. In Semantic Technology (pp. 294–305): Springer.
Verified April (2014). DBLP: Digital bibliography & library project. http://dblp.uni-trier.de/xml/.
Verified April (2014). UniProt. http://www.uniprot.org/.
Wang, C, Parthasarathy, S, & Jin, R (2006). A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries. In Advances in Database Technology, EDBT (pp. 533–551): Springer.
Wang, W, Jiang, H, Lu, H, & Yu, JX (2004a). Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Databases, VLDB’04.
Wang, W, Jiang, H, Lu, H, & Yu, JX (2004b). Bloom histogram: Path selectivity estimation for XML data with updates. In Proceedings of the Thirtieth International Conference on Very Large Databases.
Wang, Y, Wang, H, Meng, X, & Wang, S (2004c). Estimating the selectivity of XML path expression with predicates by histograms. In Li, Q, Wang, G, & Feng, L (Eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, (Vol. 3129 pp. 409–418): Springer Berlin Heidelberg.
Wu, X, & Liu, G (2008). XML twig pattern matching using version tree. Data & Knowledge Engineering, 64 (3), 580–599.
Article Google Scholar
Wu, X, Theodoratos, D, Wang, WH, & Sellis, T (2013). Optimizing XML queries: Bitmapped materialized views vs. indexes. Information Systems, 38(6), 863–884.
Article Google Scholar
Wu, Y, Patel, JM, & Jagadish, H (2002). Estimating answer sizes for XML queries. In Jensen, C, Šaltenis, S, Jeffery, K, Pokorny, J, Bertino, E, B?hn, K, & Jarke, M (Eds.) Advances in Database Technology, Lecture Notes in Computer Science, (Vol. 2287 pp. 590–608): Springer Berlin Heidelberg.
Yang, LH, Lee, ML, Hsu, W, Huang, D, & Wong, L (2008). Efficient mining of frequent XML query patterns with repeating-siblings. Information and Software Technology, 50(5), 375–389.
Article Google Scholar
Zhang, C, Naughton, J, DeWitt, D, Luo, Q, & Lohman, G (2001). On supporting containment queries in relationaldatabase management systems. SIGMOD Rec, 30(2), 425–436 . doi:10.1145/376284.375722.
Article Google Scholar
Zhang, N, Ozsu, MT, Aboulnaga, A, & Ilyas, If (2006). Xseed: Accurate and fast cardinality estimation for XPath queries. In Proceedings of the IEEE 22nd International Conference on Data Engineering, Washington, DC, USA, ICDE’06.

Download references

Acknowledgments

The first author would like to acknowledge the support provided by King Abdulaziz City for Science and Technology (KACST) through the Science & Technology Unit at King Fahd University of Petroleum & Minerals (KFUPM) for funding this work under Project no. 11-INF1658-04 as part of the National Science, Technology, and Innovation Plan.

Author information

Authors and Affiliations

College of Computer Sciences and Engineering, King Fahd University of Petroleum & Minerals, Dhahran, 31261, Saudi Arabia
E.-S. M. El-Alfy & S. Mohammed
Exploration Network Operations Department, Saudi ARAMCO, Dhahran, 31311, Saudi Arabia
A. F. Barradah

Authors

E.-S. M. El-Alfy
View author publications
You can also search for this author in PubMed Google Scholar
S. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
A. F. Barradah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E.-S. M. El-Alfy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Alfy, ES.M., Mohammed, S. & Barradah, A.F. XHQE: A hybrid system for scalable selectivity estimation of XML queries. Inf Syst Front 18, 1233–1249 (2016). https://doi.org/10.1007/s10796-015-9561-6

Download citation

Published: 09 June 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10796-015-9561-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

XHQE: A hybrid system for scalable selectivity estimation of XML queries

Abstract

Access this article

Similar content being viewed by others

Improved selectivity estimator for XML queries based on structural synopsis

Smooth Scan: robust access path selection without cardinality estimation

SMat-J: A Sparse Matrix-Based Join for SPARQL Query Processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

XHQE: A hybrid system for scalable selectivity estimation of XML queries

Abstract

Access this article

Similar content being viewed by others

Improved selectivity estimator for XML queries based on structural synopsis

Smooth Scan: robust access path selection without cardinality estimation

SMat-J: A Sparse Matrix-Based Join for SPARQL Query Processing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation