Improved selectivity estimator for XML queries based on structural synopsis

Mohammed, Salahadin; El-Alfy, El-Sayed M.; Barradah, Ahmad F.

doi:10.1007/s11280-014-0311-3

Improved selectivity estimator for XML queries based on structural synopsis

Published: 06 December 2014

Volume 18, pages 1123–1144, (2015)
Cite this article

World Wide Web Aims and scope Submit manuscript

Salahadin Mohammed¹,
El-Sayed M. El-Alfy¹ &
Ahmad F. Barradah²

147 Accesses
5 Citations
Explore all metrics

Abstract

With the increasing popularity of XML database applications, the use of efficient XML query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose and evaluate a novel selectivity estimator, based on a structural synopsis, called SynopTech. The main idea of SynopTech is the generation of a summary tree by labeling the nodes of the source XML data tree using a fingerprint function and merging subtrees with similar structures. The generated summary tree is then used by SynopTech to estimate the selectivity of given queries. We experimented the proposed approach with four benchmark datasets of different structural characteristics and using different types of queries. Comparing with the Sampling algorithm, one of the state-of-the-art algorithms for selectivity estimations, SynopTech achieved lower selectivity estimation error rates, yet with very low memory budget. For example, for linear and existential queries, SynopTech had perfect estimations whereas the Sampling algorithm had an error rate of up to 70 %. For regular twig queries, SynopTech had a maximum error rate of 4.12 % whereas the Sampling algorithm had more than 55 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

XHQE: A hybrid system for scalable selectivity estimation of XML queries

Article 09 June 2015

E.-S. M. El-Alfy, S. Mohammed & A. F. Barradah

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

Optimization of XML Queries by Using Semantics in XML Schemas and the Document Structure

References

Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pp. 591–600. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Alrammal, M., Hains, G., Zergaoui, M.: Path tree: Document synopsis for XPath query selectivity estimation In: Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp. 321–328 (2011)
Bray, T.J., Paoli, C., McQueen, S., Maler, E.: Extensible markup language (XML) 1.0 2nd edn. Available: http://www.w3.org/TR/REC-xml (2000)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp. 310–321 (2002)
Benedikt, M., Fan, W., Kuper, G.: Structural properties of XPath fragments. Theor. Comput. Sci. 336(1), 3–31 (2005)
Article MATH MathSciNet Google Scholar
Chu, Y., Yu, J.: The research of database query optimization based on XML. Adv. Mater. Res. 546-547, 519–525 (2012)
Article Google Scholar
DBLP: Digital bibliography & library project. http://dblp.uni-trier.de/xml/ (2013)
Drukh, N., Polyzotis, N., Garofalakis, M., Matias, Y.: Fractional XSketch synopses for XML databases. In: Bellahsne, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) Database and XML Technologies, Lecture Notes in Computer Science, vol. 3186, pp. 189–203. Springer, Berlin Heidelberg New York (2004)
Google Scholar
Fisher, D., Maneth, S.: Structural selectivity estimation for XML documents In: Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, pp. 626–635 (2007)
Fomichev, A., Grinev, M., Kuznetsov, S.: Sedna: A native XML DBMS. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006: Theory and Practice of Computer Science, Lecture Notes in Computer Science, vol. 3831, pp. 272–281. Springer, Berlin Heidelberg New York (2006)
Google Scholar
Hachicha, M., Darmont, J.: A survey of XML tree patterns. IEEE Trans. Knowl. Data Eng. 25(1), 29–46 (2013)
Article Google Scholar
Haw, S.C., Lee, C.S.: Data storage practices and query processing in XML databases: A survey. Knowl.-Based Syst. 24(8), 1317–1340 (2011)
Article Google Scholar
Hong, S.-M., Oh, S.-Y., Yoon, H.: New modular multiplication algorithms for fast modular exponentiation In: Advances in Cryptology – EUROCRYPT’96, pp. 166–177 (1996)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Article MATH MathSciNet Google Scholar
Lee, M.L., Li, H., Hsu, W., Ooi, B.C.: A statistical approach for XML query size estimation In: Proceedings of the International Conference on Current Trends in Database Technology, EDBT’04, pp. 250–259. Springer-Verlag, Berlin, Heidelberg (2004)
Google Scholar
Li, H., Lee, M.L., Hsu, W., Cong, G.: An estimation system for XPath expressions In: Proceedings of the 22nd International Conference on Data Engineering, ICDE ’06, pp. 54–64. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Li, H., Lee, M.L., Hsu, W.: A histogram-based selectivity estimator for skewed XML data. In: Andersen, K., Debenham, J., Wagner, R. (eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol. 3588, pp. 27–279. Springer, Berlin Heidelberg New York (2005)
Google Scholar
Lu, J., Ling, T., Bao, Z., Wang, C.: Extended XML tree pattern matching: Theories and algorithms. IEEE Trans. Knowl. Data Eng. 23(3), 402–416 (2011)
Article Google Scholar
Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 335–344 (2009)
Madria, S., Chen, Y., Passi, K., Bhowmick, S.: Efficient processing of XPath queries using indexes. Inf. Syst. 32(1), 131–159 (2007)
Article Google Scholar
Mlynkova, I., Toman, K., Pokornỳ, J.: Statistical Analysis of Real XML Data Collections In: Proceedings of 13th International Conference on Management of Data (COMAD), pp. 20–31 (2006)
Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD ’02, pp. 358–369 (2002)
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML query answers In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD ’04, pp. 263–274. ACM, New York (2004)
Google Scholar
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Selectivity estimation for XML twigs In: Proceedings of the 20th International Conference on Data Engineering, ICDE ’04, pp. 264–. IEEE Computer Society, Washington, DC (2004)
Polyzotis, N., Garofalakis, M.: XSketch synopses for XML data graphs. ACM Trans. Comput. Syst. 31(3), 1014–1063 (2006)
Google Scholar
Sakr, S.: Towards a comprehensive assessment for selectivity estimation approaches of XML queries. Web Eng. Technol. 6, 58–82 (2010)
Article Google Scholar
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB’02, pp. 974–985 (2002)
Shakespeare plays. http://www.ibiblio.org/xml/examples/shakespeare/ (2013)
The Penn TreeBank Project. http://www.cis.upenn.edu/treebank/ (2014)
Wang, W., Jiang, H., Lu, H., Yu, J.X.: Bloom histogram: Path selectivity estimation for XML data with updates In: Proceedings of the 13th International Conference on Very large data bases, VLDB ’04. VLDB Endowment, vol. 30, pp. 240–251 (2004)
Wang, Y., Wang, H., Meng, X., Wang, S.: Estimating the selectivity of XML path expression with predicates by histograms. In: Li, Q., Wang, G., Feng, L. (eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, vol. 3129, pp. 409–418. Springer, Berlin Heidelberg New York (2004)
Google Scholar
Wu, Y., Patel, J.M., Jagadish, H.: Estimating answer sizes for XML queries In: Advances in Database Technology, Lecture Notes in Computer Science, vol. 2287, pp. 590–608. Springer, Berlin Heidelberg New York (2002)
Google Scholar
Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: Bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)
Article Google Scholar
Yang, L.H., Lee, M.L., Hsu, W., Huang, D., Wong, L.: Efficient mining of frequent XML query patterns with repeating-siblings. Inf. Softw. Technol. 50(5), 375–389 (2008)
Article Google Scholar
Zhang, N., Ozsu, M.T., Aboulnaga, A., Ilyas, I.F.: XSeed: Accurate and fast cardinality estimation for XPath queries In: Proceedings of the 22nd International Conference on Data Engineering, ICDE ’06, pp. 61–71 (2006)

Download references

Author information

Authors and Affiliations

College of Computer Sciences and Engineering, King Fahd University of Petroleum & Minerals, Dhahran, 31261, Saudi Arabia
Salahadin Mohammed & El-Sayed M. El-Alfy
Exploration Network Operations Department, Saudi ARAMCO, Dhahran, 31311, Saudi Arabia
Ahmad F. Barradah

Authors

Salahadin Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
El-Sayed M. El-Alfy
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad F. Barradah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salahadin Mohammed.

Additional information

The second author (El-Sayed M. El-Alfy) is on leave from the College of Engineering, Tanta University, Egypt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohammed, S., El-Alfy, ES.M. & Barradah, A.F. Improved selectivity estimator for XML queries based on structural synopsis. World Wide Web 18, 1123–1144 (2015). https://doi.org/10.1007/s11280-014-0311-3

Download citation

Received: 18 May 2013
Revised: 13 August 2014
Accepted: 11 November 2014
Published: 06 December 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11280-014-0311-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved selectivity estimator for XML queries based on structural synopsis

Abstract

Access this article

Similar content being viewed by others

XHQE: A hybrid system for scalable selectivity estimation of XML queries

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

Optimization of XML Queries by Using Semantics in XML Schemas and the Document Structure

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved selectivity estimator for XML queries based on structural synopsis

Abstract

Access this article

Similar content being viewed by others

XHQE: A hybrid system for scalable selectivity estimation of XML queries

D2-Index: A Dynamic Index Method for Querying XML and Semi-Structured Data

Optimization of XML Queries by Using Semantics in XML Schemas and the Document Structure

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation