Skip to main content
Log in

Improved selectivity estimator for XML queries based on structural synopsis

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the increasing popularity of XML database applications, the use of efficient XML query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose and evaluate a novel selectivity estimator, based on a structural synopsis, called SynopTech. The main idea of SynopTech is the generation of a summary tree by labeling the nodes of the source XML data tree using a fingerprint function and merging subtrees with similar structures. The generated summary tree is then used by SynopTech to estimate the selectivity of given queries. We experimented the proposed approach with four benchmark datasets of different structural characteristics and using different types of queries. Comparing with the Sampling algorithm, one of the state-of-the-art algorithms for selectivity estimations, SynopTech achieved lower selectivity estimation error rates, yet with very low memory budget. For example, for linear and existential queries, SynopTech had perfect estimations whereas the Sampling algorithm had an error rate of up to 70 %. For regular twig queries, SynopTech had a maximum error rate of 4.12 % whereas the Sampling algorithm had more than 55 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pp. 591–600. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  2. Alrammal, M., Hains, G., Zergaoui, M.: Path tree: Document synopsis for XPath query selectivity estimation In: Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp. 321–328 (2011)

  3. Bray, T.J., Paoli, C., McQueen, S., Maler, E.: Extensible markup language (XML) 1.0 2nd edn. Available: http://www.w3.org/TR/REC-xml (2000)

  4. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp. 310–321 (2002)

  5. Benedikt, M., Fan, W., Kuper, G.: Structural properties of XPath fragments. Theor. Comput. Sci. 336(1), 3–31 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chu, Y., Yu, J.: The research of database query optimization based on XML. Adv. Mater. Res. 546-547, 519–525 (2012)

    Article  Google Scholar 

  7. DBLP: Digital bibliography & library project. http://dblp.uni-trier.de/xml/ (2013)

  8. Drukh, N., Polyzotis, N., Garofalakis, M., Matias, Y.: Fractional XSketch synopses for XML databases. In: Bellahsne, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) Database and XML Technologies, Lecture Notes in Computer Science, vol. 3186, pp. 189–203. Springer, Berlin Heidelberg New York (2004)

    Google Scholar 

  9. Fisher, D., Maneth, S.: Structural selectivity estimation for XML documents In: Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, pp. 626–635 (2007)

  10. Fomichev, A., Grinev, M., Kuznetsov, S.: Sedna: A native XML DBMS. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006: Theory and Practice of Computer Science, Lecture Notes in Computer Science, vol. 3831, pp. 272–281. Springer, Berlin Heidelberg New York (2006)

    Google Scholar 

  11. Hachicha, M., Darmont, J.: A survey of XML tree patterns. IEEE Trans. Knowl. Data Eng. 25(1), 29–46 (2013)

    Article  Google Scholar 

  12. Haw, S.C., Lee, C.S.: Data storage practices and query processing in XML databases: A survey. Knowl.-Based Syst. 24(8), 1317–1340 (2011)

    Article  Google Scholar 

  13. Hong, S.-M., Oh, S.-Y., Yoon, H.: New modular multiplication algorithms for fast modular exponentiation In: Advances in Cryptology – EUROCRYPT’96, pp. 166–177 (1996)

  14. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  15. Lee, M.L., Li, H., Hsu, W., Ooi, B.C.: A statistical approach for XML query size estimation In: Proceedings of the International Conference on Current Trends in Database Technology, EDBT’04, pp. 250–259. Springer-Verlag, Berlin, Heidelberg (2004)

    Google Scholar 

  16. Li, H., Lee, M.L., Hsu, W., Cong, G.: An estimation system for XPath expressions In: Proceedings of the 22nd International Conference on Data Engineering, ICDE ’06, pp. 54–64. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  17. Li, H., Lee, M.L., Hsu, W.: A histogram-based selectivity estimator for skewed XML data. In: Andersen, K., Debenham, J., Wagner, R. (eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol. 3588, pp. 27–279. Springer, Berlin Heidelberg New York (2005)

    Google Scholar 

  18. Lu, J., Ling, T., Bao, Z., Wang, C.: Extended XML tree pattern matching: Theories and algorithms. IEEE Trans. Knowl. Data Eng. 23(3), 402–416 (2011)

    Article  Google Scholar 

  19. Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 335–344 (2009)

  20. Madria, S., Chen, Y., Passi, K., Bhowmick, S.: Efficient processing of XPath queries using indexes. Inf. Syst. 32(1), 131–159 (2007)

    Article  Google Scholar 

  21. Mlynkova, I., Toman, K., Pokornỳ, J.: Statistical Analysis of Real XML Data Collections In: Proceedings of 13th International Conference on Management of Data (COMAD), pp. 20–31 (2006)

  22. Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD ’02, pp. 358–369 (2002)

  23. Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML query answers In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD ’04, pp. 263–274. ACM, New York (2004)

    Google Scholar 

  24. Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Selectivity estimation for XML twigs In: Proceedings of the 20th International Conference on Data Engineering, ICDE ’04, pp. 264–. IEEE Computer Society, Washington, DC (2004)

  25. Polyzotis, N., Garofalakis, M.: XSketch synopses for XML data graphs. ACM Trans. Comput. Syst. 31(3), 1014–1063 (2006)

    Google Scholar 

  26. Sakr, S.: Towards a comprehensive assessment for selectivity estimation approaches of XML queries. Web Eng. Technol. 6, 58–82 (2010)

    Article  Google Scholar 

  27. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB’02, pp. 974–985 (2002)

  28. Shakespeare plays. http://www.ibiblio.org/xml/examples/shakespeare/ (2013)

  29. The Penn TreeBank Project. http://www.cis.upenn.edu/treebank/ (2014)

  30. Wang, W., Jiang, H., Lu, H., Yu, J.X.: Bloom histogram: Path selectivity estimation for XML data with updates In: Proceedings of the 13th International Conference on Very large data bases, VLDB ’04. VLDB Endowment, vol. 30, pp. 240–251 (2004)

  31. Wang, Y., Wang, H., Meng, X., Wang, S.: Estimating the selectivity of XML path expression with predicates by histograms. In: Li, Q., Wang, G., Feng, L. (eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, vol. 3129, pp. 409–418. Springer, Berlin Heidelberg New York (2004)

    Google Scholar 

  32. Wu, Y., Patel, J.M., Jagadish, H.: Estimating answer sizes for XML queries In: Advances in Database Technology, Lecture Notes in Computer Science, vol. 2287, pp. 590–608. Springer, Berlin Heidelberg New York (2002)

    Google Scholar 

  33. Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: Bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)

    Article  Google Scholar 

  34. Yang, L.H., Lee, M.L., Hsu, W., Huang, D., Wong, L.: Efficient mining of frequent XML query patterns with repeating-siblings. Inf. Softw. Technol. 50(5), 375–389 (2008)

    Article  Google Scholar 

  35. Zhang, N., Ozsu, M.T., Aboulnaga, A., Ilyas, I.F.: XSeed: Accurate and fast cardinality estimation for XPath queries In: Proceedings of the 22nd International Conference on Data Engineering, ICDE ’06, pp. 61–71 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salahadin Mohammed.

Additional information

The second author (El-Sayed M. El-Alfy) is on leave from the College of Engineering, Tanta University, Egypt.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohammed, S., El-Alfy, ES.M. & Barradah, A.F. Improved selectivity estimator for XML queries based on structural synopsis. World Wide Web 18, 1123–1144 (2015). https://doi.org/10.1007/s11280-014-0311-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0311-3

Keywords

Navigation