Abstract
With the increasing popularity of XML database applications, the use of efficient XML query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose and evaluate a novel selectivity estimator, based on a structural synopsis, called SynopTech. The main idea of SynopTech is the generation of a summary tree by labeling the nodes of the source XML data tree using a fingerprint function and merging subtrees with similar structures. The generated summary tree is then used by SynopTech to estimate the selectivity of given queries. We experimented the proposed approach with four benchmark datasets of different structural characteristics and using different types of queries. Comparing with the Sampling algorithm, one of the state-of-the-art algorithms for selectivity estimations, SynopTech achieved lower selectivity estimation error rates, yet with very low memory budget. For example, for linear and existential queries, SynopTech had perfect estimations whereas the Sampling algorithm had an error rate of up to 70 %. For regular twig queries, SynopTech had a maximum error rate of 4.12 % whereas the Sampling algorithm had more than 55 %.
Similar content being viewed by others
References
Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pp. 591–600. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Alrammal, M., Hains, G., Zergaoui, M.: Path tree: Document synopsis for XPath query selectivity estimation In: Proceedings of the 5th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2011), pp. 321–328 (2011)
Bray, T.J., Paoli, C., McQueen, S., Maler, E.: Extensible markup language (XML) 1.0 2nd edn. Available: http://www.w3.org/TR/REC-xml (2000)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’02, pp. 310–321 (2002)
Benedikt, M., Fan, W., Kuper, G.: Structural properties of XPath fragments. Theor. Comput. Sci. 336(1), 3–31 (2005)
Chu, Y., Yu, J.: The research of database query optimization based on XML. Adv. Mater. Res. 546-547, 519–525 (2012)
DBLP: Digital bibliography & library project. http://dblp.uni-trier.de/xml/ (2013)
Drukh, N., Polyzotis, N., Garofalakis, M., Matias, Y.: Fractional XSketch synopses for XML databases. In: Bellahsne, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) Database and XML Technologies, Lecture Notes in Computer Science, vol. 3186, pp. 189–203. Springer, Berlin Heidelberg New York (2004)
Fisher, D., Maneth, S.: Structural selectivity estimation for XML documents In: Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE, pp. 626–635 (2007)
Fomichev, A., Grinev, M., Kuznetsov, S.: Sedna: A native XML DBMS. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006: Theory and Practice of Computer Science, Lecture Notes in Computer Science, vol. 3831, pp. 272–281. Springer, Berlin Heidelberg New York (2006)
Hachicha, M., Darmont, J.: A survey of XML tree patterns. IEEE Trans. Knowl. Data Eng. 25(1), 29–46 (2013)
Haw, S.C., Lee, C.S.: Data storage practices and query processing in XML databases: A survey. Knowl.-Based Syst. 24(8), 1317–1340 (2011)
Hong, S.-M., Oh, S.-Y., Yoon, H.: New modular multiplication algorithms for fast modular exponentiation In: Advances in Cryptology – EUROCRYPT’96, pp. 166–177 (1996)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Lee, M.L., Li, H., Hsu, W., Ooi, B.C.: A statistical approach for XML query size estimation In: Proceedings of the International Conference on Current Trends in Database Technology, EDBT’04, pp. 250–259. Springer-Verlag, Berlin, Heidelberg (2004)
Li, H., Lee, M.L., Hsu, W., Cong, G.: An estimation system for XPath expressions In: Proceedings of the 22nd International Conference on Data Engineering, ICDE ’06, pp. 54–64. IEEE Computer Society, Washington, DC (2006)
Li, H., Lee, M.L., Hsu, W.: A histogram-based selectivity estimator for skewed XML data. In: Andersen, K., Debenham, J., Wagner, R. (eds.) Database and Expert Systems Applications, Lecture Notes in Computer Science, vol. 3588, pp. 27–279. Springer, Berlin Heidelberg New York (2005)
Lu, J., Ling, T., Bao, Z., Wang, C.: Extended XML tree pattern matching: Theories and algorithms. IEEE Trans. Knowl. Data Eng. 23(3), 402–416 (2011)
Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 335–344 (2009)
Madria, S., Chen, Y., Passi, K., Bhowmick, S.: Efficient processing of XPath queries using indexes. Inf. Syst. 32(1), 131–159 (2007)
Mlynkova, I., Toman, K., Pokornỳ, J.: Statistical Analysis of Real XML Data Collections In: Proceedings of 13th International Conference on Management of Data (COMAD), pp. 20–31 (2006)
Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured XML databases In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD ’02, pp. 358–369 (2002)
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML query answers In: Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD ’04, pp. 263–274. ACM, New York (2004)
Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Selectivity estimation for XML twigs In: Proceedings of the 20th International Conference on Data Engineering, ICDE ’04, pp. 264–. IEEE Computer Society, Washington, DC (2004)
Polyzotis, N., Garofalakis, M.: XSketch synopses for XML data graphs. ACM Trans. Comput. Syst. 31(3), 1014–1063 (2006)
Sakr, S.: Towards a comprehensive assessment for selectivity estimation approaches of XML queries. Web Eng. Technol. 6, 58–82 (2010)
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB’02, pp. 974–985 (2002)
Shakespeare plays. http://www.ibiblio.org/xml/examples/shakespeare/ (2013)
The Penn TreeBank Project. http://www.cis.upenn.edu/treebank/ (2014)
Wang, W., Jiang, H., Lu, H., Yu, J.X.: Bloom histogram: Path selectivity estimation for XML data with updates In: Proceedings of the 13th International Conference on Very large data bases, VLDB ’04. VLDB Endowment, vol. 30, pp. 240–251 (2004)
Wang, Y., Wang, H., Meng, X., Wang, S.: Estimating the selectivity of XML path expression with predicates by histograms. In: Li, Q., Wang, G., Feng, L. (eds.) Advances in Web-Age Information Management, Lecture Notes in Computer Science, vol. 3129, pp. 409–418. Springer, Berlin Heidelberg New York (2004)
Wu, Y., Patel, J.M., Jagadish, H.: Estimating answer sizes for XML queries In: Advances in Database Technology, Lecture Notes in Computer Science, vol. 2287, pp. 590–608. Springer, Berlin Heidelberg New York (2002)
Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: Bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)
Yang, L.H., Lee, M.L., Hsu, W., Huang, D., Wong, L.: Efficient mining of frequent XML query patterns with repeating-siblings. Inf. Softw. Technol. 50(5), 375–389 (2008)
Zhang, N., Ozsu, M.T., Aboulnaga, A., Ilyas, I.F.: XSeed: Accurate and fast cardinality estimation for XPath queries In: Proceedings of the 22nd International Conference on Data Engineering, ICDE ’06, pp. 61–71 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
The second author (El-Sayed M. El-Alfy) is on leave from the College of Engineering, Tanta University, Egypt.
Rights and permissions
About this article
Cite this article
Mohammed, S., El-Alfy, ES.M. & Barradah, A.F. Improved selectivity estimator for XML queries based on structural synopsis. World Wide Web 18, 1123–1144 (2015). https://doi.org/10.1007/s11280-014-0311-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-014-0311-3