Skip to main content
Log in

Query optimization in XML structured-document databases

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structured-document query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Amer-Yahia, S., Cho, S., Lakshmanan, L., Srivastava, D.: Minimization of tree pattern queries. In: Proceedings of ACM Conf. on Management of Data (SIGMOD), pp. 497–508 (2001)

  2. Boag, S., Chamberlin, D., Fernandez, MF., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML Query Language, (http://www.w3.org/TR/xquery/) (2003)

  3. Böhm, K., Aberer, K., Neuhold, E.J., Yang, X.: Structured document storage and refined declarative and navigational access mechanisms in HyperStorM. The VLDB Journal 6(4), 296–311 (1997)

    Article  Google Scholar 

  4. Böhm, K., Aberer, K., T Özsu, M., Gayer, K.: Query optimization for structured documents based on knowledge on the document type definition. In: Proceedings of IEEE International Forum on Research and Technology Advances in Digital Libraries, April 22–24, 1998, Santa Barbara, California, pp. 196–205 (1998)

  5. Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML schema to relations: A cost-based approach to XML storage. In: Proceedings of the 18th International Conference on Data Engineering (ICDE’02), 2002, pp. 64–75 (2002)

  6. Bremer, J.M., Gertz, M.: An efficient XML node identification and indexing scheme. University of California at Davis, Technical Report (2003) (http://www.db.cs.ucdavis.edu/papers/TR_CSE-2003-04_BremerGertz.pdf)

  7. Buneman, P., Fan, W., Weinstein, S.: Query optimization for semistructured data using path constraints in a deterministic data model. In: Proceddings Of DBPL, pp. 208–223 (1999)

  8. Chung, T.-S., Kim, H.-J. XML query processing using document type definitions. Journal of Systems and Software 64(3), 195–205 (2002)

    Google Scholar 

  9. Chan, C.-Y., Felber, P., Garofalakis, M., Rastogi, R.: Efficient filtering of XML documents with XPath expressions. In: Proceedings of International Conference on Data Engineering, San Jose, California, February 2002, pp. 235–244 (2002)

  10. Chan, C., Garofalakis, M.N., Rastogi, R.: RE-Tree: An efficient index structure for regular expressions. The VLDB Journal 12(2), 102–119 (2002)

    Article  Google Scholar 

  11. Che, D., Aberer, K.: A heuristics-based approach to query optimization in structured document databases. In: Proceedings Of International Database Engineering and Application Symposium, Montreal, Canada, August 2–4, 1999 pp. 24–33 (1999)

  12. Che, D.: Implementation issues of a deterministic transformation system for structured document query optimization. In: Proceedings Of the Seventh International Database Engineering and Applications Symposium, July 16–18, 2003, Hong Kong, China, pp. 268–277 (2003)

  13. Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: Proceedings of 28th International Conference on VLDB, Hong Kong, China, 2002, pp. 263–274 (2002)

  14. Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0 (1999) (http://www.w3.org/TR/1999/REC-xpath-19991116)

  15. Consens, M., Milo, T.: Optimizing queries on files. In: Proceedings of ACM SIGMOD International Conference on Management of Data, May 1994 pp. 301–312 (1994)

  16. Dao, T.: An Indexing model for structured documents to support queries on content, structure and attributes. In: Proceedings of IEEE International Forum on Research and Technology Advances in Digital Libraries, Santa Barbara, California, April 22–24, 1998 pp. 88–97 (1998)

  17. Fernandez, M.F., Suciu, D.: Optimizing regular path expressions using graph schemas. In: Proceedings of the Fourteenth International Conference on Data Engineering, February 23–27, 1998, Orlando, Florida, USA, pp. 14–23 (1998)

  18. Fernandez, M., Tan, W., Suciu.: SilkRoute: Trading between Relations and XML. In: Proceedings of the 9th Int. World Wide Web Conference, Amsterdam, (May 2000)

  19. Fiebig, T., Helmer, S., Kanne, C.-C., Moerkotte, G., Neumann, J., Schiele, R., Westmann, T.: Anatomy of a native XML base management system. The VLDB Journal 11(4), 292–314 (2002)

    Article  Google Scholar 

  20. Flesca, S., Furfaro, F., Masciari, E.: On the minimization of Xpath queries. In: Proceedings of VLDB, pp. 153-164, 2003, pp. 153–164 (2003)

  21. Florescu, D., Kossmann, D.: Storing and Querying XML Data Using an RDMBS. IEEE Engineering Bulletin 22(3):27–34 (1999)

    Google Scholar 

  22. Frasincar, F., Houben, G.-J., Pau, C.: XAL: an Algebra for XML Query Optimization. In: Proceedings of 13th Australasian Database Conference, Melbourne, Australia (2002)

  23. Gerstein, L.J.: Discrete Mathematics and Algebraic Structures. (W H Freeman and Company, New York, 1987)

    Google Scholar 

  24. Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. In: Proceedings of VLDB, Hongkong, China (2002)

  25. Grust, T.: Accelerating XPath location steps. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 109–120 (2002)

  26. Guha, S.,Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML joins. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 287–298 (2002)

  27. Jagadish, H.V., Lakshmanan, L.V.S., Srivastava, D., Thompson, K.: TAX: A tree algebra for XML. In: Proceedings of DBPL Conference, Rome, Italy, 2001, pp. 149–164 (2001)

  28. Klettke, M., Meyer, H.: XML and object-relational database systems-enhancing structural mappings based on statistics. In: Proceedings of International Workshop on the Web and Databases (WebDB), Dallas, (May 2000)

  29. Kwong, A.,Gertz, M.: Schema-based optimization of XPath expressions. Technical report, Dept. of Computer Science, University Of California (2001)

  30. XMach-1: Benchmarking XML Data Management Systems (http://dbs.uni-leipzig.de/en/projekte/XML/XmlBenchmarking.html).

  31. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proceedings Of the 27th International Conference on Very Large Databases, Rome, Italy, September 2000, pp. 361–370 (2001)

  32. McHugh, J., Abiteboul, S., Goldman, R., Quass, D., Widom, J.: Lore: A database management system for semistructured data. SIGMOD Record, September 1997, 26(3), 54–66 (1997)

  33. McHugh, J., Widom, J.: Query optimization for XML. In: Proceedings of the 25th International Conference on Very Large Databases, Edinburgh, Scotland, September 1999, pp. 315–326 (1999)

  34. The Michigan Benchmark (http://www.eecs.umich.edu/db/mbench/description.html).

  35. Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of ICDT, pp. 277–295 (1999)

  36. The XOO7 Benchmark (http://www.comp.nus.edu.sg/~ebh/XOO7.html).

  37. Salminen, A ., Tompa, F.W.: PAT expressions: an algebra for text search. Acta Linguistica Hungarica 41(1), 277–306 (1994)

    Google Scholar 

  38. Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational databases for querying XML documents: Limitations and Opportunities. In: Proceedings of VLDB, pp. 302–314 (1999)

  39. Schmidt, A.R., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: Proceedings of the International Conference on Very Large Data Bases, Hong Kong, China, August 2002, pp. 974–985 (2002)

  40. Srivastava, D., Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Wu, Y.: Structural Joins: A primitive for efficient XML query pattern matching. In: Proceedings of ICDE (2002)

  41. Surjanto, B., Ritter, N., Loeser, H.: XML content management based on object-relational database technology. In: Proceedings of the 1st International Conference On Web Information Systems Engineering (WISE), Hongkong, (June 2000)

  42. Wang, G., Liu, M.: Query processing and optimization for regular path expressions. In: Proceedings of CAiSE, 2003, pp. 30–45 (2003)

  43. Standard Generalized Markup Language (http://xml.coverpages.org/sgml.html)

  44. Extensible Markup Language (http://xml.coverpages.org/xml.html)

  45. Yao, B.B., T Özsu, M., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: Proceedings of 20th International Conference on Data Engineering, Boston, MA, pp. 621–632 (2004)

  46. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. ACM SIGMOD Record 30(2):425-436 (2001)

    Article  Google Scholar 

  47. Zhang, N., Kacholia, V., Vzsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proceedings of 20th International Conference on Data Engineering, Boston, MA, pp. 56–65 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dunren Che.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Che, D., Aberer, K. & Özsu, M.T. Query optimization in XML structured-document databases. The VLDB Journal 15, 263–289 (2006). https://doi.org/10.1007/s00778-005-0172-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0172-6

Keywords

Navigation