ABSTRACT
XML (Extensible Mark-up Language) has been embraced as a new approach to data modeling. Nowadays, more and more information is formated as semi-structured data, e.g., articles in a digital library, documents on the web, and so on. Implementation of an efficient system enabling storage and querying of XML documents requires development of new techniques.
Many different techniques of XML indexing have been proposed during recent years. If we consider some classes of indexing methods, we distinguish two kinds of joins for processing twig queries. The first join merges two sets retrieved from an inverted list. The second join applies the first query result in building the second query. Although authors propose improvements of their joins, there has not yet been a discussion about the advantages of applying various join operations. In this article, we propose a join selection based on the cost of a join. By choosing a more appropriate join operation, twig query processing efficiency is significantly improved.
- S. Al-Khalifa, H. V. Jagadish, and N. Koudas. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Proceedings of International Conference on Data Engineering, ICDE 2002. IEEE Computer Society, 2002. Google ScholarDigital Library
- N. Bruno, D. Srivastava, and N. Koudas. Holistic Twig Joins: Optimal XML Pattern Matching. In Proceedings of the ACM International Conference on Management of Data, SIGMOD 2002, pages 310--321. ACM Press, 2002. Google ScholarDigital Library
- S. Chaudhuri. An Overview of Query Optimization in Relational Systems. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, PODS 1998, pages 34--43. ACM Press, 1998. Google ScholarDigital Library
- S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K. S. Candan. Twig2Stack: Bottom-up Processing of Generalized-tree-pattern Queries Over XML documents. In Proceedings of International Conference on Very Large Databases, VLDB 2006, pages 283--294. VLDB Endowment, 2006. Google ScholarDigital Library
- T. Chen, J. Lu, and T. Ling. On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. Proceedings of the ACM International Conference on Management of Data, SIGMOD 2005, pages 455--466, 2005. Google ScholarDigital Library
- Z. Chen, G. Korn, F. Koudas, N. Shanmugasundaram, and J. Srivastava. Index Structures for Matching XML Twigs Using Relational Query Processors. In Proceedings of 13th International Conference on Data Engineering, ICDE 2005, pages 1273--1273. IEEE Computer Society, 2005. Google ScholarDigital Library
- C.-W. Chung, J.-K. Min, and K. Shim. APEX: an Adaptive Path Index for XML Data. In Proceedings of the ACM International Conference on Management of Data, SIGMOD 2002, pages 121--132, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A Fast Index for Semistructured Data. In Proceedings of the 27th International Conference on Very Large Databases, VLDB 2001, pages 341--350, 2001. Google ScholarDigital Library
- F. Frasincar, G.-J. Houben, and C. Pau. XAL: an Algebra for XML Query Optimization. In Proceedings of the 13th Australasian Database Conference, ADC 2002, pages 49--56. Australian Computer Society, Inc., 2002. Google ScholarDigital Library
- T. Grust, M. van Keulen, and J. Teubner. Staircase Join: Teach a Relational DBMS to Watch Its (Axis) Steps. In Proceedings of the 29th, International Conference on Very Large Databases, VLDB 2003, pages 524--535. VLDB Endowment, 2003. Google ScholarDigital Library
- A. Halverson and et al. Mixed Mode XML Query Processing. In Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, pages 225--236. VLDB Endowment, 2003. Google ScholarDigital Library
- W. H. Hanyu Li, Mong Li Lee. A Path-Based Labeling Scheme for Efficient Structural Join. In Proceedings of XSym 2005, pages 34--48. Springer--Verlag, 2005. Google ScholarDigital Library
- T. Härder, M. Haustein, C. Mathis, and M. Wagner. Node Labeling Schemes for Dynamic XML Documents Reconsidered. Data & Knowledge Engineering, 60(1):126--149, 2007. Google ScholarDigital Library
- H. Jiang, W. Wang, H. Lu, and J. Yu. Holistic twig joins on indexed XML documents. Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, pages 273--284, 2003. Google ScholarDigital Library
- M. Krátký, R. Bača, and V. Snášel. Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data. In Proceedings of the 18th International Conference on Database and Expert Systems Applications, DEXA 2007, volume LNCS 4653/2007. Springer--Verlag, 2007.Google Scholar
- M. Krátký, J. Pokorný, and V. Snášel. Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data. In Current Trends in Database Technology, EDBT 2004, volume LNCS 3268/2004. Springer--Verlag, 2004. Google ScholarDigital Library
- Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proceedings of the 27th International Conference on Very Large Databases, VLDB 2001, 2001. Google ScholarDigital Library
- J. Lu, T. Ling, C. Chan, and T. Chen. From Region Encoding to Extended Dewey: on Efficient Processing of XML Twig Pattern Matching. Proceedings of the 31st International Conference on Very Large Databases, VLDB 2005, pages 193--204, 2005. Google ScholarDigital Library
- T. S. M. Yoshikawa, T. Amagasa and S. Uemura. XRel: a Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Trans. Inter. Tech., 1(1):110--141, 2001. Google ScholarDigital Library
- N. May, M. Brantner, A. Böhm, C.-C. Kanne, and G. Moerkotte. Index vs. Navigation in XPath Evaluation. In Proceedings of Database and XML Technologies, XSym 2006, volume LNCS 4156/2006, pages 16--30. Springer--Verlag, 2006. Google ScholarDigital Library
- N. Polyzotis and M. Garofalakis. Structure and Value Synopses for XML Data Graphs. In Proceedings of International Conference on Very Large Databases, VLDB 2002. Morgan Kaufmann, 2002. Google ScholarDigital Library
- N. Polyzotis and M. Garofalakis. XSKETCH Synopses for XML Data Graphs. ACM Trans. Database Syst., 31(3):1014--1063, 2006. Google ScholarDigital Library
- H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. Google ScholarDigital Library
- A. R. Schmidt and at al. The XML Benchmark. Technical Report INS-R0103, CWI, The Netherlands, April, 2001, http://monetdb.cwi.nl/xml/.Google Scholar
- J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. DeWitt, and J. Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities. In Proceedings of the 25th International Conference on Very Large Databases, VLDB 1999. Edinburgh, Scotland, UK, pages 302--314. Morgan Kaufmann, 1999. Google ScholarDigital Library
- S. S. Prakas and S. Madria. SUCXENT: An Efficient Path-Based Approach to Store and Query XML Documents. In Proceedings of Database and Expert Systems Applications, DEXA 2004, volume LNCS 3180/2004, pages 285--295. Springer-Verlag, 2004.Google ScholarCross Ref
- I. Tatarinov and at al. Storing and Querying Ordered XML Using a Relational Database System. In Proceedings of the ACM International Conference on Management of Data, SIGMOD 2002, pages 204--215, New York, USA, 2002. ACM Press. Google ScholarDigital Library
- M. van Keulen. Relational Approach to Logical Query Optimization of XPath. In Proceedings of the 1st Twente Data Management Workshop, TDM'04, pages 57--63, 2004.Google Scholar
- W3 Consortium. XQuery 1.0: An XML Query Language, W3C Working Draft, 12 November 2003, http://www.w3.org/TR/xquery/.Google Scholar
- W3 Consortium. XML Path Language (XPath) Version 2.0, W3C Working Draft, 15 November 2002, http://www.w3.org/TR/xpath20/.Google Scholar
- Y. Wu, J. Patel, and H. Jagadish. Estimating Answer Sizes for XML Queries. In Proceedings of Advances in Database Technology -- EDBT 2002, LNCS, Volume 2287/2002. Springer-Verlag, 2002. Google ScholarDigital Library
- Y. Wu, J. M. Patel, and H. Jagadish. Structural Join Order Selection for XML Query Optimization. In Proceedings of the 19th International Conference on Data Engineering, ICDE 2003, pages 443--454. IEEE Computer Society, 2003.Google ScholarCross Ref
- C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings of the ACM International Conference on Management of Data, SIGMOD 2001, pages 425--436, New York, USA, 2001. ACM Press. Google ScholarDigital Library
Index Terms
- A cost-based join selection for XML twig content-based queries
Recommendations
Cost-based holistic twig joins
An evaluation of XML queries such as XQuery or XPath expressions represents a challenging task due to its complexity. Many algorithms have been introduced to cope with this problem. Some of them, called binary joins, evaluate separated parts of a query ...
Efficient processing of XML twig queries with OR-predicates
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataAn XML twig query, represented as a labeled tree, is essentially a complex selection predicate on both structure and content of an XML document. Twig query matching has been identified as a core operation in querying tree-structured XML data. A number of ...
Efficient processing of multiple XML twig queries
DEXA'06: Proceedings of the 17th international conference on Database and Expert Systems ApplicationsFinding all occurrences of a twig pattern in an XML document is a core operation for XML query processing. The emergence of XML as a common mark-up language for data interchange has spawned great interest in techniques for filtering and content-based ...
Comments