ABSTRACT
XML (Extensible Mark-up Language) is a well established format which is often used for modeling of semi-structured data. XPath and XQuery are de facto standards among XML query languages and searching for occurrences of a twig pattern query (TPQ) in an XML document is one of their core tasks.
There is a large number of different approaches addressing the TPQ matching problem. The aim of this article is to compare the state-of-the-art techniques and give an overview which can help to understand the relationships between different methodologies used in this area. We distinguish three main areas of a TPQ processing: (1) index data structures and XML document partitioning, (2) join algorithms, and (3) cost-based optimizations. We cover the most important techniques in each area and explain their relationships and possible combinations.
- S. Al-Khalifa, H. V. Jagadish, and N. Koudas. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Proceedings of ICDE 2002, pages 141--152. IEEE CS, 2002. Google ScholarDigital Library
- R. Bača and M. Krátký. On the Efficiency of a Prefix Path Holistic Algorithm. In Proceedings of Database and XML Technologies, XSym 2009, volume LNCS 5679, pages 25--32. Springer--Verlag, 2009. Google ScholarDigital Library
- R. Bača and M. Krátký. TJDewey -- On the Efficient Path Labeling Scheme Holistic Approach. In Database Systems for Advanced Application, DASFAA 2009 Internationals Workshops, LNCS 5667, pages 6--20. Springer--Verlag, 2009.Google Scholar
- R. Bača, J. Walder, M. Pawlas, and M. Krátký. Benchmarking the Compression of XML Node Streams. In Database Systems for Advanced Applications: 15th International Conference, DASFAA 2010, International Workshops, volume 6193, page 179. Springer-Verlag, 2010. Google ScholarDigital Library
- M. Brantner, S. Helmer, C. Kanne, and G. Moerkotte. Full-fledged Algebraic XPath Processing in Natix. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 705--716. IEEE, 2005. Google ScholarDigital Library
- N. Bruno, D. Srivastava, and N. Koudas. Holistic Twig Joins: Optimal XML Pattern Matching. In Proceedings of ACM SIGMOD 2002, pages 310--321. ACM Press, 2002. Google ScholarDigital Library
- B. Chen, T. W. Ling, M. T. Özsu, and Z. Zhu. On label stream partition for efficient holistic twig join. In Proceedings of the 12th international conference on Database systems for advanced applications, DASFAA'07, pages 807--818, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graph-structured data. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD "03, pages 134--144, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K. S. Candan. Twig2Stack: Bottom-up Processing of Generalized-tree-pattern Queries Over XML documents. In Proceedings of VLDB 2006, pages 283--294, 2006. Google ScholarDigital Library
- T. Chen, J. Lu, and T. W. Ling. On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. In Proceedings of ACM SIGMOD 2005, pages 455--466. ACM Press, 2005. Google ScholarDigital Library
- P. F. Dietz. Maintaining order in a linked list. In Proceedings of 14th annual ACM symposium on Theory of Computing (STOC 1982), pages 122--127, 1982. Google ScholarDigital Library
- M. Fontoura, V. Josifovski, E. Shekita, and B. Yang. Optimizing cursor movement in holistic twig joins. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM "05, pages 784--791, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pages 436--445, 1997. Google ScholarDigital Library
- N. Grimsmo, T. A. Bjorklund, and M. L. Hetland. Fast Optimal Twig Joins. In Proceedings of the 36th International Conference on Very Large Data Bases, VLDB 2010. VLDB Endowment, 2010. Google ScholarDigital Library
- T. Grust. Accelerating XPath Location Steps. In Proceedings of ACM SIGMOD 2002, Madison, USA. ACM Press, 2002. Google ScholarDigital Library
- T. Grust, M. van Keulen, and J. Teubner. Staircase Join: Teach a Relational DBMS to Watch Its (Axis) Steps. In Proceedings of VLDB 2003, pages 524--535, 2003. Google ScholarDigital Library
- A. Halverson, J. Burger, L. Galanis, A. Kini, R. Krishnamurthy, A. N. Rao, F. Tian, S. D. Viglas, Y. Wang, J. F. Naughton, and D. J. DeWitt. Mixed mode XML query processing. In Proceedings of the 29th international conference on Very large data bases, VLDB 2003, pages 225--236. VLDB Endowment, 2003. Google ScholarDigital Library
- T. Härder, M. Haustein, C. Mathis, and M. Wagner. Node Labeling Schemes for Dynamic XML Documents Reconsidered. Data & Knowledge Engineering, 60(1):126--149, 2007. Google ScholarDigital Library
- H. Jiang, H. Lu, W. Wang, and B. Ooi. XR-Tree: Indexing XML Data for Efficient. In Proceedings of ICDE, 2003, India. IEEE, 2003.Google Scholar
- H. Jiang, W. Wang, H. Lu, and J. Yu. Holistic Twig Joins on Indexed XML Documents. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, pages 273--284, 2003. Google ScholarDigital Library
- R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In Proceedings of ACM SIGMOD 2002, pages 133--144. ACM Press, 2002. Google ScholarDigital Library
- R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting Local Similarity to Efficiently Index Paths in Graph-Structured Data. In Proceedings of ICDE'02, 2002.Google Scholar
- M. Krátký, J. Pokorný, and V. Snášel. Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data. In Current Trends in Database Technology, EDBT 2004, volume 3268 of LNCS. Springer, 2004. Google ScholarDigital Library
- Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proceedings of VLDB 2001, 2001. Google ScholarDigital Library
- J. Lu, T. Chen, and T. W. Ling. Efficient Processing of XML Twig Patterns with Parent Child Edges: a Look-ahead Approach. In Proceedings of ACM CIKM 2004, pages 533--542. ACM Press, 2004. Google ScholarDigital Library
- J. Lu, T. Ling, C. Chan, and T. Chen. From Region Encoding to Extended Dewey: on Efficient Processing of XML Twig Pattern Matching. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pages 193--204, 2005. Google ScholarDigital Library
- N. May, M. Brantner, A. Böhm, C.-C. Kanne, and G. Moerkotte. Index vs. Navigation in XPath Evaluation. In Database and XML Technologies, volume 4156 of Lecture Notes in Computer Science, pages 16--30. Springer Berlin / Heidelberg, 2006. Google ScholarDigital Library
- P. Michiels, G. Mihaila, and J. Simeon. Put a Tree Pattern in Your Algebra. In Proceedings of the 23th International Conference on Data Engineering, ICDE 2007, pages 246--255, 2007.Google ScholarCross Ref
- L. Qin, J. Yu, and B. Ding. TwigList: Make Twig Pattern Matching Fast. In The 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, volume 4443 of LNCS, pages 850--862. Springer-Verlag, 2007. Google ScholarDigital Library
- C. Re, J. Simeon, and M. Fernandez. A Complete and Efficient Algebraic Compiler for XQuery. In Data Engineering, 2006. ICDE "06. Proceedings of the 22nd International Conference on, page 14. IEEE Computer Society, 2006. Google ScholarDigital Library
- I. Tatarinov et al. Storing and Querying Ordered XML Using a Relational Database System. In Proceedings of ACM SIGMOD 2002, pages 204--215, New York, USA, 2002. Google ScholarDigital Library
- W3 Consortium. XQuery 1.0: An XML Query Language, W3C Working Draft, 12 November 2003, http://www.w3.org/TR/xquery/.Google Scholar
- H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: a Dynamic Index Method for Querying XML data by Tree Structures. In Proceedings of the ACM SIGMOD 2003, pages 110--121. ACM Press, 2003. Google ScholarDigital Library
- A. M. Weiner and T. Härder. Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization. In Advances in Databases and Information Systems, volume 5739 of Lecture Notes in Computer Science, pages 149--163. Springer Berlin / Heidelberg, 2009. Google ScholarDigital Library
- A. M. Weiner and T. Härder. An Integrative Approach to Query Optimization in Native XML Database Management Systems. In Proceedings of the Fourteenth International Database Engineering & Applications Symposium, IDEAS "10, pages 64--74, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Y. Wu, J. M. Patel, and H. Jagadish. Structural Join Order Selection for XML Query Optimization. In Proceedings of ICDE 2003, pages 443--454. IEEE CS, 2003.Google ScholarCross Ref
- B. Yang, M. Fontoura, E. Shekita, S. Rajagopalan, and K. Beyer. Virtual cursors for xml joins. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, CIKM 2004, pages 523--532. ACM, 2004. Google ScholarDigital Library
- M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura. XRel: a Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transactions on Internet Technology, pages 110--141, 2001. Google ScholarDigital Library
- C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings of ACM SIGMOD 2001, pages 425--436, 2001. Google ScholarDigital Library
Index Terms
- XML query processing: efficiency and optimality
Recommendations
Structural XML Query Processing
Since the boom in new proposals on techniques for efficient querying of XML data is now over and the research world has shifted its attention toward new types of data formats, we believe that it is crucial to review what has been done in the area to ...
Query processing optimization in broadcasting XML data in mobile communications
AbstractTodays, XML as a de facto standard is used to broadcast data over mobile wireless networks. In these networks, mobile clients send their XML queries over a wireless broadcast channel and recieve their desired XML data from the channel. However, ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Comments