skip to main content
10.1145/2351476.2351478acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
invited-talk

XML query processing: efficiency and optimality

Authors Info & Claims
Published:08 August 2012Publication History

ABSTRACT

XML (Extensible Mark-up Language) is a well established format which is often used for modeling of semi-structured data. XPath and XQuery are de facto standards among XML query languages and searching for occurrences of a twig pattern query (TPQ) in an XML document is one of their core tasks.

There is a large number of different approaches addressing the TPQ matching problem. The aim of this article is to compare the state-of-the-art techniques and give an overview which can help to understand the relationships between different methodologies used in this area. We distinguish three main areas of a TPQ processing: (1) index data structures and XML document partitioning, (2) join algorithms, and (3) cost-based optimizations. We cover the most important techniques in each area and explain their relationships and possible combinations.

References

  1. S. Al-Khalifa, H. V. Jagadish, and N. Koudas. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Proceedings of ICDE 2002, pages 141--152. IEEE CS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bača and M. Krátký. On the Efficiency of a Prefix Path Holistic Algorithm. In Proceedings of Database and XML Technologies, XSym 2009, volume LNCS 5679, pages 25--32. Springer--Verlag, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bača and M. Krátký. TJDewey -- On the Efficient Path Labeling Scheme Holistic Approach. In Database Systems for Advanced Application, DASFAA 2009 Internationals Workshops, LNCS 5667, pages 6--20. Springer--Verlag, 2009.Google ScholarGoogle Scholar
  4. R. Bača, J. Walder, M. Pawlas, and M. Krátký. Benchmarking the Compression of XML Node Streams. In Database Systems for Advanced Applications: 15th International Conference, DASFAA 2010, International Workshops, volume 6193, page 179. Springer-Verlag, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Brantner, S. Helmer, C. Kanne, and G. Moerkotte. Full-fledged Algebraic XPath Processing in Natix. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 705--716. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Bruno, D. Srivastava, and N. Koudas. Holistic Twig Joins: Optimal XML Pattern Matching. In Proceedings of ACM SIGMOD 2002, pages 310--321. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Chen, T. W. Ling, M. T. Özsu, and Z. Zhu. On label stream partition for efficient holistic twig join. In Proceedings of the 12th international conference on Database systems for advanced applications, DASFAA'07, pages 807--818, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graph-structured data. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD "03, pages 134--144, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K. S. Candan. Twig2Stack: Bottom-up Processing of Generalized-tree-pattern Queries Over XML documents. In Proceedings of VLDB 2006, pages 283--294, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Chen, J. Lu, and T. W. Ling. On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. In Proceedings of ACM SIGMOD 2005, pages 455--466. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. F. Dietz. Maintaining order in a linked list. In Proceedings of 14th annual ACM symposium on Theory of Computing (STOC 1982), pages 122--127, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Fontoura, V. Josifovski, E. Shekita, and B. Yang. Optimizing cursor movement in holistic twig joins. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM "05, pages 784--791, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pages 436--445, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Grimsmo, T. A. Bjorklund, and M. L. Hetland. Fast Optimal Twig Joins. In Proceedings of the 36th International Conference on Very Large Data Bases, VLDB 2010. VLDB Endowment, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Grust. Accelerating XPath Location Steps. In Proceedings of ACM SIGMOD 2002, Madison, USA. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Grust, M. van Keulen, and J. Teubner. Staircase Join: Teach a Relational DBMS to Watch Its (Axis) Steps. In Proceedings of VLDB 2003, pages 524--535, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Halverson, J. Burger, L. Galanis, A. Kini, R. Krishnamurthy, A. N. Rao, F. Tian, S. D. Viglas, Y. Wang, J. F. Naughton, and D. J. DeWitt. Mixed mode XML query processing. In Proceedings of the 29th international conference on Very large data bases, VLDB 2003, pages 225--236. VLDB Endowment, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Härder, M. Haustein, C. Mathis, and M. Wagner. Node Labeling Schemes for Dynamic XML Documents Reconsidered. Data & Knowledge Engineering, 60(1):126--149, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Jiang, H. Lu, W. Wang, and B. Ooi. XR-Tree: Indexing XML Data for Efficient. In Proceedings of ICDE, 2003, India. IEEE, 2003.Google ScholarGoogle Scholar
  20. H. Jiang, W. Wang, H. Lu, and J. Yu. Holistic Twig Joins on Indexed XML Documents. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, pages 273--284, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In Proceedings of ACM SIGMOD 2002, pages 133--144. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting Local Similarity to Efficiently Index Paths in Graph-Structured Data. In Proceedings of ICDE'02, 2002.Google ScholarGoogle Scholar
  23. M. Krátký, J. Pokorný, and V. Snášel. Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data. In Current Trends in Database Technology, EDBT 2004, volume 3268 of LNCS. Springer, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proceedings of VLDB 2001, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Lu, T. Chen, and T. W. Ling. Efficient Processing of XML Twig Patterns with Parent Child Edges: a Look-ahead Approach. In Proceedings of ACM CIKM 2004, pages 533--542. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Lu, T. Ling, C. Chan, and T. Chen. From Region Encoding to Extended Dewey: on Efficient Processing of XML Twig Pattern Matching. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pages 193--204, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. May, M. Brantner, A. Böhm, C.-C. Kanne, and G. Moerkotte. Index vs. Navigation in XPath Evaluation. In Database and XML Technologies, volume 4156 of Lecture Notes in Computer Science, pages 16--30. Springer Berlin / Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Michiels, G. Mihaila, and J. Simeon. Put a Tree Pattern in Your Algebra. In Proceedings of the 23th International Conference on Data Engineering, ICDE 2007, pages 246--255, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  29. L. Qin, J. Yu, and B. Ding. TwigList: Make Twig Pattern Matching Fast. In The 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, volume 4443 of LNCS, pages 850--862. Springer-Verlag, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Re, J. Simeon, and M. Fernandez. A Complete and Efficient Algebraic Compiler for XQuery. In Data Engineering, 2006. ICDE "06. Proceedings of the 22nd International Conference on, page 14. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. I. Tatarinov et al. Storing and Querying Ordered XML Using a Relational Database System. In Proceedings of ACM SIGMOD 2002, pages 204--215, New York, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W3 Consortium. XQuery 1.0: An XML Query Language, W3C Working Draft, 12 November 2003, http://www.w3.org/TR/xquery/.Google ScholarGoogle Scholar
  33. H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: a Dynamic Index Method for Querying XML data by Tree Structures. In Proceedings of the ACM SIGMOD 2003, pages 110--121. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. M. Weiner and T. Härder. Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization. In Advances in Databases and Information Systems, volume 5739 of Lecture Notes in Computer Science, pages 149--163. Springer Berlin / Heidelberg, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. M. Weiner and T. Härder. An Integrative Approach to Query Optimization in Native XML Database Management Systems. In Proceedings of the Fourteenth International Database Engineering & Applications Symposium, IDEAS "10, pages 64--74, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Wu, J. M. Patel, and H. Jagadish. Structural Join Order Selection for XML Query Optimization. In Proceedings of ICDE 2003, pages 443--454. IEEE CS, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  37. B. Yang, M. Fontoura, E. Shekita, S. Rajagopalan, and K. Beyer. Virtual cursors for xml joins. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, CIKM 2004, pages 523--532. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura. XRel: a Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transactions on Internet Technology, pages 110--141, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings of ACM SIGMOD 2001, pages 425--436, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XML query processing: efficiency and optimality

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          IDEAS '12: Proceedings of the 16th International Database Engineering & Applications Sysmposium
          August 2012
          261 pages
          ISBN:9781450312349
          DOI:10.1145/2351476

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 August 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • invited-talk

          Acceptance Rates

          Overall Acceptance Rate74of210submissions,35%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader