skip to main content
10.1145/1923947.1923957dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

XPath query processing improvements

Published:01 November 2010Publication History

ABSTRACT

Much research has been done adapting relational technology for use with XML and XPath query processing, several research efforts have focused on native XML databases, and some research efforts have focused on hybrid approaches. This paper presents a hybrid design: we extend the usage of path summary indexes by combining them with partitioned indexes on schema-less XML documents to accelerate XPath query processing. Efficient XPath query processing is important because XPath is the query language used for node selection within XQuery.

To index an XML document, each node is assigned a path identifier that is unique for every root-to-node path. A separate XML path summary index is created, itself encoded as an XML document, which summarizes the document structure by eliminating path redundancies which are inherent within many XML document instances. The use of structure summaries is widely adopted. Two additional supporting indexes are utilized: first, the XML structure is placed into a structure index that is partitioned by the path identifier, and second, the XML element and attribute values are placed into a separate value index that is partitioned by the same path identifier. Therefore, we integrate structure summaries, complete structure, and values into a unified index. To support comprehensive integration we use unique implementation and query methods.

XPath queries, either partially or fully, are first executed against the summary index to derive candidate path identifiers which are placed into a specialized hash map tree cursor. We introduce the partitioned branching path join, a twig join that enables efficient index nested loop joins between various B+-tree partitions on the same structure relation, guided by the hash map tree cursor. We conclude with performance results from several queries using our lightweight prototype system, which demonstrates that our combination of methods matches or outperforms existing high-end database engines when determining node sequences for several XPath queries.

References

  1. Aho, A. V., Hopcroft, J. E., and Ullman, J. D. Data Structures and Algorithms. Addison-Wesley, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., Srivastava, D., and Wu Y., 2002. Structural Joins: A Primitive for Efficient XML Query Pattern Matching, pages 141--152, IEEE, ICDE, 2002.Google ScholarGoogle Scholar
  3. Barta, A., Consens, M. P., and Mendelzon, A. O. 2005. Benefits of path summaries in an XML query optimizer supporting multiple access methods. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 133--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Benedikt, M. and Jeffrey, A. Efficient and Expressive Tree Filter. In FSTTCS 2007: Foundations of Software Technology and Theoretical Computer Science, Lecture Notes In Computer Science, LNCS, Springer-Verlag, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Benedikt, M., Jeffrey, A., and Ley-Wild, R. 2008. Stream firewalling of xml constraints. In Proceedings of the 2008 ACM SIGMOD international Conference on Management of Data (Vancouver, Canada, June 09--12, 2008). SIGMOD '08. ACM, New York, NY, 487--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Berglund, A., Boag, S., Chamberlin, D., Fernández, M., Kay, M., Robie, J., and Siméon, J. XML Path Language (XPath) 2.0 W3C Working Draft 29 October 2004, http://www.w3.org/TR/XPath20/Google ScholarGoogle Scholar
  7. Bertino, E. and Kim, W. 1989. Indexing techniques for queries on nested objects, IEEE Transactions on Knowledge and Data Engineering 1(2), p. 196--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, D., and Siméon, J., 2007. XQuery 1.0: An XML Query Language, http://www.w3.org/TR/XQueryGoogle ScholarGoogle Scholar
  9. Boncz, P. A., Kersten, M. L., and Manegold, S. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., and Teubner, J. 2006. MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In Proceedings of the 2006 ACM SIGMOD international Conference on Management of Data (Chicago, IL, USA, June 27--29, 2006). SIGMOD '06. ACM, New York, NY, 479--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., and Teubner, J. 2005. Pathfinder: XQuery---the relational way. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 1322--1325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Choi, R. H. and Wong, R. K. 2007. Efficient processing of branch queries for high-performance XML filtering. In Proceedings of the 2nd international Conference on Scalable information Systems (Suzhou, China, June 06--08, 2007). ACM International Conference Proceeding Series, vol. 304. ICST (Institute for Computer Sciences Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Clark, J. and DeRose, S., 1999. XPath 1.0 http://www.w3.org/TR/XPathGoogle ScholarGoogle Scholar
  14. Dietz, P. Maintaining order in a linked list. Proceedings of the Fourteeth Annual ACM Symposium on Theory of Computing, pages 122--127. San Francisco, CA, May 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fiebig, T., Helmer, S., Kanne, C., Moerkotte, G., Neumann, J., Schiele, R., and Westman, T. Anatomy of a native XML base management system. The VLDB Journal (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Georgiadis, H. and Vassalos, V. 2007. XPath on steroids: exploiting relational engines for XPath performance. In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11--14, 2007). SIGMOD '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Goldman, R. and Widom, J. 1997. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proceedings of the 23rd international Conference on Very Large Data Bases, p. 436--445, (August 25--29, 1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gottlob, G., Koch, C., and Pichler, R. Efficient Algorithms for Processing XPath Queries. Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gou, G., and Chirkova, R., "Efficiently Querying Large XML Data Repositories: A Survey," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 10, pp. 1381--1403, October, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Grimsmo, N. 2008. Faster path indexes for search in XML data. In Proceedings of the Nineteenth Conference on Australasian Database - Volume 75 (Gold Coast, Australia, December 03--04, 2007). ACM International Conference Proceeding Series, vol. 313. Australian Computer Society, Darlinghurst, Australia, 127--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Grust, T. 2002. Accelerating XPath location steps. In Proceedings of the 2002 ACM SIGMOD international Conference on Management of Data (Madison, Wisconsin, June 03--06, 2002). SIGMOD '02. ACM, New York, NY, 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Grust, T., Keulen, M. V., and Teubner, J. 2003. Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps. VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Grust, T., Keulen, M. V., and Teubner, J. 2004. Accelerating XPath evaluation in any RDBMS. ACM Trans. Database Syst. 29, 1 (Mar. 2004), 91--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jittrawong, K. and Wong, R. K. 2007. Optimizing XPath queries on streaming XML data. In Proceedings of the Eighteenth Conference on Australasian Database - Volume 63 (Ballarat, Victoria, Australia, January 30 - February 02, 2007). J. Bailey and A. Fekete, Eds. ACM International Conference Proceeding Series, vol. 242. Australian Computer Society, Darlinghurst, Australia, 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. F. 2002. Covering indexes for branching path queries. In Proceedings of the 2002 ACM SIGMOD international Conference on Management of Data (Madison, Wisconsin, June 03--06, 2002). SIGMOD '02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kaushik, R., Krishnamurthy, R., Naughton, J. F., and Ramakrishnan, R. 2004. On the integration of structure indexes and inverted lists. In Proceedings of the 2004 ACM SIGMOD international Conference on Management of Data (Paris, France, June 13--18, 2004). SIGMOD '04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Li, H., Lee, M., Hsu, W., and Chen, C. An evaluation of XML indexes for structural join. ACM SIGMOD Record, Volume 33 Issue 3, September 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. MonetDB, http://monetdb.cwi.nl/, September 2009Google ScholarGoogle Scholar
  29. Moro, M. M., Vagena, Z., and Tsotras, V. J. 2005. Tree-pattern queries on a lightweight XML processor. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Moro, M. M., Vagena, Z., and Tsotras, V. J. 2008. XML Structural Summaries. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1524--1525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Olteanu, D., Furche, T., and Bry, F. 2004. An efficient single-pass query evaluator for XML data streams. In Proceedings of the 2004 ACM Symposium on Applied Computing (Nicosia, Cyprus, March 14--17, 2004). SAC '04. ACM, New York, NY, 627--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Olteanu, D., Meuss, H., Furche, T., Bry, F. XPath: Looking Forward. In Proc. EDBT Workshop on XML-Based Data Management, volume 2490 of Lecture Notes in Computer Science. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pettovello, P. M. and Fotouhi, F. 2008. Efficient XPath Query Processing. CASCON '08 Proceedings of the 2008 conference of the Center for Advanced Studies (CAS) on Collaborative research, ACM Digital Library, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wang, H., Park, S., Fan, W., and Yu, P. S. 2003. ViST: a dynamic index method for querying XML data by tree structures. In Proceedings of the 2003 ACM SIGMOD international Conference on Management of Data (San Diego, California, June 09--12, 2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Weigel, F., Meuss, H., Schulz, K. U., and Bry, F. 2004. Content and structure in indexing and ranking XML. In Proceedings of the 7th international Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004 (Paris, France, June 17--18, 2004). WebDB '04, vol. 67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Weigel, F., Schulz, K. U., and Meuss, H. 2005. Exploiting native XML indexing techniques for XML retrieval in relational database systems. In Proceedings of the 7th Annual ACM international Workshop on Web information and Data Management (Bremen, Germany, November 04--04, 2005). WIDM '05. ACM, New York, NY, 23--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. XMark, An XML Benchmark Project. http://monetdb.cwi.nl/xml/index.htmlGoogle ScholarGoogle Scholar
  38. XMLmind, Qizx XML Database, Qizx in-memory Free_Engine-3.0, May 20, 2009. http://www.xmlmind.com/qizxGoogle ScholarGoogle Scholar
  39. Yang, B., Fontoura, M., Shekita, E., Rajagopalan, S., and Beyer, K. 2004. Virtual cursors for XML joins. In Proceedings of the Thirteenth ACM international Conference on information and Knowledge Management (Washington, D.C., USA, November 08--13, 2004). CIKM '04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhang, C., Naughton, J., DeWitt, D., Luo, Q., and Lohman, G. 2001. On supporting containment queries in relational database management systems. In Proceedings of the 2001 ACM SIGMOD international Conference on Management of Data (Santa Barbara, California, United States, May 21--24, 2001). T. Sellis, Ed. SIGMOD '01. ACM, New York, NY, 425--436. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XPath query processing improvements

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image DL Hosted proceedings
              CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
              November 2010
              482 pages

              Publisher

              IBM Corp.

              United States

              Publication History

              • Published: 1 November 2010

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate24of90submissions,27%
            • Article Metrics

              • Downloads (Last 12 months)1
              • Downloads (Last 6 weeks)0

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader