ABSTRACT
Much research has been done adapting relational technology for use with XML and XPath query processing, several research efforts have focused on native XML databases, and some research efforts have focused on hybrid approaches. This paper presents a hybrid design: we extend the usage of path summary indexes by combining them with partitioned indexes on schema-less XML documents to accelerate XPath query processing. Efficient XPath query processing is important because XPath is the query language used for node selection within XQuery.
To index an XML document, each node is assigned a path identifier that is unique for every root-to-node path. A separate XML path summary index is created, itself encoded as an XML document, which summarizes the document structure by eliminating path redundancies which are inherent within many XML document instances. The use of structure summaries is widely adopted. Two additional supporting indexes are utilized: first, the XML structure is placed into a structure index that is partitioned by the path identifier, and second, the XML element and attribute values are placed into a separate value index that is partitioned by the same path identifier. Therefore, we integrate structure summaries, complete structure, and values into a unified index. To support comprehensive integration we use unique implementation and query methods.
XPath queries, either partially or fully, are first executed against the summary index to derive candidate path identifiers which are placed into a specialized hash map tree cursor. We introduce the partitioned branching path join, a twig join that enables efficient index nested loop joins between various B+-tree partitions on the same structure relation, guided by the hash map tree cursor. We conclude with performance results from several queries using our lightweight prototype system, which demonstrates that our combination of methods matches or outperforms existing high-end database engines when determining node sequences for several XPath queries.
- Aho, A. V., Hopcroft, J. E., and Ullman, J. D. Data Structures and Algorithms. Addison-Wesley, 1983. Google ScholarDigital Library
- Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., Srivastava, D., and Wu Y., 2002. Structural Joins: A Primitive for Efficient XML Query Pattern Matching, pages 141--152, IEEE, ICDE, 2002.Google Scholar
- Barta, A., Consens, M. P., and Mendelzon, A. O. 2005. Benefits of path summaries in an XML query optimizer supporting multiple access methods. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 133--144. Google ScholarDigital Library
- Benedikt, M. and Jeffrey, A. Efficient and Expressive Tree Filter. In FSTTCS 2007: Foundations of Software Technology and Theoretical Computer Science, Lecture Notes In Computer Science, LNCS, Springer-Verlag, 2007 Google ScholarDigital Library
- Benedikt, M., Jeffrey, A., and Ley-Wild, R. 2008. Stream firewalling of xml constraints. In Proceedings of the 2008 ACM SIGMOD international Conference on Management of Data (Vancouver, Canada, June 09--12, 2008). SIGMOD '08. ACM, New York, NY, 487--498. Google ScholarDigital Library
- Berglund, A., Boag, S., Chamberlin, D., Fernández, M., Kay, M., Robie, J., and Siméon, J. XML Path Language (XPath) 2.0 W3C Working Draft 29 October 2004, http://www.w3.org/TR/XPath20/Google Scholar
- Bertino, E. and Kim, W. 1989. Indexing techniques for queries on nested objects, IEEE Transactions on Knowledge and Data Engineering 1(2), p. 196--214. Google ScholarDigital Library
- Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, D., and Siméon, J., 2007. XQuery 1.0: An XML Query Language, http://www.w3.org/TR/XQueryGoogle Scholar
- Boncz, P. A., Kersten, M. L., and Manegold, S. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77--85. Google ScholarDigital Library
- Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., and Teubner, J. 2006. MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In Proceedings of the 2006 ACM SIGMOD international Conference on Management of Data (Chicago, IL, USA, June 27--29, 2006). SIGMOD '06. ACM, New York, NY, 479--490. Google ScholarDigital Library
- Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., and Teubner, J. 2005. Pathfinder: XQuery---the relational way. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 1322--1325. Google ScholarDigital Library
- Choi, R. H. and Wong, R. K. 2007. Efficient processing of branch queries for high-performance XML filtering. In Proceedings of the 2nd international Conference on Scalable information Systems (Suzhou, China, June 06--08, 2007). ACM International Conference Proceeding Series, vol. 304. ICST (Institute for Computer Sciences Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium. Google ScholarDigital Library
- Clark, J. and DeRose, S., 1999. XPath 1.0 http://www.w3.org/TR/XPathGoogle Scholar
- Dietz, P. Maintaining order in a linked list. Proceedings of the Fourteeth Annual ACM Symposium on Theory of Computing, pages 122--127. San Francisco, CA, May 1982. Google ScholarDigital Library
- Fiebig, T., Helmer, S., Kanne, C., Moerkotte, G., Neumann, J., Schiele, R., and Westman, T. Anatomy of a native XML base management system. The VLDB Journal (2002). Google ScholarDigital Library
- Georgiadis, H. and Vassalos, V. 2007. XPath on steroids: exploiting relational engines for XPath performance. In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11--14, 2007). SIGMOD '07. Google ScholarDigital Library
- Goldman, R. and Widom, J. 1997. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proceedings of the 23rd international Conference on Very Large Data Bases, p. 436--445, (August 25--29, 1997). Google ScholarDigital Library
- Gottlob, G., Koch, C., and Pichler, R. Efficient Algorithms for Processing XPath Queries. Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002. Google ScholarDigital Library
- Gou, G., and Chirkova, R., "Efficiently Querying Large XML Data Repositories: A Survey," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 10, pp. 1381--1403, October, 2007. Google ScholarDigital Library
- Grimsmo, N. 2008. Faster path indexes for search in XML data. In Proceedings of the Nineteenth Conference on Australasian Database - Volume 75 (Gold Coast, Australia, December 03--04, 2007). ACM International Conference Proceeding Series, vol. 313. Australian Computer Society, Darlinghurst, Australia, 127--135. Google ScholarDigital Library
- Grust, T. 2002. Accelerating XPath location steps. In Proceedings of the 2002 ACM SIGMOD international Conference on Management of Data (Madison, Wisconsin, June 03--06, 2002). SIGMOD '02. ACM, New York, NY, 109--120. Google ScholarDigital Library
- Grust, T., Keulen, M. V., and Teubner, J. 2003. Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps. VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany. Google ScholarDigital Library
- Grust, T., Keulen, M. V., and Teubner, J. 2004. Accelerating XPath evaluation in any RDBMS. ACM Trans. Database Syst. 29, 1 (Mar. 2004), 91--131. Google ScholarDigital Library
- Jittrawong, K. and Wong, R. K. 2007. Optimizing XPath queries on streaming XML data. In Proceedings of the Eighteenth Conference on Australasian Database - Volume 63 (Ballarat, Victoria, Australia, January 30 - February 02, 2007). J. Bailey and A. Fekete, Eds. ACM International Conference Proceeding Series, vol. 242. Australian Computer Society, Darlinghurst, Australia, 73--82. Google ScholarDigital Library
- Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. F. 2002. Covering indexes for branching path queries. In Proceedings of the 2002 ACM SIGMOD international Conference on Management of Data (Madison, Wisconsin, June 03--06, 2002). SIGMOD '02. Google ScholarDigital Library
- Kaushik, R., Krishnamurthy, R., Naughton, J. F., and Ramakrishnan, R. 2004. On the integration of structure indexes and inverted lists. In Proceedings of the 2004 ACM SIGMOD international Conference on Management of Data (Paris, France, June 13--18, 2004). SIGMOD '04. Google ScholarDigital Library
- Li, H., Lee, M., Hsu, W., and Chen, C. An evaluation of XML indexes for structural join. ACM SIGMOD Record, Volume 33 Issue 3, September 2004 Google ScholarDigital Library
- MonetDB, http://monetdb.cwi.nl/, September 2009Google Scholar
- Moro, M. M., Vagena, Z., and Tsotras, V. J. 2005. Tree-pattern queries on a lightweight XML processor. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 205--216. Google ScholarDigital Library
- Moro, M. M., Vagena, Z., and Tsotras, V. J. 2008. XML Structural Summaries. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1524--1525. Google ScholarDigital Library
- Olteanu, D., Furche, T., and Bry, F. 2004. An efficient single-pass query evaluator for XML data streams. In Proceedings of the 2004 ACM Symposium on Applied Computing (Nicosia, Cyprus, March 14--17, 2004). SAC '04. ACM, New York, NY, 627--631. Google ScholarDigital Library
- Olteanu, D., Meuss, H., Furche, T., Bry, F. XPath: Looking Forward. In Proc. EDBT Workshop on XML-Based Data Management, volume 2490 of Lecture Notes in Computer Science. Springer, 2002. Google ScholarDigital Library
- Pettovello, P. M. and Fotouhi, F. 2008. Efficient XPath Query Processing. CASCON '08 Proceedings of the 2008 conference of the Center for Advanced Studies (CAS) on Collaborative research, ACM Digital Library, 2008. Google ScholarDigital Library
- Wang, H., Park, S., Fan, W., and Yu, P. S. 2003. ViST: a dynamic index method for querying XML data by tree structures. In Proceedings of the 2003 ACM SIGMOD international Conference on Management of Data (San Diego, California, June 09--12, 2003). Google ScholarDigital Library
- Weigel, F., Meuss, H., Schulz, K. U., and Bry, F. 2004. Content and structure in indexing and ranking XML. In Proceedings of the 7th international Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004 (Paris, France, June 17--18, 2004). WebDB '04, vol. 67. Google ScholarDigital Library
- Weigel, F., Schulz, K. U., and Meuss, H. 2005. Exploiting native XML indexing techniques for XML retrieval in relational database systems. In Proceedings of the 7th Annual ACM international Workshop on Web information and Data Management (Bremen, Germany, November 04--04, 2005). WIDM '05. ACM, New York, NY, 23--30. Google ScholarDigital Library
- XMark, An XML Benchmark Project. http://monetdb.cwi.nl/xml/index.htmlGoogle Scholar
- XMLmind, Qizx XML Database, Qizx in-memory Free_Engine-3.0, May 20, 2009. http://www.xmlmind.com/qizxGoogle Scholar
- Yang, B., Fontoura, M., Shekita, E., Rajagopalan, S., and Beyer, K. 2004. Virtual cursors for XML joins. In Proceedings of the Thirteenth ACM international Conference on information and Knowledge Management (Washington, D.C., USA, November 08--13, 2004). CIKM '04. Google ScholarDigital Library
- Zhang, C., Naughton, J., DeWitt, D., Luo, Q., and Lohman, G. 2001. On supporting containment queries in relational database management systems. In Proceedings of the 2001 ACM SIGMOD international Conference on Management of Data (Santa Barbara, California, United States, May 21--24, 2001). T. Sellis, Ed. SIGMOD '01. ACM, New York, NY, 425--436. Google ScholarDigital Library
Index Terms
- XPath query processing improvements
Recommendations
Indexing XML documents for XPath query processing in external memory
Special issue: ER 2003Existing encoding schemes and index structures proposed for XML query processing primarily target the containment relationship, specifically the parent-child and ancestor-descendant relationship. The presence of preceding-sibling and following-sibling ...
A framework for using materialized XPath views in XML query processing
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30XML languages, such as XQuery, XSLT and SQL/XML, employ XPath as the search and extraction language. XPath expressions often define complicated navigation, resulting in expensive query processing, especially when executed over large collections of ...
Comments