Skip to main content
Log in

Evaluation Techniques for Generalized Path Pattern Queries on XML Data

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: a primitive for efficient XML query pattern matching. In: ICDE (2002)

  2. Arion, A., Benzaken, V., Manolescu, I., Papakonstantinou, Y.: Structured materialized views for XML queries. In: VLDB, pp. 87–98 (2007)

  3. Bar-Yossef, Z., Fontoura, M., Josifovski, V.: On the memory requirements of XPath evaluation over XML streams. In: PODS, pp. 177–188 (2004)

  4. Barton, C., Charles, P., Goyal, D., Raghavachari, M., Fontoura, M., Josifovski, V.: Streaming XPath processing with forward and backward axes. In: ICDE, pp. 455–466 (2003)

  5. Boncz, P.A., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: Monetdb/xquery: a fast xquery processor powered by a relational engine. In: SIGMOD Conference, pp. 479–490 (2006)

  6. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)

  7. Chen, Y., Davidson, S.B., Zheng, Y.: An efficient XPath query processor for XML streams, In: ICDE, p. 79 (2006)

  8. Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on DAGs. In: VLDB (2005)

  9. Chen, S., Li, H.-G., Tatemura, J., Hsiung, W.-P., Agrawal, D., Candan, K.S.: Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents. In: VLDB (2006)

  10. Chen, T., Lu, J., Ling, T. W.: On boosting holism in XML twig pattern matching using structural indexing techniques. In: SIGMOD (2005)

  11. Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: VLDB (2002)

  12. Consens, M.P., Milo, T.: Algebras for querying text regions (extended abstract). In: PODS (1995)

  13. Diaz, A.L., Lovell, D.: IBM’s XML generator. http://www.alphaworks.ibm.com/tech/xmlgenerator

  14. Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: VLDB (1997)

  15. Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)

    Article  MathSciNet  Google Scholar 

  16. Gou, G., Chirkova, R.: Efficient algorithms for evaluating XPath over streams. In: SIGMOD, pp. 269–280 (2007)

  17. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: ICDE, pp. 367–378 (2003)

  18. Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with or-predicates. In: SIGMOD (2004)

  19. Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: indexing XML data for efficient structural joins. In: ICDE (2003)

  20. Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: VLDB (2003)

  21. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD, pp. 133–144. ACM Press (2002)

  22. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: VLDB, pp. 72–83 (2004)

  23. Lu, J., Chen, T., Ling, T.W.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In: CIKM (2004)

  24. Milo, T., Suciu, D.: Index structures for path expressions. In: ICDT (London, UK), pp. 277–295. Springer (1999)

  25. Olteanu, D.: Forward node-selecting queries over trees. ACM Trans. Database Syst. 32(1), 37 (2007)

    Article  Google Scholar 

  26. Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: looking forward. In: EDBT, pp. 109–127 (2002)

  27. Peng, F., Chawathe, S.S.: XPath queries on streaming data. In: SIGMOD, pp. 431–442 (2003)

  28. Souldatos, S., Wu, X., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Evaluation of partial path queries on xml data. In: CIKM, pp. 21–30 (2007)

  29. Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic querying of tree-structured data sources using partially specified tree patterns. In: CIKM (2005)

  30. Theodoratos, D., Placek, P., Dalamagas, T., Souldatos, S., Sellis, T.K.: Containment of partially specified tree-pattern queries in the presence of dimension graphs. VLDB J. 18(1), 233–254 (2009)

    Article  Google Scholar 

  31. Theodoratos, D., Wu, X.: Assigning semantics to partial tree-pattern queries. Data Knowl. Eng. 64(1), 242–265 (2008)

    Article  Google Scholar 

  32. University of Pennsylvania Treebank Project. http://www.cis.upenn.edu/~treebank (2002)

  33. Wu, Y., Patel, J.M., Jagadish, H.V.: Structural join order selection for XML query optimization. In: ICDE (2003)

  34. Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized path pattern queries on xml data. In: WWW, pp. 835–844 (2008)

  35. XML Path Language (XPath): World Wide Web consortium site, W3C. http://www.w3.org/TR/xpath20

  36. XML Query Language (XQuery): World Wide Web consortium site, W3C. http://www.w3.org/XML/Query

  37. Yang, B., Fontoura, M., Shekita, E., Rajagopalan, S., Beyer, K.: Virtual cursors for XML joins. In: CIKM (2004)

  38. Yu, C., Jagadish, H.V.: Querying complex structured databases. In: VLDB, pp. 1010–1021 (2007)

  39. Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: SIGMOD (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoying Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Theodoratos, D., Souldatos, S. et al. Evaluation Techniques for Generalized Path Pattern Queries on XML Data. World Wide Web 13, 441–474 (2010). https://doi.org/10.1007/s11280-010-0092-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0092-2

Keywords

Navigation