Skip to main content
Log in

Efficient evaluation of generalized tree-pattern queries on XML streams

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The streaming evaluation is a popular way of evaluating queries on XML documents. Besides its many advantages, it is also the only option for a number of important XML applications. Unfortunately, existing algorithms focus almost exclusively on tree-pattern queries (TPQs). Requirements for flexible querying of XML data have motivated recently the introduction of query languages that are more general and flexible than TPQs. These languages are not supported by existing algorithms. In this paper, we consider a partial tree-pattern query (PTPQ) language which generalizes and strictly contains TPQs. PTPQs can express a fragment of XPath which comprises reverse axes and the node identity equality (is) operator, in addition to forward axes, wildcards and predicates. They constitute an important subclass of XPath, which is very useful in practice. Unfortunately, previous streaming algorithms for TPQs cannot be applied to PTPQs. PTPQs can be represented as dags enhanced with constraints. We explore this representation to design an original polynomial time streaming algorithm for PTPQs. Our algorithm aggressively filters incoming data that is irrelevant to the query and wisely avoids processing redundant query matches (i.e., matches of the query dag that do not contribute to new solutions). Our algorithm is the first one to support the streaming evaluation of such a broad fragment of XPath. We provide an analysis of it, and conduct an extensive experimental evaluation of its performance and scalability. Compared to the only known streaming algorithm that supports TPQs extended with reverse axes, our algorithm performs better by orders of magnitude while consuming a much smaller fraction of memory space. Current streaming applications have stringent requirements on query response time and memory consumption because of the large (possibly unbounded) size of data they handle. In order to keep memory usage and CPU consumption low for the PTPQ streaming evaluation, we design another streaming algorithm called Eager PSX for PTPQs. Its key feature is that it applies an eager evaluation strategy to quickly determine when node matches should be returned as solutions to the user and also to proactively detect redundant matches. We theoretically analyze Eager PSX, and experimentally test its time and space performance and scalability. We compare it with PSX. Our results show that Eager PSX not only achieves better space performance without compromising time performance, but it also greatly improves query response time for both simple and complex queries, in many cases, by orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. World Wide Web Consortium site, W3C. http://www.w3.org/

  2. Altinel, M., Franklin, M.J.: Efficient filtering of XML documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)

  3. Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: a full-text search extension to XQuery. In: WWW (2004)

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)

  5. Bar-Yossef, Z., Fontoura, M., Josifovski, V.: On the memory requirements of XPath evaluation over XML streams. In: PODS, pp. 177–188 (2004)

  6. Bar-Yossef Z., Fontoura M., Josifovski V.: On the memory requirements of XPath evaluation over XML streams. J. Comput. Syst. Sci. 73(3), 391–441 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  7. Barton, C., Charles, P., Goyal, D., Raghavachari, M., Fontoura, M., Josifovski, V.: Streaming XPath processing with forward and backward axes. In: ICDE, pp. 455–466 (2003)

  8. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)

  9. Candan, K.S., Hsiung, W.-P., Chen, S., Tatemura, J., Agrawal, D.: Afilter: adaptable XML filtering with prefix-caching and suffix-clustering. In: VLDB, pp. 559–570 (2006)

  10. Chan, C.Y., Felber, P., Garofalakis M.N., Rastogi, R.: Efficient filtering of XML documents with XPath expressions. In: ICDE, pp. 235–244 (2002)

  11. Chen, Y., Davidson, S.B., Zheng, Y.: An efficient XPath query processor for XML streams. In: ICDE, p. 79 (2006)

  12. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB (2003)

  13. Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: KDD, pp. 9–17 (2000)

  14. Diao Y., Altinel M., Franklin M.J., Zhang H., Fischer P.M.: Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Database Syst. 28(4), 467–516 (2003)

    Article  Google Scholar 

  15. Florescu, D., Hillery, C., Kossmann, D., Lucas, P., Riccardi, F., Westmann, T., Carey, M.J., Sundararajan, A., Agrawal, G.: The BEA/XQRL streaming XQuery processor. In: VLDB, pp. 997–1008 (2003)

  16. Golab L., Özsu M.T.: Issues in data stream management. SIGMOD Rec. 32(2), 5–14 (2003)

    Article  Google Scholar 

  17. Gou, G., Chirkova, R.: Efficient algorithms for evaluating XPath over streams. In: SIGMOD, pp. 269–280 (2007)

  18. Green T.J., Gupta A., Miklau G., Onizuka M., Suciu D.: Processing XML streams with deterministic automata and stream indexes. ACM Trans. Database Syst. 29(4), 752–788 (2004)

    Article  Google Scholar 

  19. Gupta, A.K., Suciu, D.: Stream processing of XPath queries with predicates. In: SIGMOD, pp. 419–430 (2003)

  20. Han W.-S., Jiang H., Ho H., Li Q.: Streamtx: extracting tuples from streaming XML data. Proc. VLDB Endow. 1(1), 289–300 (2008)

    Google Scholar 

  21. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: ICDE, pp. 367–378 (2003)

  22. Ives Z.G., Halevy A.Y., Weld D.S.: An XML query engine for network-bound data. VLDB J. 11(4), 380–402 (2002)

    Article  MATH  Google Scholar 

  23. Josifovski V., Fontoura M., Barta A.: Querying XML streams. VLDB J. 14(2), 197–210 (2005)

    Article  Google Scholar 

  24. Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB, pp. 228–239 (2004)

  25. Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: VLDB, p. 1149 (2003)

  26. Li, X., Agrawal, G.: Efficient evaluation of XQuery over streaming data. In: VLDB, pp. 265–276 (2005)

  27. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free XQuery. In: VLDB, pp. 72–83 (2004)

  28. Ludäscher, B., Mukhopadhyay, P., Papakonstantinou, Y.: A transducer-based XML query processor. In: VLDB, pp. 227–238 (2002)

  29. Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)

  30. Marian, A., Siméon, J.: Projecting XML documents. In: VLDB, pp. 213–224 (2003)

  31. Megginson D., et al.: Simple API for XML. http://www.saxproject.org/

  32. Moro, M.M., Bakalov, P., Tsotras, V.J.: Early profile pruning on xml-aware publish/subscribe systems. In: VLDB, pp. 866–877 (2007)

  33. Olteanu D.: Forward node-selecting queries over trees. ACM Trans. Database Syst. 32(1), 37 (2007)

    Article  Google Scholar 

  34. Olteanu D.: Spex: streamed and progressive evaluation of XPath. IEEE Trans. Knowl. Data Eng. 19(7), 934–949 (2007)

    Article  Google Scholar 

  35. Olteanu, D., Furche, T., Bry, F.: Evaluating complex queries against XML streams with polynomial combined complexity. In: BNCOD, pp. 31–44 (2004)

  36. Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: looking forward. In: EDBT, pp. 109–127 (2002)

  37. Peng, F., Chawathe, S.S.: XPath queries on streaming data. In: SIGMOD, pp. 431–442 (2003)

  38. Ramanan, P.: Evaluating an XPath query on a streaming XML document. In: ICMD (2005)

  39. Schmidt, M., Scherzinger, S., Koch, C.: Combined static and dynamic analysis for effective buffer minimization in streaming XQuery evaluation. In: ICDE, pp. 236–245 (2007)

  40. Souldatos, S., Wu, X., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Evaluation of partial path queries on xml data. In: CIKM, pp. 21–30 (2007)

  41. Su, H., Rundensteiner, E.A., Mani, M.: Semantic query optimization for XQuery over XML streams. In: VLDB, pp. 277–288 (2005)

  42. Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic querying of tree-structured data sources using partially specified tree patterns. In: CIKM (2005)

  43. Theodoratos D., Placek P., Dalamagas T., Souldatos S., Sellis T.K.: Containment of partially specified tree-pattern queries in the presence of dimension graphs. VLDB J. 18(1), 233–254 (2009)

    Article  Google Scholar 

  44. Theodoratos D., Wu X.: Assigning semantics to partial tree-pattern queries. Data Knowl. Eng. 64, 242–265 (2008)

    Article  Google Scholar 

  45. Theodoratos, D., Wu, X.: Eager evaluation of partial tree-pattern queries on xml streams. In: DASFAA, pp. 241–246 (2009)

  46. Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized path pattern queries on xml data. In: WWW, pp. 835–844 (2008)

  47. Yu, C., Jagadish, H.V.: Querying complex structured databases. In: VLDB, pp. 1010–1021 (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoying Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Theodoratos, D. & Zuzarte, C. Efficient evaluation of generalized tree-pattern queries on XML streams. The VLDB Journal 19, 661–686 (2010). https://doi.org/10.1007/s00778-010-0184-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0184-8

Keywords

Navigation