Skip to main content
Log in

Principles of Holism for sequential twig pattern matching

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Modern applications face the challenge of dealing with structured and semi-structured data. They have to deal with complex objects, most of them presenting some kind of internal structure, which often forms a hierarchy. Though XML documents are the most known, chemical compounds, CAD drawings, web-sites and many other applications have to deal with similar problems. In such environments, ordered and unordered tree pattern matching are the fundamental search operations. One of the main thrusts of research activities for tree pattern matching is the class of holistic approaches. Their ultimate goal is to evaluate a query twig as a whole by relying on sequential access patterns and non trivial auxiliary storage structures, typically stored in main memory. Based on the pre/post-order ranks of individual tree nodes, we establish strong theoretical bases as a foundation for correct and efficient holistic pattern matching algorithms. In particular, we define and prove sufficient and necessary conditions to minimize the amount of data retained in memory, thus introducing a correct and complete framework on which different holistic solutions can be compared. We also show how these rules can be applied for building algorithms for ordered and unordered tree-pattern matching. Thanks to the above theoretical achievements, each holistic algorithm gains in efficiency as it is directly implemented on the adopted numbering scheme, avoids expensive matching refinements and keeps memory requirements stable. An experimental analysis and comparison with previous approaches confirms the superiority of our approach tested on synthetic as well as real-life data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Adistambha, K., Doeller, M., Tous, R., Gruhne, M., Sano, M., Tsinaraki, C., Christodoulakis, S., Kyoungro Yoon, Ritz, C.H., Burnett, I.S.: The MPEG-7 query format: a new standard in progress for multimedia query by content. In: International Symposium on Communications and Information Technologies, pp. 479–484 (2007)

  2. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: a primitive for efficient xml query pattern matching. In: Proc. of the 18th Int’l Conf. on Data Engineering (ICDE), pp. 141–152 (2002)

  3. Boncz, P.A., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: Proc. of the ACM SIGMOD Conf., pp. 479–490 (2006)

  4. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of the ACM SIGMOD Conf., pp. 310–321 (2002)

  5. Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and Querying Large XML Repositories. In: Proc. of the 21st Int’l Conf. on Data Engineering (ICDE), pp. 261–272 (2005)

  6. Chen, S., Li, H.G., Tatemura, J., Hsiung, W.P., Agrawal, D., Candan, K.S.: Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML Documents. In: Proc. of 32nd Int’l Conf. on VLDB, pp. 283–294 (2006)

  7. Chen, T., Lu, J., Wang Ling, T.: On Boosting Holism in XML Twig pattern matching using structural indexing techniques. In: Proc. of the ACM SIGMOD Conf., pp. 455–466 (2005)

  8. Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: Proc. of 28th Int’l Conf. on VLDB, pp. 263–274 (2002)

  9. Choi, B., Mahoui, M., Wood, D.: On the optimality of Holistic algorithms for Twig queries. In: Proc. of 14th Int’l Conf. on Database and Expert Systems Applications (DEXA), pp. 28–37 (2003)

  10. Dietz, P.F.: Maintaining order in a linked list. In: Proc. of 14th ACM STOC, pp. 122–127 (1982)

  11. Fontoura, M., Josifovski, V., Shekita, E.J., Yang, B.: Optimizing cursor movement in holistic twig joins. In: Proc. of 14th ACM Conf. on Information and Knowledge Management (CIKM), pp. 784–791 (2005)

  12. Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proc. of the 23rd VLDB Conference, pp. 436–445 (1997)

  13. Grust, T., Van Keulen, M., Teubner, J.: Staircase join: teach a relational DBMS to watch its (Axis) steps. In: Proc. of 29th Int’l Conf. on VLDB, pp. 524–525 (2003)

  14. Grust T., Van Keulen M., Teubner J.: Accelerating XPath evaluation in any RDBMS. ACM Trans. Database Syst. 29(1), 91–131 (2004)

    Article  Google Scholar 

  15. Grust, T., Rittinger, J., Teubner, J.: eXrQuy: order indifference in XQuery. In: Proc. of the 23nd Int’l Conf. on Data Engineering (ICDE), pp. 226–235 (2007)

  16. Grust, T., Sakr, S., Teubner, J.: XQuery on SQL hosts. In: Proc. of the 30th Int’l Conf. VLDB, pp. 252–263 (2004)

  17. Jiang, H., Wang, W., Lu, H., Xu Yu J.: Holistic Twig joins on indexed XML documents. In: Proc. of 29th Int’l Conf. on VLDB, pp. 273–284 (2003)

  18. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proc. of the ACM SIGMOD Conference, pp. 133–144 (2002)

  19. Li Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of 27th Int’l Conf. on VLDB, pp. 361–370 (2001)

  20. Lian W., Mamoulis N., Cheung D.W.L., Yiu S.M.: Indexing useful structural patterns for XML query processing. IEEE Trans. Knowl. Data Eng. 17(7), 997–1009 (2005)

    Article  Google Scholar 

  21. Lu, J., Wang Ling, T., Yong Chan, C., Chen, T.: From region encoding to extended dewey: on efficient processing of XML Twig pattern matching. In: Proc. of 31st Int’l Conf. on VLDB, pp. 193–204 (2005)

  22. Lu, J., Chen, T., Ling, T.W.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In: Proc. of the ACM Int’l Conf. on Information and Knowledge Management (CIKM), pp. 533–542 (2004)

  23. Milo, T., Suciu, D.: Index structures for path expressions. In: Proc. of the 7th ICDT Conference, pp. 277–295, 1999

  24. Mirit Shalem, Ziv Bar-Yossef: The space complexity of processing XML twig queries over indexed documents. In: Proc. of the 24th Int’l Conf. on Data Engineering (ICDE), pp. 824–832 (2008)

  25. Qin, L., Yu, J.X., Ding, B.: TwigList: make twig pattern matching fast. In: Proc. of 12th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), pp. 850–862 (2007)

  26. Rao P., Moon B.: Sequencing XML data and query twigs for fast pattern matching. ACM Trans. Database Syst. 31(1), 299–345 (2006)

    Article  Google Scholar 

  27. Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML benchmark project. Technical report INS-R0103, CWI (2001)

  28. Wang H., Meng, X.: On the sequencing of tree structures for XML indexing. In: Proc. of the 21st ICDE, pp. 372–383 (2005)

  29. Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: a dynamic index method for querying XML data by tree structures. In: Proc. of the ACM SIGMOD Conference, pp. 110–121 (2003)

  30. Wang, W., Wang, H., Lu, H., Jiang, H., Lin, X., Li, J.: Efficient processing of XML path queries using the Disk-based F&B index. In: Proc. of 31st Int’l Conf. on VLDB, pp. 145–156 (2005)

  31. Yang, X., Wan, J., Tan, F.: A parsing model based on ordered tree inclusion matching. In: Proc. of 8th ACIS Int’l Conf. on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, pp. 348–353 (2007)

  32. Yoshikawa M., Amagasa T., Shimura T., Uemura S.: XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans. Internet Techn. 1(1), 110–141 (2001)

    Article  Google Scholar 

  33. Zezula, P., Amato, G., Debole, F., Rabitti, F.: Tree signatures for XML querying and navigation. In: Proc. of XML Database Symposium (XSym), pp. 149–163, 2003

  34. Zezula, P., Mandreoli, F., Martoglia, R.: Tree signatures and unordered XML pattern matching. In: Proc. of the 30th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM) (2004)

  35. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On Supporting containment queries in relational database management systems. In: Proc. of the ACM SIGMOD Conference, pp. 425–436 (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Martoglia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandreoli, F., Martoglia, R. & Zezula, P. Principles of Holism for sequential twig pattern matching. The VLDB Journal 18, 1369–1392 (2009). https://doi.org/10.1007/s00778-009-0143-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0143-4

Keywords

Navigation