Skip to main content
Log in

XMin: Minimizing Tree Pattern Queries with Minimality Guarantee

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational DBMS. The experimental results using real and synthetic data sets show the practicality of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: a primitive for efficient XML query pattern matching. In: Proc. the 20th IEEE Int’l Conf. on Data Engineering (ICDE), pp. 141–152 (2002)

  2. Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Tree pattern query minimization. VLDB J. 11(4), 315–331 (2002)

    Article  MATH  Google Scholar 

  3. An, Y., Borgida, A., Mylopoulos, J.: Discovering and maintaining semantic mappings between XML schemas and ontologies. J. Computing Sci. Eng. 2(1), 44–73 (2008)

    Google Scholar 

  4. Arion, A., Benzaken, V., Manolescu, I., Papakonstantinou, Y.: Structured materialized views for XML queries. In: Proc. the 33rd Int’l Conf. on Very Large Data Bases (VLDB), pp. 87–98 (2007)

  5. Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: Path summaries and path partitioning in modern XML databases. World Wide Web J. 11(1), 117–151 (2008)

    Article  Google Scholar 

  6. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Proc. the 9th Annual ACM Symposium on Theory of Computing (STOC), pp. 77–90 (1977)

  7. Che, D.: An efficient algorithm for tree pattern query minimization under broad integrity constraints. Int. J. Web Inf. Syst. 3(3), 231–256 (2007)

    Article  Google Scholar 

  8. Chen, D., Chan, C.: Minimization of tree pattern queries with constraints. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 609–622 (2008)

  9. Chen, Z., Jagadish, H.V., Lakshmanan, L.V.S., Paparizos, S.: From tree patterns to generalized tree patterns: on efficient evaluation of XQuery. In: Proc. the 29th Int’l Conf. on Very Large Data Bases (VLDB), pp. 237–248 (2003)

  10. Flesca, S., Furfaro, F., Masciari, E.: On the minimization of XPath queries. J. ACM 55(1), 1–46 (2008)

    Article  MathSciNet  Google Scholar 

  11. Geneves, P., Layaida, N.: A system for the static analysis of XPath. ACM Trans. Inf. Sys. 24(4), 475–502 (2006)

    Article  Google Scholar 

  12. Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proc. the 23rd Int’l Conf. on Very Large Data Bases (VLDB), pp. 436–445 (1997)

  13. Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)

    Article  MathSciNet  Google Scholar 

  14. Han, W.-S., Lee, K.-H., Lee, B.-S.: An XML storage system for Object-Oriented/Object-Relational DBMSs. J. Object Technol. 2(3), 113–126 (2003)

    Google Scholar 

  15. Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: Proc. the 11th Int’l Conf. on Extending Database Technology (EDBT), pp. 61–72 (2008)

  16. Krishnamurthy, R., Kaushik, R., Naughton, J.F.: XML-to-SQL query translation literature: the state of the art and open problems. In: Proc. the 1st Int’l XML Database Symposium, pp. 1–18 (2003)

  17. Lee, K.-H., Kim, S.-Y., Whang, E., Lee, J.-G.: A practitioner’s approach to normalizing XQuery expressions. In: Proc. 11th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), LNCS 3882, pp. 437–453, Singapore (2006)

  18. Lee, K.-H., Whang, K.-Y., Han, W.-S., Kim, M.-S.: Structural consistency: enabling XML keyword search to eliminate spurious results consistently. VLDB J. (2009, in press)

  19. Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation. In: Proc. the 12th Int’l Conf. on Extending Database Technology (EDBT), pp. 335–344 (2009)

  20. Miklau, G.: The XML Data Repository. http://www.cs.washington.edu/research/xmldatasets (2004)

  21. Milo, T., Suciu, D.: Index structures for path expressions. In: Proc. the 7th Int’l Conf. on Database Theory (ICDT), pp. 277–295 (1999)

  22. Moro, M.M., Vagena, Z., Tsotras, V.J.: Evaluating structural summaries as access methods for XML. In: Proc. the 15th Int’l Conf. on World Wide Web (WWW), pp. 1079–1080 (2006)

  23. Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of Disjunction, DTDs, and Variables. Logical Methods Comput. Sci. 2(3), 1–30 (2006)

    MathSciNet  Google Scholar 

  24. Ng, W., Lau, H.L., Zhou, A.: Divide, compress and conquer: querying XML via partitioned path-based compressed data blocks. World Wide Web J. 11(2), 169–197 (2008)

    Article  Google Scholar 

  25. Park, Y., Whang, K., Lee, B., Han, W.: Efficient evaluation of partial match queries for XML documents using information retrieval techniques. In: Proc. the 10th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), pp. 95–112 (2005)

  26. Ramanan, P.: Efficient algorithms for minimizing tree pattern queries. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 299–309 (2002)

  27. Ramanan, P.: Covering indexes for XML queries: bisimulation − simulation = negation. In: Proc. the 29th Int’l Conf. on Very Large Data Bases (VLDB), pp. 165–176 (2003)

  28. Rao, P., Moon, B.: Sequencing XML data and query twigs for fast pattern matching. ACM Trans. Database Syst. 31(1), 299–345 (2006)

    Article  Google Scholar 

  29. Sloane, N.J.A.: The On-Line Encyclopedia of Integer Sequences. http://www.research.att.com/~njas (2005)

  30. Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., Myllymaki, J.: Implementing a scalable XML publish/subscribe system using a relational database system. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 479–490 (2004)

  31. Wang, H., Lin, Z.: A novel algorithm for counting all common subsequences. In: Proc. IEEE Int’l Conf. on Granular Computing, pp. 502–505 (2007)

  32. Wang, H., Li, J., Wang, H.: Clustered chain path index for XML document: efficiently processing branch queries. World Wide Web 11(1), 153–168 (2008)

    Article  Google Scholar 

  33. Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a High-Performance ORDBMS Tightly-Coupled with IR Features. In: Proc. 21st IEEE Int’l Conf. on Data Engineering (ICDE), pp. 1004–1005 (2005). This paper received the Best Demonstration Award

  34. Wong, K.-F., Yu, J.X., Tang, N.: Answering XML queries using path-based indexes: a survey. World Wide Web 9(3), 277–299 (2006)

    Article  Google Scholar 

  35. Wood, P.T.: Minimising simple XPath expressions. In: Proc. the Fourth Int’l Workshop on the Web and Databases (WebDB), pp. 13–18 (2001)

  36. Wood, P.T.: Containment for XPath fragments under DTD constraints. In: Proc. the 9th Int’l Conf. on Database Theory (ICDT), pp. 297–311 (2003)

  37. XMark—An XML Benchmark Project. http://monetdb.cwi.nl/xml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyu-Young Whang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, KH., Whang, KY. & Han, WS. XMin: Minimizing Tree Pattern Queries with Minimality Guarantee. World Wide Web 13, 343–371 (2010). https://doi.org/10.1007/s11280-010-0089-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0089-x

Keywords

Navigation