Abstract
Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational DBMS. The experimental results using real and synthetic data sets show the practicality of our method.
Similar content being viewed by others
References
Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: a primitive for efficient XML query pattern matching. In: Proc. the 20th IEEE Int’l Conf. on Data Engineering (ICDE), pp. 141–152 (2002)
Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Tree pattern query minimization. VLDB J. 11(4), 315–331 (2002)
An, Y., Borgida, A., Mylopoulos, J.: Discovering and maintaining semantic mappings between XML schemas and ontologies. J. Computing Sci. Eng. 2(1), 44–73 (2008)
Arion, A., Benzaken, V., Manolescu, I., Papakonstantinou, Y.: Structured materialized views for XML queries. In: Proc. the 33rd Int’l Conf. on Very Large Data Bases (VLDB), pp. 87–98 (2007)
Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: Path summaries and path partitioning in modern XML databases. World Wide Web J. 11(1), 117–151 (2008)
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Proc. the 9th Annual ACM Symposium on Theory of Computing (STOC), pp. 77–90 (1977)
Che, D.: An efficient algorithm for tree pattern query minimization under broad integrity constraints. Int. J. Web Inf. Syst. 3(3), 231–256 (2007)
Chen, D., Chan, C.: Minimization of tree pattern queries with constraints. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 609–622 (2008)
Chen, Z., Jagadish, H.V., Lakshmanan, L.V.S., Paparizos, S.: From tree patterns to generalized tree patterns: on efficient evaluation of XQuery. In: Proc. the 29th Int’l Conf. on Very Large Data Bases (VLDB), pp. 237–248 (2003)
Flesca, S., Furfaro, F., Masciari, E.: On the minimization of XPath queries. J. ACM 55(1), 1–46 (2008)
Geneves, P., Layaida, N.: A system for the static analysis of XPath. ACM Trans. Inf. Sys. 24(4), 475–502 (2006)
Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proc. the 23rd Int’l Conf. on Very Large Data Bases (VLDB), pp. 436–445 (1997)
Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)
Han, W.-S., Lee, K.-H., Lee, B.-S.: An XML storage system for Object-Oriented/Object-Relational DBMSs. J. Object Technol. 2(3), 113–126 (2003)
Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: Proc. the 11th Int’l Conf. on Extending Database Technology (EDBT), pp. 61–72 (2008)
Krishnamurthy, R., Kaushik, R., Naughton, J.F.: XML-to-SQL query translation literature: the state of the art and open problems. In: Proc. the 1st Int’l XML Database Symposium, pp. 1–18 (2003)
Lee, K.-H., Kim, S.-Y., Whang, E., Lee, J.-G.: A practitioner’s approach to normalizing XQuery expressions. In: Proc. 11th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), LNCS 3882, pp. 437–453, Singapore (2006)
Lee, K.-H., Whang, K.-Y., Han, W.-S., Kim, M.-S.: Structural consistency: enabling XML keyword search to eliminate spurious results consistently. VLDB J. (2009, in press)
Luo, C., Jiang, Z., Hou, W.-C., Yu, F., Zhu, Q.: A sampling approach for XML query selectivity estimation. In: Proc. the 12th Int’l Conf. on Extending Database Technology (EDBT), pp. 335–344 (2009)
Miklau, G.: The XML Data Repository. http://www.cs.washington.edu/research/xmldatasets (2004)
Milo, T., Suciu, D.: Index structures for path expressions. In: Proc. the 7th Int’l Conf. on Database Theory (ICDT), pp. 277–295 (1999)
Moro, M.M., Vagena, Z., Tsotras, V.J.: Evaluating structural summaries as access methods for XML. In: Proc. the 15th Int’l Conf. on World Wide Web (WWW), pp. 1079–1080 (2006)
Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of Disjunction, DTDs, and Variables. Logical Methods Comput. Sci. 2(3), 1–30 (2006)
Ng, W., Lau, H.L., Zhou, A.: Divide, compress and conquer: querying XML via partitioned path-based compressed data blocks. World Wide Web J. 11(2), 169–197 (2008)
Park, Y., Whang, K., Lee, B., Han, W.: Efficient evaluation of partial match queries for XML documents using information retrieval techniques. In: Proc. the 10th Int’l Conf. on Database Systems for Advanced Applications (DASFAA), pp. 95–112 (2005)
Ramanan, P.: Efficient algorithms for minimizing tree pattern queries. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 299–309 (2002)
Ramanan, P.: Covering indexes for XML queries: bisimulation − simulation = negation. In: Proc. the 29th Int’l Conf. on Very Large Data Bases (VLDB), pp. 165–176 (2003)
Rao, P., Moon, B.: Sequencing XML data and query twigs for fast pattern matching. ACM Trans. Database Syst. 31(1), 299–345 (2006)
Sloane, N.J.A.: The On-Line Encyclopedia of Integer Sequences. http://www.research.att.com/~njas (2005)
Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., Myllymaki, J.: Implementing a scalable XML publish/subscribe system using a relational database system. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, pp. 479–490 (2004)
Wang, H., Lin, Z.: A novel algorithm for counting all common subsequences. In: Proc. IEEE Int’l Conf. on Granular Computing, pp. 502–505 (2007)
Wang, H., Li, J., Wang, H.: Clustered chain path index for XML document: efficiently processing branch queries. World Wide Web 11(1), 153–168 (2008)
Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a High-Performance ORDBMS Tightly-Coupled with IR Features. In: Proc. 21st IEEE Int’l Conf. on Data Engineering (ICDE), pp. 1004–1005 (2005). This paper received the Best Demonstration Award
Wong, K.-F., Yu, J.X., Tang, N.: Answering XML queries using path-based indexes: a survey. World Wide Web 9(3), 277–299 (2006)
Wood, P.T.: Minimising simple XPath expressions. In: Proc. the Fourth Int’l Workshop on the Web and Databases (WebDB), pp. 13–18 (2001)
Wood, P.T.: Containment for XPath fragments under DTD constraints. In: Proc. the 9th Int’l Conf. on Database Theory (ICDT), pp. 297–311 (2003)
XMark—An XML Benchmark Project. http://monetdb.cwi.nl/xml
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, KH., Whang, KY. & Han, WS. XMin: Minimizing Tree Pattern Queries with Minimality Guarantee. World Wide Web 13, 343–371 (2010). https://doi.org/10.1007/s11280-010-0089-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-010-0089-x