Abstract
The widespread adoption of XML holds the promise that document structure can be exploited to specify precise database queries. However, users may have only a limited knowledge of the XML structure, and may be unable to produce a correct XQuery expression, especially in the context of a heterogeneous information collection. The default is to use keyword-based search and we are all too familiar with how difficult it is to obtain precise answers by these means. We seek to address these problems by introducing the notion of Meaningful Query Focus (MQF) for finding related nodes within an XML document. MQF enables users to take full advantage of the preciseness and efficiency of XQuery without requiring (perfect) knowledge of the document structure. Such a Schema-Free XQuery is potentially of value not just to casual users with partial knowledge of schema, but also to experts working in data integration or data evolution. In such a context, a schema-free query, once written, can be applied universally to multiple data sources that supply similar content under different schemas, and applied “forever” as these schemas evolve. Our experimental evaluation found that it is possible to express a wide variety of queries in a schema-free manner and efficiently retrieve correct results over a broad diversity of schemas. Furthermore, the evaluation of a schema-free query is not expensive: using a novel stack-based algorithm we developed for computing MQF, the overhead is from 1 to 4 times the execution time of an equivalent schema-aware query. The evaluation cost of schema-free queries can be further reduced by as much as 68% using a selectivity-based algorithm we develop to enable the integration of MQF operation into the query pipeline.
Similar content being viewed by others
References
WordNet: http://www.cogsci.princeton.edu/~wn/
Aditya, B. et al.: BANKS: Browsing and keyword searching in relational databases. VLDB (2002)
Agrawal, S. et al.: DBXplorer: a system for keyword-based search over relational databases. ICDE (2002)
Al-Khalifa, S. et al.: Structural joins: A primitive for efficient XML query pattern matching. ICDE (2001)
Al-Khalifa, S. et al.: Querying structured text in an XML database. SIGMOD (2003)
Amer-Yahai, S. et al.: FleXPath: Flexible structure and full-text querying for XML. SIGMOD (2004)
Amer-Yahia, S. et al.: TeXQuery: A full-text search extension to XQuery. WWW (2004)
Bruno, N. et al.: Holistic twig joins: Optimal XML pattern matching. SIGMOD (2002)
Burton-Jones, A. et al.: A heuristic-based methodology for semantic augmentation of user queries on the Web. ER (2003)
Carmel, D. et al.: Searching XML documents via XML fragments. SIGIR (2003)
Chamberlin, D.: XQuery: An XML query language. IBM Syst. J. 41, 597–615 (2003)
Chien, S.-Y. et al.: Efficient structural joins on indexed XML documents. VLDB (2002)
Chinenyanga, T.T., Kushmerick, N.: Expressive and efficient ranked querying of XML data. WebDB (2001)
Cohen, S. et al.: XSEarch: A semantic search engine for XML. VLDB (2003)
Deerwester, S. et al.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. (1990)
Florescu, D. et al.: Integrating keyword search into XML query processing. Comput. Netw. 33, 119–135 (2000)
Fuhr, N., Großjohann, K.: XIRQL: An extension of XQL for information retrieval. SIGIR (2000)
Goldman, R. et al.: Proximity search in databases. VLDB (1998)
Guo, L. et al.: XRANK: Ranked keyword search over XML documents. SIGMOD (2003)
Halevy, A. et al.: Crossing the structure chasm. CIDR (2003)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Hristidis, V. et al.: Keyword proximity search on XML graphs. ICDE (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. VLDB (2002)
Jagadish, H.V. et al.: TIMBER: A native XML database. VLDB J. 11(4), 274–291 (2002)
Ley, M.: DBLP bibliography (2003)
Li, Y. et al.: NaLIX: An interactive natural language interface for querying XML. SIGMOD (2005)
Quass, D. et al.: Querying semistructured heterogeneous information. DOOD (1995)
Resnik, P.S.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural langauge. J. Artif. Intell. Res. 11, 95–130 (1999)
Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)
Schlieder, T.: Similarity search in {XML} data using cost-based query tranformations. SIGMOD (2001)
Schmidt, A. et al.: Querying XML documents made easy: Nearest concept queries. ICDE (2001)
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. EDBT (2002)
W3C: XML Query Use Cases. W3C Working Draft. Available at http://www.w3.org/TR/xquery-use-cases/ (2003)
W3C: XML Schema. W3C Recommendation. Available at http://www.w3.org/XML/Schema (2003)
W3C: XQuery 1.0. W3C Working Draft. Available at http://www.w3.org/TR/xquery/ (2004)
W3C: XQuery 1.0 and XPath 2.0 Full-Text. W3C Working Draft. Available at http://www.w3.org/TR/xquery-full-text/ (2005)
Wen, Z.: New algorithms for the LCA problem and the binary tree reconstruction problem. Inf. Process. 51(1), 11–16 (1994)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. SIGMOD (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Yu, C. & Jagadish, H.V. Enabling Schema-Free XQuery with meaningful query focus. The VLDB Journal 17, 355–377 (2008). https://doi.org/10.1007/s00778-006-0003-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0003-4