ABSTRACT
Querying XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying XML documents is full-text search on textual content. In this paper, we study fundamental challenges that arise when we try to integrate these two querying paradigms.While keyword search is based on approximate matching, XPath has exact match semantics. We address this mismatch by considering queries on structure as a "template", and looking for answers that best match this template and the full-text search. To achieve this, we provide an elegant definition of relaxation on structure and define primitive operators to span the space of relaxations. Query answering is now based on ranking potential answers on structural and full-text search conditions. We set out certain desirable principles for ranking schemes and propose natural ranking schemes that adhere to these principles. We develop efficient algorithms for answering top-K queries and discuss results from a comprehensive set of experiments that demonstrate the utility and scalability of the proposed framework and algorithms.
- S. Al-Khalifa et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.Google Scholar
- S. Amer-Yahia et al. TeXQuery: A Full-Text Search Extension to XQuery. In WWW 2004. Google ScholarDigital Library
- S. Amer-Yahia et al. Tree pattern relaxation. In EDBT, 2002. Google ScholarDigital Library
- K. Böhm et al. Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM. VLDB Journal Vol.6 No.4, Springer, 1997. Google ScholarDigital Library
- J. M. Bremer and M. Gertz. XQuery/IR: Integrating XML Document and Data Retrieval. WebDB 2002.Google Scholar
- E. W. Brown. Fast Evaluation of Structured Queries for Information Retrieval. SIGIR 1995. Google ScholarDigital Library
- N. Bruno et al. Top-K Selection Queries Over Relational Databases: Mapping Strategies and Performance Evaluation. ACM Transactions on Database Systems (TODS), 27(2), 2002. Google ScholarDigital Library
- M. J. Carey and D. Kossmann. On Saying "Enough Already!" in SQL. In SIGMOD 1997. Google ScholarDigital Library
- D. Carmel et al. Searching XML Documents via XML Fragments. In SIGIR 2003. Google ScholarDigital Library
- C. Chen and Y. Ling. A Sampling-Based Estimator for Top-K Query. In ICDE 2002.Google Scholar
- T. T. Chinenyanga and N. Kushmerick. Expressive and Efficient Ranked Querying of XML Data. 4th International Workshop on the Web and Databases (WebDB). Santa Barbara, California, 2001.Google Scholar
- S. Cohen et al. XSEarch: A Semantic Search Engine for XML. In VLDB 2003. Google ScholarDigital Library
- M. Cutler et al. Using the Structure of HTML Documents to Improve Retrieval. USENIX Symposium on Internet Technologies and Systems. California 1997. Google ScholarDigital Library
- E. Damiani et al. The APPROXML Tool Demonstration. In EDBT 2002. Google ScholarDigital Library
- C. Delobel and M. C. Rousset. A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. International Workshop on Foundations of Models for Information Integration (FMII-2001).Google Scholar
- S. Flesca et al. On the minimization of XPath queries. In VLDB 2003: 153--164 Google ScholarDigital Library
- D. Florescu et al. Integrating Keyword Search into XML Query Processing. In WWW 2000. Google ScholarDigital Library
- N. Fuhr and K. Grossjohann. XIRQL: An Extension of XQL for Information Retrieval. ACM SIGIR Workshop on XML and Information Retrieval. Athens, Greece, 2000.Google Scholar
- N. Fuhr, T. Rlleke. HySpirit a Probabilistic Inference Engine for Hypermedia Re-trieval in Large Databases. 6th International Conference on Extending Database Technology (EDBT). Valencia, Spain, 1998. Google ScholarDigital Library
- L. Guo et al. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD 2003. Google ScholarDigital Library
- Y. Hayashi et al. Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, 2000.Google Scholar
- V. Hristidis et al. PREFER: A system for the Efficient Execution Of Multiparametric Ranked Queries. In SIGMOD 2001. Google ScholarDigital Library
- P. Kilpelainen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, November 1992.Google Scholar
- G. Miklau and D. Suciu. Containment and Equivalence for an XPath Fragment. In PODS 2002. Google ScholarDigital Library
- S.-H. Myaeng et al. A Flexible Model for Retrieval of SGML Documents. ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 1998. Google ScholarDigital Library
- J. Naughton et al. The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.htmlGoogle Scholar
- N. Polyzotis et al. Selectivity Estimation for XML Twigs. ICDE 2004. Google ScholarDigital Library
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarDigital Library
- A. Schmidt et al. Querying XML Documents Made Easy: Nearest Concept Queries. In ICDE 2001. Google ScholarDigital Library
- T. Schlieder. Similarity Search in XML Data using Cost-Based Query Transformations. ACM SIGMOD 2001 Web and Databases Workshop. May, 2001. Santa Barbara, California.Google Scholar
- A. Theobald and G. Weikum. newblock Adding Relevance to XML newblock 3rd International Workshop on the Web and Databases. Dallas, Texas, 2000. Google ScholarDigital Library
- FleXPath: flexible structure and full-text querying for XML
Recommendations
Texquery: a full-text search extension to xquery
WWW '04: Proceedings of the 13th international conference on World Wide WebOne of the key benefits of XML is its ability to represent a mix of structured and unstructured (text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very ...
XML-based information mediation with MIX
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataThe MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.1 MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
XML-based information mediation with MIX
The MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.1 MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
Comments