Article

FleXPath: flexible structure and full-text querying for XML

Authors:
Sihem Amer-Yahia

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Laks V. S. Lakshmanan

University of British Columbia, Vancouver, CA

University of British Columbia, Vancouver, CA
View Profile

,
Shashank Pandit

IIT Bombay, Mumbaî, India

IIT Bombay, Mumbaî, India
View Profile

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataJune 2004Pages 83–94https://doi.org/10.1145/1007568.1007581

Published:13 June 2004Publication History

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Pages 83–94

ABSTRACT

Querying XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying XML documents is full-text search on textual content. In this paper, we study fundamental challenges that arise when we try to integrate these two querying paradigms.While keyword search is based on approximate matching, XPath has exact match semantics. We address this mismatch by considering queries on structure as a "template", and looking for answers that best match this template and the full-text search. To achieve this, we provide an elegant definition of relaxation on structure and define primitive operators to span the space of relaxations. Query answering is now based on ranking potential answers on structural and full-text search conditions. We set out certain desirable principles for ranking schemes and propose natural ranking schemes that adhere to these principles. We develop efficient algorithms for answering top-K queries and discuss results from a comprehensive set of experiments that demonstrate the utility and scalability of the proposed framework and algorithms.

References

S. Al-Khalifa et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.Google Scholar
S. Amer-Yahia et al. TeXQuery: A Full-Text Search Extension to XQuery. In WWW 2004. Google ScholarDigital Library
S. Amer-Yahia et al. Tree pattern relaxation. In EDBT, 2002. Google ScholarDigital Library
K. Böhm et al. Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM. VLDB Journal Vol.6 No.4, Springer, 1997. Google ScholarDigital Library
J. M. Bremer and M. Gertz. XQuery/IR: Integrating XML Document and Data Retrieval. WebDB 2002.Google Scholar
E. W. Brown. Fast Evaluation of Structured Queries for Information Retrieval. SIGIR 1995. Google ScholarDigital Library
N. Bruno et al. Top-K Selection Queries Over Relational Databases: Mapping Strategies and Performance Evaluation. ACM Transactions on Database Systems (TODS), 27(2), 2002. Google ScholarDigital Library
M. J. Carey and D. Kossmann. On Saying "Enough Already!" in SQL. In SIGMOD 1997. Google ScholarDigital Library
D. Carmel et al. Searching XML Documents via XML Fragments. In SIGIR 2003. Google ScholarDigital Library
C. Chen and Y. Ling. A Sampling-Based Estimator for Top-K Query. In ICDE 2002.Google Scholar
T. T. Chinenyanga and N. Kushmerick. Expressive and Efficient Ranked Querying of XML Data. 4th International Workshop on the Web and Databases (WebDB). Santa Barbara, California, 2001.Google Scholar
S. Cohen et al. XSEarch: A Semantic Search Engine for XML. In VLDB 2003. Google ScholarDigital Library
M. Cutler et al. Using the Structure of HTML Documents to Improve Retrieval. USENIX Symposium on Internet Technologies and Systems. California 1997. Google ScholarDigital Library
E. Damiani et al. The APPROXML Tool Demonstration. In EDBT 2002. Google ScholarDigital Library
C. Delobel and M. C. Rousset. A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. International Workshop on Foundations of Models for Information Integration (FMII-2001).Google Scholar
S. Flesca et al. On the minimization of XPath queries. In VLDB 2003: 153--164 Google ScholarDigital Library
D. Florescu et al. Integrating Keyword Search into XML Query Processing. In WWW 2000. Google ScholarDigital Library
N. Fuhr and K. Grossjohann. XIRQL: An Extension of XQL for Information Retrieval. ACM SIGIR Workshop on XML and Information Retrieval. Athens, Greece, 2000.Google Scholar
N. Fuhr, T. Rlleke. HySpirit a Probabilistic Inference Engine for Hypermedia Re-trieval in Large Databases. 6th International Conference on Extending Database Technology (EDBT). Valencia, Spain, 1998. Google ScholarDigital Library
L. Guo et al. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD 2003. Google ScholarDigital Library
Y. Hayashi et al. Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, 2000.Google Scholar
V. Hristidis et al. PREFER: A system for the Efficient Execution Of Multiparametric Ranked Queries. In SIGMOD 2001. Google ScholarDigital Library
P. Kilpelainen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, November 1992.Google Scholar
G. Miklau and D. Suciu. Containment and Equivalence for an XPath Fragment. In PODS 2002. Google ScholarDigital Library
S.-H. Myaeng et al. A Flexible Model for Retrieval of SGML Documents. ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 1998. Google ScholarDigital Library
J. Naughton et al. The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.htmlGoogle Scholar
N. Polyzotis et al. Selectivity Estimation for XML Twigs. ICDE 2004. Google ScholarDigital Library
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarDigital Library
A. Schmidt et al. Querying XML Documents Made Easy: Nearest Concept Queries. In ICDE 2001. Google ScholarDigital Library
T. Schlieder. Similarity Search in XML Data using Cost-Based Query Transformations. ACM SIGMOD 2001 Web and Databases Workshop. May, 2001. Santa Barbara, California.Google Scholar
A. Theobald and G. Weikum. newblock Adding Relevance to XML newblock 3rd International Workshop on the Web and Databases. Dallas, Texas, 2000. Google ScholarDigital Library

FleXPath: flexible structure and full-text querying for XML

Recommendations

Texquery: a full-text search extension to xquery
WWW '04: Proceedings of the 13th international conference on World Wide Web

One of the key benefits of XML is its ability to represent a mix of structured and unstructured (text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very ...
Read More
XML-based information mediation with MIX
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.¹ MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
Read More
XML-based information mediation with MIX
The MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.¹ MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Conference Chairs:
Arnd Christian König
Microsoft Research
,
Stefan Dessloch
University of Kaiserslautern, Germany
,
General Chair:
Patrick Valduriez
INRIA, France
,
Program Chair:
Gerhard Weikum
University of the Saarland
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 155
  Total Citations
  View Citations
- 1,693
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.