research-article

LCA-based selection for XML document collections

Authors:
Georgia Koloniari

University of Ioannina, Ioannina, Greece

University of Ioannina, Ioannina, Greece
View Profile

,
Evaggelia Pitoura

University of Ioannina, Ioannina, Greece

University of Ioannina, Ioannina, Greece
View Profile

WWW '10: Proceedings of the 19th international conference on World wide webApril 2010Pages 511–520https://doi.org/10.1145/1772690.1772743

Published:26 April 2010Publication History

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 511–520

ABSTRACT

In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based on their goodness to the query. Goodness is determined by the relevance of the documents in the collection to the query. We consider keyword queries and support Lowest Common Ancestor (LCA) semantics for defining query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. To avoid evaluating queries against each document in a collection, we propose maintaining in a preprocessing phase, information about the LCAs of all pairs of keywords in a document and use it to approximate the properties of the LCA-based results of a query. To improve storage and processing efficiency, we use appropriate summaries of the LCA information based on Bloom filters. We address both a boolean and a weighted version of the database selection problem. Our experimental results show that our approach incurs low errors in the estimation of the goodness of a collection and provides rankings that are very close to the actual ones.

References

S. Abiteboul, I. Manolescu, N. Polyzotis, N. Preda, and C. Sun. XML processing in DHT networks. In ICDE, 2008. Google ScholarDigital Library
A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the selectivity of xml path expressions for internet scale applications. In VLDB, 2001. Google ScholarDigital Library
B. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13(7), 1970. Google ScholarDigital Library
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, 1995. Google ScholarDigital Library
S. Chernov, P. Serdyukov, M. Bender, S. Michel, G. Weikum, and C. Zimmer. Database selection and result merging in p2p web search. In DBISP2P, 2005/2006. Google ScholarDigital Library
S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. Xsearch: A semantic search engine for xml. In VLDB, 2003. Google ScholarDigital Library
J. Freire, J. R. Haritsa, M. Ramanath, P. Roy, and J. Simeon. Statix: making xml count. In SIGMOD, 2002. Google ScholarDigital Library
N. Fuhr and K. Grobjohann. Xirql: A query language for information retrieval in xml documents. In SIGIR, 2001. Google ScholarDigital Library
L. Gravano, H. Garcia-Molina, and A. Tomasic. Gloss: text-source discovery over the internet. ACM Trans. on Database Systems, 24(2):229--264, 1999. Google ScholarDigital Library
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. Xrank: Ranked keyword search over xml documents. In SIGMOD, 2003. Google ScholarDigital Library
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on xml graphs. In ICDE, 2003.Google ScholarCross Ref
G. Koloniari and E. Pitoura. Content-based routing of path queries in peer-to-peer systems. In EDBT, 2004.Google ScholarCross Ref
G. Li, J. Feng, J. Wang, and L. Zhou. Effective keyword search for valuable lcas over xml documents. In CIKM, 2007. Google ScholarDigital Library
Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004. Google ScholarDigital Library
Z. Liu and Y. Chen. Identifying meaningful return information for xml keyword search. In SIGMOD, 2007. Google ScholarDigital Library
Z. Liu and Y. Chen. Answering keyword queries on xml using materialized views. In ICDE (Poster), 2008. Google ScholarDigital Library
Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1):921--932, 2008. Google ScholarDigital Library
The niagara generator. In http://www.cs.wisc.edu/niagara.Google Scholar
N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Selectivity estimation for xml twigs. In ICDE, 2004. Google ScholarDigital Library
M. Sayyadian, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.Google ScholarCross Ref
C. Sun, C. Chan, and A. Goenka. Multiway slca-based keyword search in xml data. In WWW, 2007. Google ScholarDigital Library
Q. H. Vu, B. C. Ooi, D. Papadias, and A. K. H. Tung. A graph method for keyword-based selection of the top-k databases. In SIGMOD, 2008. Google ScholarDigital Library
W. Wang, H. Jiang, H. Lu, and J. X. Yu. Bloom histogram: Path selectivity estimation for xml data with updates. In VLDB, 2004. Google ScholarDigital Library
Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest lcas in xml databases. In SIGMOD, 2005. Google ScholarDigital Library
Y. Xu and Y. Papakonstantinou. Efficient lca based keyword search in xml data. In EDBT, 2008. Google ScholarDigital Library
B. Yu, G. Li, K. Sollins, and A. K. H. Tung. Effective keyword-based selection of relational databases. In SIGMOD, 2007. Google ScholarDigital Library

Index Terms

LCA-based selection for XML document collections
1. Information systems
  1. Information retrieval

Recommendations

Multiway SLCA-based keyword search in XML data
WWW '07: Proceedings of the 16th international conference on World Wide Web

Keyword search for smallest lowest common ancestors (SLCAs)in XML data has recently been proposed as a meaningful way to identify interesting data nodes inXML data where their subtrees contain an input set of keywords. In this paper, we generalize this ...
Read More
Keyword Proximity Search in XML Trees

Recent works have shown the benefits of keyword proximity search in querying XML documents in addition to text documents. For example, given query keywords over Shakespeare's plays in XML, the user might be interested in knowing how the keywords ...
Read More
Efficient LCA based keyword search in xml data
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Keyword search in XML documents based on the notion of lowest common ancestors (LCAs) and modifications of it has recently gained research interest [2, 3, 4]. In this paper we propose an efficient algorithm called Indexed Stack to find answers to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690
General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India
Copyright © 2010 International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
database selection
lowest common ancestor
xml
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 294
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

LCA-based selection for XML document collections

WWW '10: Proceedings of the 19th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multiway SLCA-based keyword search in XML data

Keyword Proximity Search in XML Trees

Efficient LCA based keyword search in xml data