Article

Efficient keyword search for smallest LCAs in XML databases

Authors:
Yu Xu

University of California, San Diego

University of California, San Diego
View Profile

,
Yannis Papakonstantinou

University of California, San Diego

University of California, San Diego
View Profile

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of dataJune 2005Pages 527–538https://doi.org/10.1145/1066157.1066217

Published:14 June 2005Publication History

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Pages 527–538

ABSTRACT

Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no tree that also contains all keywords. Our core contribution, the Indexed Lookup Eager algorithm, exploits key properties of smallest trees in order to outperform prior algorithms by orders of magnitude when the query contains keywords with significantly different frequencies. The Scan Eager variant is tuned for the case where the keywords have similar frequencies. We analytically and experimentally evaluate two variants of the Eager algorithm, along with the Stack algorithm [13]. We also present the XKSearch system, which utilizes the Indexed Lookup Eager, Scan Eager and Stack algorithms and a demo of which on DBLP data is available at http://www.db.ucsd.edu/projects/xksearch. Finally, we extend the Indexed Lookup Eager algorithm to answer Lowest Common Ancestor (LCA) queries.

References

S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.Google ScholarDigital Library
V. Aguilera et al. Querying XML documents in XYleme. In SIGIR Workshop on XML and Information Retrieval, 2000.Google Scholar
S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, 2002. Google ScholarDigital Library
BerkeleyDB. http://www.sleepycat.com/.Google Scholar
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
Z. Chen, H. Jagadish, F. Korn, and N. Koudas. Counting twig matches in a tree. In ICDE, 2001. Google ScholarDigital Library
S. Cohen, J. Namou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003. Google ScholarDigital Library
D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. In WWW9, 2000. Google ScholarDigital Library
N. Fuhr and K. Grojohann. XIRQL: A Query Language for Information Retrieval in XML documents. In SIGIR, 2001. Google ScholarDigital Library
H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice-Hall, 2000. Google ScholarDigital Library
R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity Search in Databases. In VLDB, 1998. Google ScholarDigital Library
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003. Google ScholarDigital Library
V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword Proximity Search in XML Trees. Available at http://www.db.ucsd.edu/publications/treeproximity.pdf.Google Scholar
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002. Google ScholarDigital Library
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.Google ScholarCross Ref
Q. Li and B. Moon. Indexing and Querying XML data for regular path expressions. In VLDB, 2001. Google ScholarDigital Library
Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004. Google ScholarDigital Library
J. Naughton et al. The Niagara Internet Query System. IEEE Data Engineering Bulletin, 24(2):27--33, 2001.Google Scholar
B. Schieber and U. Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Computing, 17(6):1253--1262, 1988. Google ScholarDigital Library
A. Schmidt, M. L. Kersten, and M. Windhouwer. Querying XML documents made easy: Nearest concept queries. In ICDE, 2001. Google ScholarDigital Library
D. Srivastava et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.Google Scholar
I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and querying ordered XML using a relational database system. In SIGMOD, 2002. Google ScholarDigital Library
A. Theobald and G. Weikum. Adding relevance to XML. In WebDB, 2000. Google ScholarDigital Library
A. Theobald and G. Weikum. The index-based XXL search engine for querying XML data with relevance ranking. In EDBT, 2002. Google ScholarDigital Library
Z. Wen. New algorithms for the LCA problem and the binary tree reconstruction problem. Information Processing. Lett, 51(1): 11--16, 1994. Google ScholarDigital Library
XYZFind. http://www.searchtools.com/tools/xyzfind.html.Google Scholar

Efficient keyword search for smallest LCAs in XML databases
1. Information systems

Recommendations

Identifying meaningful return information for XML keyword search
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Keyword search enables web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords ...
Read More
Efficient Top-k Keyword Search on XML Streams
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer Scientists

Keywords can be used to query XML data without schema information. In this paper, a novel kind of query is proposed, top-k keyword search over XML streams. According to the set of keywords and the number of results, such query can retrieve the top-k XML ...
Read More
Towards an Effective XML Keyword Search

Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: 1) Identify the user search ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
Conference Chair:
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 352
  Total Citations
  View Citations
- 1,406
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient keyword search for smallest LCAs in XML databases

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Identifying meaningful return information for XML keyword search

Efficient Top-k Keyword Search on XML Streams

Towards an Effective XML Keyword Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient keyword search for smallest LCAs in XML databases

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Identifying meaningful return information for XML keyword search

Efficient Top-k Keyword Search on XML Streams

Towards an Effective XML Keyword Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media