ABSTRACT
Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no tree that also contains all keywords. Our core contribution, the Indexed Lookup Eager algorithm, exploits key properties of smallest trees in order to outperform prior algorithms by orders of magnitude when the query contains keywords with significantly different frequencies. The Scan Eager variant is tuned for the case where the keywords have similar frequencies. We analytically and experimentally evaluate two variants of the Eager algorithm, along with the Stack algorithm [13]. We also present the XKSearch system, which utilizes the Indexed Lookup Eager, Scan Eager and Stack algorithms and a demo of which on DBLP data is available at http://www.db.ucsd.edu/projects/xksearch. Finally, we extend the Indexed Lookup Eager algorithm to answer Lowest Common Ancestor (LCA) queries.
- S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.Google ScholarDigital Library
- V. Aguilera et al. Querying XML documents in XYleme. In SIGIR Workshop on XML and Information Retrieval, 2000.Google Scholar
- S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, 2002. Google ScholarDigital Library
- BerkeleyDB. http://www.sleepycat.com/.Google Scholar
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
- Z. Chen, H. Jagadish, F. Korn, and N. Koudas. Counting twig matches in a tree. In ICDE, 2001. Google ScholarDigital Library
- S. Cohen, J. Namou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003. Google ScholarDigital Library
- D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. In WWW9, 2000. Google ScholarDigital Library
- N. Fuhr and K. Grojohann. XIRQL: A Query Language for Information Retrieval in XML documents. In SIGIR, 2001. Google ScholarDigital Library
- H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice-Hall, 2000. Google ScholarDigital Library
- R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity Search in Databases. In VLDB, 1998. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003. Google ScholarDigital Library
- V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword Proximity Search in XML Trees. Available at http://www.db.ucsd.edu/publications/treeproximity.pdf.Google Scholar
- V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002. Google ScholarDigital Library
- V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.Google ScholarCross Ref
- Q. Li and B. Moon. Indexing and Querying XML data for regular path expressions. In VLDB, 2001. Google ScholarDigital Library
- Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004. Google ScholarDigital Library
- J. Naughton et al. The Niagara Internet Query System. IEEE Data Engineering Bulletin, 24(2):27--33, 2001.Google Scholar
- B. Schieber and U. Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Computing, 17(6):1253--1262, 1988. Google ScholarDigital Library
- A. Schmidt, M. L. Kersten, and M. Windhouwer. Querying XML documents made easy: Nearest concept queries. In ICDE, 2001. Google ScholarDigital Library
- D. Srivastava et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.Google Scholar
- I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and querying ordered XML using a relational database system. In SIGMOD, 2002. Google ScholarDigital Library
- A. Theobald and G. Weikum. Adding relevance to XML. In WebDB, 2000. Google ScholarDigital Library
- A. Theobald and G. Weikum. The index-based XXL search engine for querying XML data with relevance ranking. In EDBT, 2002. Google ScholarDigital Library
- Z. Wen. New algorithms for the LCA problem and the binary tree reconstruction problem. Information Processing. Lett, 51(1): 11--16, 1994. Google ScholarDigital Library
- XYZFind. http://www.searchtools.com/tools/xyzfind.html.Google Scholar
- Efficient keyword search for smallest LCAs in XML databases
Recommendations
Identifying meaningful return information for XML keyword search
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataKeyword search enables web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords ...
Efficient Top-k Keyword Search on XML Streams
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer ScientistsKeywords can be used to query XML data without schema information. In this paper, a novel kind of query is proposed, top-k keyword search over XML streams. According to the set of keywords and the number of results, such query can retrieve the top-k XML ...
Towards an Effective XML Keyword Search
Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: 1) Identify the user search ...
Comments