skip to main content
10.1145/1066157.1066217acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Efficient keyword search for smallest LCAs in XML databases

Published:14 June 2005Publication History

ABSTRACT

Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no tree that also contains all keywords. Our core contribution, the Indexed Lookup Eager algorithm, exploits key properties of smallest trees in order to outperform prior algorithms by orders of magnitude when the query contains keywords with significantly different frequencies. The Scan Eager variant is tuned for the case where the keywords have similar frequencies. We analytically and experimentally evaluate two variants of the Eager algorithm, along with the Stack algorithm [13]. We also present the XKSearch system, which utilizes the Indexed Lookup Eager, Scan Eager and Stack algorithms and a demo of which on DBLP data is available at http://www.db.ucsd.edu/projects/xksearch. Finally, we extend the Indexed Lookup Eager algorithm to answer Lowest Common Ancestor (LCA) queries.

References

  1. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Aguilera et al. Querying XML documents in XYleme. In SIGIR Workshop on XML and Information Retrieval, 2000.Google ScholarGoogle Scholar
  3. S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BerkeleyDB. http://www.sleepycat.com/.Google ScholarGoogle Scholar
  5. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Chen, H. Jagadish, F. Korn, and N. Koudas. Counting twig matches in a tree. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cohen, J. Namou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. In WWW9, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Fuhr and K. Grojohann. XIRQL: A Query Language for Information Retrieval in XML documents. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice-Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity Search in Databases. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword Proximity Search in XML Trees. Available at http://www.db.ucsd.edu/publications/treeproximity.pdf.Google ScholarGoogle Scholar
  15. V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. Q. Li and B. Moon. Indexing and Querying XML data for regular path expressions. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Naughton et al. The Niagara Internet Query System. IEEE Data Engineering Bulletin, 24(2):27--33, 2001.Google ScholarGoogle Scholar
  20. B. Schieber and U. Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Computing, 17(6):1253--1262, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Schmidt, M. L. Kersten, and M. Windhouwer. Querying XML documents made easy: Nearest concept queries. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Srivastava et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.Google ScholarGoogle Scholar
  23. I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and querying ordered XML using a relational database system. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Theobald and G. Weikum. Adding relevance to XML. In WebDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Theobald and G. Weikum. The index-based XXL search engine for querying XML data with relevance ranking. In EDBT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Wen. New algorithms for the LCA problem and the binary tree reconstruction problem. Information Processing. Lett, 51(1): 11--16, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. XYZFind. http://www.searchtools.com/tools/xyzfind.html.Google ScholarGoogle Scholar
  1. Efficient keyword search for smallest LCAs in XML databases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
      June 2005
      990 pages
      ISBN:1595930604
      DOI:10.1145/1066157
      • Conference Chair:
      • Fatma Ozcan

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 June 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader