skip to main content
10.1145/1772690.1772743acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

LCA-based selection for XML document collections

Published:26 April 2010Publication History

ABSTRACT

In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based on their goodness to the query. Goodness is determined by the relevance of the documents in the collection to the query. We consider keyword queries and support Lowest Common Ancestor (LCA) semantics for defining query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. To avoid evaluating queries against each document in a collection, we propose maintaining in a preprocessing phase, information about the LCAs of all pairs of keywords in a document and use it to approximate the properties of the LCA-based results of a query. To improve storage and processing efficiency, we use appropriate summaries of the LCA information based on Bloom filters. We address both a boolean and a weighted version of the database selection problem. Our experimental results show that our approach incurs low errors in the estimation of the goodness of a collection and provides rankings that are very close to the actual ones.

References

  1. S. Abiteboul, I. Manolescu, N. Polyzotis, N. Preda, and C. Sun. XML processing in DHT networks. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the selectivity of xml path expressions for internet scale applications. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13(7), 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chernov, P. Serdyukov, M. Bender, S. Michel, G. Weikum, and C. Zimmer. Database selection and result merging in p2p web search. In DBISP2P, 2005/2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. Xsearch: A semantic search engine for xml. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Freire, J. R. Haritsa, M. Ramanath, P. Roy, and J. Simeon. Statix: making xml count. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Fuhr and K. Grobjohann. Xirql: A query language for information retrieval in xml documents. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Gravano, H. Garcia-Molina, and A. Tomasic. Gloss: text-source discovery over the internet. ACM Trans. on Database Systems, 24(2):229--264, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. Xrank: Ranked keyword search over xml documents. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on xml graphs. In ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. G. Koloniari and E. Pitoura. Content-based routing of path queries in peer-to-peer systems. In EDBT, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. G. Li, J. Feng, J. Wang, and L. Zhou. Effective keyword search for valuable lcas over xml documents. In CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Liu and Y. Chen. Identifying meaningful return information for xml keyword search. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Liu and Y. Chen. Answering keyword queries on xml using materialized views. In ICDE (Poster), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1):921--932, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. The niagara generator. In http://www.cs.wisc.edu/niagara.Google ScholarGoogle Scholar
  19. N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Selectivity estimation for xml twigs. In ICDE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Sayyadian, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Sun, C. Chan, and A. Goenka. Multiway slca-based keyword search in xml data. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Q. H. Vu, B. C. Ooi, D. Papadias, and A. K. H. Tung. A graph method for keyword-based selection of the top-k databases. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Wang, H. Jiang, H. Lu, and J. X. Yu. Bloom histogram: Path selectivity estimation for xml data with updates. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest lcas in xml databases. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Xu and Y. Papakonstantinou. Efficient lca based keyword search in xml data. In EDBT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Yu, G. Li, K. Sollins, and A. K. H. Tung. Effective keyword-based selection of relational databases. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LCA-based selection for XML document collections

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub