skip to main content
10.1145/1529282.1529627acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Retrieving valid matches for XML keyword search

Published: 08 March 2009 Publication History

Abstract

Adapting keyword search to XML data has been attractive recently, generalized as XML Keyword Search (XKS). Its fundamental task is to retrieve meaningful and concise result for the given keyword query, and [1] is the latest work which returns the fragments rooted at the SLCA (Smallest LCA - Lowest Common Ancestor) nodes. To guarantee the fragments only containing meaningful nodes, [1] proposed a contributor-based filtering mechanism in its MaxMatch algorithm. However, the filtering mechanism is not sufficient. It will commit the false positive problem (discarding interesting nodes) and the redundancy problem (keeping uninteresting nodes).
In this paper, we propose a new filtering mechanism to overcome those two problems. The fundamental concept is valid contributor. A child v is a valid contributor to its parent u, if (1) v's label is unique among all u's children; or (2) for the siblings with same label as v, v's content is not covered by any of them. Our new filtering mechanism is: all the nodes in each retrieved fragment should be valid contributors to their parents. By doing so, it not only satisfies the axiomatic properties proposed by [1], but also ensures the filtered fragment more meaningful and concise. We implement our proposal in ValidMatch, and compare ValidMatch with MaxMatch on real and synthetic XML data. The result verifies our claims, and shows the effectiveness of our valid-contributor-based filtering mechanism.

References

[1]
Z. Liu and Y. Chen, "Reasoning and identifying relevant matches for XML keyword search," in VLDB 2008.
[2]
L. Guo, F. Shao, C. Botev, et al, "XRank: Ranked keyword search over XML documents," in SIGMOD, 2003.
[3]
Y. Li, C. Yu, and H. V. Jagadish, "Schema-free xquery," in VLDB, 2004, pp. 72--83.
[4]
Y. Xu and Y. Papakonstantinou, "Efficient keyword search for smallest LCAs in XML databases," in SIGMOD, 2005, pp. 527--538.
[5]
V. Hristidis, N. Koudas, et al, "Keyword proximity search in XML trees," TKDE, vol. 18, no. 4, pp. 525--539, 2006.
[6]
C. Sun, C. Y. Chan, et al, "Multiway slca-based keyword search in XML data," in WWW, 2007, pp. 1043--1052.
[7]
Z. Liu and Y. Chen, "Identifying meaningful return information for XML keyword search," in SIGMOD, 2007, pp. 329--340.
[8]
G. Li, J. Feng, et al, "Effective keyword search for valuable LCAs over XML documents," in CIKM, 2007, pp. 31--40.
[9]
Y. Xu and Y. Papakonstantinou, "Efficient LCA based keyword search in XML data," in EDBT, 2008, pp. 535--546.
[10]
I. Tatarinov and S. D. Viglas, "Storing and querying ordered XML using a relational database system," in SIGMOD, 2002, pp. 204--215.
[11]
"http://www.cs.washington.edu/research/xmldatasets/."
[12]
"http://monetdb.cwi.nl/xml/."
[13]
"www.syger.com/jsc/docs/stopwords/english.htm."
[14]
S. Cohen, J. Mamou, et al, "XSearch: A semantic search engine for XML," in VLDB, 2003, pp. 33--44.

Index Terms

  1. Retrieving valid matches for XML keyword search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing
    March 2009
    2347 pages
    ISBN:9781605581668
    DOI:10.1145/1529282
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MaxMatch
    2. SLCA
    3. XML keyword search
    4. validator

    Qualifiers

    • Research-article

    Conference

    SAC09
    Sponsor:
    SAC09: The 2009 ACM Symposium on Applied Computing
    March 8, 2009 - March 12, 2008
    Hawaii, Honolulu

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 169
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media