skip to main content
10.1145/2463676.2463699acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

MESSIAH: missing element-conscious SLCA nodes search in XML data

Published:22 June 2013Publication History

ABSTRACT

Keyword search for smallest lowest common ancestors (SLCAs) in XML data has been widely accepted as a meaningful way to identify matching nodes where their subtrees contain an input set of keywords. Although SLCA and its variants (e.g.,MLCA) perform admirably in identifying matching nodes, surprisingly, they perform poorly for searches on irregular schemas that have missing elements, that is, (sub)elements that are optional, or appear in some instances of an element type but not all (e.g., a "population" subelement in a "city" element might be optional, appearing when the population is known and absent when the population is unknown). In this paper, we generalize the SLCA search paradigm to support queries involving missing elements. Specifically, we propose a novel property called optionality resilience that specifies the desired behaviors of an XML keyword search (XKS) approach for queries involving missing elements. We present two variants of a novel algorithm called MESSIAH (Missing Element-conSciouS hIgh-quality SLCA searcH), which are optionality resilient to irregular documents. MESSIAH logically transforms an XML document to a minimal full document where all missing elements are represented as empty elements, i.e., the irregular schema is made "regular", and then employs efficient strategies to identify partial and complete full SLCA nodes (SLCA nodes in the full document) from it. Specifically, it generates the same SLCA nodes as any state-of-the-art approach when the query does not involve missing elements but avoids irrelevant results when missing elements are involved. Our experimental study demonstrates the ability of MESSIAH to produce superior quality search results.

References

  1. Z. Bao, J. Lu, T. W. Ling, and B. Chen. Towards an effective xml keyword search. IEEE TKDE, 22(8):1077--1092, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. Xsearch: A semantic search engine for xml. In VLDB, pages 45--56, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Kong, R. Gilleron, and A. Lemay. Retrieving meaningful relaxed tightest fragments for xml keyword search. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Lay. Dblp - some lessons learned. In VLDB, 2009.Google ScholarGoogle Scholar
  6. K.-H. Lee, K.-Y. Whang, W.-S. Han, and M.-S. K. 0002. Structural consistency: enabling xml keyword search to eliminate spurious results consistently. VLDB J., 19(4):503--529, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Li, C. Liu, R. Zhou, and W. Wang. Suggestion of promising result types for xml keyword search. In EDBT, pages 561--572, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, pages 72--83, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Z. Liu and Y. Chen. Identifying meaningful return information for xml keyword search. In SIGMOD, pages 329--340, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1):921--932, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Liu and Y. Chen. Return specification inference and result clustering for keyword search on xml. ACM TODS, 35(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Selectivity estimation for xml twigs. In ICDE, pages 264--275. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Schmidt, F. Waas, M. L. Kersten, M. J. Carey, I. Manolescu, and R. Busse. Xmark: A benchmark for xml data management. In VLDB, pages 974--985, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Sun, C.-Y. Chan, and A. K. Goenka. Multiway slca-based keyword search in xml data. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Tatarinov, S. Viglas, K. S. Beyer, J. Shanmugasundaram, E. J. Shekita, and C. Zhang. Storing and querying ordered xml using a relational database system. In SIGMOD, pages 204--215, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Termehchy and M. Winslett. Effective, design independent xml keyword search. In CIKM, pages 107--116, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest lcas in xml databases. In SIGMOD, pages 537--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman. On supporting containment queries in relational database management systems. In SIGMOD, pages 425--436, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Zhou, Z. Bao, W. Wang, T. W. Ling, Z. Chen, X. Lin, and J. Guo. Fast slca and elca computation for xml keyword queries based on set intersection. In ICDE, pages 905--916, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MESSIAH: missing element-conscious SLCA nodes search in XML data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
        June 2013
        1322 pages
        ISBN:9781450320375
        DOI:10.1145/2463676

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '13 Paper Acceptance Rate76of372submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader