ABSTRACT
Keyword search for smallest lowest common ancestors (SLCAs) in XML data has been widely accepted as a meaningful way to identify matching nodes where their subtrees contain an input set of keywords. Although SLCA and its variants (e.g.,MLCA) perform admirably in identifying matching nodes, surprisingly, they perform poorly for searches on irregular schemas that have missing elements, that is, (sub)elements that are optional, or appear in some instances of an element type but not all (e.g., a "population" subelement in a "city" element might be optional, appearing when the population is known and absent when the population is unknown). In this paper, we generalize the SLCA search paradigm to support queries involving missing elements. Specifically, we propose a novel property called optionality resilience that specifies the desired behaviors of an XML keyword search (XKS) approach for queries involving missing elements. We present two variants of a novel algorithm called MESSIAH (Missing Element-conSciouS hIgh-quality SLCA searcH), which are optionality resilient to irregular documents. MESSIAH logically transforms an XML document to a minimal full document where all missing elements are represented as empty elements, i.e., the irregular schema is made "regular", and then employs efficient strategies to identify partial and complete full SLCA nodes (SLCA nodes in the full document) from it. Specifically, it generates the same SLCA nodes as any state-of-the-art approach when the query does not involve missing elements but avoids irrelevant results when missing elements are involved. Our experimental study demonstrates the ability of MESSIAH to produce superior quality search results.
- Z. Bao, J. Lu, T. W. Ling, and B. Chen. Towards an effective xml keyword search. IEEE TKDE, 22(8):1077--1092, 2010. Google ScholarDigital Library
- S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. Xsearch: A semantic search engine for xml. In VLDB, pages 45--56, 2003. Google ScholarDigital Library
- R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997. Google ScholarDigital Library
- L. Kong, R. Gilleron, and A. Lemay. Retrieving meaningful relaxed tightest fragments for xml keyword search. In EDBT, 2009. Google ScholarDigital Library
- M. Lay. Dblp - some lessons learned. In VLDB, 2009.Google Scholar
- K.-H. Lee, K.-Y. Whang, W.-S. Han, and M.-S. K. 0002. Structural consistency: enabling xml keyword search to eliminate spurious results consistently. VLDB J., 19(4):503--529, 2010. Google ScholarDigital Library
- J. Li, C. Liu, R. Zhou, and W. Wang. Suggestion of promising result types for xml keyword search. In EDBT, pages 561--572, 2010. Google ScholarDigital Library
- Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, pages 72--83, 2004. Google ScholarDigital Library
- Z. Liu and Y. Chen. Identifying meaningful return information for xml keyword search. In SIGMOD, pages 329--340, 2007. Google ScholarDigital Library
- Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1):921--932, 2008. Google ScholarDigital Library
- Z. Liu and Y. Chen. Return specification inference and result clustering for keyword search on xml. ACM TODS, 35(2), 2010. Google ScholarDigital Library
- N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Selectivity estimation for xml twigs. In ICDE, pages 264--275. IEEE, 2004. Google ScholarDigital Library
- A. Schmidt, F. Waas, M. L. Kersten, M. J. Carey, I. Manolescu, and R. Busse. Xmark: A benchmark for xml data management. In VLDB, pages 974--985, 2002. Google ScholarDigital Library
- C. Sun, C.-Y. Chan, and A. K. Goenka. Multiway slca-based keyword search in xml data. In WWW, 2007. Google ScholarDigital Library
- I. Tatarinov, S. Viglas, K. S. Beyer, J. Shanmugasundaram, E. J. Shekita, and C. Zhang. Storing and querying ordered xml using a relational database system. In SIGMOD, pages 204--215, 2002. Google ScholarDigital Library
- A. Termehchy and M. Winslett. Effective, design independent xml keyword search. In CIKM, pages 107--116, 2009. Google ScholarDigital Library
- Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest lcas in xml databases. In SIGMOD, pages 537--538, 2005. Google ScholarDigital Library
- C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman. On supporting containment queries in relational database management systems. In SIGMOD, pages 425--436, 2001. Google ScholarDigital Library
- J. Zhou, Z. Bao, W. Wang, T. W. Ling, Z. Chen, X. Lin, and J. Guo. Fast slca and elca computation for xml keyword queries based on set intersection. In ICDE, pages 905--916, 2012. Google ScholarDigital Library
Index Terms
- MESSIAH: missing element-conscious SLCA nodes search in XML data
Recommendations
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents
Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then ...
ASTERIX: Ambiguity and Missing Element-Aware XML Keyword Search Engine
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalDespite a decade of research on XML keyword search (XKS), demonstration of a high quality XKS system has still eluded the information retrieval community. Existing XKS engines primarily suffer from two limitations. First, although the smallest lowest ...
XCDSearch: An XML Context-Driven Search Engine
We present in this paper, a context-driven search engine called XCDSearch for answering XML Keyword-based queries as well as Loosely Structured queries, using a stack-based sort-merge algorithm. Most current research is focused on building relationships ...
Comments