An Effective and Efficient Approach for Keyword-Based XML Retrieval

Li, Xiaoguang; Gong, Jian; Wang, Daling; Yu, Ge

doi:10.1007/11563952_6

Xiaoguang Li¹⁹,
Jian Gong¹⁹,
Daling Wang¹⁹ &
…
Ge Yu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3739))

Included in the following conference series:

International Conference on Web-Age Information Management

812 Accesses

Abstract

IR-style keyword-based search on XML document has become the most common tool for XML query, as users need not to know the structural information of the target XML document before constructing a query. For a keyword-based search engine for XML document, the key issue is how to return some sets of meaningfully related nodes to user’s query efficiently. An ordinary solution of current approaches is to store the relationship of each pair of nodes in an XML document to an index. Obviously, this will lead to serious storage overhead. In this paper, we propose an enhanced inverted index structure (PN-Inverted Index) that stores path information in addition to node ID, and import and extend the concept of LCA to PLCA. Efficient algorithms with these concepts are designed to check the relationship of arbitrary number of nodes. Compared with existing approaches, our approach need not create additional relationship index but just utilize the existing inverted index that is much common for IR-style keyword search engine. Experimental results show that with the promise of returning meaningful answers, our search engine offers great performance benefits. Although the size of the inverted index is increased, the total size of indices of search engine is smaller than the existing approaches.

Supported by the National Natural Science Foundation of China(60173051), and the Teaching and the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institution of the Ministry of Education of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Graph Database Indexing Layer for Logic-Based Tree Pattern Matching Over Intensional XML Document Databases

XPloreRank: exploring XML data via you may also like queries

Article 11 August 2018

A general framework to resolve the MisMatch problem in XML keyword search

Article 18 April 2015

References

Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: a semantic search engine for xml. In: Proc. of VLDB (2003)
Google Scholar
Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: Proc. of VLDB (2004)
Google Scholar
Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proc. of SIGMOD (2001)
Google Scholar
Kha, D.D., Yoshikawa, M., Uemura, S.: An XML indexing structure with relative region coordinates. In: Proc. of ICDE 2001 (2001)
Google Scholar
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB 2001 (2001)
Google Scholar
Wang, W., Jiang, H., Lu, H., Yu, J.X.: PBiTree coding and efficient processing of containment joins. In: Proc. of ICDE (2003)
Google Scholar
Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: indexing xml data for efficient structural joins. In: Proc. of ICDE (2003)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRank: ranked keyword search over xml documents. In: Proc. of SIGMOD (2003)
Google Scholar
Berglund, A., Boag, S., Chamberlin, D., Fernandez, M.F., Kay, M., Robie, J., Simeon, J.: XML path language (XPath) 2.0. W3C working draft (2002), Available from http://www.w3.org/TR/xpath20/
Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: an xml query language. W3C working draft (2003), http://www.w3.org/TR/xquery/
Schmidt, A., Kersten, M., Windhouwer, M.: Querying xml document made easy: nearest concept queries. In: Proc. of ICDE (2001)
Google Scholar
Fuhr, N., Grobjoham, K.: XIRQL: a query language for information retrieval in XML document. In: Proc. of SIGIR (2001)
Google Scholar
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)
Chapter Google Scholar
Cohen, S., Kanza, Y., Kogan, Y., Nutt, W., Sagiv, Y., Serebrenik, A.: EquiX: a search and query language for XML. In: Proc. of JASIST (2002)
Google Scholar
Choi, B.: What are real dtds like? In: Proc. of the Fifth International Workshop on Web and Database (WebDB) (2002)
Google Scholar
XMark (2003), http://monetdb.cwi.nl/xml/index.html
W3C. XML schema (2003), http://www.w3.org

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Northeastern University, Shenyang, 110004, P.R.China
Xiaoguang Li, Jian Gong, Daling Wang & Ge Yu

Authors

Xiaoguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jian Gong
View author publications
You can also search for this author in PubMed Google Scholar
Daling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh & Bell Laboratories,
Wenfei Fan
College of Computer Science, Zhejiang University, 310027, Hangzhou, Zhejiang, China
Zhaohui Wu
Dept. of E. I. E, Huazhong University of Science and Technology, Wuhan, China
Jun Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Gong, J., Wang, D., Yu, G. (2005). An Effective and Efficient Approach for Keyword-Based XML Retrieval. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_6

Download citation

DOI: https://doi.org/10.1007/11563952_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics