Abstract
IR-style keyword-based search on XML document has become the most common tool for XML query, as users need not to know the structural information of the target XML document before constructing a query. For a keyword-based search engine for XML document, the key issue is how to return some sets of meaningfully related nodes to user’s query efficiently. An ordinary solution of current approaches is to store the relationship of each pair of nodes in an XML document to an index. Obviously, this will lead to serious storage overhead. In this paper, we propose an enhanced inverted index structure (PN-Inverted Index) that stores path information in addition to node ID, and import and extend the concept of LCA to PLCA. Efficient algorithms with these concepts are designed to check the relationship of arbitrary number of nodes. Compared with existing approaches, our approach need not create additional relationship index but just utilize the existing inverted index that is much common for IR-style keyword search engine. Experimental results show that with the promise of returning meaningful answers, our search engine offers great performance benefits. Although the size of the inverted index is increased, the total size of indices of search engine is smaller than the existing approaches.
Supported by the National Natural Science Foundation of China(60173051), and the Teaching and the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institution of the Ministry of Education of China.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: a semantic search engine for xml. In: Proc. of VLDB (2003)
Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: Proc. of VLDB (2004)
Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proc. of SIGMOD (2001)
Kha, D.D., Yoshikawa, M., Uemura, S.: An XML indexing structure with relative region coordinates. In: Proc. of ICDE 2001 (2001)
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB 2001 (2001)
Wang, W., Jiang, H., Lu, H., Yu, J.X.: PBiTree coding and efficient processing of containment joins. In: Proc. of ICDE (2003)
Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: indexing xml data for efficient structural joins. In: Proc. of ICDE (2003)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRank: ranked keyword search over xml documents. In: Proc. of SIGMOD (2003)
Berglund, A., Boag, S., Chamberlin, D., Fernandez, M.F., Kay, M., Robie, J., Simeon, J.: XML path language (XPath) 2.0. W3C working draft (2002), Available from http://www.w3.org/TR/xpath20/
Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: an xml query language. W3C working draft (2003), http://www.w3.org/TR/xquery/
Schmidt, A., Kersten, M., Windhouwer, M.: Querying xml document made easy: nearest concept queries. In: Proc. of ICDE (2001)
Fuhr, N., Grobjoham, K.: XIRQL: a query language for information retrieval in XML document. In: Proc. of SIGIR (2001)
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)
Cohen, S., Kanza, Y., Kogan, Y., Nutt, W., Sagiv, Y., Serebrenik, A.: EquiX: a search and query language for XML. In: Proc. of JASIST (2002)
Choi, B.: What are real dtds like? In: Proc. of the Fifth International Workshop on Web and Database (WebDB) (2002)
XMark (2003), http://monetdb.cwi.nl/xml/index.html
W3C. XML schema (2003), http://www.w3.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, X., Gong, J., Wang, D., Yu, G. (2005). An Effective and Efficient Approach for Keyword-Based XML Retrieval. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_6
Download citation
DOI: https://doi.org/10.1007/11563952_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)