Abstract
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Amer-Yahia S, Koudas N, Marian A et al (2005) Structure and content scoring for XML. In: Proceedings of the 31st international conference on very large data bases, Trondheim, Norway, pp 361–372
Berglund A, Boag S, Chamberlin D, Fernandez MF et al (2002) XML path language (XPath) 2.0 W3C working draft 16. Technical Report WD-xpath20-20020816. World Wide Web Consortium
Boag S, Chamberlin D, Fernandez MF et al (2002) XQuery 1.0: an XML Query Language W3C working draft 16. Technical Report WD-xquery-20020816. World Wide Web Consortium
Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, Wisconsin, pp 310–321
Chen T, Lu J, Ling TW (2005) On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, Maryland, USA, pp 455–466
Chien S, Vagena Z, Zhang D et al (2002) Efficient structural joins on indexed XML documents. In: Proceedings of 28th international conference on very large data bases, Hong Kong, China, pp 263–274
Chinenyanga TT, Jushmerick N (2001) Expressive and efficient ranked querying of XML data. In: Proceedings of the fourth international workshop on the web and databases, Santa Barbara, California, USA, pp 1–6
Chung C, Min J, Shim K (2002) APEX: an adaptive path index for XML data. In: Proceedings of the 2002 ACM SIGMOD international conference on management of Data, Madison, Wisconsin, pp 121–132
He H, Yang J (2004) Multiresolution indexing of XML for frequent queries. In: Proceedings of the 20th international conference on data engineering, Boston, MA, USA, pp 683–694
Hunter A, Liu W (2006) Merging uncertain information with semantic heterogeneity in XML. Knowl Inf Syst 9(2): 230–258
Jiang H, Wang W, Lu H et al (2003) Holistic twig joins on indexed XML documents. In: Proceedings of 29th international conference on very large data bases, Berlin, Germany, pp 273–284
Kailing K, Kriegel HP, Pfeifle M et al (2006) Extending metric index structures for efficient range query processing. Knowl Inf Syst 10(2): 211–227
Kaushik R, Shenoy P, Bohannon P et al (2002) Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th international conference on data engineering, San Jose, USA, pp 129–140
Li Q, Moon B (2001) Indexing and Querying XML data for regular path expressions. In: Proceedings of 27th international conference on very large data bases, Roma, Italy, pp 361–370
Liu S, Zou Q, Chu WW (2004) Configurable indexing and ranking for XML information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Sheffield, UK, pp 88–95
McHugh J, Abiteboul S, Goldman R et al (1997) Lore: a database management system for semistructured data. SIGMOD, Rec 26(3): 54–66
Milo T, Suciu D (1999) Index structures for path expressions. In: Proceedings of the database theory, 7th international conference, Jerusalem, Israel, pp 277–295
Polyzotis N, Garofalakis M, Loannidis Y (2004) Approximate XML query answers. In: Proceedings of the ACM SIGMOD international conference on management of data, Paris, France, pp 263–274
Prasad KH, Kumar PS (2005) Efficient indexing and querying of XML data using modified prufer sequences. In: Proceedings of the 2005 ACM CIKM international conference on information and knowledge management, Bremen, Germany, pp 397–404
Qun C, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, California, USA, pp 134–144
Rao P and Moon B (2004) PRIX: indexing and querying XML using prufer sequences. In: Proceedings of the 20th international conference on data engineering, Boston, MA, USA, pp 288–300
Schlieder T (2002) Schema-driven evaluation of approximate tree-pattern queries. In: Proceedings of the 8th international conference on extending database technology, Prague, Czech Republic, pp 514–532
Schlieder T, Meuss H (2002) Querying and ranking XML documents. J Am Soc Inf Sci Technol 53(6): 498–503
Schmidt A (2003) XMark. http://monetdb.cwi.nl/xml/
UW XML Repository. http://www.cs.washington.edu/research/xmldatasets/
Wang H, Park S, Fan W et al (2003) ViST: a dynamic index method for querying XML data by tree structures. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, California, USA, pp 110–121
Weigel F, Meuss H, Schulz KU et al (2004) Content and structure in indexing and ranking XML. In: Proceedings of the seventh international workshop on the web and databases, Maison de la Chimie, Paris, France, pp 67–72
Zhu Q, Tao Y, Zuzarte C (2005) Optimizing complex queries based on similarities of subqueries. Knowl Inf Syst 8(3): 350–373
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ng, P.K.L., Ng, V.T.Y. RRSi: indexing XML data for proximity twig queries. Knowl Inf Syst 17, 193–216 (2008). https://doi.org/10.1007/s10115-008-0122-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0122-x