Skip to main content
Log in

RRSi: indexing XML data for proximity twig queries

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amer-Yahia S, Koudas N, Marian A et al (2005) Structure and content scoring for XML. In: Proceedings of the 31st international conference on very large data bases, Trondheim, Norway, pp 361–372

  2. Berglund A, Boag S, Chamberlin D, Fernandez MF et al (2002) XML path language (XPath) 2.0 W3C working draft 16. Technical Report WD-xpath20-20020816. World Wide Web Consortium

  3. Boag S, Chamberlin D, Fernandez MF et al (2002) XQuery 1.0: an XML Query Language W3C working draft 16. Technical Report WD-xquery-20020816. World Wide Web Consortium

  4. Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, Wisconsin, pp 310–321

  5. Chen T, Lu J, Ling TW (2005) On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, Maryland, USA, pp 455–466

  6. Chien S, Vagena Z, Zhang D et al (2002) Efficient structural joins on indexed XML documents. In: Proceedings of 28th international conference on very large data bases, Hong Kong, China, pp 263–274

  7. Chinenyanga TT, Jushmerick N (2001) Expressive and efficient ranked querying of XML data. In: Proceedings of the fourth international workshop on the web and databases, Santa Barbara, California, USA, pp 1–6

  8. Chung C, Min J, Shim K (2002) APEX: an adaptive path index for XML data. In: Proceedings of the 2002 ACM SIGMOD international conference on management of Data, Madison, Wisconsin, pp 121–132

  9. He H, Yang J (2004) Multiresolution indexing of XML for frequent queries. In: Proceedings of the 20th international conference on data engineering, Boston, MA, USA, pp 683–694

  10. Hunter A, Liu W (2006) Merging uncertain information with semantic heterogeneity in XML. Knowl Inf Syst 9(2): 230–258

    Article  Google Scholar 

  11. Jiang H, Wang W, Lu H et al (2003) Holistic twig joins on indexed XML documents. In: Proceedings of 29th international conference on very large data bases, Berlin, Germany, pp 273–284

  12. Kailing K, Kriegel HP, Pfeifle M et al (2006) Extending metric index structures for efficient range query processing. Knowl Inf Syst 10(2): 211–227

    Article  Google Scholar 

  13. Kaushik R, Shenoy P, Bohannon P et al (2002) Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th international conference on data engineering, San Jose, USA, pp 129–140

  14. Li Q, Moon B (2001) Indexing and Querying XML data for regular path expressions. In: Proceedings of 27th international conference on very large data bases, Roma, Italy, pp 361–370

  15. Liu S, Zou Q, Chu WW (2004) Configurable indexing and ranking for XML information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Sheffield, UK, pp 88–95

  16. McHugh J, Abiteboul S, Goldman R et al (1997) Lore: a database management system for semistructured data. SIGMOD, Rec 26(3): 54–66

    Article  Google Scholar 

  17. Milo T, Suciu D (1999) Index structures for path expressions. In: Proceedings of the database theory, 7th international conference, Jerusalem, Israel, pp 277–295

  18. Polyzotis N, Garofalakis M, Loannidis Y (2004) Approximate XML query answers. In: Proceedings of the ACM SIGMOD international conference on management of data, Paris, France, pp 263–274

  19. Prasad KH, Kumar PS (2005) Efficient indexing and querying of XML data using modified prufer sequences. In: Proceedings of the 2005 ACM CIKM international conference on information and knowledge management, Bremen, Germany, pp 397–404

  20. Qun C, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, California, USA, pp 134–144

  21. Rao P and Moon B (2004) PRIX: indexing and querying XML using prufer sequences. In: Proceedings of the 20th international conference on data engineering, Boston, MA, USA, pp 288–300

  22. Schlieder T (2002) Schema-driven evaluation of approximate tree-pattern queries. In: Proceedings of the 8th international conference on extending database technology, Prague, Czech Republic, pp 514–532

  23. Schlieder T, Meuss H (2002) Querying and ranking XML documents. J Am Soc Inf Sci Technol 53(6): 498–503

    Google Scholar 

  24. Schmidt A (2003) XMark. http://monetdb.cwi.nl/xml/

  25. UW XML Repository. http://www.cs.washington.edu/research/xmldatasets/

  26. Wang H, Park S, Fan W et al (2003) ViST: a dynamic index method for querying XML data by tree structures. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, California, USA, pp 110–121

  27. Weigel F, Meuss H, Schulz KU et al (2004) Content and structure in indexing and ranking XML. In: Proceedings of the seventh international workshop on the web and databases, Maison de la Chimie, Paris, France, pp 67–72

  28. Zhu Q, Tao Y, Zuzarte C (2005) Optimizing complex queries based on similarities of subqueries. Knowl Inf Syst 8(3): 350–373

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick K. L. Ng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ng, P.K.L., Ng, V.T.Y. RRSi: indexing XML data for proximity twig queries. Knowl Inf Syst 17, 193–216 (2008). https://doi.org/10.1007/s10115-008-0122-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0122-x

Keywords

Navigation