Skip to main content

Clustered Absolute Path Index for XML Document: On Efficient Processing of Twig Queries

  • Conference paper
Book cover Advanced Web and Network Technologies, and Applications (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3842))

Included in the following conference series:

  • 764 Accesses

Abstract

Finding all the occurrences of a twig pattern in an XML document is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process twig queries based on region encoding. While each element in source document is given two or more numbers in region-encoding-form index, the size of index grows linearly to the source document. The algorithms based on region encoding perform worse when the source document grows large. In this paper, we address the problem by putting forward a novel index structure, called Clustered Absolute Path Index (CAPI for brief). This index can extremely reduce the size of index and grows slowly as the source document grows large. Based on CAPI, we design novel join algorithms, called Path-Match to process queries without branches, Branch-Filter and RelatedPath-Join to process queries with branches. Experimental results show that the proposed algorithms based on CAPI outperform twig join significantly and have good scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Clark, J., De Rose, S. (eds.): XML Path Language (XPath) Version 2.0 – W3C Working Draft (2003)

    Google Scholar 

  2. Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML query language. Technical report, W3C (2002)

    Google Scholar 

  3. Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD Conference, pp. 310–321 (2002)

    Google Scholar 

  4. Jiang, H., et al.: Holistic twig joins on indexed XML documents. In: Proc. of VLDB, pp. 273–284 (2003)

    Google Scholar 

  5. Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with OR-predicates. In: Proc. of SIGMOD Conference, pp. 274–285 (2004)

    Google Scholar 

  6. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB, pp. 361–370 (2001)

    Google Scholar 

  7. Milo, T., Dan Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  8. Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: PODS, pp. 65–76 (2002)

    Google Scholar 

  9. O’Neil, P., et al.: ORDPATHs: Insert-friendly XML node labels. In: SIGMOD, pp. 903–908 (2004)

    Google Scholar 

  10. Chen, Y., Davidson, S.B., Zheng, Y.: BLAS: An efficient XPath processing system. In: Proc. of SIGMOD, pp. 47–58 (2004)

    Google Scholar 

  11. Extensible Markup Language (XML) 1.0, http://www.w3.org/TR/2004/REC-xml-20040204/

  12. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE 2002 (2002)

    Google Scholar 

  13. Qun, C., Lim, A., Ong, K.W.: D(k)-index:An adaptive structural summary for graph-structureddata. In: ACM SIGMOD, San Diego, California, USA, pp. 134–144 (2003)

    Google Scholar 

  14. He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: ICDE 2004 (2004)

    Google Scholar 

  15. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD 2002 (2002)

    Google Scholar 

  16. XMark: The XML-benchmark project (2002), http://monetdb.cwi.nl/xml

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, H., Li, J., Wang, H. (2006). Clustered Absolute Path Index for XML Document: On Efficient Processing of Twig Queries. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds) Advanced Web and Network Technologies, and Applications. APWeb 2006. Lecture Notes in Computer Science, vol 3842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610496_1

Download citation

  • DOI: https://doi.org/10.1007/11610496_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31158-4

  • Online ISBN: 978-3-540-32435-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics