Skip to main content

An Efficient Path Index for Querying Semi-structured Data

(Extended Abstract)

  • Conference paper
  • First Online:
  • 536 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2642))

Abstract

The richness of semi-structured data allows data of varied and inconsistent structures to be stored in a single database. Such data can be represented as a graph, and queries can be constructed using path expressions, which describe traversals through the graph.

Instead of providing optimal performance for a limited range of path expressions, we propose a mechanism which is shown to have consistent and high performance for path expressions of any complexity, including those with descendant operators (path wildcards). We further detail mechanisms which employ our index to perform more complex processing, such as evaluating both path expressions containing links and entire (sub) queries containing path based predicates. Performance is shown to be independent of the number of terms in the path expression(s), even where these expressions contain wildcards. Experiments show that our index is faster than conventional methods by up to two orders of magnitude for certain query types, is compact, and scales well.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul. Querying semi-structured data. In ICDT, 1997.

    Google Scholar 

  2. M. Barg and R. K. Wong. Fast and versatile path index for querying semi-structured data. Full paper. Technical Report 0209, University of NSW, 2002. Available at: ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/0209.ps.Z.

  3. M. Barg and R.K. Wong. Structural proximity searching for large collections of semi-structured data. In ACM CIKM, 2001.

    Google Scholar 

  4. M. Barg and R.K. Wong. A fast and versatile path index for querying semi-structured data. In 8th Intl. Conf. on Database Systems for Advanced Applications (DASFAA’03), Kyoto, Japan, March 2003.

    Google Scholar 

  5. N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal xml pattern matching. In SIGMOD, 2002.

    Google Scholar 

  6. S. Chien, V. Tsotras, C. Zaniolo, and D. Zhang. Efficient complex query support for multiversion XML documents. In EDBT, 2002.

    Google Scholar 

  7. B. Cooper, N. Sample, M. Franklin, G. Hjaltason, and M. Shadmon. A fast index for semi-structured data. In VLDB, 2001.

    Google Scholar 

  8. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997.

    Google Scholar 

  9. T. Grust. Accelerating xpath location steps. In SIGMOD, 2002.

    Google Scholar 

  10. R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In SIGMOD, 2002.

    Google Scholar 

  11. Q. Li and B. Moon. Indexing and querying xml data for regular path expressions. In VLDB, 2001.

    Google Scholar 

  12. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A database management system for semistructured data. In SIGMOD, 1997.

    Google Scholar 

  13. University of New South Wales. The Soda2 project. http://www.cse.unsw.edu.au/soda/.

  14. J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. DeWitt, and J. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In VLDB, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barg, M., Wong, R.K., Lam, F. (2003). An Efficient Path Index for Querying Semi-structured Data. In: Zhou, X., Orlowska, M.E., Zhang, Y. (eds) Web Technologies and Applications. APWeb 2003. Lecture Notes in Computer Science, vol 2642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36901-5_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-36901-5_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-02354-8

  • Online ISBN: 978-3-540-36901-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics