Skip to main content

HID: An Efficient Path Index for Complex XML Collections with Arbitrary Links

  • Conference paper
Databases in Networked Information Systems (DNIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3433))

Included in the following conference series:

Abstract

The increasing popularity of XML has generated a lot of interest in query processing over graph-structured data. To support efficient evaluation of path expressions structured indexes have been proposed. However, most variants of structures indexes ignore inter- or intra-document references. They assume a tree-like structure of XML-documents. Extending these indexes to work with large XML graphs and to support intra-or inter-document links requires a lot of computing power for the creation process and a lot of space to store the indexes. Moreover, the efficient evaluation of ancestors-descendants queries over arbitrary graphs with long paths is a severe problem. In this paper, we propose a scalable connection index that is based on the concept of 2-hop covers as introduced by Cohen el al. The proposed algorithm for index creation scales down the original graph size substantially. As a result a directed acyclic graph with a smaller number of nodes and edges will emerge. This reduces the number of computing steps required for building the index. Thus, computing time and space will be reduced as well . The index also permits to efficiently evaluate ancestors-descendants relationships. Moreover, the proposed index has a nice property in comparison to most other work; it is optimized for descendants-or-self queries on arbitrary graphs with link relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cooper, B.F., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A fast index for semistructured data. In: VLDB 2001 Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, September 11-14. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  2. Kaplan, H., Milo, T.: Short and simple labels for small distances and other functions. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 246–257. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Barashev, D., et al.: Indexing XML to Support Path Expressions. In: 6th East-European Conference on advances in Databases and Infromation System, ADBIS (2002)

    Google Scholar 

  4. Cohen, E., et al.: Labeling dynamic XML trees. In: Symposium on Principle of Databases (POSD ), pp. 271–281 (2002)

    Google Scholar 

  5. Qun, C., et al.: D(K)-Index: An adaptive Structural Summares for Graph-based Data. In: ACM SIGMOD Int. Conference on Mangement of Data, pp. 134–144 (2003)

    Google Scholar 

  6. Milo, T., Suciu, D.: Index Structures for path expressions. In: 7th International Conference on Database Theory (ICDT), pp. 277–295 (1999)

    Google Scholar 

  7. Cohen, Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. In: Proceedings Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 937–946. ACM Press, New York (2002)

    Google Scholar 

  8. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, 1997, pp. 436–445. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  9. Kaushik, R., et al.: Covering indexes for Branching path queries. In: ACM SIGMOD int. Conference on Management of data, pp. 133–144 (2002)

    Google Scholar 

  10. Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: Franklin, et al. (eds.) [6], pp. 121–132

    Google Scholar 

  11. Schenkel, R., et al.: HOPI: An efficient connection index for complex XML document collections. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 237–255. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Kaushik, R., et al.: Exploiting Local Similarity for Indexing Paths in Graph-Structured Data. In: 18th Int. Conference on Data Engineering, ICDE (2002)

    Google Scholar 

  13. Sayed, Unland, R.: Index-support on XML documents Containing Links. In: IEEE Midwest Symposium on Circuits and System (2003)

    Google Scholar 

  14. Kaplan, H., et al.: A Comparison of labeling schemes for ancestor queries. In: 13th ACM- SIAM Symposium on Discrete algorithms (SODA), pp. 954–963 (2002)

    Google Scholar 

  15. The Mondial Database, http://dbis.informatik.uni-goettingen.de/Mondial/

  16. Abiteboul, S., et al.: Compact labeling schemes for ancestor’s queries. In: 12th ACM- SIAM Symposium on Discrete algorithms (SODA), pp. 547–556 (2001)

    Google Scholar 

  17. Cormen, T.H., et al.: Introduction to algorithms, 2nd edn., ch. 22-23 (2001)

    Google Scholar 

  18. Nuutila, E., Soisalon-Soininen: Efficient Transitive Closure Computation. Technical Report TKO-B113 (1993)

    Google Scholar 

  19. Li, Q., Moon, B.: Indexing and querying XML Data for Regular Path Expressions. In: 27th Int. Conference on Very Large Data Bases (VLDB), pp. 361–370 (2001)

    Google Scholar 

  20. Yoshikawa, M., Amagasa, T.: XRel: A Path-Index Based Approach to Storage and Retrivel XML Documents Using Relational Databases. ACM Transactions on Internet Technology, TOIT (2001)

    Google Scholar 

  21. Tatarinov, S.D., Zhang, C.: Storing and Querying Ordered XML Using a Relational Database System. In: ACM SIGMOD Int. Conference on Management of Data, pp. 204–215 (2002)

    Google Scholar 

  22. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.: On Supporting Containment Queries in Relational Database Mangement System. In: ACM SIGMOD Int. Conference on Management of Data (2001)

    Google Scholar 

  23. Chein, S.-Y., et al.: Efficient Structural Joins on Indexed XML Documents. In: 28th Int. Conference on Very Large Data Bases, VLDB (2002)

    Google Scholar 

  24. XML Linking Language(XLink) Version 1.0, W3C Recommendation (June 27, 2001), http://www.W3.org/TR/xlink

  25. XML Pointer Language (XPointer), W3C Working Draft (August 16, 2002), http://www.w3.org/TR/xptr

  26. The Internet Movie Databse, http://www.imdb.com

  27. The XML bechmark project, http://www.xml-benchmark.org

  28. Abiteboul, S., Bunmen, P., Suciu, D.: Data on the Web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Atlos (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sayed, A., Unland, R. (2005). HID: An Efficient Path Index for Complex XML Collections with Arbitrary Links. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2005. Lecture Notes in Computer Science, vol 3433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31970-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31970-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25361-7

  • Online ISBN: 978-3-540-31970-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics