Skip to main content

Indexing Semistructured Data Using PATRICIA Tree

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1873))

Included in the following conference series:

  • 1798 Accesses

Abstract

Information on the Web like HTML documents with images, video, and sound is a collection of heterogeneous data. HTML documents are semistructured in nature. Semistructured data are used to describe those structures which are less rigid or regular than those data found in standard database systems. This study presents a novel means of using Patricia Tree [14] to index semistructured data. This index is used by transferring the query into a regular expression and querying the regular expression over the Patricia Tree. The highlight of this approach is supporting query on content and structure simultaneously, while also supporting fast query time on long path and regular expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. S. Abiteboul. Querying Semistructured data. Proceedings of the International Conference on Database Theory, pages 1–18, Delphi, Greece, January (1997).

    Google Scholar 

  2. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68–88, April (1997).

    Article  Google Scholar 

  3. R. Baeza-Yates and G. Gonnet. Fast Text Searching for Regular Expressions or Automaton Simulation over Tries. Journal of ACM, 43(6) November (1996), 915–936.

    Article  MathSciNet  Google Scholar 

  4. R. Baeza-Yates and G. Gonnet. A Faster Algorithm for Approximate String Matching. Combinatorial Pattern Matching (CPM’96), Irvine, CA, LNCS 1075, Jun (1996), 1–23.

    Chapter  Google Scholar 

  5. M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In Proceedings of the Fourteenth International Conference on Data Engineering, Orlando, Florida, February (1998).

    Google Scholar 

  6. G.H Gonnet. Examples of PAT applied to the Oxford G. Navarro. A language for queries on structure and contents of textual databases. Master’s thesis, Dept. of Computer Science, Univ. of Chile, April (1995).

    Google Scholar 

  7. G.H Gonnet, R.A. Baeza-Yates, and T. Snider New Indices for Text:Pat Trees and Pat Arrays, in Information Retrieval Data structures and Algorithms. In Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, Newjersey (1992).

    Google Scholar 

  8. G.H Gonnet, R.A. Baeza-Yates, and T. Snider. Examples of PAT applied to the Oxford English Dictionary. Technical Report OED-87-02, UW Centre for the New OED and Text Research, Univ. of Waterloo, (1987).

    Google Scholar 

  9. R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23th International Conference on Very Large Data Bases, 436–445, Athens, Greece, August (1997).

    Google Scholar 

  10. D. Knuth The art of Computer Programming. Sorting and Searching. Addison-Wesley, Reading, Mass., (1973).

    MATH  Google Scholar 

  11. J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman, Indexing Semistructured Data. Technical Report, Stanford University Database Group, (1998). http://www-db.stanford.edu/pub/papers/semiindexing98.ps.

  12. J. McHugh and J. Widom. Query optimization in semistructured data. Technical report, Stanford University Database Group, (1997). http://www-db.stanford.edu/pub/papers/qo.ps.

  13. J. McHugh and J. Widom. Optimizing Branching Path Expressions Technical report, Stanford University Database Group, (1999). http://www-db.stanford.edu/pub/papers/mp.ps.

  14. D. Morrison. PATRICIA-Practical algorithm to retrieve information coded in alphanumeric. JACM, 15:514–534, (1968).

    Article  Google Scholar 

  15. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proceedings of the 11th International Conference on Data Engineering, pages 251–260, Taipei, Taiwan (1995).

    Google Scholar 

  16. D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructured heterogeneous information. In Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases (DOOD), Singapore, December (1995).

    Google Scholar 

  17. M. Shishibori, M. Okuno, K. Ando and J. Aoe An Efficient Compression Method for Patricia Tries. (1997) IEEE International Conference on Systems, Man, and Cybernetics.

    Google Scholar 

  18. M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company. (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, LC., Horng, JT., Liu, BJ., Wang, CY., Chen, GD. (2000). Indexing Semistructured Data Using PATRICIA Tree. In: Ibrahim, M., Küng, J., Revell, N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44469-6_80

Download citation

  • DOI: https://doi.org/10.1007/3-540-44469-6_80

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67978-3

  • Online ISBN: 978-3-540-44469-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics