Abstract
Information on the Web like HTML documents with images, video, and sound is a collection of heterogeneous data. HTML documents are semistructured in nature. Semistructured data are used to describe those structures which are less rigid or regular than those data found in standard database systems. This study presents a novel means of using Patricia Tree [14] to index semistructured data. This index is used by transferring the query into a regular expression and querying the regular expression over the Patricia Tree. The highlight of this approach is supporting query on content and structure simultaneously, while also supporting fast query time on long path and regular expressions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abiteboul. Querying Semistructured data. Proceedings of the International Conference on Database Theory, pages 1–18, Delphi, Greece, January (1997).
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68–88, April (1997).
R. Baeza-Yates and G. Gonnet. Fast Text Searching for Regular Expressions or Automaton Simulation over Tries. Journal of ACM, 43(6) November (1996), 915–936.
R. Baeza-Yates and G. Gonnet. A Faster Algorithm for Approximate String Matching. Combinatorial Pattern Matching (CPM’96), Irvine, CA, LNCS 1075, Jun (1996), 1–23.
M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In Proceedings of the Fourteenth International Conference on Data Engineering, Orlando, Florida, February (1998).
G.H Gonnet. Examples of PAT applied to the Oxford G. Navarro. A language for queries on structure and contents of textual databases. Master’s thesis, Dept. of Computer Science, Univ. of Chile, April (1995).
G.H Gonnet, R.A. Baeza-Yates, and T. Snider New Indices for Text:Pat Trees and Pat Arrays, in Information Retrieval Data structures and Algorithms. In Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, Newjersey (1992).
G.H Gonnet, R.A. Baeza-Yates, and T. Snider. Examples of PAT applied to the Oxford English Dictionary. Technical Report OED-87-02, UW Centre for the New OED and Text Research, Univ. of Waterloo, (1987).
R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23th International Conference on Very Large Data Bases, 436–445, Athens, Greece, August (1997).
D. Knuth The art of Computer Programming. Sorting and Searching. Addison-Wesley, Reading, Mass., (1973).
J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman, Indexing Semistructured Data. Technical Report, Stanford University Database Group, (1998). http://www-db.stanford.edu/pub/papers/semiindexing98.ps.
J. McHugh and J. Widom. Query optimization in semistructured data. Technical report, Stanford University Database Group, (1997). http://www-db.stanford.edu/pub/papers/qo.ps.
J. McHugh and J. Widom. Optimizing Branching Path Expressions Technical report, Stanford University Database Group, (1999). http://www-db.stanford.edu/pub/papers/mp.ps.
D. Morrison. PATRICIA-Practical algorithm to retrieve information coded in alphanumeric. JACM, 15:514–534, (1968).
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proceedings of the 11th International Conference on Data Engineering, pages 251–260, Taipei, Taiwan (1995).
D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructured heterogeneous information. In Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases (DOOD), Singapore, December (1995).
M. Shishibori, M. Okuno, K. Ando and J. Aoe An Efficient Compression Method for Patricia Tries. (1997) IEEE International Conference on Systems, Man, and Cybernetics.
M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company. (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, LC., Horng, JT., Liu, BJ., Wang, CY., Chen, GD. (2000). Indexing Semistructured Data Using PATRICIA Tree. In: Ibrahim, M., Küng, J., Revell, N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44469-6_80
Download citation
DOI: https://doi.org/10.1007/3-540-44469-6_80
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67978-3
Online ISBN: 978-3-540-44469-5
eBook Packages: Springer Book Archive