Abstract
In structured text databases documents are represented as parse trees, and different tree matching notions can be used as primitives for query languages. Two useful notions of tree matching, tree inclusion and tree pattern matching both seem to require superlinear time. In this paper we give a general sufficient condition for a tree matching problem to be solvable in linear time, and apply it to tree pattern matching and tree inclusion. The application is based on the notion of a nonperiodic parse tree. We argue that most text documents can be modeled in a natural way using grammars yielding nonperiodic parse trees. We show how the knowledge that the target tree is nonperiodic can be used to obtain linear time algorithms for the tree matching problems. We also discuss the preprocessing of patterns for grammatical tree matching.
Work supported by the Academy of Finland
Preview
Unable to display preview. Download preview PDF.
References
A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
F. Bancilhon and P. Richard. Managing texts and facts in a mixed data base environment. In G. Gardarin and E. Gelenbe, editors, New Applications of Data Bases. Academic Press, 1984.
G. Coray, R. Ingold, and C. Vanoirbeek. Formatting structured documents: Batch versus interactive. In J.C. van Vliet, editor, Text Processing and Document Manipulation. Cambridge University Press, 1986.
M. Dubiner, Z. Galil, and E. Magen. Faster tree pattern matching. In Proc. of the Symposium on Foundations of Computer Science (FOCS'90), pages 145–150, 1990.
P. Dublish. Some comments on the subtree isomorphism problem for ordered trees. Information Processing Letters, 36:273–275, 1990.
R. Furuta, V. Quint, and J. André. Interactively editing structured documents. Electronic Publishing, 1(1):19–44, 1988.
G. H. Gonnet and F. Wm. Tompa. Mind your grammar — a new approach to text databases. In Proc. of the Conference on Very Large Data Bases (VLDB'87), pages 339–346, 1987.
R. Grossi. A note on the subtree isomorphism for ordered trees and related problems. Information Processing Letters, 39:81–84, 1991.
C. M. Hoffman and M. J. O'Donnell. Pattern matching in trees. Journal of the ACM, 29(1):68–95, January 1982.
J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.
P. Kilpeläinen, G. Lindén, H. Mannila, and E. Nikunen. A structured document database system. In Richard Furuta, editor, EP90 — Proceedings of the International Conference on Electronic Publishing, Document Manipulation & Typography, The Cambridge Series on Electronic Publishing. Cambridge University Press, 1990.
P. Kilpeläinen and H. Mannila. Ordered and unordered tree inclusion. Report A-1991-4, University of Helsinki, Dept. of Comp. Science, August 1991.
P. Kilpeläinen and H. Mannila. The tree inclusion problem. In Samson Abramsky and T.S.E. Maibaum, editors, TAPSOFT'91, Proc. of the International Joint Conference on the Theory and Practice of Software Development, Vol. 1: Colloqium on Trees in Algebra and Programming (CAAP'91), pages 202–214. Springer-Verlag, 1991.
P. Kilpeläinen and H. Mannila. A query language for structured text databases. Manuscript in preparation, February 1992.
S. R. Kosaraju. Efficient tree pattern matching. In Proc. of the Symposium on Foundations of Computer Science (FOCS'89), pages 178–183, 1989.
E. Mäkinen. On the subtree isomorphism problem for ordered trees. Information Processing Letters, 32:271–273, September 1989.
H. Mannila and K.-J. Räihä. On query languages for the p-string data model. In H. Kangassalo, S. Ohsuga, and H. Jaakkola, editors, Information Modelling and Knowledge Bases, pages 469–482. IOS Press, 1990.
E. Nikunen. Views in structured text databases. Phil.lic. thesis, University of Helsinki, Department of Computer Science, December 1990.
V. Quint and I. Vatton. GRIF: An interactive system for structured document manipulation. In J.C. van Vliet, editor, Proceedings of the International Conference on Text Processing and Document Manipulation. Cambridge University Press, 1986.
S. W. Reyner. An analysis of a good algorithm for the subtree problem. SIAM Journal of Computing, 6(4):730–732, December 1977.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kilpeläinen, P., Mannila, H. (1992). Grammatical tree matching. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_13
Download citation
DOI: https://doi.org/10.1007/3-540-56024-6_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56024-1
Online ISBN: 978-3-540-47357-2
eBook Packages: Springer Book Archive