Abstract
Given two rooted, ordered, and labeled trees P and T the tree inclusion problem is to determine if P can be obtained from T by deleting nodes in T. This problem has recently been recognized as an important query primitive in XML databases. Kilpeläinen and Mannila (SIAM J. of Comp. 1995) presented the first polynomial time algorithm using quadratic time and space. Since then several improved results have been obtained for special cases when P and T have a small number of leaves or small depth. However, in the worst case these algorithms still use quadratic time and space. In this paper we present a new approach to the problem which leads to a new algorithm which uses optimal linear space and has subquadratic running time. Our algorithm improves all previous time and space bounds. Most importantly, the space is improved by a linear factor. This will make it possible to query larger XML databases and speed up the query time since more of the computation can be kept in main memory.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alonso, L., Schott, R.: On the tree inclusion problem. In: Proc. of Math. Foundations of Computer Science, pp. 211–221 (1993)
Alstrup, S., Holm, J., de Lichtenberg, K., Thorup, M.: Minimizing diameters of dynamic trees. In: Proc. of Intl. Coll. on Automata, Languages and Programming (ICALP), pp. 270–280 (1997)
Bille, P., Gørtz, I.: The tree inclusion problem. In: optimal space and faster. Technical Report TR-2005-54, IT University of Copenhagen (January 2005)
Chen, W.: More efficient algorithm for ordered tree inclusion. J. Algorithms 26, 370–385 (1998)
Dietz, P.F.: Fully persistent arrays. In: Proc. of Workshop on Algorithms and Data Structures (WADS), pp. 67–74 (1989)
Hagerup, T., Miltersen, P.B., Pagh, R.: Deterministic dictionaries. J. Algorithms 41(1), 69–85 (2001)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Kilpeläinen, P.: Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Department of Computer Science (1992)
Kilpeläinen, P., Mannila, H.: Retrieval from hierarchical texts by partial patterns. In: Proc. of Conf. on Research and Development in Information Retrieval, pp. 214–222 (1993)
Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comp. 24, 340–356 (1995)
Knuth, D.E.: The Art of Computer Programming, vol. 1. Addison-Wesley, Reading (1969)
Mannila, H., Räihä, K.J.: On query languages for the p-string data model. Information Modelling and Knowledge Bases, 469–482 (1990)
Muthukrishnan, S., Müller, M.: Time and space efficient method-lookup for object-oriented programs. In: Proc. of Symp. on Discrete Algorithms, pp. 42–51 (1996)
Richter, T.: A new algorithm for the ordered tree inclusion problem. In: Proc. of Symp. on Combinatorial Pattern Matching (CPM), pp. 150–166 (1997)
Schlieder, T., Meuss, H.: Querying and ranking XML documents. J. Am. Soc. Inf. Sci. Technol. 53(6), 489–503 (2002)
Schlieder, T., Naumann, F.: Approximate tree embedding for querying XML data. In: Proc. of Workshop On XML and Information Retrieval (2000)
Yang, H., Lee, L., Hsu, W.: Finding hot query patterns over an xquery stream. The VLDB Journal 13(4), 318–332 (2004)
Yang, L.H., Lee, M.L., Hsu, W.: Efficient mining of XML query patterns for caching. In: Proc. of Conference on Very Large Databases (VLDB), pp. 69–80 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bille, P., Li Gørtz, I. (2005). The Tree Inclusion Problem: In Optimal Space and Faster. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds) Automata, Languages and Programming. ICALP 2005. Lecture Notes in Computer Science, vol 3580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11523468_6
Download citation
DOI: https://doi.org/10.1007/11523468_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27580-0
Online ISBN: 978-3-540-31691-6
eBook Packages: Computer ScienceComputer Science (R0)