Abstract
Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. We consider in a probabilistic framework a family of generalized suffix trees — called b-suffix trees — built from the first n suffixes of a random word. In this family of trees, a noncompact suffix trees (i.e., such that every edge is labeled by a single symbol) is represented by b= 1, and a compact suffix tree (i.e., without unary nodes) is asymptotically equivalent to b → ∂. Several parameters of b-suffix trees are of interest, namely the typical depth, the depth of insertion, the height, the external path length, and so forth. We establish some results concerning typical, that is, almost sure (a.s.), behavior of these parameters. These findings are used to obtain several insights into certain algorithms on words and universal data compression schemes.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This research was supported in part by NSF Grants CCR-8900305 and INT-8912631, and AFOSR Grant 90-0107, NATO Grant 0057/89, and Grant R01 LM05118 from the National Library of Medicine
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).
A. Apostolico, The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on Words, 85–96, Springer-Verlag, ASI F12 (1985).
A. Apostolico and W. Szpankowski, Self-alignments in Words and Their Applications, J. of Algorithms, 13 (1992), in press.
P. Billingsley, Ergodic Theory and Information, John Wiley & Sons, New York 1965.
A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45 (1989).
W. Chang, E. Lawler, Approximate String Matching in Sublinear Expected Time, Proc. of 1990 FOCS, 116–124 (1990).
L. Devroye, W. Szpankowski and B. Rais, A note of the height of suffix trees, SIAM J. Computing 21, 48–54 (1992).
Z. Galil, K. Park, An Improved Algorithm for Approximate String Matching, SIAM J. Computing, 19, 989–999 (1990)
G.H. Gonnet and R. Baeza-Yates, Handbook of Algorithms and Data Structures, Addison-Wesley, Workingham (1991).
P. Grassberger, Estimating the Information Content of Symbol Sequences and Efficient Codes, IEEE Trans. Information Theory, 35, 669–675 (1991).
L. Guibas and A. W. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, Journal of Combinatorial Theory, Series A, 30, 183–208 (1981).
P. Jacquet and W. Szpankowski, Analysis of Digital Tries with Markovian Dependency, IEEE Trans. Information Theory, 37, 1470–1475 (1991).
P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Tree by String-Ruler Approach, INRIA TR-1106 (1989); also submitted to a journal.
D. Knuth, The Art of Computer Programming. Sorting and Searching, Addison-Wesley (1973).
G.M. Landau and U. Vishkin, Fast String Matching with k Differences, J. Comp. Sys. Sci., 37, 63–78 (1988)
G.M. Landau and U. Vishkin Fast Parallel and Serial Approximate String Matching, J. Algorithms, 10, 157–169 (1989).
A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75–81 (1976).
M. Lothaire, Combinatorics on Words, Addison-Wesley (1982)
E.M. McCreight, A Space Economical Suffix Tree Construction Algorithm, JACM, 23, 262–272 (1976).
B. Pittel, Asymptotic growth of a class of random trees, The Annals of Probability, 18, 414–427 (1985).
M. Rodeh, V. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the ACM, 28, 16–24 (1981).
W. Szpankowski, On the Height of Digital Trees and Related Problems, Algorithmica, 6, 256–277 (1991).
W. Szpankowski, Patricia tries again revisited, Journal of the ACM, 37, 691–711 (1991).
W. Szpankowski, A Typical Behavior of Some Data Compression Schemes, Proc. of Data Compression Conference, pp. 247–256, Snowbirds (1991).
W. Szpankowski, (Un) Expected Behavior of Typical Suffix Trees, Proc. Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 422–431, Orlando 1992.
W. Szpankowski, Suffix Trees Revisited: (Un)Expected Asymptotic Behaviors, Purdue University, CSD-TR-91-063 (1991).
P. Weiner, Linear Pattern Matching Algorithms, Proc. of the 14-th Annual Symposium on Switching and Automata Theory, 111 (1973).
U. Vishkin, Deterministic Sampling — A New Technique for fast Pattern Matching, SIAM J. Computing, 20, 22–40 (1991).
A. Wyner and J. Ziv, Some Asymptotic Properties of the Entropy of a Stationary Ergodic Data Source with Applications to Data Compression, IEEE Trans. Information Theory, 35, 1250–1258 (1989).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szpankowski, W. (1992). Probabilistic analysis of generalized suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_1
Download citation
DOI: https://doi.org/10.1007/3-540-56024-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56024-1
Online ISBN: 978-3-540-47357-2
eBook Packages: Springer Book Archive