Abstract
This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of it to the height of digital trees associated with this set of words. A word is defined as a random sequence of (possible infinite) symbols over a finite alphabet. A key notion of alignment matrix {C ij }n i,j=1 is introduced where C ij is the length of the longest string that is prefix of the i-th and the j-th word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of a digital trie under independent model (i.e., all words are statistically independent), is asymptotically equal to 2 logα n where n is the number of words stored in the trie and α is a parameter of the probabilistic model. Some extensions of our basic model to other digital trees such as b-tries, tries with random number of keys (Poisson model) and suffix trees (dependent keys !) are also shortly discussed.
This research was supported in part by NSF grants NCR-8846388 and CCR-8900305.
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).
D. Knuth, The Art of Computer Programming. Sorting and Searching, vol. III, Addison-Wesley (1973).
A. Apostolico, “The Myriad Virtues of Suffix Trees”, Combinatorial Algorithms on Words, 85–96, Springer-Verlag, ASI F12 (1985).
R. Fagin, J. Nievergelt, N. Pippenger and H. Strong, “Extendible Hashing: A Fast Access Method for Dynamic Files”, ACM TODS, 4, 315–344 (1979).
P. Flajolet, “On the Performance Evaluation of Extendible Hashing and Trie Searching”, Acta Informatica, 20, 345–369 (1983).
R. Gallager, Information Theory and Reliable Communications, John Wiley & Sons, New York (1968).
J. Capetanakis, “Tree Algorithms for Packet Broadcast Channels”, IEEE Trans. on Information Theory, IT-25, 505–525 (1979).
IEEE Transaction on Information Theory, IT-31, 2 (1985).
Ph. Jacquet and M. Regnier, “Trie Partitioning Process: Limiting Distributions”, in Lecture Notes in Computer Science, vol. 214, pp. 196–210, Springer Verlag, New York 1986.
L. Devroye, “A Probabilistic Analysis of the Height of Tries and of the Complexity of Trie Sort”, Acta Informatica, 21, 229–232 (1984).
B. Pittel, “Asymptotic Growth of a Class of Random Trees”, The Annalus of Probability, 13, 414–427 (1985).
B. Pittel, “Path in a Random Digital Tree: Limiting Distributions”, Adv. Appl. Probl., 18, 139–155 (1986).
M. Regnier, “On the Average Height of Trees in Digital Searching and Dynamic Hashing”, Inform. Processing Letters, 13, 64–66 (1981).
A. Yao, “A Note on the Analysis of Extendible Hashing”, Inform. Processing Letters, 11, 84–86 (1980).
W. Szpankowski, “On the Analysis of the Average Height of a Digital Trie: Another Approach”, Purdue University CSD TR-646 (1986); revision TR-816 (1988).
A. Apostolico and W. Szpankowski, “Self-Alignments in Words and Their Applications”, Purdue University CSD TR-732 (1987), submitted to a journal.
W. Szpankowski, “Some Results on V-ary Asymmetric Tries”, Journal of Algorithms, 9, 224–244 (1988).
P. Kirschenhofer, H. Prodinger and W. Szpankowski, “On the Variance of the External Path Length in a Symmetric Digital Trie”, Discrete Applied Mathematics, to appear.
H. David, Order Statistics, John Wiley & Sons, New York (1980).
J. Galambos, The Asymptotic Theory of Extreme Order Statistics, John Wiley & Sons, New York (1978).
T. Lai and H. Robbins, “A Class of Dependent Random Variables and Their Maxima”, Z. Wahrscheinlichkeitscheorie, 42, 89–111 (1978).
P. Billingsley, Probability and Measures, John Wiley & Sons, New York (1986).
W. Szpankowski, “(Probably) Optimal Solutions to Some Problems NOT Only on Graphs”, Purdue University CSD TR 780. 1988; revision 1989.
B. Silverman and T.C. Brown, “Short distances, flat triangles and Poisson limits”, J. Appl. Probab., 15, 815–825 (1978).
D. Aldous, Probability Approximations via the Poisson Clumping Heuristic, Springer Verlag, New York 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szpankowski, W. (1989). Digital data structures and order statistics. In: Dehne, F., Sack, J.R., Santoro, N. (eds) Algorithms and Data Structures. WADS 1989. Lecture Notes in Computer Science, vol 382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-51542-9_18
Download citation
DOI: https://doi.org/10.1007/3-540-51542-9_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-51542-5
Online ISBN: 978-3-540-48237-6
eBook Packages: Springer Book Archive