Abstract
The Color Set Size problem is: Given a rooted tree of size n with l leaves colored from 1 to m, m ≤ l, for each vertex u find the number of different leaf colors in the subtree rooted at u. This problem formulation, together with the Generalized Suffix Tree data structure has applications to string matching. This paper gives an optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m. In addition, parallel solutions to the above problems are given. These solutions may shed light on problems in computational biology, such as the multiple string alignment problem.
This work was partially supported by NSF Grant CCR 87-22848, and Department of Energy Grants DE-AC03-76SF00098 and DE-FG03-90ER60999
Preview
Unable to display preview. Download preview PDF.
References
A. Apostolico. The myriad virtues of subword trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series, Series F: Computer and System Sciences, Vol. 12, pages 85–96, Springer-Verlag, Berlin, 1985.
A. Apostolico, C. Iliopoulos, G. M. Landau, B. Schieber, and U. Vishkin. Parallel construction of a suffix tree with applications. Algorithmica, 3:347–365, 1988.
M. Ajtai, J. Komlós, and E. Szemerédi. An O(n log n) sorting network. In Proc. of the 15th ACM Symposium on Theory of Computing, pages 1–9, 1983.
S. Altschul and D. Lipman. Trees, stars, and multiple biological sequence alignment. SIAM Journal on Applied Math, 49:197–209, 1989.
R. Cole. Parallel merge sort. In Proc. 27nd Annual Symposium on the Foundation of Computer Science, pages 511–516, 1986.
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal on Applied Math, 48:1073–1082, 1988.
R. Cole and U. Vishkin. The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time. Algorithmica, 3:329–346, 1988.
D. Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Technical Report CSE-91-4, Computer Science, U. C. Davis, 1991.
T. Hagerup. Towards optimal parallel bucket sorting. Information and Computation, 75:39–51, 1987.
C. Kruskal, L. Rudolph, and M. Snir. The power of parallel prefix. IEEE Trans. Comput. C-34:965–968, 1985.
R. Lo. personal communications. 1991.
L. Ladner and M. Fischer. Parallel prefix computation. J.A.C.M., 27:831–838, 1980.
G. M. Landau and U. Vishkin. Introducing efficient parallelism into approximate string matching and a new serial algorithm. In Proc. of the 18th ACM Symposium on Theory of Computing, pages 220–230, 1986.
H. M. Martinez. An efficient method for finding repeats in molecular sequences. Nucleic Acids Research, 11(13):4629–4634, 1983.
E. M. McCreight. A space-economical suffix tree construction algorithm. J.A.C.M., 23(2):262–272, 1976.
Y. Maon, B. Schieber, and U. Vishkin. Open ear decomposition and s-t numbering in graphs. Theoretical Computer Science, 1987.
V. R. Pratt. Improvements and applications for the weiner repetition finder. 1975. unpublished manuscript.
S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM Journal on Computing, 18:594–607, 1989.
B. Schieber and U. Vishkin. On finding lowest common ancestors: simplification and parallelization. SIAM Journal on Computing, 17:1253–1262, 1988.
R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 14:862–874, 1985.
U. Vishkin. On efficient parallel strong orientation. I.P.L., 20:235–240, 1985.
P. Weiner. Linear pattern matching algorithms. In Proc. 14th IEEE Symp. on Switching and Automata Theory, pages 1–11, 1973.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chi, L., Hui, K. (1992). Color Set Size problem with applications to string matching. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_19
Download citation
DOI: https://doi.org/10.1007/3-540-56024-6_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56024-1
Online ISBN: 978-3-540-47357-2
eBook Packages: Springer Book Archive