Abstract
We show space-economical algorithms for finding maximal unique matches (MUM’s) between two strings which are important in large scale genome sequence alignment problems. Our algorithms require only O(n) bits (O(n/ log n) words) where n is the total length of the strings. We propose three algorithms for different inputs: When the input is only the strings, their compressed suffix array, or their compressed suffix tree. Their time complexities are O(n log n), O(n logε n) and O(n) respectively, where ε is any constant between 0 and 1. We also show an algorithm to construct the compressed suffix tree from the compressed suffix array using O(n logε n) time and O(n) bits space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg. Alignment of Whole Genomes. Nucleic Acids Research, 27:2369–2376, 1999.
P. Elias. Universal codeword sets and representation of the integers. IEEE Trans. Inform. Theory, IT-21(2):194–203, March 1975.
P. Ferragina and G. Manzini. Opportunistic Data Structures with Applications. In 41st IEEE Symp. on Foundations of Computer Science, pages 390–398, 2000.
R. Grossi and J. S. Vitter. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.
D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In Proc. the 12th Annual Symposium on Combinatorial Pattern Matching (CPM’01), LNCS 2089, pages 181–192, 2001.
S. Kurtz. Reducing the Space Requirement of Suffix Trees. Software-Practice and Experience, 29(13):1149–1171, 1999.
T. W. Lam, K. Sadakane, W. K Sung, and S. M Yiu. working draft.
J. I. Munro and V. Raman. Succinct Representation of Balanced Parentheses and Static Trees. SIAM Journal on Computing, 31(3):762–776, 2001.
J. I. Munro, V. Raman, and S. Srinivasa Rao. Space Efficient Suffix Trees. Journal of Algorithms, 39(2):205–222, May 2001.
K. Sadakane. Compressed Text Databases with Efficient Query Algorithms based on the Compressed Suffix Array. In Proceedings of ISAAC’00, number 1969 in LNCS, pages 410–421, 2000.
K. Sadakane. Succinct Representations of lcp Information and Improvements in the Compressed Suffix Arrays. In Proc. ACM-SIAM SODA 2002, pages 225–232, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hon, WK., Sadakane, K. (2002). Space-Economical Algorithms for Finding Maximal Unique Matches. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_13
Download citation
DOI: https://doi.org/10.1007/3-540-45452-7_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43862-5
Online ISBN: 978-3-540-45452-6
eBook Packages: Springer Book Archive