A Fast Longest Common Subsequence Algorithm for Similar Strings

Arslan, Abdullah N.

doi:10.1007/978-3-642-13089-2_7

Abdullah N. Arslan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6031))

Included in the following conference series:

International Conference on Language and Automata Theory and Applications

1090 Accesses
2 Citations

Abstract

The longest common subsequence problem is a very important computational problem for which there are many algorithms. We present a new algorithm for this problem. Let X and Y be any two given strings each of length O(n). We observe that a longest common subsequence can be obtained by using longest common prefixes of suffixes (longest common extensions) of X and Y. The longest common extension problem asks for the longest common prefix of suffixes starting in a given pair of positions in X and Y, respectively. Let e be the number of edit operations, insert, delete, and substitute to change X to Y (i.e. let e be the edit distance between X and Y). Our algorithm visits $O(\min\{en,(1+\sqrt{2})^{2e+1})$ nodes in the edit graph, and for every visited node, performs one longest common extension query. Each of these queries can be answered in constant time if we represent the strings by a suffix tree or a suffix array. These data structures can be created in linear time. We do not assume that the edit distance e is known beforehand, therefore we try values for e starting with e = 1 (without loss of generality X ≠ Y) and double e until our algorithm finds a longest common subsequence. The total time complexity of our algorithm is $O(\min\{en\log{n},n+e(1+\sqrt{2})^{2e+1}\})$. This is a better time complexity result compared to those of existing solutions for the problem when e is small. For example, when $e\leq \frac{1}{3}((\log_{(1+\sqrt{2})}~{n})-1)$ our algorithm finds an optimal solution in time O(n).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

A Polynomial Time Algorithm for a Generalized Longest Common Subsequence Problem

Exact Algorithms for the Bounded Repetition Longest Common Subsequence Problem

References

Apostolico, A., Guerra, C.: The longest common subsequence problem revisited. Algorithmica (2), 315–336 (1987)
Article MATH MathSciNet Google Scholar
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)
Chapter Google Scholar
Bergroth, L., Hakonen, H., Ratia, T.: A survey of longest common subsequence algorithms. In: SPIRE, pp. 39–48 (2000)
Google Scholar
Fischer, J., Heun, V.: Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)
Chapter Google Scholar
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
Ilie, L., Tinta, L.: Practical algorithms for the longest common extension problem. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 302–309. Springer, Heidelberg (2009)
Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest common prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Kuo, S., Cross, G.R.: An algorithm to find the length of the longest common subsequence of two strings. ACM SIGIR Forum 23(3-4), 89–99 (1989)
Article Google Scholar
Masek, W.J., Paterson, M.S.: A faster algorithm for computing string-edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)
Article MATH MathSciNet Google Scholar
Miller, W., Myers, E.W.: A file comparison program. Softw. Pract. Exp. 15(11), 1025–1040 (1985)
Article Google Scholar
Nakatsu, N., Kambayashi, Y., Yajima, S.: A longest common subsequence algorithm suitable for similar texts. Acta Informatica 18, 171–179 (1982)
Article MATH MathSciNet Google Scholar
Ukkonen, E.: Algorithms for approximate string matching. Information and Control 64, 100–118 (1985)
Article MATH MathSciNet Google Scholar
Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the ACM 21(1), 168–173 (1975)
Article Google Scholar
Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) sequence comparison algorithm. Inf. Proc. Lett. 35, 317–323 (1990)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Texas A & M University - Commerce, TX, 75428, USA
Abdullah N. Arslan

Authors

Abdullah N. Arslan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &
Fachbereich IV - Informatik, Universität Trier, Campus II, Behringstraße, 54286, Trier, Germany
Henning Fernau

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arslan, A.N. (2010). A Fast Longest Common Subsequence Algorithm for Similar Strings. In: Dediu, AH., Fernau, H., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-13089-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13088-5
Online ISBN: 978-3-642-13089-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics