Abstract
The longest common subsequence problem (LCS) and the closest substring problem (CSP) are two models for finding common patterns in strings, and have been studied extensively. Though both LCS and CSP are NP-Hard, they exhibit very different behavior with respect to polynomial time approximation algorithms. While LCS is hard to approximate within n δ for some δ>0, CSP admits a polynomial time approximation scheme. In this paper, we study the longest common rigid subsequence problem (LCRS). This problem shares similarity with both LCS and CSP and has an important application in motif finding in biological sequences. We show that it is NP-hard to approximate LCRS within ratio n δ, for some constant δ>0, where n is the maximum string length. We also show that it is NP-Hard to approximate LCRS within ratio Ω(m), where m is the number of strings.
Similar content being viewed by others
References
Adebiyi, E.F., Kaufmann, M.: Extracting common motifs under the Levenshtein measure: theory and experimentation. In: Proceedings of the Workshop on Algorithms for Bioinformatics (WABI), pp. 140–156 (2002)
Alimonti, P., Kann, V.: Some APX-completeness results for cubic graphs. Theor. Comput. Sci. 237, 123–134 (2000)
Alon, N., Feige, U., Wigderson, A., Zuckerman, D.: Derandomized graph products. Comput. Complex. 5(1), 60–75 (1995)
Alon, N., Spencer, J.: The Probabilistic Method. Wiley, New York (2000)
Arora, S., Lund, C.: In: Hochbaum, D. (ed.) Hardness of Approximations, Approximation Algorithms for NP-hard Problems, pp. 399–446. PWS, Boston (1996)
Cheadle, C., Ivashchenko, Y., South, V., Searfoss, G., French, S., Howk, R., Ricca, G., Jaye, M.: Identification of a src sh3 domain binding motif by screening a random phage display library. J. Biol. Chem. 269(39), 24034–24039 (1994)
Jiang, T., Li, M.: On the approximation of shortest common supersequence and longest common subsequences. SIAM J. Comput. 24(5), 1122–1139 (1995)
Keich, U., Pevzner, P.A.: Finding motifs in the twilight zone. In: Proceedings of the Sixth Annual International Conference on Computational Biology, pp. 195–204 (2002)
Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1), 41–55 (2003). Early version appeared in SODA’99
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing (STOC), Atlanta, May 1999, pp. 473–482 (1999)
Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. J. Comput. Syst. Sci. 65(1), 73–96 (2002). Early version appeared in STOC’99
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002). Early versions appeared in STOC’99 and CPM’00
Ma, B.: A polynomial time approximation scheme for the closest substring problem. In: Giancarlo, R., Sankoff, D. (eds.) Combinatorial Pattern Matching, 11th Annual Symposium CPM, Montreal, Canada, June 21–23 2000. Lecture Notes in Computer Science, vol. 1848, pp. 99–107. Springer, Berlin (2000)
Ma, B., Zhang, K.: On the longest common rigid subsequence problem. In: Apostolico, A., Crochemore, M., Park, K. (eds.) Combinatorial Pattern Matching, 16th Annual Symposium CPM, Jeju Island, Korea, June 19–22 2005. Lecture Notes in Computer Science, vol. 3537, pp. 11–20. Springer, Berlin (2005)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25, 322–336 (1978)
Papadimitriou, C., Yannakakis, M.: Optimization, approximation, and complexity classes. J. Comput. Syst. Sci. 43, 425–440 (1991)
Rajasekaran, S., Balla, S., Huang, C.: Exact algorithms for planted motif challenge problems. In: Proceedings of the 3rd Asia Pacific Bioinformatics Conference, pp. 249–259 (2005)
Rajasekaran, S., Balla, S., Huang, C., Thapar, V., Gryk, M., Maciejewski, M., Schiller, M.: Exact algorithms for motif search. In: Proceedings of the 3rd Asia Pacific Bioinformatics Conference, pp. 239–248 (2005)
Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics 14(1), 55–67 (1998)
Stormo, G., Hartzell, G.W.: Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86(4), 1183–1187 (1999)
Waterman, M., Arratia, R., Galas, D.J.: Pattern recognition in several sequences: consensus and alignment. Bull. Math. Biol. 46(4), 515–527 (1984)
Author information
Authors and Affiliations
Corresponding author
Additional information
Some results in this manuscript were previously presented in CPM’05 conference [14].
Rights and permissions
About this article
Cite this article
Bansal, N., Lewenstein, M., Ma, B. et al. On the Longest Common Rigid Subsequence Problem. Algorithmica 56, 270–280 (2010). https://doi.org/10.1007/s00453-008-9175-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-008-9175-1