Skip to main content

Locating Longest Common Subsequences with Limited Penalty

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Included in the following conference series:

  • 2496 Accesses

Abstract

Locating longest common subsequences is a typical and important problem. The original version of locating longest common subsequences stretches a longer alignment between a query and a database sequence finds all alignments corresponding to the maximal length of common subsequences. However, the original version produces a lot of results, some of which are meaningless in practical applications and rise to a lot of time overhead. In this paper, we firstly define longest common subsequences with limited penalty to compute the longest common subsequences whose penalty values are not larger than a threshold \(\tau \). This helps us to find answers with good locality. We focus on the efficiency of this problem. We propose a basic approach for finding longest common subsequences with limited penalty. We further analyze features of longest common subsequences with limited penalty, and based on it we propose a filter-refine approach to reduce number of candidates. We also adopt suffix array to efficiently generate common substrings, which helps calculating the problem. Experimental results on three real data sets show the effectiveness and efficiency of our algorithms.

This work is partially supported by the NSF of China for Outstanding Young Scholars under grant No. 61322208, the NSF of China under grant Nos. 61272178 and 61572122.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Notice that, the definition of penalty score of LCS is different from edit distance even when \(\alpha =\beta =1\). The edit distance between two strings represents the minimal number of edit operations transforming from one string to another string, which does not guarantee to find an alignment with longest common subsequences as LCS does.

References

  1. Arnold, M., Ohlebusch, E.: Linear time algorithms for generalizations of the longest common substring problem. Algorithmica 60(4), 806–818 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, A Coruña, Spain, pp. 39–48, 27–29 September 2000

    Google Scholar 

  3. Brodal, G.S., Kaligosi, K., Katriel, I., Kutz, M.: Faster algorithms for computing longest common increasing subsequences. In: Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006, Barcelona, Spain, pp. 330–341, 5–7 July 2006

    Google Scholar 

  4. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  5. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003). doi:10.1007/3-540-45061-0_73

    Chapter  Google Scholar 

  6. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, CPM 2001, Jerusalem, Israel, pp. 181–192, 1–4 July 2001

    Google Scholar 

  7. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, CPM 2003, Morelia, Michocán, Mexico, pp. 186–199, 25–27 June 2003

    Google Scholar 

  8. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  9. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  10. Korf, I., Yandell, M., Bedell, J.A.: BLAST - An Essential Guide to the Basic Local Alignment Search Tool. O’Reilly, Sebastopol (2003)

    Google Scholar 

  11. Lam, T.W., Sung, W., Tam, S., Wong, C., Yiu, S.: Compressed indexing and local alignment of DNA. Bioinformatics 24(6), 791–797 (2008)

    Article  Google Scholar 

  12. Levenshtein, V.I.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Inf. Transm. 1(1), 817 (1965)

    MATH  Google Scholar 

  13. Masek, W.J., Paterson, M.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  14. Meek, C., Patel, J.M., Kasetty, S.: OASIS: an online and accurate technique for local-alignment searches on biological sequences. In: VLDB, pp. 910–921 (2003)

    Google Scholar 

  15. Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1(2), 251–266 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  16. Nakatsu, N., Kambayashi, Y., Yajima, S.: A longest common subsequence algorithm suitable for similar text strings. Acta Inf. 18, 171–179 (1982)

    Article  MATH  Google Scholar 

  17. Overill, R.E.: Book review: “time warps, string edits, and macromolecules: the theory and practice of sequence comparison” by David Sankoff and Joseph Kruskal. J. Log. Comput. 11(2), 356 (2001)

    Article  Google Scholar 

  18. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  19. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  20. Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, pp. 1–11, 15–17 October 1973

    Google Scholar 

  21. Yang, X., Liu, H., Wang, B.: ALAE: accelerating local alignment with affine gap exactly in biosequence databases. PVLDB 5(11), 1507–1518 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wang, B., Yang, X., Li, J. (2017). Locating Longest Common Subsequences with Limited Penalty. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55699-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55698-7

  • Online ISBN: 978-3-319-55699-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics