Skip to main content

Longest Common Prefixes with k-Mismatches and Applications

  • Conference paper
  • First Online:
SOFSEM 2018: Theory and Practice of Computer Science (SOFSEM 2018)

Abstract

We propose a new algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size \(\sigma \) that occurs elsewhere in the string with Hamming distance at most k. Specifically, we show that the proposed algorithm requires time \(\mathcal {O}(n (\sigma R)^k \log \log n (\log k+ \log \log n))\) on average, where \(R=\lceil (k+2) (\log _{\sigma } n+1) \rceil \), and space \(\mathcal {O}(n)\). This improves upon the state-of-the-art average-case time complexity for the case when \(k=1\) [23] by a factor of \(\log n / \log ^3 \log n\). In addition, we show how the proposed technique can be adapted and applied in order to compute the longest previous factors under the Hamming distance model within the same complexities. In terms of real-world applications, we show that our technique can be directly applied to the problem of genome mappability.

P. Charalampopoulos—Supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London and by the A.G. Leventis Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2(1), 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  3. Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P., Radoszewski, J., Sung, W.-K.: Faster algorithms for 1-mappability of a sequence. In: COCOA. LNCS, vol. 10628, pp. 109–121. Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-71147-8_8

  4. Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algorrithms 3(2), 19 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Antoniou, P., Daykin, J.W., Iliopoulos, C.S., Kourie, D., Mouchard, L., Pissis, S.P.: Mapping uniquely occurring short sequences derived from high throughput technologies to a reference genome. In: ITAB, pp. 1–4. IEEE Computer Society (2009)

    Google Scholar 

  6. Barthet, M., Plumbley, M.D., Kachkaev, A., Dykes, J., Wolff, D., Weyde, T.: Big chord data extraction and mining. In: CIM (2014)

    Google Scholar 

  7. Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000). https://doi.org/10.1007/10719839_9

    Chapter  Google Scholar 

  8. Bufe, C.: Understandable Guide to Music Theory: The Most Useful Aspects of Theory for Rock, Jazz, and Blues Musicians. See Sharp Press, Tucson (1994)

    Google Scholar 

  9. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC 2004, pp. 91–100. ACM (2004)

    Google Scholar 

  10. Crochemore, M., Ilie, L., Iliopoulos, C.S., Kubica, M., Rytter, W., Waleń, T.: Computing the longest previous factor. Eur. J. Comb. 34(1), 15–26 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  11. Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel Ziv factorization. In: DCC, pp. 482–488. IEEE Computer Society (2008)

    Google Scholar 

  12. Derrien, T., Estellé, J., Sola, S.M., Knowles, D., Raineri, E., Guigó, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012)

    Article  Google Scholar 

  13. Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22300-6_32

    Chapter  Google Scholar 

  14. Fischer, J., Köppl, D., Kurpicz, F.: On the benefit of merging suffix array intervals for parallel pattern matching. In: CPM 2016. LIPIcs, vol. 54, pp. 26:1–26:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)

    Google Scholar 

  15. Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012)

    Article  Google Scholar 

  16. Grabowski, S.: A note on the longest common substring with \(k\)-mismatches problem. Inf. Process. Lett. 115(6–8), 640–642 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kärkkäinen, J., Kempa, D.: Faster external memory LCP array construction. In: ESA. LIPIcs, vol. 57, pp. 61:1–61:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)

    Google Scholar 

  18. Karlin, S., Ghandour, G., Ost, F., Tavare, S., Korn, L.J.: New approaches for computer analysis of nucleic acid sequences. Proc. Natl. Acad. Sci. U.S.A. 80(18), 5660–5664 (1983)

    Article  MATH  Google Scholar 

  19. Khmelev, D.V., Teahan, W.J.: A repetition based measure for verification of text collections and for text categorization. In: ACM SIGIR 2003, pp. 104–110. ACM (2003)

    Google Scholar 

  20. Kolpakov, R., Bana, G., Kucherov, G.: MREPS: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31(13), 3672–3678 (2003)

    Article  Google Scholar 

  21. Liang, K.-H.: Bioinformatics for Biomedical Science and Clinical Applications. Woodhead Publishing Series in Biomedicine. Woodhead Publishing, Cambridge (2013)

    Book  Google Scholar 

  22. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  23. Manzini, G.: Longest common prefix with mismatches. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds.) SPIRE 2015. LNCS, vol. 9309, pp. 299–310. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23826-5_29

    Chapter  Google Scholar 

  24. Médigue, C., Rose, M., Viari, A., Danchin, A.: Detecting and analyzing DNA sequencing errors: toward a higher quality of the bacillus subtilis genome sequence. Genome Res. 9(11), 1116–1127 (1999)

    Article  Google Scholar 

  25. Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)

    Article  Google Scholar 

  26. Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC, pp. 193–202. IEEE (2009)

    Google Scholar 

  27. Smit, A.F.A.: Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9(6), 657–663 (1999)

    Article  Google Scholar 

  28. Thankachan, S.V., Apostolico, A., Aluru, S.: A provably efficient algorithm for the k-mismatch average common substring problem. J. Comput. Biol. 23(6), 472–482 (2016)

    Article  MathSciNet  Google Scholar 

  29. Thankachan, S.V., Chockalingam, S.P., Liu, Y., Apostolico, A., Aluru, S.: ALFRED: a practical method for alignment-free distance computation. J. Comput. Biol. 23(6), 452–460 (2016)

    Article  MathSciNet  Google Scholar 

  30. Weiner, P.: Linear pattern matching algorithms. In: SWAT 1973, pp. 1–11. IEEE Computer Society (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Charalampopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alamro, H., Ayad, L.A.K., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P. (2018). Longest Common Prefixes with k-Mismatches and Applications. In: Tjoa, A., Bellatreche, L., Biffl, S., van Leeuwen, J., Wiedermann, J. (eds) SOFSEM 2018: Theory and Practice of Computer Science. SOFSEM 2018. Lecture Notes in Computer Science(), vol 10706. Edizioni della Normale, Cham. https://doi.org/10.1007/978-3-319-73117-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73117-9_45

  • Published:

  • Publisher Name: Edizioni della Normale, Cham

  • Print ISBN: 978-3-319-73116-2

  • Online ISBN: 978-3-319-73117-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics