Longest Common Prefixes with k-Mismatches and Applications

Alamro, Hayam; Ayad, Lorraine A. K.; Charalampopoulos, Panagiotis; Iliopoulos, Costas S.; Pissis, Solon P.

doi:10.1007/978-3-319-73117-9_45

Hayam Alamro¹⁸,
Lorraine A. K. Ayad¹⁸,
Panagiotis Charalampopoulos¹⁸,
Costas S. Iliopoulos¹⁸ &
…
Solon P. Pissis¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10706))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Informatics

1378 Accesses
5 Citations

Abstract

We propose a new algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size \(\sigma \) that occurs elsewhere in the string with Hamming distance at most k. Specifically, we show that the proposed algorithm requires time \(\mathcal {O}(n (\sigma R)^k \log \log n (\log k+ \log \log n))\) on average, where \(R=\lceil (k+2) (\log _{\sigma } n+1) \rceil \), and space \(\mathcal {O}(n)\). This improves upon the state-of-the-art average-case time complexity for the case when \(k=1\) [23] by a factor of \(\log n / \log ^3 \log n\). In addition, we show how the proposed technique can be adapted and applied in order to compute the longest previous factors under the Hamming distance model within the same complexities. In terms of real-world applications, we show that our technique can be directly applied to the problem of genome mappability.

P. Charalampopoulos—Supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London and by the A.G. Leventis Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2(1), 53–86 (2004)
Article MathSciNet MATH Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P., Radoszewski, J., Sung, W.-K.: Faster algorithms for 1-mappability of a sequence. In: COCOA. LNCS, vol. 10628, pp. 109–121. Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-71147-8_8
Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algorrithms 3(2), 19 (2007)
Article MathSciNet MATH Google Scholar
Antoniou, P., Daykin, J.W., Iliopoulos, C.S., Kourie, D., Mouchard, L., Pissis, S.P.: Mapping uniquely occurring short sequences derived from high throughput technologies to a reference genome. In: ITAB, pp. 1–4. IEEE Computer Society (2009)
Google Scholar
Barthet, M., Plumbley, M.D., Kachkaev, A., Dykes, J., Wolff, D., Weyde, T.: Big chord data extraction and mining. In: CIM (2014)
Google Scholar
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000). https://doi.org/10.1007/10719839_9
Chapter Google Scholar
Bufe, C.: Understandable Guide to Music Theory: The Most Useful Aspects of Theory for Rock, Jazz, and Blues Musicians. See Sharp Press, Tucson (1994)
Google Scholar
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC 2004, pp. 91–100. ACM (2004)
Google Scholar
Crochemore, M., Ilie, L., Iliopoulos, C.S., Kubica, M., Rytter, W., Waleń, T.: Computing the longest previous factor. Eur. J. Comb. 34(1), 15–26 (2013)
Article MathSciNet MATH Google Scholar
Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel Ziv factorization. In: DCC, pp. 482–488. IEEE Computer Society (2008)
Google Scholar
Derrien, T., Estellé, J., Sola, S.M., Knowles, D., Raineri, E., Guigó, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012)
Article Google Scholar
Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22300-6_32
Chapter Google Scholar
Fischer, J., Köppl, D., Kurpicz, F.: On the benefit of merging suffix array intervals for parallel pattern matching. In: CPM 2016. LIPIcs, vol. 54, pp. 26:1–26:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)
Google Scholar
Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012)
Article Google Scholar
Grabowski, S.: A note on the longest common substring with \(k\)-mismatches problem. Inf. Process. Lett. 115(6–8), 640–642 (2015)
Article MathSciNet MATH Google Scholar
Kärkkäinen, J., Kempa, D.: Faster external memory LCP array construction. In: ESA. LIPIcs, vol. 57, pp. 61:1–61:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)
Google Scholar
Karlin, S., Ghandour, G., Ost, F., Tavare, S., Korn, L.J.: New approaches for computer analysis of nucleic acid sequences. Proc. Natl. Acad. Sci. U.S.A. 80(18), 5660–5664 (1983)
Article MATH Google Scholar
Khmelev, D.V., Teahan, W.J.: A repetition based measure for verification of text collections and for text categorization. In: ACM SIGIR 2003, pp. 104–110. ACM (2003)
Google Scholar
Kolpakov, R., Bana, G., Kucherov, G.: MREPS: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31(13), 3672–3678 (2003)
Article Google Scholar
Liang, K.-H.: Bioinformatics for Biomedical Science and Clinical Applications. Woodhead Publishing Series in Biomedicine. Woodhead Publishing, Cambridge (2013)
Book Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MathSciNet MATH Google Scholar
Manzini, G.: Longest common prefix with mismatches. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds.) SPIRE 2015. LNCS, vol. 9309, pp. 299–310. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23826-5_29
Chapter Google Scholar
Médigue, C., Rose, M., Viari, A., Danchin, A.: Detecting and analyzing DNA sequencing errors: toward a higher quality of the bacillus subtilis genome sequence. Genome Res. 9(11), 1116–1127 (1999)
Article Google Scholar
Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)
Article Google Scholar
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC, pp. 193–202. IEEE (2009)
Google Scholar
Smit, A.F.A.: Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9(6), 657–663 (1999)
Article Google Scholar
Thankachan, S.V., Apostolico, A., Aluru, S.: A provably efficient algorithm for the k-mismatch average common substring problem. J. Comput. Biol. 23(6), 472–482 (2016)
Article MathSciNet Google Scholar
Thankachan, S.V., Chockalingam, S.P., Liu, Y., Apostolico, A., Aluru, S.: ALFRED: a practical method for alignment-free distance computation. J. Comput. Biol. 23(6), 452–460 (2016)
Article MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: SWAT 1973, pp. 1–11. IEEE Computer Society (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, King’s College London, London, UK
Hayam Alamro, Lorraine A. K. Ayad, Panagiotis Charalampopoulos, Costas S. Iliopoulos & Solon P. Pissis

Authors

Hayam Alamro
View author publications
You can also search for this author in PubMed Google Scholar
Lorraine A. K. Ayad
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Charalampopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Solon P. Pissis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Charalampopoulos .

Editor information

Editors and Affiliations

Vienna University of Technology , Vienna, Austria
A Min Tjoa
ISAE-ENSMA, Chasseneuil-du-Poitou, France
Ladjel Bellatreche
Vienna University of Technology, Vienna, Austria
Stefan Biffl
Utrecht University, Utrecht, The Netherlands
Jan van Leeuwen
Academy of Sciences, Prague, Czech Republic
Jiří Wiedermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alamro, H., Ayad, L.A.K., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P. (2018). Longest Common Prefixes with k-Mismatches and Applications. In: Tjoa, A., Bellatreche, L., Biffl, S., van Leeuwen, J., Wiedermann, J. (eds) SOFSEM 2018: Theory and Practice of Computer Science. SOFSEM 2018. Lecture Notes in Computer Science(), vol 10706. Edizioni della Normale, Cham. https://doi.org/10.1007/978-3-319-73117-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-73117-9_45
Published: 22 December 2017
Publisher Name: Edizioni della Normale, Cham
Print ISBN: 978-3-319-73116-2
Online ISBN: 978-3-319-73117-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics