Skip to main content

Efficient Distributed Computation of Maximal Exact Matches

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7490))

Abstract

Given two long strings S and T, representing two genomic sequences, and given a user defined threshold ℓ, the problem of computing maximal exact matches (MEMs) is to find each triple (p 1,p 2,l) specifying two matching substrings S[p 1..p 1 + l − 1] = T[p 2..p 2 + l − 1], such that l ≥ ℓ and S[p 1 − 1] ≠ T[p 2 − 1] and S[p 1 + l] ≠ T[p 2 + l]. Computing MEMs is a major problem in bioinformitcs, because it is a primary step in identifying regions of common similarity among genomic sequences. Faster solutions to this problem are still demanded to overcome the ever increasing amount of genomic sequences to be compared to each other. In this paper, we present a parallel version of the MEM algorithm running on a computer cluster. Our experimental results show that our algorithm is efficient and scalable.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The Enhanced Suffix Array and Its Applications to Genome Analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: CoCoNUT: An efficient system for the comparison and analysis of genomes. BMC Bioinformatics 9, 476 (2008)

    Article  Google Scholar 

  4. Bernal, A., Ear, U., Kyrpide, N.: Genomes OnLine Database (GOLD): A monitor of genome projects world-wide. Nucleic Acids Research 29(1), 126–127 (2001)

    Article  Google Scholar 

  5. Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for large-scale genome alignment and comparison. Genome Research 30(11), 2478–2483 (2002)

    Google Scholar 

  6. Deogen, J.S., Yang, J., Ma, F.: EMAGEN: An efficient approach to multiple genome alignment. In: Proc. of Asia-Pacific Bioinf. Conf., pp. 113–122 (2004)

    Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)

    Google Scholar 

  8. Höhl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(suppl. 1), S312–S320 (2002)

    Article  Google Scholar 

  9. Khan, Z., Bloom, J.S., Kruglyak, L., Singh, M.: A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays. Bioinformatics 25(13), 1609–1616 (2009)

    Article  Google Scholar 

  10. Kurtz, S., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5(2), R12+ (2004)

    Article  Google Scholar 

  11. Ohlebusch, E., Gog, S.: Space-efficient genome comparisons with compressed full-text indexes. In: BICoB, pp. 19–24 (2010)

    Google Scholar 

  12. Ohlebusch, E., Kurtz, S.: Space efficient computation of rare maximal exact matches between multiple sequences. J. Comp. Biol. 15(4) (2008)

    Google Scholar 

  13. Shibuya, T., Kurochkin, I.: Match Chaining Algorithms for cDNA Mapping. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 462–475. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Treangen, T.J., Messeguer, X.: M-GCAT: Interactively and efficiently constructing large-scale multiple genome comparison frameworks. BMC Bioinformatics 7, 433 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abouelhoda, M., Seif, S. (2012). Efficient Distributed Computation of Maximal Exact Matches. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33518-1_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33517-4

  • Online ISBN: 978-3-642-33518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics