Efficient Distributed Computation of Maximal Exact Matches

Abouelhoda, Mohamed; Seif, Sondos

doi:10.1007/978-3-642-33518-1_26

Efficient Distributed Computation of Maximal Exact Matches

Mohamed Abouelhoda^19,20 &
Sondos Seif²⁰

Conference paper

1403 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7490))

Abstract

Given two long strings S and T, representing two genomic sequences, and given a user defined threshold ℓ, the problem of computing maximal exact matches (MEMs) is to find each triple (p ₁,p ₂,l) specifying two matching substrings S[p ₁..p ₁ + l − 1] = T[p ₂..p ₂ + l − 1], such that l ≥ ℓ and S[p ₁ − 1] ≠ T[p ₂ − 1] and S[p ₁ + l] ≠ T[p ₂ + l]. Computing MEMs is a major problem in bioinformitcs, because it is a primary step in identifying regions of common similarity among genomic sequences. Faster solutions to this problem are still demanded to overcome the ever increasing amount of genomic sequences to be compared to each other. In this paper, we present a parallel version of the MEM algorithm running on a computer cluster. Our experimental results show that our algorithm is efficient and scalable.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The Enhanced Suffix Array and Its Applications to Genome Analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)
Chapter Google Scholar
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
Article MathSciNet MATH Google Scholar
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: CoCoNUT: An efficient system for the comparison and analysis of genomes. BMC Bioinformatics 9, 476 (2008)
Article Google Scholar
Bernal, A., Ear, U., Kyrpide, N.: Genomes OnLine Database (GOLD): A monitor of genome projects world-wide. Nucleic Acids Research 29(1), 126–127 (2001)
Article Google Scholar
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for large-scale genome alignment and comparison. Genome Research 30(11), 2478–2483 (2002)
Google Scholar
Deogen, J.S., Yang, J., Ma, F.: EMAGEN: An efficient approach to multiple genome alignment. In: Proc. of Asia-Pacific Bioinf. Conf., pp. 113–122 (2004)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Google Scholar
Höhl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(suppl. 1), S312–S320 (2002)
Article Google Scholar
Khan, Z., Bloom, J.S., Kruglyak, L., Singh, M.: A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays. Bioinformatics 25(13), 1609–1616 (2009)
Article Google Scholar
Kurtz, S., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5(2), R12+ (2004)
Article Google Scholar
Ohlebusch, E., Gog, S.: Space-efficient genome comparisons with compressed full-text indexes. In: BICoB, pp. 19–24 (2010)
Google Scholar
Ohlebusch, E., Kurtz, S.: Space efficient computation of rare maximal exact matches between multiple sequences. J. Comp. Biol. 15(4) (2008)
Google Scholar
Shibuya, T., Kurochkin, I.: Match Chaining Algorithms for cDNA Mapping. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 462–475. Springer, Heidelberg (2003)
Chapter Google Scholar
Treangen, T.J., Messeguer, X.: M-GCAT: Interactively and efficiently constructing large-scale multiple genome comparison frameworks. BMC Bioinformatics 7, 433 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, Cairo University, Giza, Egypt
Mohamed Abouelhoda
Center for Informatics Sciences, Nile University, Giza, Egypt
Mohamed Abouelhoda & Sondos Seif

Authors

Mohamed Abouelhoda
View author publications
You can also search for this author in PubMed Google Scholar
Sondos Seif
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Institute of Information Systems, Research Group Parallel Computing, Vienna University of Technology / TU Wien, Favoritenstrasse 16, 1040, Vienna / Wien, Austria
Jesper Larsson Träff
Faculty of Computer Science, Research Group Scientific Computing, University of Vienna, Währinger Str. 29/6.21, 1090, Vienna / Wien, Austria
Siegfried Benkner
University of Tennessee, 37996, Knoxville, TN, USA
Jack J. Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abouelhoda, M., Seif, S. (2012). Efficient Distributed Computation of Maximal Exact Matches. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-33518-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics