Abstract
Given two long strings S and T, representing two genomic sequences, and given a user defined threshold ℓ, the problem of computing maximal exact matches (MEMs) is to find each triple (p 1,p 2,l) specifying two matching substrings S[p 1..p 1 + l − 1] = T[p 2..p 2 + l − 1], such that l ≥ ℓ and S[p 1 − 1] ≠ T[p 2 − 1] and S[p 1 + l] ≠ T[p 2 + l]. Computing MEMs is a major problem in bioinformitcs, because it is a primary step in identifying regions of common similarity among genomic sequences. Faster solutions to this problem are still demanded to overcome the ever increasing amount of genomic sequences to be compared to each other. In this paper, we present a parallel version of the MEM algorithm running on a computer cluster. Our experimental results show that our algorithm is efficient and scalable.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The Enhanced Suffix Array and Its Applications to Genome Analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: CoCoNUT: An efficient system for the comparison and analysis of genomes. BMC Bioinformatics 9, 476 (2008)
Bernal, A., Ear, U., Kyrpide, N.: Genomes OnLine Database (GOLD): A monitor of genome projects world-wide. Nucleic Acids Research 29(1), 126–127 (2001)
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for large-scale genome alignment and comparison. Genome Research 30(11), 2478–2483 (2002)
Deogen, J.S., Yang, J., Ma, F.: EMAGEN: An efficient approach to multiple genome alignment. In: Proc. of Asia-Pacific Bioinf. Conf., pp. 113–122 (2004)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Höhl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(suppl. 1), S312–S320 (2002)
Khan, Z., Bloom, J.S., Kruglyak, L., Singh, M.: A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays. Bioinformatics 25(13), 1609–1616 (2009)
Kurtz, S., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5(2), R12+ (2004)
Ohlebusch, E., Gog, S.: Space-efficient genome comparisons with compressed full-text indexes. In: BICoB, pp. 19–24 (2010)
Ohlebusch, E., Kurtz, S.: Space efficient computation of rare maximal exact matches between multiple sequences. J. Comp. Biol. 15(4) (2008)
Shibuya, T., Kurochkin, I.: Match Chaining Algorithms for cDNA Mapping. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 462–475. Springer, Heidelberg (2003)
Treangen, T.J., Messeguer, X.: M-GCAT: Interactively and efficiently constructing large-scale multiple genome comparison frameworks. BMC Bioinformatics 7, 433 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abouelhoda, M., Seif, S. (2012). Efficient Distributed Computation of Maximal Exact Matches. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-33518-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)