SEAL: a divide-and-conquer approach for sequence alignment

Kandadi, Harini; Aygün, Ramazan Savas

doi:10.1007/s13721-015-0096-z

SEAL: a divide-and-conquer approach for sequence alignment

Original Article
Published: 23 August 2015

Volume 4, article number 25, (2015)
Cite this article

Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Harini Kandadi¹ &
Ramazan Savas Aygün¹

243 Accesses
Explore all metrics

Abstract

Sequence similarity search and sequence alignment methods are fundamental steps in comparative genomics and have a wide spectrum of application in the field of medicine, agriculture, and environment. The dynamic programming sequence alignment methods produce optimal alignments but are impractical for a similarity search due to their large running time. Heuristic methods like BLAST run much faster but may not provide optimal alignments. In this paper, we introduce a novel sequence alignment algorithm, SEAL. SEAL is a parallelizable algorithm that does not require gap penalty parameter as in heuristic methods. SEAL uses a combination of divide-and-conquer paradigm and the maximum contiguous subarray solution. SEAL is also improved by the use of borders in every contiguous segment. The alignment scores obtained by SEAL are consistently higher than those obtained by heuristic methods. Since the dependencies are minimized among intermediate steps, the complexity of SEAL can be reduced to \(\theta \,\left( {\log^{2} n} \right)\) in the presence of satisfactory number of parallel processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bentley Jon (1984) Programming pearls: algorithm design techniques. Commun ACM 25(9):865–871
Article Google Scholar
Choi Y (2012). A fast computation of pairwise sequence alignment scores between a protein and a set of single-locus variants of another protein. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (pp. 414–417). New York, NY, USA: ACM. doi:10.1145/2382936.2382989
Dai D, Li X, Wang C, Zhou X (2012) Cloud based short read mapping service. Cluster Computing (CLUSTER), 2012 IEEE International Conference on, vol., no., pp. 601,604, 24–28
Díaz D, Esteban FJ, Hernández P, Caballero JA, Dorado G, Gálvez S (2011) Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture. Parallel Comput 37(4–5):244–259. doi:10.1016/j.parco.2011.03.003
Huang X, Miller W (1991) A time-efficient, linear-space local similarity algorithm. Adv Appl Math 12(3):337–357. doi:10.1016/0196-8858(91)90017-D
Jones NC, Pevzner P (2004) An introduction to bioinformatics algorithms. MIT Press
Krishnan Arun (2005) GridBLAST: a globus-based high-throughput implementation of BLAST in a Grid computing framework. Concurr Comput Pract Exp 17(13):1607–1623
Article Google Scholar
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595
Article Google Scholar
Li Y, Patel JM, Terrell A (2012) WHAM: a high-throughput sequence alignment method. ACM Trans Database Syst 37(4):28. doi:10.1145/2389241.2389247
Li W, Cowley A, Uludag M, Gur T, McWilliam H, Squizzato S, Lopez R (2015) The EMBL–EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res 43(W1):W580–W584. doi:10.1093/nar/gkv279
Article Google Scholar
Lin H, Ma X, Chandramohan P, Geist A, Samatova N (2005) Efficient data access for parallel BLAST. In: Proceedings of the 19th IEEE international symposium on parallel and distributed processing, IEEE, p 72b, 4–8 Apr 2005. doi:10.1109/IPDPS.2005.190
Lin H et al. (2008) Massively parallel genomic sequence search on the Blue Gene/P architecture, Conference on High Performance Networking and Computing. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, article 33
Mathog D (2003) Parallel BLST on split databases. Bioinformatics 19(4):1865–1866
Article Google Scholar
McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R (2013) Analysis tool web services from the EMBL-EBI. Nucleic Acids Res 41(W1):W597–W600. doi:10.1093/nar/gkt376
Article Google Scholar
O’Driscoll A, Belogrudov V, Carroll J, Kropp K, Walsh P, Ghazal P, Sleator RD (2015) HBLAST: parallelised sequence similarity—a Hadoop MapReducable basic local alignment search tool. J Biomed Inform 54:58–64. doi:10.1016/j.jbi.2015.01.008
Article Google Scholar
Pearson WR (1995) Comparison of methods for searching protein sequence databases. Protein Sci 4:1147–1160
Article Google Scholar
Perumalla K, Deo N (1995) Parallel algorithms for maximum subsequence and maximum subarray. Parallel Process Lett 05(03):367–373
Article Google Scholar
Shpaer EG et al (1996) Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA. Genomics 2:179–191
Article Google Scholar
Soding J (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7):951–960. doi:10.1093/bioinformatics/bti125
Article Google Scholar
Stamm M, Staritzbichler R, Khafizov K, Forrest LR (2014). AlignMe—a membrane protein sequence alignment web server. Nucleic Acids Res 42(W1):W246–W251. doi:10.1093/nar/gku291
Article Google Scholar
Stoye J (1997) Divide-and-conquer multiple sequence alignment. Dissertation thesis, Universität Bielefeld, Forschungsbericht der Technischen Fakultät, Abteilung Informationstechnik
Stoye J (1998) Multiple sequence alignment with the divide-and-conquer method. Gene 211:GC45–GC56
Article Google Scholar
Stoye J, Moulton V, Dress AW (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput Appl Biosci CABIOS 13:625–626
Google Scholar
Sun M, Zhou X, Yang F, Lu K, Dai D (2014) Bwasw-Cloud: efficient sequence alignment algorithm for two big data with MapReduce. In: Applications of Digital Information and Web Technologies (ICADIWT), 2014 Fifth International Conference on the, vol., no., pp. 213,218, 17–19
Tönges U, Perrey SW, Stoye J, Dress AWM (1996) A general method for fast multiple sequence alignment. Gene 172:GC33–GC41. doi:10.1016/0378-1119(96)00123-0
Article Google Scholar
Wang J, Mu Q (2003) SOAP-HT-BLAST: high-throughput BLAST based on Web services. Bioinformatics 19(14):1863–1864
Article Google Scholar
Wang H et al (2003) BLAST++: BLASTing queries in batches. Bioinformatics 19(17):2323–2324
Article Google Scholar
White CT (1991) BioSCAN: a VLSI-based system for biosequence analysis, Computer design: VLSI in computers and processors, ICCD ‘91. In: Proceedings, 1991 IEEE International Conference. 14(16):504–509

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Alabama in Huntsville, Huntsville, Alabama, USA
Harini Kandadi & Ramazan Savas Aygün

Authors

Harini Kandadi
View author publications
You can also search for this author in PubMed Google Scholar
Ramazan Savas Aygün
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramazan Savas Aygün.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kandadi, H., Aygün, R.S. SEAL: a divide-and-conquer approach for sequence alignment. Netw Model Anal Health Inform Bioinforma 4, 25 (2015). https://doi.org/10.1007/s13721-015-0096-z

Download citation

Received: 29 December 2014
Revised: 15 July 2015
Accepted: 02 August 2015
Published: 23 August 2015
DOI: https://doi.org/10.1007/s13721-015-0096-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SEAL: a divide-and-conquer approach for sequence alignment

Abstract

Access this article

Similar content being viewed by others

A Divide-and-Conquer Method for Multiple Sequence Alignment on Multi-core Computers

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Dynamic Programming

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SEAL: a divide-and-conquer approach for sequence alignment

Abstract

Access this article

Similar content being viewed by others

A Divide-and-Conquer Method for Multiple Sequence Alignment on Multi-core Computers

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Dynamic Programming

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation