Years and Authors of Summarized Original Work
-
2006; Ma, Li, Zhang
Problem Definition
In the 1970s, sequence alignment was introduced to demonstrate the similarity of the sequences of genes and proteins [12]. A DNA sequence is a finite sequence over four nucleotides – adenine, guanine, cytosine, and thymine, whereas a protein sequence is over 20 amino acids. Homologous proteins have similar biological functions. Since they evolve from a common ancestral sequence, the sequences of homologous proteins and their encoding genes are often highly similar. Therefore, the DNA or amino acid sequence of a protein is often aligned with the sequences of well-studied proteins to infer the biological functions of the protein.
Formally, an alignment of two sequences, S and T, on an alphabet \(\mathcal{B}\) is a two-row matrix with the following properties:
- 1.
The letters in Sare listed in order, interspersed with space symbols “–,” in a row, where “–” represents the fact that a letter is missing at...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Brejovà B, Brown D, Vinar̆ T (2004) Optimal spaced seeds for homologous coding regions. J Bioinformatics Comput Biol 1:595–610
Buhler J, Keich U, Sun Y (2004) Designing seeds for similarity search in genomic DNA. J Comput Syst Sci 70:342–363
Choi KP, Zhang LX (2004) Sensitivity analysis and efficient method for identifying optimal spaced seeds. J Comput Syst Sci 68:22–40
Choi KP, Zeng F, Zhang LX (2004) Good spaced seeds for homology search. Bioinformatics 20:1053–1059
Intl Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 409:520–562
Keich U, Li M, Ma B, Tromp J (2004) On spaced seeds for similarity search. Discret Appl Math 3:253–263
Li M, Ma B, Kisman D, Tromp J (2004) PatternHunter II: highly sensitive and fast homology search. J Bioinformatics Comput Biol 2:417–440
Ma B, Yao H (2009) Seed optimization for iid similarities is no easier than optimal Golomb ruler design. Inf Process Lett 109(19):1120–1124
Ma B, Tromp J, Li M (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics 18:440–445
Ma B, Li M (2007) On the complexity of the spaced seeds. J Comput Syst Sci 73:1024–1034
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Smith TF, Waterman MS (1980) Identification of common molecular subsequences. J Mol Biol 147:195–197
Sun Y, Buhler J (2004) Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings RECOMB’04, 2004, San Diego, pp 76–85
Zhang LX (2007) Superiority of spaced seeds for homology search. IEEE/ACM Trans Comput Biol Bioinformatics (TCBB) 4:496–505
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Zhang, L. (2016). Superiority and Complexity of the Spaced Seeds. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_803
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2864-4_803
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering