Abstract
Spaced seeds have been extensively studied in the homology search field. A spaced seed can be regarded as a very special type of hash function on k-mers, where two k-mers have the same hash value if and only if they are identical at the w (w < k) positions designated by the seed. Spaced seeds substantially increased the homology search sensitivity. It is then a natural question to ask whether there is a better hash function (called hash seed) that provides better sensitivity than the spaced seed. We study this question in the paper. We propose a strategy to classify amino acids, which leads to a better hash seed. Our results raise a new question about how to design the best hash seed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Brejová, B., Brown, D.G., Vinař, T.: Optimal spaced seeds for hidden markov models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)
Brejová, B., Brown, D., Vinař, T.: Vector seeds: An extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Brejová, B., Brown, D., Vinař, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1(4), 595–610 (2004); early version appeared in: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)
Brejova, B., Brown, D.G., Vinar, T.: Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005); early version appeared in: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Brown, D.: Multiple vector seeds for protein alignment. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 170–181. Springer, Heidelberg (2004)
Brown, D.: A survey of seeding for sequence alignments. In: Bioinformatics Algorithms: Techniques and Applications, pp. 117–142. Wiley, Chichester (2008)
Brown, D.G., Li, M., Ma, B.: A tutorial of recent developments in the seeding of local alignment. Journal of Bioinformatics and Computational Biology 2(4), 819–842 (2004)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proc. of the 7th International Conference on Computational Biology (RECOMB), pp. 67–75 (2003)
Choi, K.P., Zeng, F., Zhang, L.: Good spaced seeds for homology search. Bioinformatics 20, 1053–1059 (2004)
Choi, K.P., Zhang, L.: Sensitive analysis and efficient method for identifying optimal spaced seeds. J. Comp and Sys. Sci 68, 22–40 (2004)
Csűrös, M.: Performing local similarity searches with variable length seeds. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 373–387. Springer, Heidelberg (2004)
Csűrös, M., Ma, B.: Rapid homology search with neighbour seeds. Algorithmica 48(2), 187–202 (2007)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. of the National Academy of Sciences of the United States of America 89(22), 10915–10919 (1992)
Huang, X., Hardison, R.C., Miller, W.: A space-efficient algorithm for local similarities. CABIOS 6, 373–381 (1990)
Yang, I.H., Wang, S.H., Chen, H.H., Huang, P.H., Chao, K.M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proc. of IEEE 4th Symp. on Bioinformatics and Bioengineering, pp. 411–418 (2004)
Ilie, L., Ilie, S.: Fast computation of good multiple spaced seeds. In: Proc. of 7th Workshop on Algorithms in Bioinformatics (2007)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 3, 253–263 (2004)
Keich, U., Li, M., Ma, B., Tromp, J.: On Spaced Seeds of similarity search. Discrete Appl. Math. 138, 253–263 (2004)
Kucherov, G., Noe, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proc. of the 4th IEEE Symposium on Bioinformatics and Bioengineering (BIBE), pp. 387–394 (2004)
Kucherov, G., Noe, L., Roytberg, M.: Multi-seed lossless filtration. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 297–310. Springer, Heidelberg (2004)
Lehtinen, O., Sutinen, E., Tarhio, J.: Experiments on block indexing. In: Proc. of the 3rd South American Workshop on String Processing (1996)
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinf. and Comp. Biol. 2(3), 417–440 (2004)
Li, M., Ma, B., Zhang, L.: Superiority and complexity of the spaced seeds. In: Proc. of the 17th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 22–26 (2006)
Lin, H., Zhang, Z., Zhang, M., Ma, B., Li, M.: Zoom! zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)
Ma, B., Li, M.: On the complexity of spaced seeds. Journal of Computer Science and System Sciences (2007)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Nicolas, F., Rivals, E.: Hardness of optimal spaced seed design. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 144–155. Springer, Heidelberg (2005)
Pevzner, P.A., Waterman, M.S.: Multiple filtration and approximate pattern matching. Algorithmica 13, 135–154 (1995)
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Research 13, 103–107 (2003)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for dna similarity search. In: Proc. of the 8th International Conference on Computational Biology (RECOMB), pp. 76–84 (2004)
Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, W., Ma, B., Zhang, K. (2009). Amino Acid Classification and Hash Seeds for Homology Search. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-00727-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00726-2
Online ISBN: 978-3-642-00727-9
eBook Packages: Computer ScienceComputer Science (R0)