Skip to main content

Amino Acid Classification and Hash Seeds for Homology Search

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Abstract

Spaced seeds have been extensively studied in the homology search field. A spaced seed can be regarded as a very special type of hash function on k-mers, where two k-mers have the same hash value if and only if they are identical at the w (w < k) positions designated by the seed. Spaced seeds substantially increased the homology search sensitivity. It is then a natural question to ask whether there is a better hash function (called hash seed) that provides better sensitivity than the spaced seed. We study this question in the paper. We propose a strategy to classify amino acids, which leads to a better hash seed. Our results raise a new question about how to design the best hash seed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  CAS  PubMed  Google Scholar 

  2. Brejová, B., Brown, D.G., Vinař, T.: Optimal spaced seeds for hidden markov models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Brejová, B., Brown, D., Vinař, T.: Vector seeds: An extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Brejová, B., Brown, D., Vinař, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1(4), 595–610 (2004); early version appeared in: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)

    Article  Google Scholar 

  5. Brejova, B., Brown, D.G., Vinar, T.: Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005); early version appeared in: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)

    Article  Google Scholar 

  6. Brown, D.: Multiple vector seeds for protein alignment. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 170–181. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Brown, D.: A survey of seeding for sequence alignments. In: Bioinformatics Algorithms: Techniques and Applications, pp. 117–142. Wiley, Chichester (2008)

    Google Scholar 

  8. Brown, D.G., Li, M., Ma, B.: A tutorial of recent developments in the seeding of local alignment. Journal of Bioinformatics and Computational Biology 2(4), 819–842 (2004)

    Article  CAS  PubMed  Google Scholar 

  9. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proc. of the 7th International Conference on Computational Biology (RECOMB), pp. 67–75 (2003)

    Google Scholar 

  10. Choi, K.P., Zeng, F., Zhang, L.: Good spaced seeds for homology search. Bioinformatics 20, 1053–1059 (2004)

    Article  CAS  PubMed  Google Scholar 

  11. Choi, K.P., Zhang, L.: Sensitive analysis and efficient method for identifying optimal spaced seeds. J. Comp and Sys. Sci 68, 22–40 (2004)

    Article  Google Scholar 

  12. Csűrös, M.: Performing local similarity searches with variable length seeds. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 373–387. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Csűrös, M., Ma, B.: Rapid homology search with neighbour seeds. Algorithmica 48(2), 187–202 (2007)

    Article  Google Scholar 

  14. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. of the National Academy of Sciences of the United States of America 89(22), 10915–10919 (1992)

    Article  CAS  Google Scholar 

  15. Huang, X., Hardison, R.C., Miller, W.: A space-efficient algorithm for local similarities. CABIOS 6, 373–381 (1990)

    CAS  PubMed  Google Scholar 

  16. Yang, I.H., Wang, S.H., Chen, H.H., Huang, P.H., Chao, K.M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proc. of IEEE 4th Symp. on Bioinformatics and Bioengineering, pp. 411–418 (2004)

    Google Scholar 

  17. Ilie, L., Ilie, S.: Fast computation of good multiple spaced seeds. In: Proc. of 7th Workshop on Algorithms in Bioinformatics (2007)

    Google Scholar 

  18. Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 3, 253–263 (2004)

    Article  Google Scholar 

  19. Keich, U., Li, M., Ma, B., Tromp, J.: On Spaced Seeds of similarity search. Discrete Appl. Math. 138, 253–263 (2004)

    Article  Google Scholar 

  20. Kucherov, G., Noe, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proc. of the 4th IEEE Symposium on Bioinformatics and Bioengineering (BIBE), pp. 387–394 (2004)

    Google Scholar 

  21. Kucherov, G., Noe, L., Roytberg, M.: Multi-seed lossless filtration. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 297–310. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  22. Lehtinen, O., Sutinen, E., Tarhio, J.: Experiments on block indexing. In: Proc. of the 3rd South American Workshop on String Processing (1996)

    Google Scholar 

  23. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinf. and Comp. Biol. 2(3), 417–440 (2004)

    Article  CAS  Google Scholar 

  24. Li, M., Ma, B., Zhang, L.: Superiority and complexity of the spaced seeds. In: Proc. of the 17th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 22–26 (2006)

    Google Scholar 

  25. Lin, H., Zhang, Z., Zhang, M., Ma, B., Li, M.: Zoom! zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Ma, B., Li, M.: On the complexity of spaced seeds. Journal of Computer Science and System Sciences (2007)

    Google Scholar 

  27. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    Article  CAS  PubMed  Google Scholar 

  28. Nicolas, F., Rivals, E.: Hardness of optimal spaced seed design. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 144–155. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  29. Pevzner, P.A., Waterman, M.S.: Multiple filtration and approximate pattern matching. Algorithmica 13, 135–154 (1995)

    Article  Google Scholar 

  30. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Research 13, 103–107 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  CAS  PubMed  Google Scholar 

  32. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for dna similarity search. In: Proc. of the 8th International Conference on Computational Biology (RECOMB), pp. 76–84 (2004)

    Google Scholar 

  33. Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, W., Ma, B., Zhang, K. (2009). Amino Acid Classification and Hash Seeds for Homology Search. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00727-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00726-2

  • Online ISBN: 978-3-642-00727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics