Skip to main content

Learning Significant Alignments: An Alternative to Normalized Local Alignment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2366))

Abstract

We describe a supervised learning approach to resolve difficulties in finding biologically significant local alignments. It was noticed that the O(n 2) algorithm by Smith-Waterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring shorter, biologically significant ones. Arslan et. al. proposed an O(n 2 log n) algorithm which outputs a normalized local alignment that maximizes the degree of similarity rather than the total similarity score. Given a properly selected normalization parameter, the algorithm can discover significant alignments that would be missed by the Smith-Waterman algorithm. Unfortunately, determining a proper normalization parameter requires repeated executions with different parameter values and expert feedback to determine the usefulness of the alignments. We propose a learning approach that uses existing biologically significant alignments to learn parameters for intelligently processing sub-optimal Smith-Waterman alignments. Our algorithm runs in O(n 2) time and can discover biologically significant alignments without requiring expert feedback to produce meaningful results.

Supported by a grant from Rensselaer Polytechnic Institute.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexandrov, N., Solovyev, V.: Statistical significance of ungapped alignments. Pacific Symp. on Biocomputing (1998) 463–472

    Google Scholar 

  2. Altschul, S., Erickson, B.: Significance levels for biological sequence comparison using nonlinear similarity functions. Bulletin of Mathematical Biology 50 (1988) 77–92

    MATH  Google Scholar 

  3. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped Blast and Psi-Blast: a new generation of protein database search programs. Nucleic Acids Research 25 (1997) 3389–3402

    Article  Google Scholar 

  4. Arslan, A., Egecioglu, Ö., Pevzner, P.: A new approach to sequence comparison: normalized sequence alignment. Proceeding of the Fifth Annual International Conference on Molecular Biology(2001) 2–11

    Google Scholar 

  5. Arslan, A., Egecioglu, Ö.: An efficient uniform-cost normalized edit distance algorithm. 6th Symp. on String Processing and Info. Retrieval(1999) 8–15

    Google Scholar 

  6. Bafna, V., Huson, D.: The conserved exon method of gene finding. Proc. of the 8th Int. Conf. on Intelligent Systems for Molecular Bio. (2000) 3–12

    Google Scholar 

  7. Barton, G.: An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. Computer Applications in the Biosciences 9 (1993) 729–734

    Google Scholar 

  8. Batzoglou, S., Pachter, L., Mesirov, J., Berger, B., Lander, E.: Comparative analysis of mouse and human DNA and application to exon prediction. Proc. of the 4th Annual Int. Conf. on Computational Molecular Biology(2000) 46–53

    Google Scholar 

  9. Dinkelbach, W.: On nonlinear fractional programming. Management Science 13 (1967) 492–498

    MathSciNet  Google Scholar 

  10. Gelfand, M., Mironov, A., Pevzner P.: Gene recognition via spliced sequence align-ment. Proc. Natl. Acad. Sci. USA 93 (1996) 9061–9066

    Article  Google Scholar 

  11. Goad, W., Kanehisa, M.: Pattern recognition in nucleic acid sequences: a general method for finding local homologies and symmetries. Nucleic Acids Research 10 (1982) 247–263

    Article  Google Scholar 

  12. Huang, X., Pevzner, P., Miller, W.: Parametric recomputing in alignment graph. Proc. of the 5th Annual Symp. on Comb. Pat. Matching (1994) 87–101

    Google Scholar 

  13. Oommen, B., Zhang, K.: The normalized string editing problem revisited. IEEE Trans. on PAMI 18 (1996) 669–672

    Google Scholar 

  14. Seller, P.: Pattern recognition in genetic sequences by mismatch density. Bull. of Math. Bio.46 (1984) 501–504

    Google Scholar 

  15. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981) 195–197

    Article  Google Scholar 

  16. Vidal, E., Marzal, A., Aibar, P.: Fast computation of normalized edit distances. IEEE Trans. on PAMI 17 (1995) 899–902

    Google Scholar 

  17. Zhang, Z., Berman, P., Miller, W.: Alignments without low-scoring regions. J. Comput. Biol. 5 (1998) 197–200

    Article  Google Scholar 

  18. Zhang, Z., Berman, P., Wiehe, T., Miller, W.: Post-processing long pairwise alignments. Bioinformatics 15 (1999) 1012–1019

    Article  Google Scholar 

  19. Zuker, M.: Suboptimal sequence alignment in molecular biology: alignment with error analysis. Journal of Molecular Biology221 (1991) 403–420

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Breimer, E., Goldberg, M. (2002). Learning Significant Alignments: An Alternative to Normalized Local Alignment. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-48050-1_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43785-7

  • Online ISBN: 978-3-540-48050-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics