Skip to main content

Learning to Align: A Statistical Approach

  • Conference paper
Advances in Intelligent Data Analysis VII (IDA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Included in the following conference series:

Abstract

We present a new machine learning approach to the inverse parametric sequence alignment problem: given as training examples a set of correct pairwise global alignments, find the parameter values that make these alignments optimal. We consider the distribution of the scores of all incorrect alignments, then we search for those parameters for which the score of the given alignments is as far as possible from this mean, measured in number of standard deviations. This normalized distance is called the ‘Z-score’ in statistics. We show that the Z-score is a function of the parameters and can be computed with efficient dynamic programs similar to the Needleman-Wunsch algorithm. We also show that maximizing the Z-score boils down to a simple quadratic program. Experimental results demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balaji, S., Sujatha, S., Kumar, S.S.C., Srinivasan, N.: PALI: a database of alignments and phylogeny of homologous protein structures. Nucleic Acids Research 29(1), 61–61 (2001)

    Article  Google Scholar 

  2. Eppstein, D.: Setting parameters by example. In: ACM Computing Research Repository. In: 40th IEEE Symp. Foundations of Comp. Sci., pp. 309–318 (1999), SIAM J. Computing 32(3), 643–653 (2003)

    Google Scholar 

  3. Goldberg, M., Breimer, E.: Learning Significant Alignments: An Alternative to Normalized Local Alignment. In: Proceedings of the International Symposium on Methodologies for Intelligent Systems, pp. 37–45 (2002)

    Google Scholar 

  4. Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. Algorithmica 12, 312–326 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  5. Gusfield, D., Stelling, P.: Parametric and inverse-parametric sequence alignment with XPARAL. Methods in Enzymology 266, 481–494 (1996)

    Article  Google Scholar 

  6. Kececioglu, J., Kim, E.: Simple and fast inverse alignment. In: Proc. of the 10th ACM Conference on Research in Computational Molecular Biology, pp. 441–455 (2006)

    Google Scholar 

  7. Joachims, T., Galor, T., Elber, R.: Learning to Align Sequences: A Maximum-Margin Approach. In: Leimkuhler, B. (ed.) New Algorithms for Macromolecular Simulation. LNCSE, vol. 49, Springer, Heidelberg (2005)

    Google Scholar 

  8. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  9. Pachter, L., Sturmfels, B.: Parametric inference for biological sequence analysis. In: Proceedings of the National Academy of Sciences USA, vol. 101(46), pp. 16138–16143 (2004)

    Google Scholar 

  10. Sun, F., Fernandez-Baca, D., Yu, W.: Inverse parametric sequence alignment. Journal of Algorithms 53, 36–54 (2004)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ricci, E., de Bie, T., Cristianini, N. (2007). Learning to Align: A Statistical Approach. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74825-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74824-3

  • Online ISBN: 978-3-540-74825-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics