Abstract
We present a new machine learning approach to the inverse parametric sequence alignment problem: given as training examples a set of correct pairwise global alignments, find the parameter values that make these alignments optimal. We consider the distribution of the scores of all incorrect alignments, then we search for those parameters for which the score of the given alignments is as far as possible from this mean, measured in number of standard deviations. This normalized distance is called the ‘Z-score’ in statistics. We show that the Z-score is a function of the parameters and can be computed with efficient dynamic programs similar to the Needleman-Wunsch algorithm. We also show that maximizing the Z-score boils down to a simple quadratic program. Experimental results demonstrate the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balaji, S., Sujatha, S., Kumar, S.S.C., Srinivasan, N.: PALI: a database of alignments and phylogeny of homologous protein structures. Nucleic Acids Research 29(1), 61–61 (2001)
Eppstein, D.: Setting parameters by example. In: ACM Computing Research Repository. In: 40th IEEE Symp. Foundations of Comp. Sci., pp. 309–318 (1999), SIAM J. Computing 32(3), 643–653 (2003)
Goldberg, M., Breimer, E.: Learning Significant Alignments: An Alternative to Normalized Local Alignment. In: Proceedings of the International Symposium on Methodologies for Intelligent Systems, pp. 37–45 (2002)
Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. Algorithmica 12, 312–326 (1994)
Gusfield, D., Stelling, P.: Parametric and inverse-parametric sequence alignment with XPARAL. Methods in Enzymology 266, 481–494 (1996)
Kececioglu, J., Kim, E.: Simple and fast inverse alignment. In: Proc. of the 10th ACM Conference on Research in Computational Molecular Biology, pp. 441–455 (2006)
Joachims, T., Galor, T., Elber, R.: Learning to Align Sequences: A Maximum-Margin Approach. In: Leimkuhler, B. (ed.) New Algorithms for Macromolecular Simulation. LNCSE, vol. 49, Springer, Heidelberg (2005)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Pachter, L., Sturmfels, B.: Parametric inference for biological sequence analysis. In: Proceedings of the National Academy of Sciences USA, vol. 101(46), pp. 16138–16143 (2004)
Sun, F., Fernandez-Baca, D., Yu, W.: Inverse parametric sequence alignment. Journal of Algorithms 53, 36–54 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ricci, E., de Bie, T., Cristianini, N. (2007). Learning to Align: A Statistical Approach. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-74825-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)