Abstract
Score functions (potential functions) have been used effectively in many problems in molecular biology. We propose a general method for deriving score functions that are consistent with example data, which yields polynomial time learning algorithms for several important problems in molecular biology (including sequence alignment). On the other hand, we show that deriving a score function for some problems (multiple alignment and protein threading) is computationally hard. However, we show that approximation algorithms for these optimization problems can also be used for deriving score functions.
Preview
Unable to display preview. Download preview PDF.
References
Akutsu, T., Miyano, S.: On the approximation of protein threading. Proc. Int. Conf. on Computational Molecular Biology, ACM (1997) 3–8
Akutsu, T., Tashimo, H.: Linear programming based approach to the derivation of a contact potential for protein threading. Proc. Pacific Symp. Biocomputing'98, World Scientific (1998) 413–424
Amaldi, E., Kann, V.: On the approximability of finding maximum feasible subsystems of linear systems. LNCS, Vol. 775 (1994) 521–532
Bowie, J. U., Lüthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known three-dimensional structures. Science 253 (1991) 164–170
Dayhoff, M. O., Schwartz, R. M. and Orcutt, B C.: A model of evolutionary change in proteins. Atlas of protein sequence and structure 5 (1978) 345–352
Gusfield, D.: Efficient method for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol. 55 (1993) 141–154
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge Univ. Press (1997)
Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. Algorithmica 12 (1994) 312–326
Karmarkar, N. K.: A new polynomial-time algorithm for linear programming. Combinatorica 4 (1984) 373–395
Kyte, J., Doolittle, R. F.: A simple method of displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1982) 105–132
Laird, P. D.: Learning from Good and Bad Data. Kluwer Academic Publishers (1988).
Lathrop, R. H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7 (1994) 1059–1068
Lathrop, R. H., Smith, T. F.: Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255 (1996) 641–665
Maiorov, V. N., Crippen, G. M.: Contact potential that recognizes the correct folding of globular proteins. J. Mol. Biol. 277 (1992) 876–888
Middendorf, M.: More on the complexity of common superstring and supersequence problems. Theoretical Computer Science 125 (1994) 205–228
Natarajan, B. K.: Machine Learning — A Theoretical Approach. Morgan Kaufmann (1991)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comp. Biol. 1 (1994) 337–348
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9 (1981) 133–148
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akutsu, T., Yagiura, M. (1998). On the complexity of deriving score functions from examples for problems in molecular biology. In: Larsen, K.G., Skyum, S., Winskel, G. (eds) Automata, Languages and Programming. ICALP 1998. Lecture Notes in Computer Science, vol 1443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0055106
Download citation
DOI: https://doi.org/10.1007/BFb0055106
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64781-2
Online ISBN: 978-3-540-68681-1
eBook Packages: Springer Book Archive