Skip to main content

On the complexity of deriving score functions from examples for problems in molecular biology

  • Conference paper
  • First Online:
Automata, Languages and Programming (ICALP 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1443))

Included in the following conference series:

Abstract

Score functions (potential functions) have been used effectively in many problems in molecular biology. We propose a general method for deriving score functions that are consistent with example data, which yields polynomial time learning algorithms for several important problems in molecular biology (including sequence alignment). On the other hand, we show that deriving a score function for some problems (multiple alignment and protein threading) is computationally hard. However, we show that approximation algorithms for these optimization problems can also be used for deriving score functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akutsu, T., Miyano, S.: On the approximation of protein threading. Proc. Int. Conf. on Computational Molecular Biology, ACM (1997) 3–8

    Google Scholar 

  2. Akutsu, T., Tashimo, H.: Linear programming based approach to the derivation of a contact potential for protein threading. Proc. Pacific Symp. Biocomputing'98, World Scientific (1998) 413–424

    Google Scholar 

  3. Amaldi, E., Kann, V.: On the approximability of finding maximum feasible subsystems of linear systems. LNCS, Vol. 775 (1994) 521–532

    MATH  MathSciNet  Google Scholar 

  4. Bowie, J. U., Lüthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known three-dimensional structures. Science 253 (1991) 164–170

    Google Scholar 

  5. Dayhoff, M. O., Schwartz, R. M. and Orcutt, B C.: A model of evolutionary change in proteins. Atlas of protein sequence and structure 5 (1978) 345–352

    Google Scholar 

  6. Gusfield, D.: Efficient method for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol. 55 (1993) 141–154

    Article  MATH  Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge Univ. Press (1997)

    Google Scholar 

  8. Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. Algorithmica 12 (1994) 312–326

    Article  MATH  MathSciNet  Google Scholar 

  9. Karmarkar, N. K.: A new polynomial-time algorithm for linear programming. Combinatorica 4 (1984) 373–395

    MATH  MathSciNet  Google Scholar 

  10. Kyte, J., Doolittle, R. F.: A simple method of displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1982) 105–132

    Article  Google Scholar 

  11. Laird, P. D.: Learning from Good and Bad Data. Kluwer Academic Publishers (1988).

    Google Scholar 

  12. Lathrop, R. H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7 (1994) 1059–1068

    Google Scholar 

  13. Lathrop, R. H., Smith, T. F.: Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255 (1996) 641–665

    Article  Google Scholar 

  14. Maiorov, V. N., Crippen, G. M.: Contact potential that recognizes the correct folding of globular proteins. J. Mol. Biol. 277 (1992) 876–888

    Article  Google Scholar 

  15. Middendorf, M.: More on the complexity of common superstring and supersequence problems. Theoretical Computer Science 125 (1994) 205–228

    Article  MATH  MathSciNet  Google Scholar 

  16. Natarajan, B. K.: Machine Learning — A Theoretical Approach. Morgan Kaufmann (1991)

    Google Scholar 

  17. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comp. Biol. 1 (1994) 337–348

    Article  Google Scholar 

  18. Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9 (1981) 133–148

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Kim G. Larsen Sven Skyum Glynn Winskel

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akutsu, T., Yagiura, M. (1998). On the complexity of deriving score functions from examples for problems in molecular biology. In: Larsen, K.G., Skyum, S., Winskel, G. (eds) Automata, Languages and Programming. ICALP 1998. Lecture Notes in Computer Science, vol 1443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0055106

Download citation

  • DOI: https://doi.org/10.1007/BFb0055106

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64781-2

  • Online ISBN: 978-3-540-68681-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics