Skip to main content

Bidirectional Dynamics for Protein Secondary Structure Prediction

  • Chapter
  • First Online:
Sequence Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

Abstract

Connectionist models for learning in sequential domains are typically dynamical systems that use hidden states to store contextual information. In principle, these models can adapt to variable time lags and perform complex sequential mappings. In spite of several successful applications (mostly based on hidden Markov models), the general class of sequence learning problems is still far from being satisfactorily solved. In particular, learning sequential translations is generally a hard task and current models seem to exhibit a number of limitations. One of these limitations, at least for some application domains, is the causality assumption. A dynamical system is said to be causal if the output at (discrete) time t does not depend on future inputs. Causality is easy to justify in dynamics that attempt to model the behavior of many physical systems. Clearly, in these cases the response at time t cannot depend on stimulae that the system has not yet received as input. As it turns out, non-causal dynamics over infinite time horizons cannot be realized by any physical or computational device. For certain categories of finite sequences, however, information from both the past and the future can be very useful for analysis and predictions at time t. This is the case, for example, of DNA and protein sequences where the structure and function of a region in the sequence may strongly depend on events located both upstream and downstream of the region, sometimes at considerable distances. Another good example is provided by the off-line translation of a language into another one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Andrew, F., & Dimitriadis, M. (1994). Forecasting probability densities by using hidden markov models with mixed states. In Weigend, A. S., & Gershenfeld, N. (Eds. ), Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley.

    Google Scholar 

  • Angluin, D., & Smith, C.H. (1983). A survey of inductive inference: Theory and methods. ACM Comput. Surv., 15(3), 237–269.

    Article  MathSciNet  Google Scholar 

  • Bairoch, A., & Apweiler, R. (1999). The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res, pp.49–54.

    Google Scholar 

  • Baldi, P., & Brunak, S. (1998). Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA.

    Google Scholar 

  • Baldi, P., Brunak, S., Chauvin, Y., & Nielsen, H. (1999). Assessing the accuracy of prediction algorithms for classification: an overview. Submitted for publication.

    Google Scholar 

  • Baldi, P., & Chauvin, Y. (1996). Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7), 1541–1565.

    Article  Google Scholar 

  • Baldi, P., Chauvin, Y., Hunkapillar, T., & McClure, M. (1994). Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA, 91, 1059–1063.

    Article  Google Scholar 

  • Bengio, Y., & Frasconi, P. (1995). An input output HMM architecture. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems 7, pp. 427–434. The MIT Press.

    Google Scholar 

  • Bengio, Y., & Frasconi, P. (1996). Input-output HMM’s for sequence processing. IEEE Trans. on Neural Networks, 7(5), 1231–1249.

    Article  Google Scholar 

  • Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Trans. on Neural Networks, 5(2), 157–166.

    Article  Google Scholar 

  • Bengio, Y., & Frasconi, P. (1994). Credit assignment through time: Alternatives to backpropagation. In Cowan, J. D., Tesauro, G., & Alspector, J. (Eds.), Advances in Neural Information Processing Systems, Vol. 6, pp.75–82. Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  • Bengio, Y., LeCun, Y., Nohl, C., & Burges, C. (1995). LeRec: A NN/HMM hybrid for on-line handwriting recognition. Neural Computation, 7(6), 1289–1303.

    Article  Google Scholar 

  • Bernstein, F. C., & et al. (1977). The protein data bank: A computer based archival file for macromolecular structures. J. Mol. Biol., 112, 535–542.

    Article  Google Scholar 

  • Bridle, J.S. (1989). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In D.S. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2, pp. 211–217. Morgan Kaufmann.

    Google Scholar 

  • Brown, P. (1987). The Acoustic-Modeling problem in Automatic Speech Recognition. Ph.D. thesis, Dept. of Computer Science, Carnegie-Mellon University.

    Google Scholar 

  • Bunke, H., Roth, M., & Schukat-Talamazzini, E. (1995). Off-line Cursive Handwriting Recognition Using Hidden Markov Models. Pattern Recognition, 28(9), 1399–1413.

    Article  Google Scholar 

  • CASP3 (1998). Third community wide experiment on the critical assessment of techniques for protein structure prediction. Unpublished results available in http://predictioncenter.llnl.gov/casp3.

  • Charniak, E. (1993). Statistical Language Learning. MIT Press.

    Google Scholar 

  • Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 34, 508–519.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B, 39, 1–38.

    MATH  MathSciNet  Google Scholar 

  • Frasconi, P., Gori, M., Maggini, M., & Soda, G. (1996). Representation of finite state automata in recurrent radial basis function networks. Machine Learning, 23, 5–32.

    MATH  Google Scholar 

  • Frasconi, P., Gori, M., & Sperduti, A. (1998). A general framework for adaptive processing of data structures. IEEE Trans. on Neural Networks, 9(5), 768–786.

    Article  Google Scholar 

  • Freitag, D., & McCallum, A. (2000). Information extraction with hmm structures learned by stochastic optimization. In Proc. AAAI.

    Google Scholar 

  • Frishman, D., & Argos, P. (1995). Knowledge-based secondary structure assignment. Proteins, 23, 566–579.

    Article  Google Scholar 

  • Ghahramani, Z., & Jordan, M.I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245–274.

    Article  MATH  Google Scholar 

  • Giles, C. L., Miller, C. B., Chen, D., Chen, H. H., Sun, G. Z., & Lee, Y. C. (1992). Learning and extracting finite state automata with second-order recurrent neural networks. Neural Computation, 4(3), 393–405.

    Article  Google Scholar 

  • Goller, C., & Kuechler, A. (1996). Learning task-dependent distributed structure-representations by backpropagation through structure. In IEEE International Conference on Neural Networks, pp. 347–352.

    Google Scholar 

  • Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12, 993–1001.

    Article  Google Scholar 

  • Heckerman, D. (1997). Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1(1), 79–119.

    Article  Google Scholar 

  • Hobohm, U., Scharf, M., Schneider, R., & Sander, C. (1992). Selection of representative data sets. Prot. Sci., 1, 409–417.

    Article  Google Scholar 

  • Jelinek, F. (1997). Statistical Methods for Speech Recognition. MIT Press.

    Google Scholar 

  • Jensen, F. V., Lauritzen, S. L., & Olosen, K. G. (1990). Bayesian updating in recursive graphical models by local computations. Comput. Stat. Quarterly, 4, 269–282.

    Google Scholar 

  • Jones, D. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, pp.195–202.

    Google Scholar 

  • Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637.

    Article  Google Scholar 

  • Krogh, A., Brown, M., Mian, I. S., Sjolander, K., & Haussler, D. (1994). Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol., pp.1501–1531.

    Google Scholar 

  • Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems 7, pp.231–238. The MIT Press.

    Google Scholar 

  • Lin, T., Horne, B. G., & Giles, C. L. (1998). How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies. Neural Networks, 11(5), 861–868.

    Article  Google Scholar 

  • Lin, T., Horne, B. G., Tino, P., & Giles, C. L. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, 7(6), 1329–1338.

    Article  Google Scholar 

  • Lucke, H. (1995). Bayesian belief networks as a tool for stochastic parsing. Speech Communication, 16, 89–118.

    Article  Google Scholar 

  • Moult, J., & et al. (1997). Critical assessment of methods of protein structure prediction (CASP): Round II. Proteins, 29(S1), 2–6. Supplement 1.

    Article  Google Scholar 

  • Myers, E.W., & Miller, W. (1988). Optimal alignments in linear space. Comput. Appl. Biosci., 4, 11–7.

    Google Scholar 

  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

    Google Scholar 

  • Pearson, W. R. (1990). Rapid and sensitive sequence comparison with FASTP and FASTA. Meth. Enzymol., pp. 63–98.

    Google Scholar 

  • Qian, N., & Sejnowski, T. J. (1988). Predicting the secondary structure of glubular proteins using neural network models. J. Mol. Biol., 202, 865–884.

    Article  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  • Richards, F.M., & Kundrot, C. E. (1988). Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins, 3, 71–84.

    Article  Google Scholar 

  • Riis, S. K., & Krogh, A. (1996). Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J. Comput. Biol., 3, 163–183.

    Article  Google Scholar 

  • Rost, B., & Sander, C. (1993a). Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. USA, 90(16), 7558–7562.

    Article  Google Scholar 

  • Rost, B., & Sander, C. (1993b). Prediction of protein secondary structure at better than 70 % accuracy. J. Mol. Biol., 232(2), 584–599.

    Article  Google Scholar 

  • Rost, B., & Sander, C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, pp. 55–72.

    Google Scholar 

  • Schneider, R., de Daruvar, A., & Sander, C. (1997). The hssp database of protein structure-sequence alignments. Nucleic Acids Research, 25, 226–230.

    Article  Google Scholar 

  • Smyth, P., Heckerman, D., & Jordan, M. I. (1997). Probabilistic independence networks for hidden markov probability models. Neural Computation, 9(2), 227–269.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Baldi, P., Brunak, S., Frasconi, P., Pollastri, G., Soda, G. (2000). Bidirectional Dynamics for Protein Secondary Structure Prediction. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-44565-X_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41597-8

  • Online ISBN: 978-3-540-44565-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics