Abstract
Stochastic models are commonly used in bioinformatics, e.g., hidden Markov models for modeling sequence families or stochastic context-free grammars for modeling RNA secondary structure formation. Comparing data is a common task in bioinformatics, and it is thus natural to consider how to compare stochastic models. In this paper we present the first study of the problem of comparing a hidden Markov model and a stochastic context-free grammar. We describe how to compute their co-emission—or collision—probability, i.e., the probability that they independently generate the same sequence. We also consider the related problem of finding a run through a hidden Markov model and derivation in a grammar that generate the same sequence and have maximal joint probability by a generalization of the C YK algorithm for parsing a sequence by a stochastic context-free grammar. We illustrate the methods by an experiment on RNA secondary structures.
Supported by grants from Carlsbergfondet and the Program in Mathematics and Molecular Biology
Partially supported by the IST Programme of the EU under contract number IST-1999-14186 (ALCOM-FT)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K. Asai, S. Hayamizu, and K. Handa. Prediction of protein secondary strucuture by the hidden markov model. Computer Applications in the Biosciences (CABIOS), 9:141–146, 1993.
J. K. Baker. Trainable grammars for speech recognition. In Speech Communications Papers for the 97th Meeting of the Acoustical Society of America, pages 547–550, 1979.
G. A. Churchill. Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical Biology, 51:79–94, 1989.
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990.
R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probalistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
J. Hℴastad, S. Phillips, and S. Safra. A well characterized approximation problem. Information Processing Letters, 47(6):301–305, 1993.
B. Knudsen and J. Hein. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics, 15:446–454, 1999.
A. Krogh. Two methods for improving performance of an HMM and their application for gene finding. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 179–186, 1997.
A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, 1994.
R. B. Lyngsø, C. N. S. Pedersen, and H. Nielsen. Metrics and similarity measures for hidden Markov models. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 178–186, 1999.
J. S. McCaskill. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29:1105–1119, 1990.
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77, pages 257–286, 1989.
E. Rivas and S. R. Eddy. The language of RNA: A formal grammar that includes pseudo-knots. Bioinformatics, 16(4):334–340, 2000.
Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjölander, R. C. Underwood, and D. Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research, 22:5112–5120, 1994.
D. B. Searls. The linguistics of DNA. American Scientist, 80(579–591), 1992.
E. L. L. Sonnhammer, G. von Heijne, and A. Krogh. A hidden Markov model for predicting transmembrane helices in protein sequences. In Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology (ISMB), 1998.
T. A. Sudkamp. Languages and Machines. Computer Science. Addison-Wesley Publishing Company, Inc., 1998.
Y. Uemura, A. Hasegawa, S. Kobayashi, and T. Yokomori. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science, 210:277–303, 1999.
M. Zuker. On finding all suboptimal foldings of an RNA molecule. Science, 244:48–52, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jagota, A., Lyngsø, R.B., Pedersen, C.N.S. (2001). Comparing a Hidden Markov Model and a Stochastic Context-Free Grammar. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_6
Download citation
DOI: https://doi.org/10.1007/3-540-44696-6_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive