A Method to Design Standard HMMs with Desired Length Distribution for Biological Sequence Analysis

Zhu, Hongmei; Wang, Jiaxin; Yang, Zehong; Song, Yixu

doi:10.1007/11851561_3

Hongmei Zhu²¹,
Jiaxin Wang²¹,
Zehong Yang²¹ &
…
Yixu Song²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4175))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

885 Accesses
4 Citations

Abstract

Motivation: Hidden Markov Models (HMMs) have been widely used for biological sequence analysis. When modeling a phenomenon where for instance the nucleotide distribution does not change for various length of DNA, there are two popular approaches to achieve a desired length distribution: explicit or implicit modeling. The implicit modeling requires an elaborately designed model structure. So far there is no general procedure available for designing such a model structure from the training data automatically.

Results: We present an iterative algorithm to design standard HMMs structure with length distribution from the training data. The basic idea behind this algorithm is to use multiple shifted negative binomial distributions to model empirical length distribution. The negative binomial distribution is obtained by an array of n states, each with the same transition probability to itself. We shift this negative binomial distribution by using a serial of states linearly connected before the binomial model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: probabilistic models of proteins and nucleic acids. Tsinghua University Press, Beijing (2002)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)
Article Google Scholar
Michael, T.J.: Capacity and complexity of hmm duration modeling techniques. IEEE signal processing letters 12(5), 407–410 (2005)
Article MathSciNet Google Scholar
Burge, C.: Identification of genes in human genomic DNA. PhD thesis, CA: Stanford University (1997)
Google Scholar
Krogh, A.: Two methods for improving performance of an hmm and their application for gene-finding. In: Proceedings of the 5th international Conference on Intelligent Systems for Molecular Biology, pp. 179–186. AAAI Press, Menlo Park, CA (1997)
Google Scholar
Yuan, Q., Ouyang, S., Liu, J., Suh, B., Cheung, F., Sultana, R., Lee, D., Quackenbush, J., Buell, C.R.: The tigr rice genome annotation resource: Annotating the rice genome and creating resources for plant biologists. Nucleic Acids Research 31(1), 229–233 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Hongmei Zhu, Jiaxin Wang, Zehong Yang & Yixu Song

Authors

Hongmei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zehong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yixu Song
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ecole Polytechnique Fédérale de Lausanne, Switzerland
Philipp Bücher
Laboratory for Computational Biology and Bioinformatics, EPFL (Ecole Polytechnique Fédérale de Lausanne), Swiss Institute of Bioinformatics, Lausanne, Switzerland
Bernard M. E. Moret

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, H., Wang, J., Yang, Z., Song, Y. (2006). A Method to Design Standard HMMs with Desired Length Distribution for Biological Sequence Analysis. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_3

Download citation

DOI: https://doi.org/10.1007/11851561_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39583-6
Online ISBN: 978-3-540-39584-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics