Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency

Brejová, Broňa; Brown, Daniel G.; Vinař, Tomáš

doi:10.1007/978-3-540-39763-2_7

Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency

Broňa Brejová⁹,
Daniel G. Brown⁹ &
Tomáš Vinař⁹

Conference paper

867 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Abstract

We study new probabilistic models for signals in DNA. Our models allow dependencies between multiple non-adjacent positions, in a generative model we call a higher-order tree. Computing the model of maximum likelihood is equivalent in our context to computing a minimum directed spanning hypergraph, a problem we show is NP-complete. We instead compute good models using simple greedy heuristics. In practice, the advantage of using our models over more standard models based on adjacent positions is modest. However, there is a notable improvement in the estimation of the probability that a given position is a signal, which is useful in the context of probabilistic gene finding. We also show that there is little improvement by incorporating multiple signals involved in gene structure into a composite signal model in our framework, though again this gives better estimation of the probability that a site is an acceptor site signal.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, P., Bafna, V.: Detecting non-adjoining correlations within signals in DNA. In: Proceedings of the Second Annual International Conference on Research in Computational Molecular Biology (RECOMB 1998), pp. 2–8. ACM Press, New York (1998)
Chapter Google Scholar
Akutsu, T., Bannai, H., Miyano, S., Ott, S.: On the complexity of deriving position specific score matrices from examples. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 168–177. Springer, Heidelberg (2002)
Chapter Google Scholar
Andersen, L.D., Fleischner, H.: The NP-completeness of finding A-trails in Eulerian graphs and of finding spanning trees in hypergraphs. Discrete Applied Mathematics 59, 203–214 (1995)
Article MATH MathSciNet Google Scholar
Bach, F.R., Jordan, M.I.: Thin junction trees. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of NIPS 2001, pp. 569–576. MIT Press, Cambridge (2001)
Google Scholar
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)
Article Google Scholar
Burge, C.B.: Modeling dependencies in pre-mRNA splicing signals. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 129–164. Elsevier, Amsterdam (1998)
Chapter Google Scholar
Cai, D., Delcher, A., Kao, B., Kasif, S.: Modeling splice sites with Bayes networks. Bioinformatics 16(2), 152–158 (2000)
Article Google Scholar
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3), 462–467 (1968)
Article MathSciNet Google Scholar
Clark, F., Thanaraj, T.A.: Categorization and characterization of transcriptconfirmed constitutively and alternatively spliced introns and exons from human. Human Molecular Genetics 11(4), 451–454 (2002)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Dunham, I., et al.: The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999)
Article Google Scholar
Ellrott, K., Yang, C., Sladek, F.M., Jiang, T.: Identifying transcription factor binding sites through Markov chain optimization. In: Proceedings of the European Conference on Computational Biology (ECCB 2002), pp. 100–109 (2002)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar
Gallo, G., Longo, G., Pallottino, S., Nguyen, S.: Directed hypergraphs and applications. Discrete Applied Mathematics 42, 177–201 (1993)
Article MATH MathSciNet Google Scholar
ILOG Inc. CPLEX optimizer, Computer software (2000)
Google Scholar
Karger, D., Srebro, N.: Learning Markov networks: Maximum bounded treewidth graphs. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms (SODA 2001), pp. 392–401. SIAM, Philadelphia (2001)
Google Scholar
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)
Google Scholar
Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26(2), 544–548 (1998)
Article Google Scholar
Schrijver, A.: Theory of Linear and Integer Programming. Wiley and sons, Chichester (1986)
MATH Google Scholar
Staden, R.: Computer methods to aid the determination and analysis of DNA sequences. Biochemical Society Transactions 12(6), 1005–1008 (1984)
Google Scholar
Stormo, G.D., Schneider, T.D., Gold, L.E., Ehrenfeucht, A.: Use of the ’Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)
Article Google Scholar
Zhang, M.Q.: Statistical features of human exons and their flanking regions. Human Molecular Genetics 7(5), 919–932 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Broňa Brejová, Daniel G. Brown & Tomáš Vinař

Authors

Broňa Brejová
View author publications
You can also search for this author in PubMed Google Scholar
Daniel G. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Vinař
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biomathematical Sciences, The Mount Sinai School of Medicine, 10029-6574, New York, NY
Gary Benson
Institute of Biomedical and Life Sciences, Division of Environmental and Evolutionary Biology, University of Glasgow, G12 8QQ, Glasgow, Scotland
Roderic D. M. Page

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brejová, B., Brown, D.G., Vinař, T. (2003). Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-39763-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics