Skip to main content

Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Abstract

We study new probabilistic models for signals in DNA. Our models allow dependencies between multiple non-adjacent positions, in a generative model we call a higher-order tree. Computing the model of maximum likelihood is equivalent in our context to computing a minimum directed spanning hypergraph, a problem we show is NP-complete. We instead compute good models using simple greedy heuristics. In practice, the advantage of using our models over more standard models based on adjacent positions is modest. However, there is a notable improvement in the estimation of the probability that a given position is a signal, which is useful in the context of probabilistic gene finding. We also show that there is little improvement by incorporating multiple signals involved in gene structure into a composite signal model in our framework, though again this gives better estimation of the probability that a site is an acceptor site signal.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, P., Bafna, V.: Detecting non-adjoining correlations within signals in DNA. In: Proceedings of the Second Annual International Conference on Research in Computational Molecular Biology (RECOMB 1998), pp. 2–8. ACM Press, New York (1998)

    Chapter  Google Scholar 

  2. Akutsu, T., Bannai, H., Miyano, S., Ott, S.: On the complexity of deriving position specific score matrices from examples. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 168–177. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Andersen, L.D., Fleischner, H.: The NP-completeness of finding A-trails in Eulerian graphs and of finding spanning trees in hypergraphs. Discrete Applied Mathematics 59, 203–214 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bach, F.R., Jordan, M.I.: Thin junction trees. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of NIPS 2001, pp. 569–576. MIT Press, Cambridge (2001)

    Google Scholar 

  5. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)

    Article  Google Scholar 

  6. Burge, C.B.: Modeling dependencies in pre-mRNA splicing signals. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 129–164. Elsevier, Amsterdam (1998)

    Chapter  Google Scholar 

  7. Cai, D., Delcher, A., Kao, B., Kasif, S.: Modeling splice sites with Bayes networks. Bioinformatics 16(2), 152–158 (2000)

    Article  Google Scholar 

  8. Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3), 462–467 (1968)

    Article  MathSciNet  Google Scholar 

  9. Clark, F., Thanaraj, T.A.: Categorization and characterization of transcriptconfirmed constitutively and alternatively spliced introns and exons from human. Human Molecular Genetics 11(4), 451–454 (2002)

    Article  Google Scholar 

  10. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  11. Dunham, I., et al.: The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999)

    Article  Google Scholar 

  12. Ellrott, K., Yang, C., Sladek, F.M., Jiang, T.: Identifying transcription factor binding sites through Markov chain optimization. In: Proceedings of the European Conference on Computational Biology (ECCB 2002), pp. 100–109 (2002)

    Google Scholar 

  13. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)

    Article  MATH  Google Scholar 

  14. Gallo, G., Longo, G., Pallottino, S., Nguyen, S.: Directed hypergraphs and applications. Discrete Applied Mathematics 42, 177–201 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  15. ILOG Inc. CPLEX optimizer, Computer software (2000)

    Google Scholar 

  16. Karger, D., Srebro, N.: Learning Markov networks: Maximum bounded treewidth graphs. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms (SODA 2001), pp. 392–401. SIAM, Philadelphia (2001)

    Google Scholar 

  17. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)

    Google Scholar 

  18. Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26(2), 544–548 (1998)

    Article  Google Scholar 

  19. Schrijver, A.: Theory of Linear and Integer Programming. Wiley and sons, Chichester (1986)

    MATH  Google Scholar 

  20. Staden, R.: Computer methods to aid the determination and analysis of DNA sequences. Biochemical Society Transactions 12(6), 1005–1008 (1984)

    Google Scholar 

  21. Stormo, G.D., Schneider, T.D., Gold, L.E., Ehrenfeucht, A.: Use of the ’Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)

    Article  Google Scholar 

  22. Zhang, M.Q.: Statistical features of human exons and their flanking regions. Human Molecular Genetics 7(5), 919–932 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brejová, B., Brown, D.G., Vinař, T. (2003). Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39763-2_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20076-5

  • Online ISBN: 978-3-540-39763-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics