Skip to main content

Characterising DNA/RNA Signals with Crisp Hypermotifs: A Case Study on Core Promoters

  • Conference paper
  • 1361 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4447))

Abstract

A common way to characterise important and conserved signals in nucleotide sequences, such as transcription factor binding sites, is via the use of so-called consensus sequences or consensus patterns. A well-known example is the so-called “TATA-box” commonly found in eukaryotic core promoters. Such patterns are valuablein that they offer an insight into basic molecular biology processes, and can support reasoning regarding the understanding, design and control of these processes. However it is rare for such patterns to be accurate; instead they represent a very approximate characterisation of the signal under study. At the opposite extreme, we may instead characterise such a signal via a neural network, or a high-order Markov model, and so on. These have better sensitivity and specificity, but are unreadable, and consequently unhelpful for conveying an understanding of the underlying molecular biology processes that could support insight or reasoning. We describe a simple pattern language, called crisp hypermotifs (CHMs), that leads to highly readable patterns that can support understanding and reasoning, yet achieve greater sensitivity and specificity than the commonly used approaches to crisply characterise a signal. We use evolutionary computation to discover high-performance CHMs from data, and we argue that CHMs be used in place of classical consensus motifs, and justify that by presenting examples derived from a large dataset of mammalian core promoters. We provide CHM alternatives to the well-known core promoter TATA-box and Initiator patterns that have better sensitivity and specificity than their classical counterparts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Edgington, E.S: Randomisation Testing. Marcel Dekker, New York (1995)

    Google Scholar 

  • Eskin, E., Keich, U., Gelfand, M.S., Pevzner, P.: Genome-wide analysis of bacterial promoter regions. In: Proc. 8th Pac. Symp. Biocomp., Kauai, Hawaii, January 3-7 2003, pp. 29–40. ISCB (2003)

    Google Scholar 

  • Fogel, L.J., Owens, A.J., Walsh, M.J.: Artificial Intelligence Through Simulated Evolution. John Wiley, New York (1966)

    MATH  Google Scholar 

  • Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  • Henderson, J., Salzberg, S., Fasman, K.H.: Finding Genes in DNA with a Hidden Markov Model. Journal of Computational Biology 4(2), 127–142 (1997)

    Article  Google Scholar 

  • Holland, J.H: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)

    Google Scholar 

  • De Jong, K.A.: An analysis of the bevavior of a class of genetic adaptive systems. PhD thesis, University of Michigan (1975)

    Google Scholar 

  • Kanhere, A., Bansal, M.: A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinformatics 6(1) (2005)

    Google Scholar 

  • Matthews, B.W.: Biochim. Biophys. Acta 405, 442–451 (1975)

    Google Scholar 

  • Ohler, U., Niemann, H., Liao, G., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(Suppl. 1), S199–206 (2001)

    Google Scholar 

  • Pridgeon, C., Corne, D.: Novel Discriminatory Patterns for Nucleotide Sequences and their Application to Core Promoter Prediction in Eukaryotes. In: Proc. CIBCB 05, pp. 1–7. IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  • Reese, M.G.: Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56 (2001)

    Article  Google Scholar 

  • Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26(2), 544–548 (1998)

    Article  Google Scholar 

  • Schwefel, H.-P.: Numerical Optimization of Computer Models. John Wiley, Chichester (1981)

    MATH  Google Scholar 

  • Syswerda, G.: A Study of Reproduction in Generational and Steady State Genetic Algorithms. In: FOGA, pp. 94–101 (1990)

    Google Scholar 

  • Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T., Muller, K.-R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elena Marchiori Jason H. Moore Jagath C. Rajapakse

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Pridgeon, C., Corne, D. (2007). Characterising DNA/RNA Signals with Crisp Hypermotifs: A Case Study on Core Promoters. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds) Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics. EvoBIO 2007. Lecture Notes in Computer Science, vol 4447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71783-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71783-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71782-9

  • Online ISBN: 978-3-540-71783-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics