Abstract
We present a new modular concept to construct organizational models for transcriptional regulatory DNA units. The method requires a training set of at least 10 sequences and a simple initial model (e.g. two characteristic transcription factor binding sites). The final model is generated by computer analysis directly from the sequences. 20 Lentivirus long terminal repeats (LTRs) and an initial model consisting of only two elements (TATA box and polyA signal) resulted in a final model of 10 elements which recognized all of the more than 100 available Lentivirus LTRs while rejecting all other known LTR types. Database searches with this Lentivirus LTR model demonstrated the very high specificity of our method.
Preview
Unable to display preview. Download preview PDF.
References
Dong, S., Searls, D.B.: Gene structure prediction by linguistic methods. Genomics 23 (1994) 540–551
Frech, K., Herrmann, G., Werner, T.: Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids. Nucleic Acids Res. 21 (1993) 1655–1664]
Frech, K., Brack-Werner, R., Werner, T.: Common modular structure of Lentivirus LTRs. Virology 224 (1996) 256–267
Kondrakhin, Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., Milanesi, L.: Eukaryotic promoter recognition by binding sites for transcription factors. Comp. Appl. Biosci. 11 (1995) 477–488
Myers, G., Wain-Hobson, S., Henderson, L.E., Korber, B., Jeang, K.-T., Pavlakis, G.N.: Human retroviruses and AIDS 1994. A compilation and analysis of nucleic acid and amino acid sequences. Database by Los Alamos National Laboratory (1994)
Quandt, K., Frech, K., Herrmann, G., Werner, T.: A consensus match scoring system that is correlated with biological functionality. in Bioinformatics: From Nucleic Acids and Proteins to Cell Metabolism (Eds. D. Schomburg, U. Lessel) (1995a) 47–57
Quandt, K., Frech., K., Karas, H., Wingender, E., Werner, T.: MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23 (1995b) 4878–4884
Quandt, K., Grote, K., Werner, T.: GenomeInspector: Basic software tools for analysis of spatial correlations between genomic structures within megabase sequences. Genomics 33 (1996a) 301–304
Quandt, K., Grote, K., Werner, T.: GenomeInspector: A new approach to detect correlation patterns of elements on genomic sequences. Comp. Appl. Biosci. 12 (1996b) 405–413
Prestridge, D.S.: Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 249 (1995) 923–932
Wolfertstetter, F., Frech, K., Herrmann, G., Werner, T.: Identification of functional elements in unaligned nucleic acids sequences by a novel tuple search algorithm. Comp. Appl. Biosci. 12 (1996) 71–80
Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248 (1995) 1–18
Uberbacher, E.C., Mural, R.J.: Locating Protein-Coding Regions in Human DNA Sequences by a Multiple Sensor Neural Network Approach. Proc. Nad. Acad. Sci. USA 88 (1991) 11261–11265
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frech, K., Quandt, K., Werner, T. (1997). A new method to develop highly specific models for regulatory DNA regions. In: Hofestädt, R., Lengauer, T., Löffler, M., Schomburg, D. (eds) Bioinformatics. GCB 1996. Lecture Notes in Computer Science, vol 1278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0033206
Download citation
DOI: https://doi.org/10.1007/BFb0033206
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63370-9
Online ISBN: 978-3-540-69524-0
eBook Packages: Springer Book Archive