Abstract
In this paper an application of Sequence Alignment Kernel for recognition of prokaryotic promoters with transcription start sites (TSS) is presented. An algorithm for computing this kernel in square time is described. Using this algorithm, a “promoter map” of E.coli genome has been computed. This is a curve reflecting the likelihood of every base of a given genomic sequence to be a TSS. A viewer showing the likelihood curve with positions of known and putative genes and known TSS has also been developed and made available online.
Although the visual analysis of the promoter regions is very intuitive, we propose an automatic genome-wide promoter prediction scheme that simplifies the routine of checking the whole promoter map visually. Computational efficiency and speed issue are also discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blattner, F.R., Plunkett, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., Gregor, J., Davis, N.W., Kirkpatrick, H.A., Goeden, M.A., Rose, D.J., Mau, B., Shao, Y.: The complete genome sequence of Escherichia coli k-12. Science 277, 1453–1462 (1997)
De Haseth, P.L., Zupancic, M.L., Record Jr., M.T.: RNA polymerase-promoter interactions: the comings and goings of RNA polymerase. Journal of Bacteriology 180, 3019–3025 (1998)
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12, 505–519 (1984)
Lukashin, A.V., Anshelevich, V.V., Amirikyan, B.R., Gragerov, A.I., Frank- Kamenetskii, M.D.: Neural network models for promoter recognition. J Biomol. Struct. Dyn. 6, 1123–1133 (1989)
O’Neill, M.C.: Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Res. 19, 313–318 (1991)
O’Neill, M.C.: Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes. Nucleic Acids Res. 20, 3471–3477 (1992)
Mahadevan, I., Ghosh, I.: Analysis of E.coli promoter structures using neural networks. Nucleic Acids Res. 22, 2158–2165 (1994)
Alexandrov, N.N., Mironov, A.A.: Application of a new method of pattern recognition in DNA sequence analysis: a study of E.coli promoters. Nucleic Acids Res. 18, 1847–1852 (1990)
Pedersen, A.G., Baldi, P., Brunak, S., Chauvin, Y.: Characterization of prokaryotic and eukaryotic promoters using hidden markov models. In: Proceedings of the, Conference on Intelligent Systems for Molecular Biology, 182–191 (1996)
Bailey, T., Hart, W.E.: Learning consensus patterns in unaligned DNA sequences using a genetic algorithm (web), http://citeseer.nj.nec.com/172804.html
Rosenblueth, D.A., Thieffry, D., Huerta, A.M., Salgado, H., Collado-Vides, J.: Syntactic recognition of regulatory regions in Escherichia coli. Computer Applications in Biology 12, 415–422 (1996)
Leung, S.W., Mellish, C., Robertson, D.: Basic gene grammars and dna-chartparser for language processing of escherichia coli promoter dna sequences. Bioinformatics 17, 226–236 (2001)
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Tompa, M.: An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. In: Seventh International Conference on Intelligent Systems for Molecular Biology, 262–271 (1999)
Kent, J.: Improbizer motif discovery program with web interface (web), http://www.cse.ucsc.edu/~kent/improbizer/improbizer.html
Horton, P.B., Kanehisa, M.: An assesment of neural network and statistical approaches for prediction of E.coli promoter sites. Nucleic Acids Res. 20, 4331–4338 (1992)
Hawley, D.K., McClure, W.R.: Compilation and analysis of escherichia coli promoter dna sequences. Nucleic Acids Res. 11, 2237–2255 (1983)
O’Neill, M.C.: Escherichia coli promoters. I. Consensus as it relates to spacing class, specificity, repeat substructure, and three-dimensional organization. Journal of Biological Chemistry 264, 5522–5530 (1989)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 433–453 (1970)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Watkins, C.: Dynamic alignment kernels. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)
Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998)
Gordon, L., Chervonenkis, A.Y., Gammerman, A.J., Shahmuradov, I.A., Solovyev, V.V.: Sequence Alignment Kernel for recognition of promoter regions. Bioinformatics (2003) (to appear)
Salgado, H., Santos-Zavaleta, A., Gama-Castro, S., Millan-Zarate, D., Blattner, F.R., Collado-Vides, J.: Regulondb (version 3.0): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res. 28, 65–67 (2000), http://www.cifn.unam.mx/Computational_Genomics/regulondb/
Hershberg, R., Bejerano, G., Santos-Zavaleta, A., Margalit, H.: Promec: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites. Nucleic Acids Res. 29, 277 (2001), http://bioinfo.md.huji.ac.il/marg/promec/
Gordon, L.: VIsualiser of GENes – E.coli gene and TSS map together with promoter prediction curve – web interface (web), http://nostradamus.cs.rhul.ac.uk/~leo/vigen/
Foster, I., Kesselman, C.: Computational grids. In: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Gammerman, A., Vovk, V.: Prediction algorithms and confidence measures based on algorithmic randomness theory. Theoretical Computer Science 287, 209–217 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gordon, L., Chervonenkis, A.Y., Gammerman, A.J., Shahmuradov, I.A., Solovyev, V.V. (2003). Genome-Wide Prokaryotic Promoter Recognition Based on Sequence Alignment Kernel. In: R. Berthold, M., Lenz, HJ., Bradley, E., Kruse, R., Borgelt, C. (eds) Advances in Intelligent Data Analysis V. IDA 2003. Lecture Notes in Computer Science, vol 2810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45231-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-45231-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40813-0
Online ISBN: 978-3-540-45231-7
eBook Packages: Springer Book Archive