Abstract
We present a novel approach to encode inputs to neural networks for the recognition of transcription start sites in RNA polymerase II promoter regions. The approach is based on Markov models that represent TATA-box and Inr transcription binding sites, characterizing a promoter. The Markovian parameters are used as inputs to three neural networks which learn potential distant relationships between the nucleotides at promoter regions. Such an approach allows for incorporating the biological contextual information of the promoter sites into neural network systems and implementing higher-order Markov models of the promoters. Our experiments on a human promoter data set, available at [19], showed an increased correlation coefficient rate of 0.69 on average, which is better than the earlier reported best rate of 0.65 by NNPP 2.1 method.
Similar content being viewed by others
References
Bajic VB, Seah SH, Chong A, Krishnan SPT, Koh JLY, Brusic V (2003) Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. J Mol Graph Modelling 21:323–332
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Burset M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomic 34:353–367
Corne D, Meade A, Sibly R (2001) Evolving core promoter signal motifs. In: Proceedings of the 2001 congress on evolutionary computation (1999). IEEE press 1162–1169
Fickett JW, Hatzigeorgious AG (1997) Eukaryotic promoter recognition. Genome Res 861–878
Haykin S (1999) Neural networks: a compreshensive foundation, 2nd edn.
Ho SL, Rajapakse JC (2003) Splice site detection with a higher-order Markov model implemented on a neural network. Genome Inf 14:64–72
Howard D, Benson K (2003) Evolutionary computation method for promoter site prediction in DNA. Genetic and evolutionary computation conference, Chicago 1690–1701
Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. International joint conference on neural networks, San Diego 3:21–26
Ohler U, Harback S, Niemann H, Noth E, Rubin GM (2001) Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17:199–206
Ohler U, Niemann H, Liao G, Reese MG (1999) Interpolated Markov chains for eukaryotic promoter recognition. Bioinformatics 15:362–369
Perdersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic promoter prediction – a review. Comput Chem 23:191–207
Plagianakos VP, Magoulas GD, Vrahatis MN (2000) Learning rate adaptation in stochastic gradient descent. `Advances in convex analysis and global optimization' chap 2, pp 15–26
Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numerica 143–195
Reese MG (2001) Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 26:51–56
Salzberg SL, Delcher AL, Fasman K, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol 5:667–680
Scherf M, Klingenhoff A, Werner T (2000) Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel analysis approach. J Mol Biol 297:599–606
Zhang MQ (2002) Computational methods for promoter prediction. `Current topics in computational molecular biology' chap 10, pp 249–267
Promoter dataset: http://www.fruitfly.org/seq_tools/datasets /Human/promoter/
Genie dataset: http://www.fruitfly.org/seq_tools/datasets/Human/
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ho, L., Rajapakse, J. Input encoding method for identifying transcription start sites in RNA polymerase II promoters by neural networks. Soft Comput 10, 331–337 (2006). https://doi.org/10.1007/s00500-005-0491-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0491-y