Skip to main content

Advertisement

Log in

Eukaryotic and prokaryotic promoter prediction using hybrid approach

  • Original Paper
  • Published:
Theory in Biosciences Aims and scope Submit manuscript

Abstract

Promoters are modular DNA structures containing complex regulatory elements required for gene transcription initiation. Hence, the identification of promoters using machine learning approach is very important for improving genome annotation and understanding transcriptional regulation. In recent years, many methods have been proposed for the prediction of eukaryotic and prokaryotic promoters. However, the performances of these methods are still far from being satisfactory. In this article, we develop a hybrid approach (called IPMD) that combines position correlation score function and increment of diversity with modified Mahalanobis Discriminant to predict eukaryotic and prokaryotic promoters. By applying the proposed method to Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans, Escherichia coli, and Bacillus subtilis promoter sequences, we achieve the sensitivities and specificities of 90.6 and 97.4% for D. melanogaster, 88.1 and 94.1% for H. sapiens, 83.3 and 95.2% for C. elegans, 84.9 and 91.4% for E. coli, as well as 80.4 and 91.3% for B. subtilis. The high accuracies indicate that the IPMD is an efficient method for the identification of eukaryotic and prokaryotic promoters. This approach can also be extended to predict other species promoters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abeel T, Saeys Y, Bonnet E, Rouzé P, Van de Peer Y (2008a) Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res 18:310–323

    Article  PubMed  CAS  Google Scholar 

  • Abeel T, Saeys Y, Rouzé P, van de Peer Y (2008b) ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24:i24–i31

    Article  PubMed  CAS  Google Scholar 

  • Aerts S, Thijs G, Dabrowski M, Moreau Y, Moor BD (2004) Comprehensive analysis of base composition around the transcription start site in Metazoa. BMC Genomics 5:34

    Article  PubMed  Google Scholar 

  • Akan P, Deloukas P (2008) DNA sequence and structural properties as predictors of human and mouse promoters. Gene 410:165–176

    Article  PubMed  CAS  Google Scholar 

  • Anwar F, Baker SM, Jabid T, Mehedi Hasan M, Shoyaib M, Khan H, Walshe R (2008) pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach. BMC Bioinformatics 9:414

    Article  PubMed  Google Scholar 

  • Bajic VB, Seah SH, Chong A, Zhang G, Koh JL, Brusic V (2002) Dragon promoter finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics 18:198–199

    Article  PubMed  CAS  Google Scholar 

  • Bajic VB, Choudhary V, Hock CK (2004) Content analysis of the core promoter region of human genes. In Silico Biol 4:109–125

    PubMed  CAS  Google Scholar 

  • Burden S, Lin YX, Zhang R (2005) Improving promoter prediction for the NNPP2.2 algorithm: a case study using E. Coli DNA sequences. Bioinformatics 21:601–607

    Article  PubMed  CAS  Google Scholar 

  • Chan B, Kibler D (2005) Using hexamers to predict cis-regulatory motifs in Drosophila. BMC Bioinformatics 6:262

    Article  PubMed  Google Scholar 

  • Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 21:319–344

    Google Scholar 

  • Chou KC, Liu WM, Maggiora GM, Zhang CT (1998) Prediction and classification of domain structural classes. Proteins 31:97–103

    Article  PubMed  CAS  Google Scholar 

  • Davuluri RV, Grosse I, Zhang MQ (2001) Computational identification of promoters and first exons in the human genome. Nat Genet 29:412–417

    Article  PubMed  CAS  Google Scholar 

  • Down TA, Hubbard TJ (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res 12:458–461

    Article  PubMed  CAS  Google Scholar 

  • Feng Y, Luo L (2008) Use of tetrapeptide signals for protein secondary-structure prediction. Amino Acids 35:607–614

    Article  PubMed  CAS  Google Scholar 

  • Gangal R, Sharma P (2005) Human pol II promoter prediction: time series descriptors and machine learning. Nucleic Acids Res 33:1332–1336

    Article  PubMed  CAS  Google Scholar 

  • Goni JR, Pere A, Torrents D, Orozco M (2007) Determining promoter location based on DNA structure first-principles calculations. Genome Biol 8:R263

    Article  PubMed  Google Scholar 

  • Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov LA, Solovyev VV (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19:1964–1971

    Article  PubMed  CAS  Google Scholar 

  • Gordon JJ, Towsey MW, Hogan JM, Mathews SA, Timms P (2006) Improved prediction of bacterial transcription start sites. Bioinformatics 22:142–148

    Article  PubMed  CAS  Google Scholar 

  • Grech B, Maetschke S, Mathews S, Timms P (2007) Genome-wide analysis of chlamydiae for promoters that phylogenetically footprint. Res Microbiol 158:685–693

    Article  PubMed  CAS  Google Scholar 

  • Grech B, Mathews S, Timms P (2008) Phylogenetic comparison of the known Chlamydia trachomatis σ66 promoters across to Chlamydia pneumoniae and Chlamydia caviae identifies seven poorly conserved promoters. Res Microbiol 159:550–556

    Article  PubMed  CAS  Google Scholar 

  • Hawley DK, McClure WR (1983) Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res 11:2237–2255

    Article  PubMed  CAS  Google Scholar 

  • Horton PB, Kanehisa M (1992) An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. Nucleic Acids Res 20:4331–4338

    Article  PubMed  CAS  Google Scholar 

  • Huerta AM, Collado–Vides J (2003) Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. J Mol Biol 333:261–278

    Article  PubMed  CAS  Google Scholar 

  • Hutchinson G (1996) The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Bioinformatics 12:391–398

    Article  CAS  Google Scholar 

  • Janky R, van Helden J (2008) Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolution. BMC Bioinformatics 9:37

    Article  PubMed  Google Scholar 

  • Kielbasa SM, Gonze D, Herzel H (2005) Measuring similarities between transcription factor binding sites. BMC Bioinformatics 6:237

    Article  PubMed  Google Scholar 

  • Knudsen S (1999) Promoter2.0: for the recognition of pol II promoter sequences. Bioinformatics 15:356–361

    Article  PubMed  CAS  Google Scholar 

  • Laxton RR (1978) The measure of diversity. J Theor Biol 70:51–67

    Article  PubMed  CAS  Google Scholar 

  • Levitsky VG, Katokhin AV (2003) Recognition of eukaryotic promoters using a genetic algorithm based on iterative discriminant analysis. In Silico Biol 3:81–87

    PubMed  CAS  Google Scholar 

  • Li QZ, Lin H (2006) The recognition and prediction of σ70 promoters in Escherichia coli K–12. J Theor Biol 242:135–141

    Article  PubMed  CAS  Google Scholar 

  • Mahdi RN, Rouchka EC (2009) RBF–TSS: identification of transcription start site in human using radial basis functions network and oligonucleotide positional frequencies. PLoS One 4:e4878

    Article  PubMed  Google Scholar 

  • Makita Y, Nakao M, Ogasawara N, Nakai K (2004) DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucleic Acids Res 1:D75–D77

    Article  Google Scholar 

  • Ohler U (2006) Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res 34:5943–5950

    Article  PubMed  CAS  Google Scholar 

  • Ohler U, Harbeck S, Niemann H, Noth E, Reese MG (1999) Interpolated Markov chains for eukaryotic promoter recognition. Bioinformatics 15:363–369

    Article  Google Scholar 

  • Ohler U, Niemann H, Liao GC, Rubin GM (2001) Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17:S199–S206

    Article  PubMed  Google Scholar 

  • Ohler U, Liao GC, Niemann H, Rubin GM (2002) Computational analysis of core promoters in the Drosophila genome. Genome Biol 3:RESEARCH0087

    Google Scholar 

  • Pedersen AG, Engelbrecht J (1995) Investigations of Escherichia coli promoter sequences with artificial neural networks: new signals discovered upstream of the transcriptional startpoint. Proc Int Conf Intell Syst Mol Biol 3:292–299

    PubMed  CAS  Google Scholar 

  • Pedersen AG, Baldi P, Brunak S, Chauvin Y (1996) Characterization of prokaryotic and eukaryotic promoters using Hidden Markov models. Proc Int Conf Intell Syst Mol Biol 4:182–191

    PubMed  CAS  Google Scholar 

  • Pedersen AG, Baldi P, Brunak S (1999) The biology of eukaryotic promoter prediction—a review. Comput Chem 23:191–207

    Article  PubMed  CAS  Google Scholar 

  • Ponger L, Mouchiroud D (2002) CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 18:631–633

    Article  PubMed  CAS  Google Scholar 

  • Prestridge DS (1995) Predicting pol II promoter sequences using transcription factor binding sites. J Mol Biol 249:923–932

    Article  PubMed  CAS  Google Scholar 

  • Rangannan V, Bansal M (2007) Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability. J Biosci 32:851–862

    Article  PubMed  CAS  Google Scholar 

  • Rangannan V, Bansal M (2009) Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. Mol Biosyst 5:1758–1769

    Article  PubMed  CAS  Google Scholar 

  • Rani TS, Bhavani SD, Bapi RS (2007) Analysis of E. coli promoter recognition problem in dinucleotide feature space. Bioinformatics 23:582–588

    Article  PubMed  CAS  Google Scholar 

  • Reese MG (2001) Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 26:51–56

    Article  PubMed  CAS  Google Scholar 

  • Salgado H, Gama-Castro S, Martinez-Antonio A, Diaz-Peredo E, Sanchez-Solano F, Peralta-Gil M, Garcia-Alonso D, Jimenez-Jacinto V, Santos-Zavaleta A, Bonavides-Martinez C, Collado-Vides J (2004) RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K–12. Nucleic Acids Res 32:D303–D306

    Article  PubMed  CAS  Google Scholar 

  • Satija R, Pachter L, Hein J (2008) Combining statistical alignment and phylogenetic footprinting to detect regulatory elements. Bioinformatics 24:1236–1242

    Article  PubMed  CAS  Google Scholar 

  • Schmid CD, Perier R, Praz V, Bucher P (2006) EPD in its twentieth year: towards complete promoter coverage of selected model organisms. Nucleic Acids Res 34:D82–D85

    Article  PubMed  CAS  Google Scholar 

  • Shahmuradov IA, Solovyev VV, Gammerman AJ (2005) Plant promoter prediction with confidence estimation. Nucleic Acids Res 33:1069–1076

    Article  PubMed  CAS  Google Scholar 

  • Shepelev V, Fedorov A (2006) Advances in the exon–intron database (EID). Brief Bioinform 7:178–185

    Article  PubMed  CAS  Google Scholar 

  • Solovyev VV, Shahmuradov IA (2003) PromH: promoters identification using orthologous genomic sequences. Nucleic Acids Res 31:3540–3545

    Article  PubMed  CAS  Google Scholar 

  • Sonnenburg S, Zien A, Ratsch G (2006) ARTS: accurate recognition of transcription starts in human. Bioinformatics 22:e472–e480

    Article  PubMed  CAS  Google Scholar 

  • Wang HQ, Benham CJ (2006) Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress. BMC Bioinformatics 7:248

    Article  PubMed  Google Scholar 

  • Wasserman WW, Sandelin A (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5:276–287

    Article  PubMed  CAS  Google Scholar 

  • Yang JY, Zhou Y, Yu ZG, Anh V, Zhou LQ (2008) Human pol II promoter recognition based on primary sequences and free energy of dinucleotides. BMC Bioinformatics 9:113

    Article  PubMed  Google Scholar 

  • Zhang MQ (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94:565–568

    Article  PubMed  CAS  Google Scholar 

  • Zhang MQ (2005) Using CorePromoter to find human core promoters. Curr Protoc Bioinformatics Chapter 2: Unit 2.9

  • Zhang LR, Luo LF (2003) Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res 31:6214–6220

    Article  PubMed  CAS  Google Scholar 

  • Zhang X, Kassim A, Bajic VB (2004) Digital signal processing for potential promoter. In: IEEE international workshop on biomedical circuit and systems, pp S2/7/INV–S2/16-19

Download references

Acknowledgments

The authors are grateful to the anonymous reviewers for their valuable suggestions and comments, which have led to the improvement of this article. This study was supported in part by the Fundamental Research Funds for the Central Universities (ZYGX2009J081) and the National Natural Science Foundation of China (61063016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, H., Li, QZ. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci. 130, 91–100 (2011). https://doi.org/10.1007/s12064-010-0114-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12064-010-0114-8

Keywords

Navigation