Skip to main content
Log in

Computational localization of transcription factor binding sites using extreme learning machines

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Computational localization of transcription factor binding sites (TFBSs, also termed as motif instances) in DNA sequences greatly helps biologists in saving experimental cost and time for motif discovery. The task can be formulated as feature-based object location identification problem, which is remarkably different from traditional pattern recognition tasks. This paper aims to develop a machine learning approach for TFBSs location prediction through feature-based classifiers. Some specific features are extracted to characterize and distinguish the TFBSs from random k-mers. Then, a sampling technique is employed to generate dummy positives in the feature space for achieving better prediction performance. Three learner models are examined and a simple ensemble method is adopted in our classifiers design. Experimental results on eight benchmark datasets demonstrate that our proposed techniques have good potential for conserved motif detections. Comparative studies indicate that the extreme learning machine-based ensemble classifier outperforms the other learner models in terms of overall prediction accuracy and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Chacko B, Krishnan V, Raju G, Anto P (2011) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern. doi:10.1007/s13042-011-0049-5

  • Chan TM, Leung KS, Lee KH (2008) TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3):341–349

    Article  Google Scholar 

  • Chauvin Y, Rumelhart DE (1995) Backpropagation: theory, architectures, and applications. Taylor & Francis, Inc., USA

  • Chawla NV, Bowyer KW, Kegelmeyer PW (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res (JAIR) 16:321–357

    MATH  Google Scholar 

  • Dineen DG, Wilm A, Cunningham P, Higgins DG (2009) High DNA melting temperature predicts transcription start site location in human and mouse. Nucleic Acids Res 37(22):7360–7367

    Article  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York

  • Ernst J, Plasterer HL, Simon I, Bar-Joseph Z (2010) Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome Res 20(4):526–536

    Article  Google Scholar 

  • Fu W, Ray P, Xing EP (2009) DISCOVER: a feature-based discriminative method for motif search in complex genomes. Bioinformatics 25(12):i321–i329

    Article  Google Scholar 

  • Gunewardena S, Zhang Z (2006) Accounting for structural properties and nucleotide co-variations in the quantitative prediction of binding affinities of protein-DNA interactions. In: Proceedings of the pacific symposium on biocomputing, Maui, pp 379–390

  • Heron L (2011) A new fast fuzzy Cocke–Younger–Kasami algorithm for DNA strings analysis. Int J Mach Learn Cybern 2(3):209–218

    Article  Google Scholar 

  • Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of IEEE international joint conference on neural networks (IJCNN’04), vol 2, pp 985–990

  • Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122

    Article  Google Scholar 

  • Kang K, Chung JHH, Kim J (2009) Evolutionary conserved motif finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF- binding sites. Nucl Acids Res 37(6):2003–2013

    Google Scholar 

  • Kel AE, Gössling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E (2003) MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucl Acids Res 31(13):3576–3579

    Article  Google Scholar 

  • Kheradpour P, Stark A, Roy S (2007) Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res 17:1919–1931

    Article  Google Scholar 

  • Liu F, Tostesen E, Sundet JK, Jenssen TK, Bock C, Jerstad GI, Thilly WG, Hovig E (2007) The human genomic melting map. PLoS Comput Biol 3:e93

    Google Scholar 

  • Liu R, Blackwell TW, States DJ (2001a) Conformational model for binding site recognition by the E. coli MetJ transcription factor. Bioinformatics 17(7):622–633

    Article  Google Scholar 

  • Liu X, Brutlag DL, Liu JS (2001b) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 6:127–138

    Google Scholar 

  • Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, London

  • Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes. Nucl Acids Res 34:D108–D110

    Google Scholar 

  • Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K (2011) Use of structural dna properties for the prediction of transcription-factor binding sites in Escherichia coli. Nucl Acids Res 39(2):e6

    Google Scholar 

  • Ponomarenko MP, Ponomarenko JV, Frolov AS, Podkolodny NL, Savinkova LK, Kolchanov NA, Overton GC (1999) Identification of sequence-dependent DNA features correlating to activity of DNA sites interacting with proteins. Bioinformatics 15(7):687–703

    Article  Google Scholar 

  • Pudimat R, Schukat-Talamazzini EG, Backofen R (2005) A multiple-feature framework for modelling and predicting transcription factor binding sites. Bioinformatics 21(14):3082–3088

    Article  Google Scholar 

  • Quandt K, FrechH K Karas, Wingender E, Werner T (1995) MatInd and MatInspector—new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucl Acids Res 23:4878–4884

    Article  Google Scholar 

  • Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucl Acids Res 32:D91–D94

    Article  Google Scholar 

  • Sandve GK, Drabls F (2006) A survey of motif discovery methods in an integrated framework. Biol Direct 1(1):11+

    Google Scholar 

  • Satija R, Pachter L, Hein J (2008) Combining statistical alignment and phylogenetic footprinting to detect regulatory elements. Bioinformatics 24(10):1236–1242

    Article  Google Scholar 

  • Sharon E, Lubliner S, Segal E (2008) A feature-based approach to modeling protein–DNA interactions. PLoS Comput Biol 4(8):e1000154

    Google Scholar 

  • Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16(1):16–23

    Article  Google Scholar 

  • Tang V, Yan H (2011) Noise reduction in microarray gene expression data based on spectral analysis. Int J Mach Learn Cybern. doi:10.1007/s13042-011-0039-7

  • Thijs G, Lescot M, Marchal K, Rombauts S, Moor BD, Rouze P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12):1113–1122

    Article  Google Scholar 

  • Tompa M, Li N, Bailey TL, Church GM, Moor BD, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144

    Article  Google Scholar 

  • Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, New York

  • Wang DH (2009) Characterization of regulatory motif models. Technical report, La Trobe Univeristy

  • Wang DH, Lee NK (2008) MISCORE: mismatch-based matrix similarity scores for DNA motifs detection. In: Proceedings of the international conference on neural information processing (ICONIP’08), pp 478–485

  • Wang DH, Li X (2009) GAPK: genetic algorithms with prior knowledge for motif discovery in DNA sequences. In: Proceedings of the IEEE congress on evolutionary computation (CEC ’09), pp 277–284

  • Wang DH, Tapan S (2010) Fuzzy filtering systems for performing environment improvement of computational dna motif discovery. In: Proceedings of the IEEE international conference on fuzzy systems (FUZZ-IEEE’10), pp 1–8

  • Wang XZ, Chen AX, Feng HM (2011) Upper integral network with extreme learning mechanism. Neurocomputing 74(16): 2520–2525

    Google Scholar 

  • Wang XZ, Dong CR (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567

    Article  Google Scholar 

  • Wang XZ, Dong LC, Yan JH (2011) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng. doi:10.1109 /TKDE.2011.67

  • Wei Z, Jensen ST (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584

    Article  Google Scholar 

  • Wu J, Wang ST, Chung FL (2011) Positive and negative fuzzy rule system, extreme learning machine and image classification. Int J Mach Learn Cybern 2(4):261–271

    Article  Google Scholar 

  • Yaragatti M, Sandler T, Ungar L (2009) A predictive model for identifying mini-regulatory modules in the mouse genome. Bioinformatics 25(3):353–357

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dianhui Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Do, H.T. Computational localization of transcription factor binding sites using extreme learning machines. Soft Comput 16, 1595–1606 (2012). https://doi.org/10.1007/s00500-012-0820-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0820-x

Keywords

Navigation