Computational localization of transcription factor binding sites using extreme learning machines

Wang, Dianhui; Do, Hai Thanh

doi:10.1007/s00500-012-0820-x

Computational localization of transcription factor binding sites using extreme learning machines

Focus
Published: 10 February 2012

Volume 16, pages 1595–1606, (2012)
Cite this article

Soft Computing Aims and scope Submit manuscript

Dianhui Wang¹ &
Hai Thanh Do¹

274 Accesses
3 Citations
Explore all metrics

Abstract

Computational localization of transcription factor binding sites (TFBSs, also termed as motif instances) in DNA sequences greatly helps biologists in saving experimental cost and time for motif discovery. The task can be formulated as feature-based object location identification problem, which is remarkably different from traditional pattern recognition tasks. This paper aims to develop a machine learning approach for TFBSs location prediction through feature-based classifiers. Some specific features are extracted to characterize and distinguish the TFBSs from random k-mers. Then, a sampling technique is employed to generate dummy positives in the feature space for achieving better prediction performance. Three learner models are examined and a simple ensemble method is adopted in our classifiers design. Experimental results on eight benchmark datasets demonstrate that our proposed techniques have good potential for conserved motif detections. Comparative studies indicate that the extreme learning machine-based ensemble classifier outperforms the other learner models in terms of overall prediction accuracy and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chacko B, Krishnan V, Raju G, Anto P (2011) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern. doi:10.1007/s13042-011-0049-5
Chan TM, Leung KS, Lee KH (2008) TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3):341–349
Article Google Scholar
Chauvin Y, Rumelhart DE (1995) Backpropagation: theory, architectures, and applications. Taylor & Francis, Inc., USA
Chawla NV, Bowyer KW, Kegelmeyer PW (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res (JAIR) 16:321–357
MATH Google Scholar
Dineen DG, Wilm A, Cunningham P, Higgins DG (2009) High DNA melting temperature predicts transcription start site location in human and mouse. Nucleic Acids Res 37(22):7360–7367
Article Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York
Ernst J, Plasterer HL, Simon I, Bar-Joseph Z (2010) Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome Res 20(4):526–536
Article Google Scholar
Fu W, Ray P, Xing EP (2009) DISCOVER: a feature-based discriminative method for motif search in complex genomes. Bioinformatics 25(12):i321–i329
Article Google Scholar
Gunewardena S, Zhang Z (2006) Accounting for structural properties and nucleotide co-variations in the quantitative prediction of binding affinities of protein-DNA interactions. In: Proceedings of the pacific symposium on biocomputing, Maui, pp 379–390
Heron L (2011) A new fast fuzzy Cocke–Younger–Kasami algorithm for DNA strings analysis. Int J Mach Learn Cybern 2(3):209–218
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of IEEE international joint conference on neural networks (IJCNN’04), vol 2, pp 985–990
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Article Google Scholar
Kang K, Chung JHH, Kim J (2009) Evolutionary conserved motif finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF- binding sites. Nucl Acids Res 37(6):2003–2013
Google Scholar
Kel AE, Gössling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E (2003) MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucl Acids Res 31(13):3576–3579
Article Google Scholar
Kheradpour P, Stark A, Roy S (2007) Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res 17:1919–1931
Article Google Scholar
Liu F, Tostesen E, Sundet JK, Jenssen TK, Bock C, Jerstad GI, Thilly WG, Hovig E (2007) The human genomic melting map. PLoS Comput Biol 3:e93
Google Scholar
Liu R, Blackwell TW, States DJ (2001a) Conformational model for binding site recognition by the E. coli MetJ transcription factor. Bioinformatics 17(7):622–633
Article Google Scholar
Liu X, Brutlag DL, Liu JS (2001b) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 6:127–138
Google Scholar
Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, London
Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes. Nucl Acids Res 34:D108–D110
Google Scholar
Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K (2011) Use of structural dna properties for the prediction of transcription-factor binding sites in Escherichia coli. Nucl Acids Res 39(2):e6
Google Scholar
Ponomarenko MP, Ponomarenko JV, Frolov AS, Podkolodny NL, Savinkova LK, Kolchanov NA, Overton GC (1999) Identification of sequence-dependent DNA features correlating to activity of DNA sites interacting with proteins. Bioinformatics 15(7):687–703
Article Google Scholar
Pudimat R, Schukat-Talamazzini EG, Backofen R (2005) A multiple-feature framework for modelling and predicting transcription factor binding sites. Bioinformatics 21(14):3082–3088
Article Google Scholar
Quandt K, FrechH K Karas, Wingender E, Werner T (1995) MatInd and MatInspector—new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucl Acids Res 23:4878–4884
Article Google Scholar
Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucl Acids Res 32:D91–D94
Article Google Scholar
Sandve GK, Drabls F (2006) A survey of motif discovery methods in an integrated framework. Biol Direct 1(1):11+
Google Scholar
Satija R, Pachter L, Hein J (2008) Combining statistical alignment and phylogenetic footprinting to detect regulatory elements. Bioinformatics 24(10):1236–1242
Article Google Scholar
Sharon E, Lubliner S, Segal E (2008) A feature-based approach to modeling protein–DNA interactions. PLoS Comput Biol 4(8):e1000154
Google Scholar
Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16(1):16–23
Article Google Scholar
Tang V, Yan H (2011) Noise reduction in microarray gene expression data based on spectral analysis. Int J Mach Learn Cybern. doi:10.1007/s13042-011-0039-7
Thijs G, Lescot M, Marchal K, Rombauts S, Moor BD, Rouze P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12):1113–1122
Article Google Scholar
Tompa M, Li N, Bailey TL, Church GM, Moor BD, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144
Article Google Scholar
Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, New York
Wang DH (2009) Characterization of regulatory motif models. Technical report, La Trobe Univeristy
Wang DH, Lee NK (2008) MISCORE: mismatch-based matrix similarity scores for DNA motifs detection. In: Proceedings of the international conference on neural information processing (ICONIP’08), pp 478–485
Wang DH, Li X (2009) GAPK: genetic algorithms with prior knowledge for motif discovery in DNA sequences. In: Proceedings of the IEEE congress on evolutionary computation (CEC ’09), pp 277–284
Wang DH, Tapan S (2010) Fuzzy filtering systems for performing environment improvement of computational dna motif discovery. In: Proceedings of the IEEE international conference on fuzzy systems (FUZZ-IEEE’10), pp 1–8
Wang XZ, Chen AX, Feng HM (2011) Upper integral network with extreme learning mechanism. Neurocomputing 74(16): 2520–2525
Google Scholar
Wang XZ, Dong CR (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567
Article Google Scholar
Wang XZ, Dong LC, Yan JH (2011) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng. doi:10.1109 /TKDE.2011.67
Wei Z, Jensen ST (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584
Article Google Scholar
Wu J, Wang ST, Chung FL (2011) Positive and negative fuzzy rule system, extreme learning machine and image classification. Int J Mach Learn Cybern 2(4):261–271
Article Google Scholar
Yaragatti M, Sandler T, Ungar L (2009) A predictive model for identifying mini-regulatory modules in the mouse genome. Bioinformatics 25(3):353–357
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, VIC, 3086, Australia
Dianhui Wang & Hai Thanh Do

Authors

Dianhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hai Thanh Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dianhui Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Do, H.T. Computational localization of transcription factor binding sites using extreme learning machines. Soft Comput 16, 1595–1606 (2012). https://doi.org/10.1007/s00500-012-0820-x

Download citation

Published: 10 February 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s00500-012-0820-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational localization of transcription factor binding sites using extreme learning machines

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computational localization of transcription factor binding sites using extreme learning machines

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation