Abstract
Enhancers are small segments of DNA that bind to proteins (transcription factors) and the transcription of a gene is strengthened after binding to the protein, thus playing an essential role in gene expression. Recently, machine learning-based methods have become a trend in identifying enhancers and their strength. In this study, we propose iEnhancer-BERT, a novel transfer learning method based on pre-trained DNA language model using the whole human genome. More specifically, iEnhancer-BERT consists of a BERT layer for feature extraction and a CNN layer for classification. We initialize our parameters of the BERT layer using a pre-trained DNA language model, and fine-tune it with transfer learning on the enhancer identification tasks. Unlike common fine-tuning strategies, we extract the output of all Transformer Encoder layers to form the feature vector. Experiments show that our method achieves state-of-the-art results in both enhancer identification tasks and strong enhancer identification tasks. The code and data are publicly available at https://github.com/lhy0322/iEnhancer-BERT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blackwood, E.M., Kadonaga, J.T.: Going the distance: a current view of enhancer action. Science 281(5373), 60–63 (1998)
Pennacchio, L.A., et al.: Enhancers: five essential questions. Nat. Rev. Genet. 14(4), 288–295 (2013)
Herz, H.M.: Enhancer deregulation in cancer and other diseases. BioEssays 38(10), 1003–1015 (2016)
Zhang, G., et al.: DiseaseEnhancer: a resource of human disease-associated enhancer catalog. Nucleic Acids Res. 46(D1), D78–D84 (2018)
Mardis, E.R.: ChIP-seq: welcome to the new frontier. Nat. Methods 4(8), 613–614 (2007)
Creyghton, M.P., et al.: Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. 107(50), 21931–21936 (2010)
Heintzman, N.D., et al.: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39(3), 311–318 (2007)
Dorschner, M.O., et al.: High-throughput localization of functional elements by quantitative chromatin profiling. Nat. Methods 1(3), 219–225 (2004)
Buenrostro, J.D., et al.: ATAC‐seq: a method for assaying chromatin accessibility genome‐wide. Curr. Protoc. Mol. Biol. 109(1), 21.29.21–21.29.29 (2015)
Firpi, H.A., Ucar, D., Tan, K.: Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics 26(13), 1579–1586 (2010)
Fernandez, M., Miranda-Saavedra, D.: Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines. Nucleic Acids Res. 40(10), e77–e77 (2012)
Rajagopal, N., et al.: RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput. Biol. 9(3), e1002968 (2013)
Erwin, G.D., et al.: Integrating diverse datasets improves developmental enhancer prediction. PLoS Comput. Biol. 10(6), e1003677 (2014)
Ghandi, M., et al.: gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32(14), 2205–2207 (2016)
Yang, B., et al.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33(13), 1930–1936 (2017)
Liu, B., et al.: iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32(3), 362–369 (2016)
Jia, C., He, W.: EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci. Rep. 6(1), 1–7 (2016)
Liu, B., et al.: iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 34(22), 3835–3842 (2018)
Khanal, J., Tayara, H., Chong, K.T.: Identifying enhancers and their strength by the integration of word embedding and convolution neural network. IEEE Access 8, 58369–58376 (2020)
Cai, L., et al.: iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor. Bioinformatics 37(8), 1060–1067 (2021)
Li, Y., et al.: SENIES: DNA shape enhanced two-layer deep learning predictor for the identification of enhancers and their strength. IEEE/ACM Trans. Comput. Biol. Bioinform. (2022)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Le, N.Q.K., et al.: A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Brief. Bioinf. 22(5), bbab005 (2021)
Ji, Y., et al.: DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 37(15), 2112–2120 (2021)
Ernst, J., et al.: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473(7345), 43–49 (2011)
Fu, L., et al.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)
Heinz, S., et al.: The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16(3), 144–154 (2015)
Cai, W., et al.: Enhancer dependence of cell-type–specific gene expression increases with developmental age. Proc. Natl. Acad. Sci. 117(35), 21450–21458 (2020)
Shen, L.-C., et al.: SAResNet: self-attention residual network for predicting DNA-protein binding. Brief. Bioinf. (2021)
Chen, W., et al.: iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics 33(22), 3518–3523 (2017)
Acknowledgments
This work has been supported by the National Natural Science Foundation of China (Grant No. 62002154), Hunan Provincial Natural Science Foundation of China (No. 2019JJ50520, Grant No. 2021JJ40467), Research Foundation of Hunan Educational Committee (Grant No. 20C1579), and Scientific Research Startup Foundation of University of South China (Grant No. 190XQD096).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Luo, H., Chen, C., Shan, W., Ding, P., Luo, L. (2022). iEnhancer-BERT: A Novel Transfer Learning Architecture Based on DNA-Language Model for Identifying Enhancers and Their Strength. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13394. Springer, Cham. https://doi.org/10.1007/978-3-031-13829-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-13829-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13828-7
Online ISBN: 978-3-031-13829-4
eBook Packages: Computer ScienceComputer Science (R0)