Skip to main content

iEnhancer-BERT: A Novel Transfer Learning Architecture Based on DNA-Language Model for Identifying Enhancers and Their Strength

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13394))

Included in the following conference series:

Abstract

Enhancers are small segments of DNA that bind to proteins (transcription factors) and the transcription of a gene is strengthened after binding to the protein, thus playing an essential role in gene expression. Recently, machine learning-based methods have become a trend in identifying enhancers and their strength. In this study, we propose iEnhancer-BERT, a novel transfer learning method based on pre-trained DNA language model using the whole human genome. More specifically, iEnhancer-BERT consists of a BERT layer for feature extraction and a CNN layer for classification. We initialize our parameters of the BERT layer using a pre-trained DNA language model, and fine-tune it with transfer learning on the enhancer identification tasks. Unlike common fine-tuning strategies, we extract the output of all Transformer Encoder layers to form the feature vector. Experiments show that our method achieves state-of-the-art results in both enhancer identification tasks and strong enhancer identification tasks. The code and data are publicly available at https://github.com/lhy0322/iEnhancer-BERT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blackwood, E.M., Kadonaga, J.T.: Going the distance: a current view of enhancer action. Science 281(5373), 60–63 (1998)

    Article  Google Scholar 

  2. Pennacchio, L.A., et al.: Enhancers: five essential questions. Nat. Rev. Genet. 14(4), 288–295 (2013)

    Article  Google Scholar 

  3. Herz, H.M.: Enhancer deregulation in cancer and other diseases. BioEssays 38(10), 1003–1015 (2016)

    Article  Google Scholar 

  4. Zhang, G., et al.: DiseaseEnhancer: a resource of human disease-associated enhancer catalog. Nucleic Acids Res. 46(D1), D78–D84 (2018)

    Article  Google Scholar 

  5. Mardis, E.R.: ChIP-seq: welcome to the new frontier. Nat. Methods 4(8), 613–614 (2007)

    Article  Google Scholar 

  6. Creyghton, M.P., et al.: Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. 107(50), 21931–21936 (2010)

    Article  Google Scholar 

  7. Heintzman, N.D., et al.: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39(3), 311–318 (2007)

    Article  Google Scholar 

  8. Dorschner, M.O., et al.: High-throughput localization of functional elements by quantitative chromatin profiling. Nat. Methods 1(3), 219–225 (2004)

    Article  Google Scholar 

  9. Buenrostro, J.D., et al.: ATAC‐seq: a method for assaying chromatin accessibility genome‐wide. Curr. Protoc. Mol. Biol. 109(1), 21.29.21–21.29.29 (2015)

    Article  Google Scholar 

  10. Firpi, H.A., Ucar, D., Tan, K.: Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics 26(13), 1579–1586 (2010)

    Article  Google Scholar 

  11. Fernandez, M., Miranda-Saavedra, D.: Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines. Nucleic Acids Res. 40(10), e77–e77 (2012)

    Article  Google Scholar 

  12. Rajagopal, N., et al.: RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput. Biol. 9(3), e1002968 (2013)

    Article  Google Scholar 

  13. Erwin, G.D., et al.: Integrating diverse datasets improves developmental enhancer prediction. PLoS Comput. Biol. 10(6), e1003677 (2014)

    Article  Google Scholar 

  14. Ghandi, M., et al.: gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32(14), 2205–2207 (2016)

    Article  Google Scholar 

  15. Yang, B., et al.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33(13), 1930–1936 (2017)

    Article  Google Scholar 

  16. Liu, B., et al.: iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32(3), 362–369 (2016)

    Article  Google Scholar 

  17. Jia, C., He, W.: EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci. Rep. 6(1), 1–7 (2016)

    Article  Google Scholar 

  18. Liu, B., et al.: iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 34(22), 3835–3842 (2018)

    Article  Google Scholar 

  19. Khanal, J., Tayara, H., Chong, K.T.: Identifying enhancers and their strength by the integration of word embedding and convolution neural network. IEEE Access 8, 58369–58376 (2020)

    Article  Google Scholar 

  20. Cai, L., et al.: iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor. Bioinformatics 37(8), 1060–1067 (2021)

    Article  Google Scholar 

  21. Li, Y., et al.: SENIES: DNA shape enhanced two-layer deep learning predictor for the identification of enhancers and their strength. IEEE/ACM Trans. Comput. Biol. Bioinform. (2022)

    Google Scholar 

  22. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  23. Le, N.Q.K., et al.: A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Brief. Bioinf. 22(5), bbab005 (2021)

    Google Scholar 

  24. Ji, Y., et al.: DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 37(15), 2112–2120 (2021)

    Article  Google Scholar 

  25. Ernst, J., et al.: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473(7345), 43–49 (2011)

    Article  Google Scholar 

  26. Fu, L., et al.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)

    Article  Google Scholar 

  27. Heinz, S., et al.: The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16(3), 144–154 (2015)

    Article  Google Scholar 

  28. Cai, W., et al.: Enhancer dependence of cell-type–specific gene expression increases with developmental age. Proc. Natl. Acad. Sci. 117(35), 21450–21458 (2020)

    Article  Google Scholar 

  29. Shen, L.-C., et al.: SAResNet: self-attention residual network for predicting DNA-protein binding. Brief. Bioinf. (2021)

    Google Scholar 

  30. Chen, W., et al.: iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics 33(22), 3518–3523 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Grant No. 62002154), Hunan Provincial Natural Science Foundation of China (No. 2019JJ50520, Grant No. 2021JJ40467), Research Foundation of Hunan Educational Committee (Grant No. 20C1579), and Scientific Research Startup Foundation of University of South China (Grant No. 190XQD096).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingyun Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, H., Chen, C., Shan, W., Ding, P., Luo, L. (2022). iEnhancer-BERT: A Novel Transfer Learning Architecture Based on DNA-Language Model for Identifying Enhancers and Their Strength. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13394. Springer, Cham. https://doi.org/10.1007/978-3-031-13829-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13829-4_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13828-7

  • Online ISBN: 978-3-031-13829-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics