Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks

Abstract

Sequence-specific transcription factors (TFs) are the key effectors of eukaryotic gene control and they regulate hundreds to thousands of downstream genes. Of particular interest are interactions in which a given TF regulates other TFs; these interactions define the TF regulatory networks (TRNs) that underlie cellular identity and major function. Chromatin accessibility depicts whether or not a DNA sequence is physically accessible and provides a direct measurement of transcriptional regulation. Benefiting from the accumulating chromatin accessibility data and deep learning advances, we developed a new computational method named DeepTFni to infer TRNs from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data. By implementing a graph neural network, which is more suitable for network representation, DeepTFni shows outstanding performance in TRN inference, which it supports with limited numbers of cells. Furthermore, by applying DeepTFni we identified hub TFs in tissue development and tumorigenesis and revealed that many mixed-phenotype acute leukemia associated genes undergo a prominent alteration to the TRN while there is moderate difference in messenger RNA level. The DeepTFni webserver is easy to use and has provided the predicted TRNs for several popular cell lines.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the DeepTFni workflow.
Fig. 2: Performance comparison of DeepTFni and other methods.
Fig. 3: DeepTFni can capture TRNs with a small number of cells.
Fig. 4: Graph-based model and chromatin accessibility contribute to TRNs inference.
Fig. 5: Three modes of hub TFs in MPAL development.
Fig. 6: DeepTFni web tool for customized TRN inference.

Similar content being viewed by others

Data availability

The data analysed in this study originated from public data repositories. The human PBMC dataset includes 10,000 scATAC-seq data and 8,000 scRNA-seq data downloaded from the 10x Genomics website (https://support.10xgenomics.com/single-cell-atac/datasets/1.2.0/atac_v1_pbmc_10k, https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k). The MPAL dataset is obtained from Gene Expression Omnibus (GEO) Database (GSE139369). Source Data are provided with this paper.

Code availability

The code78 to implement DeepTFni is available on GitHub (https://github.com/sunyolo/DeepTFni; https://doi.org/10.5281/zenodo.6050543).

References

  1. Faith, J. J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).

    Article  Google Scholar 

  2. Hamey, F. K. et al. Reconstructing blood stem cell regulatory network models from single-cell molecular profiles. Proc. Natl Acad. Sci. USA 114, 5822–5829 (2017).

    Article  Google Scholar 

  3. Goldman, J. A. & Poss, K. D. Gene regulatory programmes of tissue regeneration. Nat. Rev. Genet. 21, 511–525 (2020).

  4. Soutourina, J. Transcription regulation by the Mediator complex. Nat. Rev. Mol. Cell Biol. 19, 262 (2017).

    Article  Google Scholar 

  5. Arendt, D. et al. The origin and evolution of cell types. Nat. Rev. Genet. 17, 744–757 (2016).

  6. Peter, I. S. & Davidson, E. H. Genomic Control Process: Development and Evolution 41–77 (Elsevier, 2015).

  7. Chang, H. H., Hemberg, M., Barahona, M., Ingber, D. E. & Huang, S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature 453, 544–547 (2008).

    Article  Google Scholar 

  8. Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).

    Article  Google Scholar 

  9. Specht, A. T. & Li, J. LEAP: constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering. Bioinformatics 33, 764–766 (2017).

    Google Scholar 

  10. Nan, P. G., Minhaz, U. D. S. M., Olivier, G. & Rudiyanto, G. SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics 34, 258–266 (2017).

  11. Chan, T. E., Stumpf, M. P. H. & Babtie, A. C. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 5, 251–267.e253 (2017).

    Article  Google Scholar 

  12. Matsumoto, H. & Kiryu, H. SCOUP: a probabilistic model based on the Ornstein–Uhlenbeck process to analyze single-cell expression data during differentiation. BMC Bioinformatics 17, 232 (2016).

    Article  Google Scholar 

  13. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    Article  Google Scholar 

  14. Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).

  15. Sanchez-Castillo, M., Blanco, D., Tienda-Luna, I. M., Carrion, M. C. & Huang, Y. A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics 34, 964–970 (2018).

    Article  Google Scholar 

  16. Fiers, M. W. E. J. et al. Mapping gene regulatory networks from single-cell omics data. Brief. Funct. Genomics 17, 246–254 (2018).

  17. Crow, M., Paul, A., Ballouz, S., Huang, Z. J. & Gillis, J. Exploiting single-cell expression to characterize co-expression replicability. Genome Biol. 17, 101 (2016).

    Article  Google Scholar 

  18. Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).

    Article  Google Scholar 

  19. Minnoye, L. et al. Chromatin accessibility profiling methods. Nat. Rev. Methods Primers 1, 10 (2021).

    Article  Google Scholar 

  20. Hu, X., Hu, Y., Wu, F., Leung, R. W. T. & Qin, J. Integration of single-cell multi-omics for gene regulatory network inference. Comput. Struct. Biotechnol. J. 18, 1925–1938 (2020).

    Article  Google Scholar 

  21. Cusanovich, D. A. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018).

    Article  Google Scholar 

  22. Pijuan-Sala, B., Wilson, N. K., Xia, J., Hou, X. & Göttgens, B. Single-cell chromatin accessibility maps reveal regulatory programs driving early mouse organogenesis. Nat. Cell Biol. 22, 487–497 (2020).

  23. Fullard, J. F. et al. An atlas of chromatin accessibility in the adult human brain. Genome Res. 28, 1243–1252 (2018).

  24. Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).

    Article  Google Scholar 

  25. Ruslan et al. Single-nucleus chromatin accessibility reveals intratumoral epigenetic heterogeneity in IDH1 mutant gliomas. Acta Neuropathol. Commun. 7, 201–201 (2019).

    Article  Google Scholar 

  26. Ackermann, A. M., Wang, Z., Schug, J., Naji, A. & Kaestner, K. H. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes. Mol. Metabol. 5, 233–244 (2016).

    Article  Google Scholar 

  27. Qin, J., Hu, Y., Xu, F., Yalamanchili, H. K. & Wang, J. Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. Methods 67, 294–303 (2014).

    Article  Google Scholar 

  28. Wang, P. et al. ChIP-Array 2: integrating multiple omics data to construct gene regulatory networks. Nucleic Acids Res. 43, W264–W269 (2015).

    Article  Google Scholar 

  29. Jansen, C., Ramirez, R. N., El-Ali, N. C., Gomez-Cabrero, D. & Mortazavi, A. Building gene regulatory networks from scATAC-seq and scRNA-seq using linked self organizing maps. PLoS Comput. Biol. 15, e1006555 (2019).

    Article  Google Scholar 

  30. Kamimoto, K., Hoffmann, C. M. & Morris, S. A. CellOracle: dissecting cell identity via network inference and in silico gene perturbation. Preprint at bioRxiv https://doi.org/10.1101/2020.02.17.947416 (2020).

  31. Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at https://arxiv.org/abs/1611.07308 (2016).

  32. Zhang, M. H. & Chen, Y. X. Link prediction based on graph neural networks. In Proc. 32th International Conference on Advances in Neural Information Processing Systems 5165–5175 (NIPS, 2018).

  33. Yang, F., Fan, K., Song, D. & Lin, H. Graph-based prediction of protein–protein interactions with attributed signed graph embedding. BMC Bioinformatics 21, 323 (2020).

    Article  Google Scholar 

  34. Karimi, M., Hasanzadeh, A. & Shen, Y. Network-principled deep generative models for designing drug combinations as graph sets. Bioinformatics 36, i445–i454 (2020).

    Article  Google Scholar 

  35. Schlichtkrull, M. et al. Modeling relational data with graph convolutional networks. In The Semantic Web. ESWC 2018. Lecture Notes in Computer Science Vol. 10843 593–607 (Springer, 2018).

  36. Wang, J., Ma, A., Ma, Q., Xu, D. & Joshi, T. Inductive inference of gene regulatory network using supervised and semi-supervised graph neural networks. Comput. Struct. Biotechnol. J. 18, 3335–3343 (2020).

    Article  Google Scholar 

  37. Qin, Q., Fan, J., Zheng, R., Wan, C. & Liu, X. S. Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biol. 21, 32 (2020).

  38. Wang, C., Sun, D., Huang, X., Wan, C. & Liu, X. S. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 21, 198 (2020).

  39. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 1274–1286 (2012).

    Article  Google Scholar 

  40. Rendeiro, A. F. et al. Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks. Nat. Commun. 7, 11938 (2016).

    Article  Google Scholar 

  41. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, aba7612 (2020).

  42. Wang, Z., Zhang, J., Feng, J. & Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proc. 28th AAAI Conference on Artificial Intelligence Vol. 28 (AAAI, 2014).

  43. Hu, K., Liu, H. & Hao, T. Natural Language Processing and Chinese Computing 171–183 (Springer, 2019).

  44. Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 701–710 (Association for Computing Machinery, 2014).

  45. Thomas, et al. GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 35, 2159–2161 (2018).

  46. Ding, J., Smith, S. L., Orozco, G., Barton, A. & Martin, P. Characterisation of CD4+ T-cell subtypes using single cell RNA sequencing and the impact of cell number and sequencing depth. Sci. Rep. 10, 19825 (2020).

  47. Chen, H. et al. Effects of sample size on plant single-cell RNA profiling. Curr. Iss. Mol. Biol. 43, 1685–1697 (2021).

    Article  Google Scholar 

  48. Schmid, K. T. et al. scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies. Nat. Commun. 12, 6625 (2021).

    Article  Google Scholar 

  49. Prakash et al. Nonclassical monocytes in health and disease. Ann. Rev. Immunol. 37, 439–456 (2019).

  50. Jenner, R. G. et al. The transcription factors T-bet and GATA-3 control alternative pathways of T-cell differentiation through a shared set of target genes. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.0909357106 (2009).

  51. Lei, C. & Ruan, J. A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29, 355–364 (2012).

  52. Martinez, V., Berzal, F. & Cubero, J. C. A Survey of link prediction in complex networks. ACM Comput. Surv. 49, 69.61–69.33 (2017).

    Article  Google Scholar 

  53. Twan, V. L., Nabuurs, S. B. & Elena, M. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics 3036 (2011).

  54. Wu, Y., Fletcher, M., Gu, Z., Wang, Q. & Radlwimmer, B. Glioblastoma epigenome profiling identifies SOX10 as a master regulator of molecular tumour subtype. Nat. Commun. 11, 6434 (2020).

    Article  Google Scholar 

  55. Shi, X. et al. EWS-FLI1 regulates and cooperates with core regulatory circuitry in Ewing sarcoma. Nucleic Acids Res. 48, 11434–11451 (2020).

    Article  Google Scholar 

  56. Chen, L. et al. Master transcription factors form interconnected circuitry and orchestrate transcriptional networks in oesophageal adenocarcinoma. Gut 69, 630–640 (2020).

    Article  Google Scholar 

  57. Stengel, K. R., Ellis, J. D., Spielman, C. L., Bomber, M. L. & Hiebert, S. W. Definition of a small core transcriptional circuit regulated by AML1-ETO. Mol. Cell 81, 530–545.e5 (2021).

    Article  Google Scholar 

  58. Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    Article  Google Scholar 

  59. Park, C. S. et al. A KLF4-DYRK2-mediated pathway regulating self-renewal in CML stem cells. Blood 134, 1960–1972 (2019).

    Article  Google Scholar 

  60. Meritxell et al. C/EBPγ deregulation results in differentiation arrest in acute myeloid leukemia. J. Clin. Invest. 122, 4490–4504 (2012).

  61. Duy, C., Teater, M., Garrett-Bakelman, F. E., Lee, T. C. & Melnick, A. M. Rational targeting of cooperating layers of the epigenome yields enhanced therapeutic efficacy against AML. Cancer Discov. 9, 872–889 (2019).

    Article  Google Scholar 

  62. Tosello, V., Bongiovanni, D., Liu, J., Pan, Q. & Piovan, E. Cross-talk between GLI transcription factors and FOXC1 promotes T-cell acute lymphoblastic leukemia dissemination. Leukemia 35, 984–1000 (2020).

  63. Li, F. et al. Prostaglandin E1 and its analog misoprostol inhibit human CML stem cell self-renewal via EP4 receptor activation and repression of AP-1. Cell Stem Cell 21, 359–373.e355 (2017).

    Article  Google Scholar 

  64. Somerville, T. D. D. et al. Derepression of the iroquois homeodomain transcription factor gene IRX3 confers differentiation block in acute leukemia. Cell Rep. 22, 638–652 (2018).

    Article  Google Scholar 

  65. Leon, T. E. et al. EZH2-deficient T-cell acute lymphoblastic leukemia is sensitized to CHK1 inhibition through enhanced replication stress. Cancer Discov. 10, 998–1017 (2020).

    Article  Google Scholar 

  66. Nagel, S. et al. Activation of paired-homeobox gene PITX1 by del(5)(q31) in T-cell acute lymphoblastic leukemia. Leuk. Lymphoma 52, 1348–1359 (2011).

    Article  Google Scholar 

  67. Durinck, K., Loocke, W. V., Meulen, J. V. D., Walle, I. V. D. & Vlierberghe, P. V. Characterization of the genome-wide TLX1 binding profile in T-cell acute lymphoblastic leukemia. Leukemia 29, 2317–2327 (2015).

    Article  Google Scholar 

  68. Alexander, T. B., Gu, Z., Iacobucci, I., Dickerson, K. & Mullighan, C. G. The genetic basis and cell of origin of mixed phenotype acute leukaemia. Nature 562, 373–379 (2018).

  69. Zhao, C. et al. Graph embedding ensemble methods based on the heterogeneous network for lncRNA–miRNA interaction prediction. BMC Genomics 21, 867 (2020).

    Article  Google Scholar 

  70. Zhao, X., Zhao, X. & Yin, M. Heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction. Brief. Bioinformatics 23, bbab407 (2021).

    Article  Google Scholar 

  71. Rao, A. et al. Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks. BMC Med. Genet. 11, 57 (2018).

    Google Scholar 

  72. Stuart, T., Butler, A., Hoffman, P., Hafemeister, C. & Satija, R. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e1821 (2019).

    Article  Google Scholar 

  73. Lü, L. & Zhou, T. Link prediction in complex networks: a survey—ScienceDirect. Physica A 390, 1150–1170 (2011).

    Article  Google Scholar 

  74. Bonneau, R. et al. Leveraging chromatin accessibility for transcriptional regulatory network inference in T helper 17 cells. Genome Res. 29, 449–463 (2019).

  75. Bengio, Y. & Glorot, X. Understanding the difficulty of training deep feed forward neural networks. In Proc. 13th International Conference on Artificial Intelligence and Statistics 249–256 (PMLR, 2010).

  76. Jolliffe, I. T. Principal component analysis. J. Marketing Res. 87, 513 (2002).

    MATH  Google Scholar 

  77. Laurens, V. D. M. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  78. Li, H., Sun, Y. & Hong, H. sunyolo/DeepTFni: (v1.0.0). Zenodo https://doi.org/10.5281/zenodo.6050543 (2022).

Download references

Acknowledgements

This work was supported by the Beijing Natural Science Foundation (http://kw.beijing.gov.cn/; grant no. 5204040 to H.L.), the National Natural Science Foundation of China (http://www.nsfc.gov.cn; grant nos 31900488, 31801112 and 61873276 to H.L., H.C. and X.B., respectively), and the Beijing Nova Program of Science and Technology (https://mis.kw.beijing.gov.cn; grant no. Z191100001119064 to H.C.).

Author information

Authors and Affiliations

Authors

Contributions

X.B. and H.C. conceived this study. H.L. and Y.S. designed the model. Y.S. and H.H. implemented the algorithm. H.L. and Y.S. analysed the data. H.T., X.H., Q.H., L.W., J.G. and K.X. assisted with the implementation of the study and data analysis. H.L. and Y.S. wrote the paper.

Corresponding authors

Correspondence to Hebing Chen or Xiaochen Bo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Markus List and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of DeepTFni on human PBMC scATAC-seq dataset.

(a). DeepTFni achieves over 0.84 accuracy across different cell types in human PBMC dataset. TF interaction numbers in initial adjacency matrix and prediction matrix are listed. (b). DeepTFni is well performed on the large scATAC-seq dataset, which contains ~800,000 cells from 15 human fetal organs. Blue bar indicates the cell number of each organ. Green bar indicates the number of ATAC peaks filtered in DeepTFni. Yellow bar indicates the running time of DeepTFni for each organ. (c). Jaccard index of interactions before and after masked-positive disturbance. Red line indicates the Jaccard index of DeepTFni prediction results on test set. Blue line indicates the Jaccard index of disturbed train and validate set. (d). Accuracy of DeepTFni prediction in disturbed dataset with different masked-positives proportion. The dashed line represents the accuracy without disturbance. (e). Number of false negatives in disturbed dataset with different masked-positive proportion. (f). Recovered ratios of masked-positives in DeepTFni prediction results. Black dots represents 5 times dataset disturbance.

Extended Data Fig. 2 DeepTFni outperforms other methods.

(a). DeepTFni outperforms other 4 methods. (b). Receiver operator characteristic curves of DeepTFni and other 4 methods. (c). DeepTFni shows better precision and recall rate on total dataset. For each method, the number of predicted TF interactions (links) with network density are listed. (d). Benchmark of running time and memory usage. (e). Visualization of t-SNE analysis on TF degrees calculated by DeepTFni, DeepWalk, GENIE3, SCENIC and GRNBoost2. Colors represent different cell types and shades represent cell number from small (light) to large (dark). Arrow line connects the center nodes for each cell number. (f). Inter cell-type distance of t-SNE results for each cell number. R2 was calculated as Pearson correlation coefficient. (g). Inter cell-type distance of t-SNE results for each cell number. R2 was calculated as Pearson correlation coefficient.

Extended Data Fig. 3 Comparison of cell-type specific TRNs.

(a). Each bar plot showing the distribution of TFs with different degree level, background color indicates TFs of (yellow) high degree or (blue) low degree. (b). 32 TFs with cell-type specific regulatory networks. For each TF, its degree in four cell type are listed. (c). Visualization of GATA3 regulatory networks predicted by DeepTFni. DeepTFni shows distinct cell-type specificity in CD14 + monocytes. (d). UMAP visualization of human PBMC scRNA-seq clusters. (left) Colors represent the different cell types. (right) Colors represent the cells with GATA3 expressed (purple) or with GATA3 silent (grey). The number indicates the proportion of cells with GATA3 expressed in each cell type. (e). Visualization of GATA3 regulatory networks predicted by SCENIC.

Extended Data Fig. 4 The Jaccard index of core TF interactions in PBMCs and MPALs.

The Jaccard index of core TF interactions in PBMCs and MPALs, orange indicates core TFs with TRN of dramatic change, navy blue indicates core TFs with GRN of moderate change.

Supplementary information

Source data

Source Data Fig. 2

Statistical data.

Source Data Fig. 3

Statistical data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Sun, Y., Hong, H. et al. Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks. Nat Mach Intell 4, 389–400 (2022). https://doi.org/10.1038/s42256-022-00469-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00469-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing