Skip to main content

Gene Set Priorization Guided by Regulatory Networks with p-values through Kernel Mixed Model

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2022)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13278))

Abstract

The transcriptome association study has helped prioritize many causal genes for detailed study and thus further helped the development of many therapeutic strategies for multiple diseases. How- ever, prioritizing the causal gene only does not seem always to be able to offer sufficient guidance to the downstream analysis. Thus, in this paper, we propose to perform the association studies from another perspective: we aim to prioritize genes with a tradeoff between the pursuit of the causality evidence and the interest of the genes in the pathway. We introduce a new method for transcriptome association study by incorporating the information of gene regulatory networks. In addition to directly building the regularization into variable selection methods, we also expect the method to report p-values of the associated genes so that these p-values have been empirically proved trustworthy by geneticists. Thus, we introduce a high-dimension variable selection method with the following two merits: it has a flexible modeling power that allows the domain experts to consider the structure of covariates so that prior knowledge, such as the gene regulatory network, can be integrated; it also calculates the p-value, with a practical manner widely accepted by geneticists, so that the identified covariates can be directly assessed with statistical guarantees. With simulations, we demonstrate the empirical strength of our method against other high-dimension variable selection methods. We further apply our method to Alzheimer’s disease, and our method identifies interesting sets of genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Barbeira, A.N., et al.: Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9(1), 1–20 (2018)

    Google Scholar 

  • Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B (Methodological) 57, 289–300 (1995)

    Google Scholar 

  • Bertram, L., Tanzi, R.E.: Genome-wide association studies in alzheimer’s disease. Hum. Mol. Genet. 18(R2), R137–R145 (2009)

    Article  Google Scholar 

  • Bozzo, E.: The moore-penrose inverse of the normalized graph laplacian. Linear Algebra Appl. 439(10), 3038–3043 (2013)

    Article  MathSciNet  Google Scholar 

  • Bozzo, E., Franceschet, M.: Approximations of the generalized inverse of the graph laplacian matrix. Internet Math. 8(4), 456–481 (2012)

    Article  MathSciNet  Google Scholar 

  • Bühlmann, P.: Statistical significance in high-dimensional linear models. Bernoulli 19(4), 1212–1242 (2013)

    Article  MathSciNet  Google Scholar 

  • Cairns, N.J., Lee, V.M.-Y., Trojanowski, J.Q.: The cytoskeleton in neurodegenerative diseases. J. Pathol. J. Pathol. Soc. Great Britain Ireland 204(4), 438–449 (2004)

    Google Scholar 

  • Crawford, L., Zeng, P., Mukherjee, S., Zhou, X.: Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet. 13(7), e1006869 (2017)

    Google Scholar 

  • de Leeuw, C.A., Mooij, J.M., Heskes, T., Posthuma, D.: Magma: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11(4), e1004219 (2015)

    Google Scholar 

  • Dhanwani, R., et al.: T cell responses to neural autoantigens are similar in alzheimer’s disease patients and age-matched healthy controls. Front. Neurosci. 14, 874 (2020)

    Google Scholar 

  • Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)

    Article  Google Scholar 

  • Efthymiou, A.G., Goate, A.M.: Late onset alzheimer’s disease genetics implicates microglial pathways in disease risk. Mol. Neurodegener. 12(1), 1–12 (2017)

    Article  Google Scholar 

  • Fan, L., et al.: New insights into the pathogenesis of alzheimer’s disease. Front. Neurol. 10, 1312 (2020)

    Google Scholar 

  • Feng, H., et al.: Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies. PLoS Genet. 17(4), e1008973 (2021)

    Google Scholar 

  • Feng, H., Mancuso, N., Pasaniuc, B., Kraft, P.: Multitrait transcriptome-wide association study (TWAS) tests. Genetic Epidemiol. 108, 240–256 (2021b)

    Google Scholar 

  • Gamazon, E.R., et al.: A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47(9), 1091–1098 (2015)

    Google Scholar 

  • González-Reyes, R.E., Nava-Mesa, M.O., Vargas-Sánchez, K., Ariza-Salamanca, D., Mora-Muñoz, L.: Involvement of astrocytes in alzheimer’s disease from a neuroinflammatory and oxidative stress perspective. Front. Mol. Neurosci. 10, 427 (2017)

    Google Scholar 

  • Gusev, A., et al.: Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48(3), 245–252 (2016)

    Google Scholar 

  • Heckerman, D.: Accounting for hidden common causes when inferring cause and effect from observational data. arXiv:1801.00727 (2018)

  • Hemonnot, A.-L., Hua, J., Ulmann, L., Hirbec, H.: Microglia in alzheimer disease: well-known targets and new opportunities. Front. Aging Neurosci. 11, 233, e1004219 (2019)

    Google Scholar 

  • Huang, J., Ma, S., Zhang,C.-H.: Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica 18, 1603–1618 (2008)

    Google Scholar 

  • Jacobs, H.I., et al.: The cerebellum in alzheimer’s disease: evaluating its role in cognitive decline. Brain 141(1), 37–47 (2018)

    Google Scholar 

  • Javanmard, A., Montanari, A.: Hypothesis testing in high-dimensional regression under the gaussian random design model: asymptotic theory. IEEE Trans. Inf. Theory 60(10), 6522–6554, e1004219 (2014)

    Google Scholar 

  • Jones, S.V., Kounatidis, I.: Nuclear factor-kappa B and alzheimer disease, unifying genetic and environmental risk factors from cell to humans. Front. Immunol. 8, 1805 (2017)

    Google Scholar 

  • Kang, H.M., et al.: Efficient control of population structure in model organism association mapping. Genetics 178(3), 1709–1723 (2008)

    Google Scholar 

  • Kang, H.M., et al.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42(4), 348–354 (2010)

    Google Scholar 

  • Kapoor, A., Nation, D.A.: Role of notch signaling in neurovascular aging and alzheimer’s disease. In: Seminars in Cell and Developmental Biology. Elsevier (2020)

    Google Scholar 

  • Kim, S., Xing, E.P.: Tree-guided group lasso for multi-task regression with structured sparsity (2010)

    Google Scholar 

  • Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9), 1175–1182 (2008). ISSN: 1367–4803. https://doi.org/10.1093/bioinformatics/btn081

  • Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., Heckerman, D.: Fast linear mixed models for genome-wide association studies. Nat. Methods 8(10), 833–835 (2011)

    Google Scholar 

  • Lockhart, R., Taylor, J., Tibshirani, R.J., Tibshirani, R.: A significance test for the lasso. Ann. Stat. 42(2), 413 (2014)

    Google Scholar 

  • Lonsdale, J., et al.: The genotype-tissue expression (GTEX) project. Nat. Genet. 45(6), 580–585 (2013)

    Google Scholar 

  • Maldonado, Y.M.: Mixed models, posterior means and penalized least-squares. Lecture Notes-Monograph Series, pp. 216–236 (2009)

    Google Scholar 

  • Masters, C.L., Bateman, R., Blennow, K., Rowe, C.C., Sperling, R.A., Jeffrey, L.: Cummings 2015. “alzheimer’s disease”. Nature Reviews Disease Primers (2015). https://doi.org/10.1038/nrdp

  • Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 72(4), 417–473, e1004219 (2010)

    Google Scholar 

  • Murpy, M., LeVine III, H.,: Alzheimer’s disease and the \(\beta \)-amyloid peptide. J. Alzheimers Dis. 19(1), 311–323 (2010)

    Google Scholar 

  • Niikura, T., Tajima, H., Kita, Y.: Neuronal cell death in alzheimer’s disease and a neuroprotective factor, humanin. Curr. Neuropharmacol. 4(2), 139–147 (2006)

    Google Scholar 

  • Oughtred, R., et al.: The biogrid interaction database: 2019 update. Nucleic Acids Res. 47(D1), D529–D541 (2018)

    Google Scholar 

  • Perez-Nievas, B.G., Serrano-Pozo, A.: Deciphering the astrocyte reaction in alzheimer’s disease. Front. Aging Neurosci. 10, 114, e1004219 (2018)

    Google Scholar 

  • Petersen, K.B., Pedersen, M.S., et al.: The matrix cookbook. Tech. Univ. Denmark 7(15), 510, e1004219 (2008)

    Google Scholar 

  • Pontén, F., Jirström, K., Uhlén, M.: The human protein atlas-a tool for pathology. J. Pathol. J. Pathol. Soc. Great Britain Ireland 216(4), 387–393, e1004219 (2008)

    Google Scholar 

  • Puniyani, K., Kim, S., Xing, E.P.: Multi-population GWA mapping via multi-task regularized regression. Bioinformatics 26(12), i208–i216, e1004219 (2010)

    Google Scholar 

  • Sadigh-Eteghad, S., Sabermarouf, B., Majdi, A., Talebi, M., Farhoudi, M., Mahmoudi, J.: Amyloid-beta: a crucial factor in alzheimer’s disease. Med. Princ. Pract. 24(1), 1–10 (2015)

    Google Scholar 

  • Safran, M., et al.: Genecards version 3: the human gene integrator. Database 2010 (2010)

    Google Scholar 

  • Salat, D.H., Kaye, J.A., Janowsky, J.S.: Selective preservation and degeneration within the prefrontal cortex in aging and alzheimer disease. Arch. Neurol. 58(9), 1403–1408 (2001)

    Article  Google Scholar 

  • Subramanian, J., Savage, J.C., Tremblay, M.È.: Synaptic loss in alzheimer’s disease: mechanistic insights provided by two-photon in vivo imaging of transgenic mouse models. Front. Cell. Neurosci. 14, 445 (2020)

    Google Scholar 

  • Thompson, W.A., et al.: The problem of negative estimates of variance components. Ann. Math. Stat. 33(1), 273–289 (1962)

    Google Scholar 

  • Tosto, G., Reitz, C.: Genome-wide association studies in alzheimer’s disease: a review. Curr. Neurol. Neurosci. Rep. 13(10), 381 (2013)

    Google Scholar 

  • Town, T., Tan, J., Flavell, R.A., Mullan, M.: T-cells in alzheimer’s disease. NeuroMol. Med. 7(3), 255–264 (2005)

    Google Scholar 

  • Uffelmann, E., et al.: Genome-wide association studies. Nat. Rev. Methods Primers 1(1), 1–21 (2021)

    Google Scholar 

  • Vagnucci, A.H., Jr., Li, W.W.: Alzheimer’s disease and angiogenesis. Lancet 361(9357), 605–608, e1004219 (2003)

    Google Scholar 

  • Van Mieghem, P., Devriendt, K., Cetinay, H.: Pseudoinverse of the Laplacian and best spreader node in a network. Phys. Rev. E 96(3), 032311 (2017)

    Google Scholar 

  • Visscher, P.M., et al.: 10 years of gwas discovery: biology, function, and translation. Am. J. Hum. Genet. 101(1), 5–22, e1004219 (2017)

    Google Scholar 

  • Wainberg, M., et al.: Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51(4), 592–599 (2019)

    Google Scholar 

  • Wang, H., Lengerich, B.J., Aragam, B., Xing, E.P.: Precision lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics 35(7), 1181–1187 (2018)

    Google Scholar 

  • Wang, H., Yue, T., Yang, J., Wu, W., Xing, E.P.: Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies. BMC Bioinf. 20(23), 1–11, e1004219 (2019)

    Google Scholar 

  • Wang, H., Aragam, B., Xing, E.P.: Tradeoffs of linear mixed models in genome-wide association studies. J. Comput. Biol. (2022). (to appear)

    Google Scholar 

  • Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M., Price, A.L.: Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46(2), 100–106 (2014)

    Google Scholar 

  • Yiannopoulou, K.G., Papageorgiou, S.G.: Current and future treatments in alzheimer disease: an update. J. Central Nerv. Syst. Dis. 12, 1179573520907397, e1004219 (2020)

    Google Scholar 

  • Zetterberg, H., Mattsson, N.: Understanding the cause of sporadic alzheimer’s disease. Expert Rev. Neurother. 14(6), 621–630 (2014)

    Google Scholar 

  • Zhang, B., et al.: Integrated systems approach identifies genetic nodes and networks in late-onset alzheimer’s disease. Cell 153(3), 707–720 (2013)

    Google Scholar 

  • Zhang, C.-H., Zhang, S.S.: Confidence intervals for low dimensional parameters in high dimensional linear models. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 76(1):217–242 (2014). https://doi.org/10.2307/24772752

  • Zhang, Z., et al.: Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42(4), 355–360 (2010)

    Google Scholar 

  • Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)

    Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320, e1004219 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Wu or Eric P. Xing .

Editor information

Editors and Affiliations

Appendices

A Additional Simulation Experiments

Different Strengths of the Regulation. Further, we study how the strength of regulation will affect the performances of our methods, and we model this shift of strength with variations of the parameter r in the data generation process, while the rest of the configurations remain the same as the data generation process. Also, we continue to focus on the intermediate level of the previous example where we set \(v=16\).

Fig. 2.
figure 2

ROC curve of competing methods over different regulation strength of the TF.

Similarly, we repeat the experiments three times and plot the ROC curve with standard deviation plotted as the shady areas in Fig. 2.

As Fig. 2 shows, our method is on par with previous hypothesis testing methods over most correlation levels. When \(r=1\), the regulated genes are distributed in the same way as the TF, although are associated with smaller effect sizes. Both LMM and KMM are good enough to uncover the associated genes in this case. When r is smaller (0.5 or 0.3), the regulated genes are less dependent on the TF, the hypothesis testing methods all perform similarly, probably because that when the regulated genes are more independent from the TF, the network structure does not introduce advantages. However, when \(r=0.7\), the KMM method starts to show a clear advantage over other methods. In summary, our proposed method can outperform other methods when there is a strong correlation between the TF and regulated genes (but not too strong when the regulated genes and TF are identically distributed). We believe this is the most frequently seen scenarios in real-world data. In addition, in other scenarios, our method does not perform worse than other methods, so there is no loss in using our method in general. In fact, if one calculates the area under ROC curve for Fig. 2, our method performs the best in all these four tested scenarios, although the advantages of our method in the other three scenarios are marginal.

Misspecified Network Structure. Finally, as our method is built upon the knowledge of network structure, we are interested in knowing what if the network structure is misspecified since in practice, we may not always be able to obtain a network structure faithful to the underlying regulatory mechanism. To simulate this, we introduce another hyperparameter q in the data generation process. When we generate the network structure N, we drop the edges in the network structure with the probability \(1-q\). The rest configuration of data generation is the same as the general one introduced in the preceding texts.

Fig. 3.
figure 3

ROC curve of competing methods when the prior network is misspecified (the edges of a network is dropped with probability \(1-q\)).

Again, we repeat the experiments three times and plot the ROC curve with standard deviation plotted as the shady areas in Fig. 3.

As Fig. 3 shows, our method is surpringly robust to the misspefication of the prior network structure. When \(q=1\), the input network is faithful to the underlying regulatory network, and the KMM method certainly outperforms the competing methods. Interestingly, the advantages of the KMM method maintain even when half of the edges of the input network are missing (\(q=0.5\)). When \(q=0.3\), which means that 70% of the edges of the underlying regulatory network are missing in the input network for the model, the proposed method start to perform similarly to the previous hypothesis testing methods. Even this case, the calculated area under ROC score of KMM will be higher than those competing methods, although this advantage cannot be observed in the ROC curves.

B Covaraite Regressing

To demonstrate the success correction of these factors, we compared the Spearman’s correction between the expressions and the covariates before and after the correction. Figure 4 shows the comparison of the Spearman’s correlation between the gene expressions and the covariates before and after the regressing across the three different compartments studied in this work, and we can see that the correlation between each genes and the age covaraites drops significantly after the regression.

Fig. 4.
figure 4

The comparison of the Spearman’s correlation between the gene expressions and the covariates before and after the regressing.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, H., Lopez, O.L., Wu, W., Xing, E.P. (2022). Gene Set Priorization Guided by Regulatory Networks with p-values through Kernel Mixed Model. In: Pe'er, I. (eds) Research in Computational Molecular Biology. RECOMB 2022. Lecture Notes in Computer Science(), vol 13278. Springer, Cham. https://doi.org/10.1007/978-3-031-04749-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04749-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04748-0

  • Online ISBN: 978-3-031-04749-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics