Skip to main content

An Efficient Nonlinear Regression Approach for Genome-Wide Detection of Marginal and Interacting Genetic Variations

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9029))

  • 2839 Accesses

Abstract

Genome-wide association studies have revealed individual genetic variants associated with phenotypic traits such as disease risk and gene expressions. However, detecting pairwise interaction effects of genetic variants on traits still remains a challenge due to a large number of combinations of variants (\(\sim 10^{11}\) SNP pairs in the human genome), and relatively small sample sizes (typically \(< 10^{4}\)). Despite recent breakthroughs in detecting interaction effects, there are still several open problems, including: (1) how to quickly process a large number of SNP pairs, (2) how to distinguish between true signals and SNPs/SNP pairs merely correlated with true signals, (3) how to detect non-linear associations between SNP pairs and traits given small sample sizes, and (4) how to control false positives? In this paper, we present a unified framework, called SPHINX, which addresses the aforementioned challenges. We first propose a piecewise linear model for interaction detection because it is simple enough to estimate model parameters given small sample sizes but complex enough to capture non-linear interaction effects. Then, based on the piecewise linear model, we introduce randomized group lasso under stability selection, and a screening algorithm to address the statistical and computational challenges mentioned above. In our experiments, we first demonstrate that SPHINX achieves better power than existing methods for interaction detection under false positive control. We further applied SPHINX to late-onset Alzheimer’s disease dataset, and report 16 SNPs and 17 SNP pairs associated with gene traits. We also present a highly scalable implementation of our screening algorithm which can screen \(\sim \) 118 billion candidates of associations on a 60-node cluster in \(<{}5.5\) hours. SPHINX is available at http://www.cs.cmu.edu/\(\sim \)seunghak/SPHINX/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bach, F.R.: Consistency of the group lasso and multiple kernel learning. The Journal of Machine Learning Research 9, 1179–1225 (2008)

    MATH  MathSciNet  Google Scholar 

  2. Becker, K.G., Barnes, K.C., Bright, T.J., Wang, S.A.: The genetic association database. Nature Genetics 36(5), 431–432 (2004)

    Article  Google Scholar 

  3. Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. The Annals of Statistics 41(3), 1111–1141 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bodmer, W.F., Bodmer, J.G.: Evolution and function of the hla system. British Medical Bulletin 34(3), 309–316 (1978)

    Google Scholar 

  5. Bretscher, O.: Linear algebra with applications. Prentice Hall Eaglewood Cliffs, NJ (1997)

    MATH  Google Scholar 

  6. Bühlmann, P., Rütimann, P., van de Geer, S., Zhang, C.: Correlated variables in regression: clustering and sparse estimation. Journal of Statistical Planning and Inference (2013)

    Google Scholar 

  7. Cagniard, B., Balsam, P.D., Brunner, D., Zhuang, X.: Mice with chronically elevated dopamine exhibit enhanced motivation, but not learning, for a food reward. Neuropsychopharmacology 31(7), 1362–1370 (2005)

    Article  Google Scholar 

  8. Evans, D.M., Marchini, J., Morris, A.P., Cardon, L.R.: Two-stage two-locus models in genome-wide association. PLoS Genetics 2(9), e157 (2006)

    Article  Google Scholar 

  9. Fan, J., Feng, Y., Song, R.: Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association 106(494), 544–557 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  10. Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(5), 849–911 (2008)

    Article  MathSciNet  Google Scholar 

  11. Foradori, C.D., Goodman, R.L., Adams, V.L., Valent, M., Lehman, M.N.: Progesterone increases dynorphin a concentrations in cerebrospinal fluid and preprodynorphin messenger ribonucleic acid levels in a subset of dynorphin neurons in the sheep. Endocrinology 146(4), 1835–1842 (2005)

    Article  Google Scholar 

  12. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. The Annals of Applied Statistics 1(2), 302–332 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  13. Gerfen, C.R., Engber, T.M., Mahan, L.C., Susel, Z., Chase, T.N., Monsma, F.J., Sibley, D.R., Sibley, D.R.: D1 and d2 dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. Science 250(4986), 1429–1432 (1990)

    Google Scholar 

  14. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  15. Guerini, F.R., Tinelli, C., Calabrese, E., Agliardi, C., Zanzottera, M., De Silvestri, A., Franceschi, M., Grimaldi, L.M., Nemni, R., Clerici, M.: HLA-A*01 is associated with late onset of Alzheimer’s disease in italian patients. International Journal of Immunopathology and Pharmacology 22, 991–999 (2009)

    Google Scholar 

  16. Hoffman, G.E., Logsdon, B.A., Mezey, J.G.: PUMA: A unified framework for penalized multiple regression analysis of gwas data. PLoS Computational Biology 9(6), e1003101 (2013)

    Article  Google Scholar 

  17. Kambadur, P., Gupta, A., Ghoting, A., Avron, H., Lumsdaine, A.: PFunc: modern task parallelism for modern high performance computing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, p. 43. ACM (2009)

    Google Scholar 

  18. Kim, S., Xing, E.P.: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genetics 5(8), e1000587 (2009)

    Article  Google Scholar 

  19. Lee, S., Xing, E.P.: Leveraging input and output structures for joint mapping of epistatic and marginal eqtls. Bioinformatics 28(12), i137–i146 (2012)

    Article  Google Scholar 

  20. Lehmann, D.J., Barnardo, M.C., Fuggle, S., Quiroga, I., Sutherland, A., Warden, D.R., Barnetson, L., Horton, R., Beck, S., Smith, A.D.: Replication of the association of HLA-B7 with Alzheimer’s disease: a role for homozygosity? Journal of Neuroinflammation 3(1), 33 (2006)

    Article  Google Scholar 

  21. Lehmann, D.J., et al.: HLA class I, II & III genes in confirmed late-onset Alzheimer’s disease. Neurobiology of Aging 22(1), 71–77 (2001)

    Article  Google Scholar 

  22. Li, C., Li, M.: GWAsimulator: a rapid whole-genome simulation program. Bioinformatics 24(1), 140–142 (2008)

    Article  Google Scholar 

  23. Li, J., Zhu, M., Manning-Bog, A.B., Di Monte, D.A., Fink, A.L.: Dopamine and l-dopa disaggregate amyloid fibrils: implications for parkinson’s and Alzheimer’s disease. The FASEB Journal 18(9), 962–964 (2004)

    Google Scholar 

  24. Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University (2009)

    Google Scholar 

  25. Liu, J., Ye, J.: Moreau-yosida regularization for grouped tree structure learning. Advances in Neural Information Processing Systems 187, 195–207 (2010)

    Google Scholar 

  26. Maggioli, E., Boiocchi, C., Zorzetto, M., Sinforiani, E., Cereda, C., Ricevuti, G., Cuccia, M.: The human leukocyte antigen class III haplotype approach: new insight in Alzheimer’s disease inflammation hypothesis. Current Alzheimer Research 10(10), 1047–1056 (2013)

    Article  Google Scholar 

  27. Meinshausen, N., Bühlmann, P.: Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72(4), 417–473 (2010)

    Article  MathSciNet  Google Scholar 

  28. Meinshausen, N., Meier, L., Bühlmann, P.: P-values for high-dimensional regression. Journal of the American Statistical Association 104(488), 1671–1681 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  29. Message Passing Interface Forum. MPI (June 1995). http://www.mpi-forum.org/

  30. Message Passing Interface Forum. MPI-2 (July 1997). http://www.mpi-forum.org/

  31. Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4), 445–455 (2010)

    Article  Google Scholar 

  32. Nyholt, D.R.: A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. The American Journal of Human Genetics 74(4), 765–769 (2004)

    Article  Google Scholar 

  33. Park, M., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9(1), 30–50 (2008)

    Article  MATH  Google Scholar 

  34. Payami, H., et al.: Evidence for association of HLA-A2 allele with onset age of Alzheimer’s disease. Neurology 49(2), 512–518 (1997)

    Article  Google Scholar 

  35. Purcell, S., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81(3), 559–575 (2007)

    Article  Google Scholar 

  36. Rakitsch, B., Lippert, C., Stegle, O., Borgwardt, K.: A lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29(2), 206–214 (2013)

    Article  Google Scholar 

  37. Wan, X., Yang, C., Yang, Q., Xue, H., Fan, X., Tang, N.L.S., Yu, W.: BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. American Journal of Human Genetics 87(3), 325 (2010)

    Article  Google Scholar 

  38. Wasserman, L., Roeder, K.: High dimensional variable selection. Annals of Statistics 37(5A), 2178 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  39. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1), 49–67 (2005)

    Article  MathSciNet  Google Scholar 

  40. Zhang, B., et al.: Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer disease. Cell 153(3), 707–720 (2013)

    Google Scholar 

  41. X. Zhang, F. Zou, and W. Wang. FastANOVA: an efficient algorithm for genome-wide association study. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 821–829. ACM (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric P. Xing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lee, S., Lozano, A., Kambadur, P., Xing, E.P. (2015). An Efficient Nonlinear Regression Approach for Genome-Wide Detection of Marginal and Interacting Genetic Variations. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16706-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16705-3

  • Online ISBN: 978-3-319-16706-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics