Skip to main content
Log in

Finding susceptible and protective interaction patterns in large-scale genetic association study

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Interaction detection in large-scale genetic association studies has attracted intensive research interest, since many diseases have complex traits. Various approaches have been developed for finding significant genetic interactions. In this article, we propose a novel framework SRMiner to detect interacting susceptible and protective genotype patterns. SRMiner can discover not only probable combination of single nucleotide polymorphisms (SNPs) causing diseases but also the corresponding SNPs suppressing their pathogenic functions, which provides a better prospective to uncover the underlying relevance between genetic variants and complex diseases. We have performed extensive experiments on several real Wellcome Trust Case Control Consortium (WTCCC) datasets. We use the pathway-based and the protein-protein interaction (PPI) network-based evaluation methods to verify the discovered patterns. The results show that SRMiner successfully identifies many disease-related genes verified by the existing work. Furthermore, SRMiner can also infer some uncomfirmed but highly possible disease-related genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Li J, Wang L M, Guo M Z, Zhang R J, Dai Q G, Liu X Y, Wang C Y, Teng Z, Xuan P, Zhang M M. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information. FEBS Open Bio, 2015, 5(1): 251–256

    Article  Google Scholar 

  2. Cordell H J. Detecting gene-gene interactions that underlie human diseases. Natural Reviews Genetics, 2009, 10(6): 392–404

    Article  Google Scholar 

  3. Zeng X X, Zhang X, Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Briefings in Bioinformatics, 2015

    Google Scholar 

  4. Zou Q, Li J J, Song L, Zeng X X, Wang G H. Similarity computation strategies in the microRNA-disease network: a survey. Briefings in Functional Genomics, 2016, 15(1): 55–64

    Google Scholar 

  5. Zhang L, Chen S C, Liu X J. Detecting differential expression from RNA-seq data with expression measurement uncertainty. Frontiers of Computer Science, 2015, 9(4): 652–663

    Article  Google Scholar 

  6. Shang J L, Zhang J Y, Sun Y, Liu D, Ye D J, Yin Y L. Performance analysis of novel methods for detecting epistasis. BMC Bioinformatics, 2011, 12(1)

    Google Scholar 

  7. Wang Y, Liu G M, Feng M L, Wong L. An empirical comparison of several recent epistatic interaction detection methods. Bioinformatics, 2011, 27(21): 2936–2943

    Article  Google Scholar 

  8. Li P, Guo M Z, Wang C Y, Liu X Y, Zou Q. An overview of SNP interactions in genome-wide association studies. Briefings in Functional Genomics, 2014, 14(3): 129–141

    Google Scholar 

  9. Li J, Huang D L, Guo M Z, Liu X Y, Wang C Y, Teng Z X, Zhang R J, Jiang Y S, Lv H C, Wang L M. A gene-based information gain method for detecting gene-gene interactions in case-control studies. European Journal of Human Genetics, 2015

    Google Scholar 

  10. Pan J B, Hu S C, Wang H, Zou Q, Ji Z L. PaGeFinder: quantitative identification of spatiotemporal pattern genes. Bioinformatics, 2012, 28(11): 1544–1545

    Article  Google Scholar 

  11. Infante J, Sanz C, Fernández-Luna J L, Llorca J, Berciano J, Combarros O. Gene-gene interaction between interleukin-1A and interleukin-8 increases Alzheimer’s disease risk. Journal of Neurology, 2004, 251(4): 482–483

    Article  Google Scholar 

  12. Combarros O, van Duijn C M, Hammond N, Belbin O, Arias-Vásquez A, Cortina-Borja M, Lehmann M G, Aulchenko Y S, Schuur M, Kölsch H. Replication by the Epistasis Project of the interaction between the genes for IL-6 and IL-10 in the risk of Alzheimer’s disease. Journal of Neuroinflammation, 2009, 6(1): 22

    Article  Google Scholar 

  13. Baryshnikova A, Costanzo M, Myers C L, Andrews B, Boone C. Genetic interaction networks: toward an understanding of heritability. Annual Review of Genomics and Human Genetics, 2013, 14(1)

    Google Scholar 

  14. Goldstein D B. Common genetic variation and human traits. New England Journal of Medicine, 2009, 360(17): 1696

    Article  Google Scholar 

  15. McCarthy M I, Abecasis G R, Cardon L R, Goldstein D B, Little J, Ioannidis J P A, Hirschhorn J N. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics, 2008, 9(5): 356–369

    Article  Google Scholar 

  16. Moore J H, Gilbert J C, Tsai C T, Chiang F T, Holden T, Barney N, White B C. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology, 2006, 241(2): 252–261

    Article  MathSciNet  Google Scholar 

  17. Wan X, Yang C, Yang Q, Xue H, Fan X D, Tang N L S, Yu W C. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics, 2010, 87(3): 325–340

    Article  Google Scholar 

  18. Wan X, Yang Can, Yang Q, Xue H, Tang N L S, Yu W C. Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics, 2010, 26(1): 30–37

    Article  Google Scholar 

  19. Zhang Y, Liu J S. Bayesian inference of epistatic interactions in casecontrol studies. Nature Genetics, 2007, 39(9): 1167–1173

    Article  Google Scholar 

  20. Zhang X, Huang S P, Zou F, Wang W. TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics, 2010, 26(12): i217–i227

    Article  Google Scholar 

  21. Janssens A C J W, vans Duijn C M. Genome-based prediction of common diseases: advances and prospects. Human Molecular Genetics, 2008, 17(R2): R166–R173

    Article  Google Scholar 

  22. Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433–459

    Article  Google Scholar 

  23. Zhao Y H, Wang G R, Li Y, Wang Z H. Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules. In: Proceedings of IEEE International Conference on Data Mining. 2011, 972–981

    Google Scholar 

  24. Montgomery S. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics, 2008, 9(6): 477–485

    Article  Google Scholar 

  25. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A R, Bender D, Maller J, Sklar P, de Bakker P I W, Daly M J, Sham P C. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 2007, 81(3): 559–575

    Article  Google Scholar 

  26. Goldberg A V. Finding a maximum density subgraph. University of California Berkeley, CA, 1984

    Google Scholar 

  27. Charikar M. Greedy approximation algorithms for finding dense components in a graph. Approximation Algorithms for Combinatorial Optimization, 2000, 139–152

    Google Scholar 

  28. Fan W, Zhang K, Cheng H, Gao J, Yan X F, Han JW, Yu P, Verscheure O. Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 230–238

    Google Scholar 

  29. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14, 000 cases of seven common diseases and 3, 000 shared controls. Nature, 2007, 447(7145): 661–678

    Article  Google Scholar 

  30. Han J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000, 29(2): 1–12

    Article  Google Scholar 

  31. Pan F, Cong G, Tung A K H, Yang J, Zaki M J. Carpenter: finding closed patterns in long biological datasets. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. 2003, 637–642

    Google Scholar 

  32. Saccone S F, Quan J X, Jones P L. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model. Bioinformatics, 2012, 28(8): 1189–1191

    Article  Google Scholar 

  33. Chatr-aryamontri A, Breitkreutz B J, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O’Donmell L. The BioGRID interaction database: 2013 update. Nucleic Acids Research, 2013, 41(D1): D816–D823

    Article  Google Scholar 

  34. Wang K, Li M Y, Bucan M. Pathway-based approaches for analysis of genomewide association studies. The American Journal of Human Genetics, 2007, 81(6): 1278–1283

    Article  Google Scholar 

  35. Chen L S, Hutter CM, Potter J D, Liu Y, Prentice R L, Peters U, Hsu L. Insights into colon cancer etiology via a regularized approach to gene set analysis of gwas data. The American Journal of Human Genetics, 2010, 86(6): 860–871

    Article  Google Scholar 

  36. Li M X, Kwan J S H, Sham P C. HYST: A hybrid set-based test for genome-wide association studies, with application to protein-protein interaction-based association analysis. The American Journal of Human Genetics, 2012, 91(3): 478–488

    Article  Google Scholar 

  37. Pawson T, Nash P. Protein–protein interactions define specificity in signal transduction. Genes & Development, 2000, 14(9): 1027–1047

    Google Scholar 

  38. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Molecular Systems Biology, 2007, 3(1): 88

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant Nos. 61272182, 61100028, and 61173029), New Century Excellent Talents (NCET-11-0085), Key Program of National Natural Science of China (61332014, U1401256), the Fundamental Research Funds for the Central Universities (130504001, 150402002), National Science Foundation (USA) (IIS-1218036, and IIS-1162374), and CAREER National Institutes of Health (USA) (R01 HG003053, and R01 GM115833). This work was finished during Yuan Li visiting Case Western Reserve University supervised by Dr. Xiang Zhang and supported by China Scholarship Council (CSC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhai Zhao.

Additional information

Yuan Li received his BE and ME degrees in computer science from Northeastern University (NEU), China in 2009 and 2011, respectively. Currently he is a PhD candidate in computer science, NEU. His major research interests include data mining and bioinformatics.

Yuhai Zhao received his BE, ME and PhD degrees in computer science from Northeastern University (NEU), China in 1999, 2004 and 2007, respectively. Currently he is an associate professor in the School of Information Science and Engineering, NEU. He is a member of IEEE ACM, and a member of CCF. His major research interests include data mining and bioinformatics.

Guoren Wang received his BS, MS and PhD degrees in computer science from Northeastern University (NEU), China in 1988, 1991 and 1996, respectively. Currently he is a professor in the School of Information Science and Engineering, NEU. His major research interests are XML data management, query processing and optimization, high-dimensional indexing, parallel database systems, P2P data management and uncertain data management.

Xiaofeng Zhu is currently a professor in the Department of Epidemiology and Biostatistics at Case Western Reserve University (CWRU), USA. He received his PhD degree in Biostatistics, Epidemiology and Biostatistics from CWRU in 1998. His research interests include genetic mapping studies of hypertension, obesity; development of statistical methods for association studies avoiding the effect of population stratification; admixture mapping.

Xiang Zhang is currently the T&D Schroeder assistant professor in the Department of Electrical Engineering and Computer Science at CaseWestern Reserve University, USA. He received his PhD degree in computer science from the University of North Carolina at Chapel Hill, USA in 2011. His research bridges the areas of data mining, database and bioinformatics.

Zhanghui Wang received his BE degree in computer science from Shenyang Institute of Aeronautical Engineering, China in 2007, and his ME degree in computer science from Northeastern University (NEU), China in 2010. Currently he is a PhD candidate in computer science, NEU. His major research interests include data mining and bioinformatics.

Jun Pang received his MS degree in computer science and technology from Wuhan University of Technology, China in 2010. Currently, he is a PhD candidate of computer software and theory in Northeastern University, China. His research interests include cloud computing and similarity join.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhao, Y., Wang, G. et al. Finding susceptible and protective interaction patterns in large-scale genetic association study. Front. Comput. Sci. 11, 541–554 (2017). https://doi.org/10.1007/s11704-016-5300-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5300-5

Keywords

Navigation