Abstract
Machine learning methods, such as Random Forest (RF), have been used to predict disease risk and select a set of single nucleotide polymorphisms (SNPs) associated to the disease on Genome-Wide Association Studies (GWAS). In this study, we extracted information from biological networks for selecting candidate SNPs to be used by RF, for predicting and ranking SNPs by importance measures. From an initial set of genes already related to a disease, we used the tool GeneMANIA for constructing gene interaction networks to find novel genes that might be associated with Alzheimer’s Disease (AD). Therefore, it is possible to extract a small number of SNPs making the application of RF feasible. The experiments conducted in this study focus on investigating which SNPs may influence the susceptibility to AD.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Thies, W., Bleiler, L.: Alzheimers disease facts and figures. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 7, 208–244 (2011)
Wang, W.Y.S., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: theoretical and practical concerns. Nature Reviews. Genetics 6, 109–118 (2005)
Bertram, L., McQueen, M.B., Mullin, K., Blacker, D., Tanzi, R.E.: Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nature Genetics 39, 17–23 (2007)
Saykin, A.J., et al.: Alzheimer’s Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 6, 265–273 (2010)
Petersen, R.C., et al.: Alzheimer’s Disease Neuroimaging Initiative (ADNI) Clinical characterization. Neurology 74, 201–209 (2010)
Kim, S., Misra, A.: SNP genotyping: technologies and biomedical applications. Annual Review of Biomedical Engineering 9, 289–320 (2007)
Montojo, J., Zuberi, K., Rodriguez, H., Kazi, F., Wright, G., Donaldson, S.L., Morris, Q., Bader, G.D.: GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics 26(22), 2927–2928 (2010)
Ritchie, M.D.: Using biological knowledge to uncover the mystery in the search for epistasis in genome-wide association studies. Ann. Hum. Genet. 75(1), 172–182 (2011)
Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C., Morris, Q.: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome. Biol. 9(suppl. 1), S4 (2008)
Goldstein, B.A., Hubbard, A.E., Cutler, A., Barcellos, L.F.: An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings. BMC Genetics 11, 49 (2010)
Lunetta, K.L., Hayward, L.B., Segal, J., Van Eerdewegh, P.: Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 (2004)
Meng, Y.A., Yu, Y., Cupples, L.A., Farrer, L.A., Lunetta, K.L.: Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 10, 78 (2009)
Purcell, S., Neale, B., Todd-Brown, K., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)
Liaw, A., Wiener, M.: Classification and Regression by randomForest. R News 2, 18–22 (2002)
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
Heidema, A.G., Boer, J.M., Nagelkerke, N., Mariman, E.C., van der A, D.L., Feskens, E.J.: The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet. 7, 23 (2006)
Glaser, B., Nikolov, I., Chubb, D., Hamshere, M.L., Segurado, R., Moskvina, V., Holmans, P.: Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests. BMC Proc. 1(suppl. 1), S54 (2007)
Liu, C., Ackerman, H.H., Carulli, J.P.: A genome-wide screen of gene-gene interactions for rheumatoid arthritis susceptibility. Hum. Genet. 129(5), 473–485 (2011)
Sun, Y.V., Cai, Z., Desai, K., Lawrance, R., Leff, R., Jawaid, A., Kardia, S.L., Yang, H.: Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests. BMC Proc. 1(suppl. 1), S62 (2007)
Araujo, G., Costa, I.G., Souza, M., Oliveira, J.R.M.: An Experimental Application of Random Forest on ADNI Genotype Dataset. In: Digital Proceedings of Brazilian Symposium on Bioinformatics, Campo Grande, pp. 68–73. SBC, Porto Alegre (2012)
Di Paolo, G., Kim, T.W.: Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nat. Rev. Neurosci. 12(5), 284–296 (2011)
Hirsch-Reinshagen, V., Burgess, B., Wellington, C.: Why lipids are important for Alzheimer disease? Molecular and Cellular Biochemistry 326(1), 121–129 (2009)
Holtzman, D.M., Herz, J., Bu, G.: Apolipoprotein e and apolipoprotein e receptors: normal biology and roles in Alzheimer disease. Cold Spring Harb. Perspect. Med. 2(3), a006312(2012)
Wu, F., Yao, P.J.: Clathrin-mediated endocytosis and Alzheimer’s disease: an update. Ageing Res. Rev. 8(3), 147–149 (2009)
McMahon, H.T., Boucrot, E.: Molecular mechanism and physiological functions of clathrin-mediated endocytosis. Nat. Rev. Mol. Cell Biol. 12(8), 517–533 (2011)
Chatr-Aryamontri, A., Breitkreutz, B.J., Heinicke, S., Boucher, L., Winter, A., Stark, C., Nixon, J., Ramage, L., Kolas, N., O’Donnell, L., Reguly, T., Breitkreutz, A., Sellam, A., Chen, D., Chang, C., Rust, J., Livstone, M., Oughtred, R., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41(Database issue), D816-D823 (2013)
Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C.L., Serova, N., Davis, S., Soboleva, A.: NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41(Database issue), D991-D995 (2013)
Cerami, E.G., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., Sander, C.: Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39(Database issue), D685-D690 (2011)
Brown, K.R., Jurisica, I.: Online Predicted Human Interaction Database. Bioinformatics 21(9), 2076–2082 (2005)
Bush, W.S., Moore, J.H.: Chapter 11: Genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Araújo, G.S., Souza, M.R.B., Oliveira, J.R.M., Costa, I.G. (2013). Random Forest and Gene Networks for Association of SNPs to Alzheimer’s Disease. In: Setubal, J.C., Almeida, N.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2013. Lecture Notes in Computer Science(), vol 8213. Springer, Cham. https://doi.org/10.1007/978-3-319-02624-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-02624-4_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02623-7
Online ISBN: 978-3-319-02624-4
eBook Packages: Computer ScienceComputer Science (R0)