ABSTRACT
For genotype-phenotype classification, many methods are used, like polygenic risk scores and deep learning, each using a different computation technique. The performance of each method varies depending on the genetic variation and is measured by accuracy or area under the curve (AUC). This article investigates the relationship between deep learning classifiers and polygenic risk scores performance for genotype-phenotype classification with respect to variation in heritability, genetic variation, and the number of risk SNP (400 different datasets of 5000 people) through extensive computation. These variation helps to find an optimal classifier for a dataset with specific heritability and an expected score for a specific case/control classification.
The deep learning classifier AUC decreases with an increase in heritability, whereas the polygenic risk scores AUC improves. The machine-learning algorithm has low AUC for high genetic variation, but for low genetic variation, AUC is high. PRS tools have the opposite behavior; for high genetic variation, the PRS tools have high AUC compared to low genetic variation data sets.
The article gives a basic template showing deep learning or PRS tools should be used depending on the heritability and genetic variation of the dataset. All the code segments are available publically to generate datasets with different parameters and explore such patterns.
- Z.L. Awdeh and Chester A. Alper. 2005. Mendelian inheritance of polygenic diseases: a hypothetical basis for increasing incidence. Medical Hypotheses 64, 3 (Jan. 2005), 495–498. https://doi.org/10.1016/j.mehy.2004.08.025Google ScholarCross Ref
- Hossein Darvish, Luis J. Azcona, Abbas Tafakhori, Roxana Mesias, Azadeh Ahmadifard, Elena Sanchez, Arman Habibi, Elham Alehabib, Amir Hossein Johari, Babak Emamalizadeh, Faezeh Jamali, Marjan Chapi, Javad Jamshidi, Yuji Kajiwara, and Coro Paisán-Ruiz. 2020. Phenotypic and genotypic characterization of families with complex intellectual disability identified pathogenic genetic variations in known and novel disease genes. Scientific Reports 10, 1 (Jan. 2020). https://doi.org/10.1038/s41598-020-57929-4Google ScholarCross Ref
- Cathy E. Elks, Marcel den Hoed, Jing Hua Zhao, Stephen J. Sharp, Nicholas J. Wareham, Ruth J. F. Loos, and Ken K. Ong. 2012. Variability in the Heritability of Body Mass Index: A Systematic Review and Meta-Regression. Frontiers in Endocrinology 3 (2012). https://doi.org/10.3389/fendo.2012.00029Google Scholar
- Anders Forsman. 2013. Effects of genotypic and phenotypic variation on establishment are important for conservation, invasion, and infection biology. Proceedings of the National Academy of Sciences 111, 1 (Dec. 2013), 302–307. https://doi.org/10.1073/pnas.1317745111Google Scholar
- Tian Ge, Avram J. Holmes, Randy L. Buckner, Jordan W. Smoller, and Mert R. Sabuncu. 2017. Heritability analysis with repeat measurements and its application to resting-state functional connectivity. Proceedings of the National Academy of Sciences 114, 21 (May 2017), 5521–5526. https://doi.org/10.1073/pnas.1700765114Google ScholarCross Ref
- Virginia W Gitonga, Carole FS Koning-Boucoiran, Kathryn Verlinden, Oene Dolstra, Richard GF Visser, Chris Maliepaard, and Frans A Krens. 2014. Genetic variation, heritability and genotype by environment interaction of morphological traits in a tetraploid rose population. BMC Genetics 15, 1 (Dec. 2014). https://doi.org/10.1186/s12863-014-0146-zGoogle ScholarCross Ref
- Yanting Han and Ralph Adolphs. 2020. Estimating the heritability of psychological measures in the Human Connectome Project dataset. PLOS ONE 15, 7 (July 2020), e0235860. https://doi.org/10.1371/journal.pone.0235860Google ScholarCross Ref
- Gareth J Hollands, David P French, Simon J Griffin, A Toby Prevost, Stephen Sutton, Sarah King, and Theresa M Marteau. 2016. The impact of communicating genetic risks of disease on risk-reducing health behaviour: systematic review with meta-analysis. BMJ (March 2016), i1102. https://doi.org/10.1136/bmj.i1102Google Scholar
- Arshad Iqbal, Iftikhar Hussain Khalil, Mehar Ali Shah, and Muhammad Sharif Kakar. 2017. Estimation of Heritability, Genetic Advance and Correlation for Marphological Traits in Spring Wheat. Sarhad Journal of Agriculture 33, 4 (Nov. 2017). https://doi.org/10.17582/journal.sja/2017/33.4.674.679Google ScholarCross Ref
- Joeri A Jansweijer, Karin Y van Spaendonck-Zwarts, Michael W T Tanck, J Peter van Tintelen, Imke Christiaans, Jasper J van der Smagt, Alexa M C Vermeer, J Martijn Bos, Arthur J Moss, Heikki Swan, Sylvia G Priori, Annika Rydberg, Jacob Tfelt-Hansen, Michael J Ackerman, Iacopo Olivotto, Philippe Charron, Juan R Gimeno, Maarten P van den Berg, Arthur AM Wilde, and Yigal M Pinto. 2019. Heritability in genetic heart disease: the role of genetic background. Open Heart 6, 1 (May 2019), e000929. https://doi.org/10.1136/openhrt-2018-000929Google ScholarCross Ref
- Andrew D. Johnson. 2009. Single-Nucleotide Polymorphism Bioinformatics. Circulation: Cardiovascular Genetics 2, 5 (Oct. 2009), 530–536. https://doi.org/10.1161/circgenetics.109.872010Google ScholarCross Ref
- Chandramohanan KT and Neethu Narayanan. 2018. Study of heritability, genetic advance and variability in scoparia dulcis L.Forestry Research and Engineering: International Journal 2, 4 (July 2018). https://doi.org/10.15406/freij.2018.02.00050Google Scholar
- J. Little, L. Bradley, M. S. Bray, M. Clyne, J. Dorman, D. L. Ellsworth, J. Hanson, M. Khoury, J. Lau, T. R. O'Brien, N. Rothman, D. Stroup, E. Taioli, D. Thomas, H. Vainio, S. Wacholder, and C. Weinberg. 2002. Reporting, Appraising, and Integrating Data on Genotype Prevalence and Gene-Disease Associations. American Journal of Epidemiology 156, 4 (Aug. 2002), 300–310. https://doi.org/10.1093/oxfordjournals.aje.a000179Google ScholarCross Ref
- Zhanshan (Sam) Ma, Lianwei Li, and Ya-Ping Zhang. 2020. Defining Individual-Level Genetic Diversity and Similarity Profiles. Scientific Reports 10, 1 (April 2020). https://doi.org/10.1038/s41598-020-62362-8Google ScholarCross Ref
- The Tien Mai, Paul Turner, and Jukka Corander. 2021. Boosting heritability: estimating the genetic component of phenotypic variation with multiple sample splitting. BMC Bioinformatics 22, 1 (March 2021). https://doi.org/10.1186/s12859-021-04079-7Google ScholarCross Ref
- Alexandra J. Mayhew and David Meyre. 2017. Assessing the Heritability of Complex Traits in Humans: Methodological Challenges and Opportunities. Current Genomics 18, 4 (July 2017). https://doi.org/10.2174/1389202918666170307161450Google ScholarCross Ref
- Hannah Verena Meyer and Ewan Birney. 2018. PhenotypeSimulator: A comprehensive framework for simulating multi-trait, multi-locus genotype to phenotype relationships. Bioinformatics 34, 17 (March 2018), 2951–2956. https://doi.org/10.1093/bioinformatics/bty197Google ScholarCross Ref
- David S. Moore and David Shenk. 2016. The heritability fallacy. Wiley Interdisciplinary Reviews: Cognitive Science 8, 1-2 (Dec. 2016), e1400. https://doi.org/10.1002/wcs.1400Google Scholar
- Muhammad Muneeb, Samuel Feng, and Andreas Henschel. 2022. An empirical comparison between polygenic risk scores and machine learning for case/control classification. (Feb. 2022). https://doi.org/10.21203/rs.3.rs-1298372/v1Google Scholar
- Muhammad Muneeb and Andreas Henschel. 2021. Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics 22, 1 (April 2021). https://doi.org/10.1186/s12859-021-04077-9Google Scholar
- Virginie Orgogozo, Baptiste Morizot, and Arnaud Martin. 2015. The differential view of genotype–phenotype relationships. Frontiers in Genetics 6 (May 2015). https://doi.org/10.3389/fgene.2015.00179Google Scholar
- Francis Robert and Jerry Pelletier. 2018. Exploring the Impact of Single-Nucleotide Polymorphisms on Translation. Frontiers in Genetics 9 (Oct. 2018). https://doi.org/10.3389/fgene.2018.00507Google Scholar
- S A Saidon, R Kamaruzaman, M S F A Razak, A Ramli, H M Sarif, Z M Zuki, S N A Rahman, T Devarajan, and E Sunian. 2020. Studies on heritability and genetic variability for grain physical properties in Malaysian rice germplasm. IOP Conference Series: Earth and Environmental Science 482, 1 (March 2020), 012022. https://doi.org/10.1088/1755-1315/482/1/012022Google ScholarCross Ref
- H.C. Slavkin. 2014. From Phenotype to Genotype. Journal of Dental Research 93, 7_suppl (May 2014), 3S–6S. https://doi.org/10.1177/0022034514533569Google ScholarCross Ref
- Lingtao Su, Guixia Liu, Han Wang, Yuan Tian, Zhihui Zhou, Liang Han, and Lun Yan. 2015. Research on Single Nucleotide Polymorphisms Interaction Detection from Network Perspective. PLOS ONE 10, 3 (March 2015), e0119146. https://doi.org/10.1371/journal.pone.0119146Google Scholar
- Albert Tenesa and Chris S. Haley. 2013. The heritability of human disease: estimation, uses and abuses. Nature Reviews Genetics 14, 2 (Jan. 2013), 139–149. https://doi.org/10.1038/nrg3377Google ScholarCross Ref
- Eva Vallejos-Vidal, Sebastián Reyes-Cerpa, Jaime Andrés Rivas-Pardo, Kevin Maisey, José M. Yáñez, Hector Valenzuela, Pablo A. Cea, Victor Castro-Fernandez, Lluis Tort, Ana M. Sandino, Mónica Imarai, and Felipe E. Reyes-López. 2020. Single-Nucleotide Polymorphisms (SNP) Mining and Their Effect on the Tridimensional Protein Structure Prediction in a Set of Immunity-Related Expressed Sequence Tags (EST) in Atlantic Salmon (Salmo salar). Frontiers in Genetics 10 (Feb. 2020). https://doi.org/10.3389/fgene.2019.01406Google Scholar
- Zhi Wei, Kai Wang, Hui-Qi Qu, Haitao Zhang, Jonathan Bradfield, Cecilia Kim, Edward Frackleton, Cuiping Hou, Joseph T. Glessner, Rosetta Chiavacci, Charles Stanley, Dimitri Monos, Struan F. A. Grant, Constantin Polychronakos, and Hakon Hakonarson. 2009. From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes. PLoS Genetics 5, 10 (Oct. 2009), e1000678. https://doi.org/10.1371/journal.pgen.1000678Google ScholarCross Ref
- Y.E. Willems, N. Boesen, J. Li, C. Finkenauer, and M. Bartels. 2019. The heritability of self-control: A meta-analysis. Neuroscience & Biobehavioral Reviews 100 (May 2019), 324–334. https://doi.org/10.1016/j.neubiorev.2019.02.012Google Scholar
- Charles S Wondji, Janet Hemingway, and Hilary Ranson. 2007. Identification and analysis of Single Nucleotide Polymorphisms (SNPs) in the mosquito Anopheles funestus, malaria vector. BMC Genomics 8, 1 (Jan. 2007). https://doi.org/10.1186/1471-2164-8-5Google ScholarCross Ref
- Naomi R. Wray, Michael E. Goddard, and Peter M. Visscher. 2007. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Research 17, 10 (Sept. 2007), 1520–1528. https://doi.org/10.1101/gr.6665407Google ScholarCross Ref
- Dan Zhou, Dongmei Yu, Jeremiah M. Scharf, Carol A. Mathews, Lauren McGrath, Edwin Cook, S. Hong Lee, Lea K. Davis, and Eric R. Gamazon. 2021. Contextualizing genetic risk score for disease screening and rare variant discovery. Nature Communications 12, 1 (July 2021). https://doi.org/10.1038/s41467-021-24387-zGoogle Scholar
- Heritability, genetic variation, and the number of risk SNPs effect on deep learning and polygenic risk scores AUC
Recommendations
Feature Selection for Polygenic Risk Scores using Genetic Algorithm and Network Science
2021 IEEE Congress on Evolutionary Computation (CEC)Many human diseases can be attributed to genetic variations in the genome. Scientists have been identifying genetic variants associated with disease risks using population-based data. With this knowledge, an individual’s genetic liability to a ...
AUC: a better measure than accuracy in comparing learning algorithms
AI'03: Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligencePredictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the ...
Comments