Abstract
A key question in human genetics is understanding the proportion of SNPs modulating a particular phenotype or the proportion of susceptibility SNPs for a disease, termed polygenicity. Previous studies have observed that complex traits tend to be highly polygenic, opposing the previous belief that only a handful of SNPs contribute to a trait. Beyond these genome-wide estimates, the distribution of polygenicity across genomic regions as well as the genomic factors that affect regional polygenicity remain poorly understood. A reason for this gap is that methods for estimating polygenicity utilize SNP effect sizes from GWAS. However, estimating regional polygenicity from GWAS effect sizes involves untangling the correlation between SNPs due to LD, leading to intractable computations for even a small number of SNPs. In this work, we propose a scalable method, BEAVR, to estimate the regional polygenicity of a trait given marginal effect sizes from GWAS and LD information. We implement a Gibbs sampler to estimate the posterior distribution of the regional polygenicity and derive a fast, algorithmic update to circumvent the computational bottlenecks associated with LD. The runtime of our algorithm is 𝒪(MK) for M SNPs and K susceptibility SNPs, where the number of susceptibility SNPs is typically K ≪ M. By modeling the full LD structure, we show that BEAVR provides unbiased estimates of polygenicity compared to previous methods that only partially model LD. Finally, we show how estimates of regional polygenicity for BMI, eczema, and high cholesterol provide insight into the regional genetic architecture of each trait.