Abstract:
In this study, we examined the impact of the variant database in recalibration and developed a database-generation model that gathers potential candidates directly from r...Show MoreMetadata
Abstract:
In this study, we examined the impact of the variant database in recalibration and developed a database-generation model that gathers potential candidates directly from resequencing genome data. Based on human genome data, we optimize the hyper-parameters in the model and evaluate the performance improvements both in terms of recalibration and variant calling. To test whether our pseudo-database approach is applicable to species other than human, we constructed pseudo-databases for sheep, rice, and chickpea, and compared its performance with dbSNP. Consistently, we find that our pseudo-database provides improved recalibration and error rates. More importantly, the use of pseudo-databases led to the identification of additional genetic variants. Therefore, the reanalysis with our pseudo-databases approach effectively recalibrates the base quality scores and consequently uncovers hidden genetic variations in published resequencing data.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information: