Abstract
The detection of structural variants (SVs) remains challenging due to inconsistencies in detected breakpoints and biological complexity of some rearrangements. Linked-reads have demonstrated their superiority in diploid genome assembly and SV detection. Recently developed tools Aquila and Aquila_stLFR use a reference sequence and linked-reads to generate a high quality diploid genome assembly, using which they then detect and phase personal genetic variations. However, they both produce a substantial proportion of false positive deletion SV calls. To take full advantage of linked-reads, an effective downstream filtering and refinement framework is needed pressingly. In this work, we propose AquilaDeepFilter to filter large deletion SVs from Aquila and Aquila_stLFR. AquilaDeepFilter relies on a deep learning ensemble approach by integrating six state-of-the-art CNN backbones. The filtering of deletion SVs is formulated as a binary classification task on image data that are generated through the extraction of multiple alignment signals, including read depth, split reads and discordant read pairs. Three linked-reads libraries sequenced from the well-studied sample NA24385 and the gold standard of GiaB benchmark were used to perform thorough experiments on our proposed method. The results demonstrated that AquilaDeepFilter could increase the precision rate of Aquila while the recall rate of Aquila decreased only slightly, and the overall F1 improved by 20%. Furthermore, AquilaDeepFilter outperformed another deep learning based method for SV filtering, DeepSVFilter. Even though we designed AquilaDeepFilter for linked-reads, the framework could also be used to improve SV detection on short reads.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
yunfei.hu{at}vanderbilt.edu
sanidhya.v.mangal{at}vanderbilt.edu
ericluzhang{at}hkbu.edu.hk
maizie.zhou{at}vanderbilt.edu
Identify applicable funding agency here. If none, delete this.