Abstract
The birth of the third-generation sequencing technology provides a large number of long-read data for calling structural variations (SVs). However, the existing calling tools for these long-read data have high precision but low recall. Therefore, to solve this problem, a new method called GcnSV is proposed in this paper. Firstly, GcnSV maps all reads in the genome sequencing data into corresponding graphs as the input of the graph neural network. Then, it uses these graphs to train the graph neural network in order to learn the characteristics of variations themselves and their upstream and downstream. Finally, a clustering algorithm is designed to obtain the final calling results. On the simulated and real data, we give the evaluation results of GcnSV and other calling tools. The experimental results show that GcnSV has higher recall and F1-score on different coverage depths and different variant lengths.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stankiewicz, P., Lupski, J.R.: Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010)
Yang, L., et al.: Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153(4), 919–929 (2013)
Lupski, J.R.: Structural variation mutagenesis of the human genome: impact on disease and evolution. Environ. Mol. Mutagen 56, 419–436 (2015)
Ye, K., Schulz, M.H., Long, Q., et al.: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25(21), 2865–2871 (2009)
Xian, F., Abbott, T.E., Larson, D., et al.: BreakDancer - Identification of Genomic Structural Variation from Paired-End Read Mapping. Wiley, Hoboken (2014)
Layer, R.M., Chiang, C., Quinlan, A.R., et al.: LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15(6), 1–19 (2014)
English, A.C., et al.: Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 16, 1–15 (2015)
Goodwin, S., McPherson, J.D., McCombie, W.R.: Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016)
Sedlazeck, F.J., Rescheneder, P., Smolka, M., Fang, H., Nattestad, M., von Haeseler, A., Schatz, M.C.: Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461 (2018)
Jiang, T., Liu, B., Jiang, Y., et al.: Long Read based Human Genomic Structural Variation Detection with cuteSV (2019)
Fang, L., Hu, J., Wang, D., Wang, K.: NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data. Bioinformatics 19, 180 (2018). https://doi.org/10.1186/s12859-018-2207-1
Kip, F.T.N., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks (2016)
Qiu, J., Tang, J., Ma, H., Dong, Y., Wang, K., Tang, J.: Deepinf: modeling influence locality in large social networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018)
Qi, X., Liao, R., Jia, J., Fidler, S., Urtasun, R.: 3D graph neural networks for RGBD semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Li, Y., Yu, R., Shahabi, C., Liu, Y.: Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In: Proceedings of the 7th International Conference on Learning Representations (2018)
Wen, T., Altman, R.B.: Graph convolutional neural networks for predicting drug-target interactions. J. Chem. Inf. Model. 59(10), 4131–4149 (2019)
Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Ester, M., Kriegel, H.P., Sander, J., et al.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. AAAI Press (1996)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Dierckxsens, N., Li, T., Vermeesch, J.R., et al.: A benchmark of structural variation detection by long reads through a realistic simulated model (2020)
Heller, D., Vingron, M.: SVIM: structural variant identification using mapped long reads. Bioinformatics 35(17), 2907–2915 (2019)
Jeffares, D.C., Jolly, C., Hoti, M., et al.: Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8(1), 14061 (2017)
Zhang, W., Jia, B., Wei, C.: PaSS: a sequencing simulator for PacBio sequencing. BMC Bioinform. 20(1), 1–7 (2019)
Sedlazeck, F.J., Rescheneder, P., Smolka, M., et al.: Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15(6), 461–468 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, M., Wang, H., Gao, J. (2023). GcnSV: A Method Based on Deep Learning of Calling Structural Variations from the Third-Generation Sequencing Data. In: Hong, W., Weng, Y. (eds) Computer Science and Education. ICCSE 2022. Communications in Computer and Information Science, vol 1813. Springer, Singapore. https://doi.org/10.1007/978-981-99-2449-3_35
Download citation
DOI: https://doi.org/10.1007/978-981-99-2449-3_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2448-6
Online ISBN: 978-981-99-2449-3
eBook Packages: Computer ScienceComputer Science (R0)