Skip to main content

CIGenotyper: A Machine Learning Approach for Genotyping Complex Indel Calls

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10813))

Included in the following conference series:

Abstract

Complex insertion and deletion (complex indel) is a rare category of genomic structural variations. A complex indel presents as one or multiple DNA fragments inserted into the genomic location where a deletion occurs. Several studies emphasize the importance of complex indels, and some state-of-the-art approaches are proposed to detect them from sequencing data. However, genotyping complex indel calls is another challenged computational problem because some commonly used features for genotyping indel calls from the sequencing data could be invalid due to the components of complex indels. Thus, in this article, we propose a machine learning approach, CIGenotyper to estimate genotypes of complex indel calls. CIGenotyper adopts a relevance vector machine (RVM) framework. For each candidate call, it first extracts a set of features from the candidate region, which usually includes the read depth, the variant allelic frequency for aligned contigs, the numbers of the splitting and discordant paired-end reads, etc. For a complex indel call, given its features to a trained RVM, the model outputs the genotype with highest likelihood. An algorithm is also proposed to train the RVM. We compare our approach to two popular approaches, Gindel and Pindel, on multiple groups of artificial datasets. The results of our model outperforms them on average success rates in most of the cases when vary the coverages of the given data, the read lengths and the distributions of the lengths of the pre-set complex indels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The Computational Pan-Genomics Consortium: Computational pan-genomics: status, promises and challenges. Briefings Bioinf. 19(1), 118–135 (2018)

    Google Scholar 

  2. Lu, C., Xie, M., Wendl, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nat. Commun. 6, 10086 (2015)

    Article  Google Scholar 

  3. DePristo, M., Banks, E., Polon, R., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)

    Article  Google Scholar 

  4. Ye, K., Wang, J., Jayasinghe, R., et al.: Systematic discovery of complex insertions and deletions in human cancers. Nat. Med. 22(1), 97–104 (2016)

    Article  Google Scholar 

  5. Iakovishina, D., Janoueix-Lerosey, I., Barillot, E., et al.: SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability. Bioinformatics 32(7), 984–992 (2016)

    Article  Google Scholar 

  6. Kloosterman, W., Francioli, L., Hormozdiari, F., et al.: Characteristics of de novo structural changes in the human genome. Genome Res. 25(6), 792–801 (2015)

    Article  Google Scholar 

  7. Zhang, X., Chen, H., Zhang, R., et al.: Detecting complex indels with wide length-spectrum from the third generation sequencing data. BIBM 2017, 1980–1987 (2017)

    Google Scholar 

  8. Geng, Y., Zhao, Z., Xu, J., et al.: Identifying heterogeneity patterns of allelic imbalance on germline variants to infer clonal architecture. In: Huang, D., Jo, K., Figueroa-García, J. (eds.) ICIC 2017. LNCS, vol. 10362, pp. 286–297. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63312-1_26

    Google Scholar 

  9. Geng, Y., Zhao, Z., Zhang, X., et al.: An improved burden-test pipeline for identifying associations from rare germline and somatic variants. BMC Genom. 18(7:55), 55–62 (2017)

    Google Scholar 

  10. Zhang, J., Wang, J., Wu, Y.: An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC Bioinf. 13(6), S6 (2012)

    Google Scholar 

  11. Bansal, V., Libiger, O.: A probabilistic method for the detection and genotyping of small indels from population-scale sequence data. Bioinformatics 27(15), 2047–2053 (2011)

    Article  Google Scholar 

  12. Marschall, T., Hajirasouliha, I., Schonhuth, A.: MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels. Bioinformatics 29(24), 3143–3150 (2013)

    Article  Google Scholar 

  13. Chu, C., Zhang, J., Wu, Y.: GINDEL: accurate genotype calling of insertions and deletions from low coverage population sequence reads. PLoS One 9(11), e113324 (2014)

    Article  Google Scholar 

  14. Camps-Valls, G., Martínez-Ramón, M., Rojo-Alvarez, J., et al.: Nonlinear system identification with composite relevance vector machines. IEEE Sig. Process. Lett. 14(4), 279–282 (2007)

    Article  Google Scholar 

  15. Zhang, X., Xu, M., Wang, Y., et al.: A graph-based algorithm for prioritizing cancer susceptibility genes from gene fusion data. BIBM 2017, 2204–2210 (2017)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Science Foundation of China (Grant No: 31701150) and the Fundamental Research Funds for the Central Universities (CXTD2017003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiayin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, T. et al. (2018). CIGenotyper: A Machine Learning Approach for Genotyping Complex Indel Calls. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10813. Springer, Cham. https://doi.org/10.1007/978-3-319-78723-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78723-7_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78722-0

  • Online ISBN: 978-3-319-78723-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics