Abstract
Identification of somatic mutations, based on data from next-generation sequencing of the DNA, has become one of the fundamental research strategies in oncology, with the goal to seek mechanisms underlying the process of carcinogenesis and resistance to commonly used therapies. Despite significant advances in the development of sequencing methods and data processing algorithms, the reproducibility of experiments is relatively low and depending significantly on the methods used to identify changes in the structure of the DNA. This is mainly due to the influence of three factors: (1) high heterogeneity of tumors due to which some mutations are characteristic for a small number of cells, (2) bias associated with the process of exome isolation and (3) specificity of data pre-processing strategies.
The aim of the work was to determine the impact of these factors on the identification of somatic mutations, allowing to determine the reasons for low reproducibility in such studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: a literature review. Biomed. Inf. Insights 8, 1–10 (2016)
Bensz, W., et al.: Integrated System supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems. SCI, vol. 642, pp. 399–409. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31277-4_35
Psiuk-Maksymowicz, K., et al.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 449–462. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_34
Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Cech, M., Chilton, J.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44(W1), W3–W10 (2016)
Psiuk-Maksymowicz, K., Mrozek, D., Jaksik, R., Borys, D., Fujarewicz, K., Swierniak, A.: Scalability of a genomic data analysis in the biotest platform. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) ACIIDS 2017. LNCS (LNAI), vol. 10192, pp. 741–752. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54430-4_71
Gruca, A., Jaksik, R., Psiuk-Maksymowicz, K.: Functional interpretation of gene sets: semantic-based clustering of gene ontology terms on the biotest platform. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 125–136. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67792-7_13
Gerlinger, M., Rowan, A.J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N.: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012)
Shi, W., Ng, C.K.Y., Lim, R.S., Jiang, T., Kumar, S., Li, X., Wali, V.B., Piscuoglio, S., Gerstein, M.B., Chagpar, A.B., Weigelt, B., Pusztai, L., Reis-Filho, J.S., Hatzis, C.: Reliability of whole-exome sequencing for assessing intratumor genetic heterogeneity. bioRxiv (2018)
Derryberry, D.Z., Cowperthwaite, M.C., Wilke, C.O.: Reproducibility of SNV-calling in multiple sequencing runs from single tumors. PeerJ 4, e1508 (2016)
Qi, Y., Liu, X., Liu, C., Wang, B., Hess, K.R., Symmans, W.F., Shi, W., Pusztai, L.: Reproducibility of variant calls in replicate next generation sequencing experiments. PLoS One 7, e0119230 (2015)
Meynert, A.M., Ansari, M., FitzPatrick, D.R., Taylor, M.S.: Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinform. 15, 247 (2014)
Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv.org p. arXiv:1303.3997 (2013)
Cibulskis, C., Lawrence, M.S., Carter, S.L., Sivachenko, A., Jaffe, D., Sougnez, C., Gabriel, S., Meyerson, M., Lander, E.S., Getz, G.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013)
Metzker, M.L.: Sequencing technologies – the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)
McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R., Thormann, A., Flicek, P., Cunningham, F.: The ensembl variant effect predictor. Genome Biol 17(1), 122 (2016)
Jaksik, R., Marczyk, M., Polanska, J., Rzeszowska-Wolny, J.: Sources of high variance between probe signals in affymetrix short oligonucleotide microarrays. Sensors 14, 532–548 (2014)
Vissers, L., van Nimwegen, K., Schieving, J., Kamsteeg, E., Kleefstra, T., Yntema, H., Pfundt, R., van der Wilt, G.J., Krabbenborg, L., Brunner, H., van der Burg, S., Grutters, J., Veltman, J., Willemsen, M.: A clinical utility study of exome sequencing versus conventional genetic testing in pediatric neurology. Genet. Med. 19, 1055–1063 (2017)
Bamshad, M.J., Ng, S.B., Bigham, A.W., Tabor, H.K., Emond, M.J., Nickerson, D.A., Shendure, J.: Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011)
Acknowledgements
This work was partially supported by the National Centre for Research and Development grant No. Strategmed2/267398/4/NCBR/2015 (KPM), the National Science Centre grant No. 2016/23/D/ST7/03665 (RJ), and by internal grant of Institute of Automatic Control BK-204/RAu1/2017 (AS).
Calculations were carried out by means of the infrastructure of the Ziemowit computer cluster (www.ziemowit.hpc.polsl.pl) in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Jaksik, R., Psiuk-Maksymowicz, K., Swierniak, A. (2018). Identification of Factors that Affect Reproducibility of Mutation Calling Methods in Data Originating from the Next-Generation Sequencing. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds) Computer and Information Sciences. ISCIS 2018. Communications in Computer and Information Science, vol 935. Springer, Cham. https://doi.org/10.1007/978-3-030-00840-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-00840-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00839-0
Online ISBN: 978-3-030-00840-6
eBook Packages: Computer ScienceComputer Science (R0)