Abstract
We study the performance of a variant detection method that is based on a property of tumor phylogenetic tree. Our major contributions are two folds. First, we show the property of tumor phylogenetic tree: the total patterns of mutations are restricted if a multi-regional mutation profile follows a corresponding tumor phylogenetic tree, where a multi-regional mutation profile is a matrix in which predictions of somatic mutations at the corresponding tumor regions are listed. Second, we evaluate the lower and upper bounds of specificity and sensitivity of a phylogeny-based somatic variant detection method under several situations. In the evaluation, we conduct patient-wise variant detection from a noisy multi-regional mutation profile matrix for some genomic positions by utilizing the phylogenetic property; we assume that the phylogenetic information can be extracted from another mutation profile matrix that contains accurate candidates at different genomic positions from the noisy ones. From the evaluation, we find that higher sensitivity is not guaranteed in the phylogeny-based variant detection, but higher specificity is guaranteed for several cases. These findings indicate the tumor phylogeny gives more merit for variant detection based on erroneous long-read sequencers (e.g. Oxford nanopore sequencers) than that based on accurate short-read sequencers (e.g., Illumina sequencer).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Koboldt, D.C., et al.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)
Saunders, C.T., et al.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28(14), 1811–1817 (2012)
Cibulskis, K., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)
Shiraishi, Y., et al.: An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41(7), e89 (2013)
Usuyama, N., et al.: HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations. Bioinformatics 30(23), 3302–3309 (2014)
Kim, S., et al.: Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15(8), 591–594 (2018)
Moriyama, T., et al.: A Bayesian model integration for mutation calling through data partitioning. Bioinformatics 35(21), 4247–4254 (2019)
Sahraeian, S.M.E., et al.: Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10(1), 1041 (2019)
Josephidou, M., et al.: multiSNV: a probabilistic approach for improving detection of somatic point mutations from multiple related tumour samples. Nucleic Acids Res. 43(9), e61 (2015)
Moriyama, T., et al.: Accurate and flexible bayesian mutation call from multi-regional tumor samples. In: Mathematical and Computational Oncology, pp. 47–61. Springer, Cham (2019)
van Rens, K.E., et al.: SNV-PPILP: refined SNV calling for tumor data using perfect phylogenies and ILP. Bioinformatics 31(7), 1133–1135 (2015)
Reiter, J.G., et al.: Reconstructing metastatic seeding patterns of human cancers. Nat. Commun. 8, 14114 (2017)
Dorri, F., et al.: Somatic mutation detection and classification through probabilistic integration of clonal population information. Commun. Biol. 2(1), 44 (2019)
Detering, H., et al.: Accuracy of somatic variant detection in multiregional tumor sequencing data. bioRxiv 655605 (2019)
Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks 21(1), 19–28 (1991)
Kimura, M.: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61(4), 126 (1969)
Zafar, H., et al.: SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models. Genome Biol. 18(1), 178 (2017)
Zafar, H., et al.: SiCloneFit: bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Research (2019)
El-Kebir, M.: SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics 34(17), i671–i679 (2018)
Acknowledgements
This work has been supported by the Grant-in-Aid for JSPS Research Fellow (17J08884) and MEXT/JSPS KAKENHI Grant (15H05912, hp180198, hp170227, 18H03329, hp190158).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof
(proof of Lemma Lemma 2) T has a phylogenetic tree, hence we can choose a phylogenetic tree \(\mathcal {T}\). We can assume \(|F_{\mathcal {T}}| \le c\) by removing leaves in \(\mathcal {T}\) if no cell corresponds to the leaf in \(f:R \rightarrow F_{\mathcal {T}}\). We can also assume that the root node has only one outgoing edge by adding a new root node and connect the novel root and the previous root node.
For the last condition, we remove the following two types of internal nodes: i) the internal node having only one outgoing edge, and ii) the internal node having more than 2 outgoing edges. It is sufficient to show the operation to remove nodes that satisfy i) or ii) from T while keeping conditions of a)-c).
For i), just remove the nodes as in Fig. 6a. We can easily check a)-c) still holds true after this operation. For ii) just remove the node as in Fig. 6b. If the number of outgoing edges is more than three, apply this operation recursively. We can also check that a)-c) still hold true after these operations. Because the operations pictured in Fig. 6a and 6b just decrease the number of i) and ii) nodes, we can totally remove i) and ii) nodes. \(\square \)
1.1 Performance Evaluation of \(R_r\)
1.2 Detailed Procedures for Performance Evaluation
Evaluation of \(\mathbb {E}_{B|A}[\mathrm {TN}(L, A, B)] / k_2\). We evaluate the upper bound and lower bound for \(\mathbb {E}_{B|A}[\mathrm {FP}(L, A, B)]\). Letting K be the total patterns of columns in A, the lower bound can be derived as follows.
where \(\underline{f_2} := \min (f_2, 1-f_2)\). The upper bound can also be derived as follows.
where \(\overline{f_2} := \max (f_2, 1-f_2)\). From this, we can evaluate \(\mathbb {E}_{B|A}[\mathrm {TN}(L, A, B)]\) as follows.
Evaluation of \(\mathbb {E}_{B|A}[\mathrm {TP}(L, A, B)] / k_1\). From the linearity of the expectation, the expected number of true positives can be written as follows.
The lower bound for \(\mathrm {Pr}( L(\textit{\textbf{c}}_{j}, A) = 1 )\) is as follows.
where \(w_{i} := \mathrm {Pr} \left( \textit{\textbf{c}}_{j} \,\,s.t.\; \sum _{n^{\prime } = 1}^{n} a_{I_{j}, n^{\prime }} = i \right) \). From this,
For obtaining the upper bound of \(\mathbb {E}_{B|A}[\mathrm {TP}(L, A, B)]\), we focus on two things as shown in Fig. 8. First, the number of column vectors in A that each \(\textit{\textbf{c}}_{j}\) can correspond is at most K. Second, the probability for each \(\textit{\textbf{c}}_{j}\) corresponding to one column vector is at most \(\overline{f_1}^{n-i}\), where \(\overline{f_1} := \max (f_1, 1-f_1)\), and \(i = \sum _{n^{\prime } = 1}^{n} a_{I_{j}, n^{\prime }}\). From this, we can obtain the upper bound for the conditional probability as follows.
Then, the upper bound of \(\mathbb {E}_{B|A}[\mathrm {TP}(L, A, B)]\) is as follows.
Therefore,
Performance Evaluation of \(R_r\). We can evaluate the specificity and sensitivity for \(R_r\).
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Moriyama, T., Imoto, S., Miyano, S., Yamaguchi, R. (2020). Theoretical Foundation of the Performance of Phylogeny-Based Somatic Variant Detection. In: Bebis, G., Alekseyev, M., Cho, H., Gevertz, J., Rodriguez Martinez, M. (eds) Mathematical and Computational Oncology. ISMCO 2020. Lecture Notes in Computer Science(), vol 12508. Springer, Cham. https://doi.org/10.1007/978-3-030-64511-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-64511-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64510-6
Online ISBN: 978-3-030-64511-3
eBook Packages: Computer ScienceComputer Science (R0)