Abstract
We review two streams of recent research results in this paper. The first is on converting a sequence A to another sequence B using the minimum number of tandem duplications. This research originates from the copying systems in computer science in the early 1980s, and also from biology more than 40 years ago. We review our recent NP-hardness result on this paper, together with several open problems along the line. Segmental duplications and deletions are more discussed recently on cancer research where besides genomes (sequences), the so-called copy number profile (a vector where the ith component represents the number of the ith segment appearing in the genome, regardless of their orders) are also used. We again review some of our recent hardness results and preliminary positive results, together with some open problems. This paper is mostly self-contained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alon, N., Bruck, J., Hassanzadeh, F.F., Jain, S.: Duplication distance to the root for binary sequences. IEEE Trans. Inf. Theory 63(12), 7793–7803 (2017)
Angibaud, S., Fertin, G., Rusu, I., Thevenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)
Benson, G., Dong, L.: Reconstructing the duplication history of a tandem repeat. In: Proceedings of ISMB 1999, pp. 44–53 (1999)
Bovet, D.P., Varricchio, S.: On the regularity of languages on a binary alphabet generated by copying systems. Inf. Process. Lett. 44(3), 119–123 (1992)
Bulteau, L., Fertin, G., Rusu, I.: Sorting by transposition is difficult. SIAM J. Discrete Math. 26(3), 1148–1180 (2012)
The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011)
Charlesworth, B., Sniegowski, P., Stephan, W.: The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371(6494), 215–220 (1994)
Chaudhuri, K., Chen, K., Mihaescu, R., Rao, S.: On the tandem duplication-random loss model of genome rearrangement. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006), pp. 564–570 (2006)
Chen, Z., Wang, L., Wang, Z.: Approximation algorithms for reconstructing the duplication history of tandem repeats. Algorithmica 54(4), 501–529 (2009)
Cho, D.-J., Han, Y.-S., Kim, H.: Bound-decreasing duplication system. Theoret. Comput. Sci. 793, 152–168 (2019)
Chowdhury, S., Shackney, S., Heselmeyer-Haddad, K., Ried, T., Schaeffer, A., Schwartz, R.: Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. Plos Comput. Biol. 10(7), e1003740 (2014)
Ciriello, G., Killer, M., Aksoy, B., Senbabaoglu, Y., Schultz, N., Sanders, C.: Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013)
Cooke, S., et al.: Intra-tumour genetic heterogeneity and poor chemoradiotherapy response in cervical cancer. Br. J. Cancer 104(2), 361–368 (2011)
Cooke, S., Brenton, J.: Evolution of platinum resistance in high-grade serous ovarian cancer. Lancet Oncol. 12(12), 1169–1174 (2011)
Cowin, P., et al.: LRP1B deletion in high-grade serous ovarian cancers is associated with acquired chemotherapy resistance to liposomal doxorubicin. Cancer Res. 72(16), 4060–4073 (2012)
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, Second edn. MIT Press, Cambridge (2001)
Dassow, J., Mitrana, V., Paun, G.: On the regularity of the duplication closure. Bull. EATCS 69, 133–136 (1999)
Downey, R., Fellows, M.: Parameterized Complexity. Springer, Heidelberg (1999). https://doi.org/10.1007/978-1-4612-0515-9
Ehrenfeucht, A., Rozenberg, G.: On regularity of languages generated by copying systems. Discrete Appl. Math. 8(3), 313–317 (1984)
El-Kebir, M., et al.: Copy-number evolution problems: complexity and algorithms. In: Frith, M., Storm Pedersen, C.N. (eds.) WABI 2016. LNCS, vol. 9838, pp. 137–149. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43681-4_11
Fellows, M., Hermelin, D., Rosamond, F., Vialette, S.: On the parameterized complexity of multiple-interval graph problems. Theoret. Comput. Sci. 410(1), 53–61 (2009)
Flum, J., Grohe, M.: Parameterized Complexity Theory. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-29953-X
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman W. H., New York (1979)
Gascuel, O., Hendy, M.D., Jean-Marie, A., McLachlan, R.: The combinatorics of tandem duplication trees. Syst. Biol. 52(1), 110–118 (2003)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)
Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of FOCS 1995, pp. 581–592 (1995)
Hassanzadeh, F., Schwartz, M., Bruck, J.: The capacity of string-duplication systems. IEEE Trans. Inf. Theory 62(2), 811–824 (2016)
Ito, M., Leupold, P., Shikishima-Tsuji, K.: Closure of language classes under bounded duplication. In: Ibarra, O.H., Dang, Z. (eds.) DLT 2006. LNCS, vol. 4036, pp. 238–247. Springer, Heidelberg (2006). https://doi.org/10.1007/11779148_22
Jain, S., Hassanzadeh, F., Bruck, J.: Capacity and expressiveness of genomic tandem duplication. IEEE Trans. Inf. Theory 63(10), 6129–6138 (2017)
Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1220–1229 (2012)
Landau, G., Schmidt, J., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8(1), 1–18 (2001)
Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Lafond, M., Zhu, B., Zou, P.: The tandem duplication distance is NP-hard. CoRR abs/1906.05266, June 2019
Lafond, M., Zhu, B., Zou, P.: The tandem duplication distance is NP-hard. In: Proceedings of STACS 2020. LiPIcs, vol. 154, pp. 15:1–15:15 (2020)
Lafond, M., Zhu, B., Zou, P.: Genomic problems involving copy number profiles: complexity and algorithms. CoRR abs/2002.04778, February 2020
Lafond, M., Zhu, B., Zou, P.: Genomic problems involving copy number profiles: complexity and algorithms. In: Proceedings of CPM 2020. LiPIcs, vol. 161, pp. 22:1–22:25 (2020)
Letunic, I., Copley, R., Bork, P.: Common exon duplication in animals and its role in alternative splicing. Hum. Mol. Genet. 11(13), 1561–1567 (2002)
Leupold, P., Mitrana, V., Sempere, J.M.: Formal languages arising from gene repeated duplication. In: Jonoska, N., Paun, G., Rozenberg, G. (eds.) Aspects of Molecular Computing. LNCS, vol. 2950, pp. 297–308. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24635-0_22
Leupold, P., Carlos, M.V., Mitrana, V.: Uniformly bounded duplication languages. Discrete Appl. Math. 146(3), 301–310 (2005)
Li, S., Dou, X., Ge, R., Qian, M., Wan, L.: A remark on copy number variation detection. Plos One 13(4), e0196226 (2018)
Li, W., Olivier, M.: Current analysis platforms and methods for detecting copy number variation. Physiol. Genomics 45(1), 1–16 (2013)
Macdonald, M., et al.: A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease. Cell 72(6), 971–983 (1993)
Maley, C., et al.: Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nat. Genet. 38(4), 468–473 (2006)
Marusyk, A., Almendro, V., Polyak, K.: Intra-tumour heterogeneity: a looking glass for cancer. Nat. Rev. 13, 323–334 (2012)
Navin, N., et al.: Inferring tumor progression from genomic heterogeneity. Genome Res. 20, 68–80 (2010)
Oesper, L., Ritz, A., Aerni, S., Drebin, R., Raphael, B.: Reconstructing cancer genomes from paired-end sequencing data. BMC Bioinform. 13(Suppl 6), S10 (2012)
Qingge, L., He, X., Liu, Z., Zhu, B.: On the minimum copy number generation problem in cancer genomics. In: Proceedings of ACM BCB 2018, pp. 260–269. ACM (2018)
Schwarz, R., Trinh, A., Sipos, B., Brenton, J., Goldman, N., Markowetz, F.: Phylogenetic quantification of intra-tumour heterogeneity. Plos Comput. Biol. 10(4), e1003535 (2014)
Shah, S., et al.: Mutational evolution in a lobular breast tumor profiled at single nucleotide resolution. Nature 461(7265), 809–813 (2009)
Shamir, R., Zehavi, M., Zeira, R.: A linear-time algorithm for the copy number transformation problem. In: Proceedings of CPM 2016. LiPIcs, vol. 54, pp. 16:1–16:13 (2016)
Sharp, A., et al.: Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77(1), 78–88 (2005)
Szostak, J.W., Wu, R.: Unequal crossing over in the ribosomal DNA of Saccharomyces cerevisiae. Nature 284(5755), 426–430 (1980)
Thue, A.: Über unendliche Zeichenreihen (Mathematisk-Naturvidenskabelig Klasse). Videnskabsselskabet, Freetown Christiania, Denmark (1906)
Tremblay-Savard, O., Bertrand, D., El-Mabrouk, N.: Evolution of orthologous tandemly arrayed gene clusters. BMC Bioinform. 12(S-9), S2 (2011)
Trevisan, L.: Non-approximability results for optimization problems on bounded degree instances. In: Proceedings of 33rd ACM Symposium on Theory of Computing (STOC 2001), pp. 453–461. ACM (2001)
Wang, M.W.: On the irregularity of the duplication closure. Bull. EATCS 70, 162–163 (2000)
Watterson, G.A., Ewens, W.J., Hall, T.E., Morgan, A.: The chromosome inversion problem. J. Theoret. Biol. 99(1), 1–7 (1982)
Zhu, B.: A retrospective on genomic preprocessing for comparative genomics. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, vol. 19, pp. 183–206. Springer, Heidelberg (2013). https://doi.org/10.1007/978-1-4471-5298-9_9
Acknowledgments
I would like to thank my collaborators for these research: Manuel Lafond, Letu Qingge and Peng Zou. I also thank Prof. Henning Fernau and the organizers of CSR’2020 to give me the chance to survey these research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, B. (2020). Tandem Duplications, Segmental Duplications and Deletions, and Their Applications. In: Fernau, H. (eds) Computer Science – Theory and Applications. CSR 2020. Lecture Notes in Computer Science(), vol 12159. Springer, Cham. https://doi.org/10.1007/978-3-030-50026-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-50026-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50025-2
Online ISBN: 978-3-030-50026-9
eBook Packages: Computer ScienceComputer Science (R0)