On Computing Breakpoint Distances for Genomes with Duplicate Genes

Shao, Mingfu; Moret, Bernard M. E.

doi:10.1007/978-3-319-31957-5_14

Mingfu Shao¹⁴ &
Bernard M. E. Moret¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9649))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

2193 Accesses
1 Citations
1 Altmetric

Abstract

A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher-level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of non-conserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property.

The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this paper, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological datasets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the “any matching” formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fertin, G., Labarre, A., Rusu, I., Tannier, E., Vialette, S.: Combinatorics of Genome Rearrangements. MIT Press, Cambridge (2009)
Book MATH Google Scholar
Bader, D.A., Moret, B.M.E., Yan, M.: A fast linear-time algorithm for inversion distance with an experimental comparison. J. Comput. Biol. 8(5), 483–491 (2001)
Article Google Scholar
Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)
Article Google Scholar
Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 163–173. Springer, Heidelberg (2006)
Chapter Google Scholar
Bailey, J.A., Eichler, E.E.: Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7(7), 552–564 (2006)
Article Google Scholar
Lynch, M.: The Origins of Genome Architecture. Sinauer, Sunderland (2007)
Google Scholar
Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)
Article Google Scholar
Blin, G., Chauve, C., Fertin, G.: The breakpoint distance for signed sequences. In: Proceedings of the 1st Conference on Algorithms and Computational Methods for Biochemical and Evolutionary Networks (CompBioNets 2004), vol. 3, pp. 3–16 (2004)
Google Scholar
Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: A pseudo-boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes. In: Tesler, G., Durand, D. (eds.) RECMOB-CG 2007. LNCS (LNBI), vol. 4751, pp. 16–29. Springer, Heidelberg (2007)
Chapter Google Scholar
Blin, G., Chauve, C., Fertin, G., Rizzi, R., Vialette, S.: Comparing genomes with duplications: a computational complexity point of view. ACM/IEEE Trans. Comput. Bio. Bioinf. 14, 523–534 (2007)
Article Google Scholar
Nguyen, C.T., Tay, Y.C., Zhang, L.: Divide-and-conquer approach for the exemplar breakpoint distance. Bioinformatics 21(10), 2171–2176 (2005)
Article Google Scholar
Shao, M., Moret, B.M.E.: A fast and exact algorithm for the exemplar breakpoint distance. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 309–322. Springer, Heidelberg (2015)
Google Scholar
Swenson, K.M., Marron, M., Earnest-DeYoung, J.V., Moret, B.M.E.: Approximating the true evolutionary distance between genomes. In: Proceedings of the 7th SIAM Workshop on Algorithm Engineering and Experiments (ALENEX 2005), pp. 121–129. SIAM Press (2005)
Google Scholar
Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)
Article MathSciNet Google Scholar
Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Assignment of orthologous genes via genome rearrangement. ACM/IEEE Trans. Comput. Bio. Bioinf. 2(4), 302–315 (2005)
Article Google Scholar
Fu, Z., Chen, X., Vacic, V., Nan, P., Zhong, Y., Jiang, T.: MSOAR: a high-throughput ortholog assignment system based on genome rearrangement. J. Comput. Biol. 14(9), 1160–1175 (2007)
Article MathSciNet Google Scholar
Shi, G., Zhang, L., Jiang, T.: MSOAR 2.0: incorporating tandem duplications into ortholog assignment based on genome rearrangement. BMC Bioinform. 11(1), 10 (2010)
Article Google Scholar
Gurobi Optimization Inc.: Gurobi optimizer reference manual (2013)
Google Scholar

Download references

Acknowledgements

We thank Daniel Dörr for helpful discussions.

Author information

Authors and Affiliations

Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne, Switzerland
Mingfu Shao & Bernard M. E. Moret

Authors

Mingfu Shao
View author publications
You can also search for this author in PubMed Google Scholar
Bernard M. E. Moret
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mingfu Shao or Bernard M. E. Moret .

Editor information

Editors and Affiliations

Princeton University, Princeton, New Jersey, USA
Mona Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shao, M., Moret, B.M.E. (2016). On Computing Breakpoint Distances for Genomes with Duplicate Genes. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-31957-5_14
Published: 08 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31956-8
Online ISBN: 978-3-319-31957-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics