Skip to main content
Log in

A new recombination lower bound and the minimum perfect phylogenetic forest problem

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Understanding recombination is a central problem in population genetics. In this paper, we address an established computational problem in this area: compute lower bounds on the minimum number of historical recombinations for generating a set of sequences (Hudson and Kaplan in Genetics 111, 147–164, 1985; Myers and Griffiths in Genetics 163, 375–394, 2003; Gusfield et al. in Discrete Appl. Math. 155, 806–830, 2007; Bafna and Bansal in IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 78–90, 2004 and in J. Comput. Biol. 13, 501–521, 2006; Song et al. in Bioinformatics 421, i413–i244, 2005). In particular, we propose a new recombination lower bound: the forest bound. We show that the forest bound can be formulated as the minimum perfect phylogenetic forest problem, a natural extension to the classic binary perfect phylogeny problem, which may be of interests on its own. We then show that the forest bound is provably higher than the optimal haplotype bound (Myers and Griffiths in Genetics 163, 375–394, 2003), a very good lower bound in practice (Song et al. in Bioinformatics 421, i413–i422, 2005). We prove that, like several other lower bounds (Bafna and Bansal in J. Comput. Biol. 13, 501–521, 2006), computing the forest bound is NP-hard. Finally, we describe an integer linear programming (ILP) formulation that computes the forest bound precisely for certain range of data. Simulation results show that the forest bound may be useful in computing lower bounds for low quality data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bafna V, Bansal V (2004) The number of recombination events in a sample history: conflict graph and lower bounds. IEEE/ACM Trans Comput Biol Bioinf 1:78–90

    Article  Google Scholar 

  • Bafna V, Bansal V (2006) Inference about recombination from haplotype data: lower bounds and recombination hotspots. J Comput Biol 13:501–521

    Article  MathSciNet  Google Scholar 

  • Bordewich M, Semple C (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann Comb 8:409–423

    Article  MATH  MathSciNet  Google Scholar 

  • Foulds LR, Graham RL (1982) The Steiner tree in phylogeny is NP-complete. Adv Appl Math 3

  • Garey M, Johnson D (1979) Computers and intractability. Freeman, San Francisco

    MATH  Google Scholar 

  • Griffiths RC, Marjoram P (1996) Ancestral inference from samples of DNA sequences with recombination. J Comput Biol 3:479–502

    Article  Google Scholar 

  • Gusfield D (1991) Efficient algorithms for inferring evolutionary history. Networks 21:19–28

    Article  MATH  MathSciNet  Google Scholar 

  • Gusfield D, Eddhu S, Langley C (2004) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinf Comput Biol 2:173–213

    Article  Google Scholar 

  • Gusfield D, Hickerson D, Eddhu S (2007) An efficiently-computed lower bound on the number of recombinations in phylogenetic networks: theory and empirical study. Discrete Appl Math 155:806–830

    Article  MATH  MathSciNet  Google Scholar 

  • Hudson R (2002) Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2):337–338

    Article  Google Scholar 

  • Hudson R, Kaplan N (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164

    Google Scholar 

  • Myers S (2003) The detection of recombination events using DNA sequence data. PhD dissertation, Dept of Statistics, University of Oxford, Oxford, England

  • Myers SR, Griffiths RC (2003) Bounds on the minimum number of recombination events in a sample history. Genetics 163:375–394

    Google Scholar 

  • Song YS, Ding Z, Gusfield D, Langley C, Wu Y (2006) Algorithms to distinguish the role of gene-conversion from single-crossover recombination in the derivations of SNP sequences in populations. In: Proceedings of RECOMB 2006. LNBI, vol 3909

  • Song YS, Wu Y, Gusfield D (2005) Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences. Bioinformatics 421:i413–i422. Proceedings of ISMB 2005

    Article  Google Scholar 

  • Wang L, Zhang K, Zhang L (2001) Perfect phylogenetic networks with recombination. J Comput Biol 8:69–78

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yufeng Wu.

Additional information

A preliminary version of this paper appeared in the Proceedings of COCOON 2007, LNCS, vol. 4598, pp. 16–26.

The work was performed while Y. Wu was with UC Davis and supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation.

D. Gusfield supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Gusfield, D. A new recombination lower bound and the minimum perfect phylogenetic forest problem. J Comb Optim 16, 229–247 (2008). https://doi.org/10.1007/s10878-007-9129-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-007-9129-6

Keywords

Navigation