Abstract
When reconstructing a phylogenetic tree, one common representation for a species is a binary string indicating the existence of some selected genes/proteins. Up until now, all existing methods have assumed the existence of these genes/proteins to be independent. However, in most cases, this assumption is not valid. In this paper, we consider the reconstruction problem by taking into account the dependency of proteins, i.e. protein linkage. We assume that the tree structure and leaf sequences are given, so we need only to find an optimal assignment to the ancestral nodes. We prove that the Phylogenetic Tree Reconstruction with Protein Linkage (PTRPL) problem for three different versions of linkage distance is NP-complete. We provide an efficient dynamic programming algorithm to solve the general problem in O(4m ·n)4 and O(4m ·(m + n)) time (compared to the straight-forward O(4m ·m ·n) and O(4m ·m 2 ·n) time algorithm), depending on the versions of linkage distance used, where .. stands for the number of species and .. for the number of proteins, i.e. length of binary string. We also argue, by experiments, that trees with higher accuracy can be constructed by using linkage information than by using only hamming distance to measure the differences between the binary strings, thus validating the significance of linkage information.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wang, L.-S., Leebens-Mack, J., Wall, P.K., Beckmann, K., Pamphilis, C.W., Warnow, T.: The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation. Computational Biology and Bioinformatics 8, 1108–1119 (2011)
Zhou, Y., Wang, R., Li, L., Xia, X., Sun, Z.: Inferring Functional Linkages between Proteins from Evolutionary Scenarios. Journal of Molecular Biology 359 (2006)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425 (1987)
Elias, I., Lagergren, J.: Fast neighbor joining. Theoretical Computer Science (2008)
Wolf, M., Ruderisch1, B., Dandekar1, T., Schultz1, J., Müller, T.: ProfDistS (profile-) distance based phylogeny on sequence-structure alignments. Bioinformatics 24 (2008)
Muller, T., Rahmann, S., Dandekar, T., Wolf, M.: Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta). BMC (2004)
Bruno, W.J., et al.: Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction. Molecular Biology and Evolution (2000)
Foulds, L.R., Graham, R.L.: The Steiner Problem in Phylogeny is NP-Complete. Advances in Applied Mathematics 3, 43–49 (1982)
Ribeiro, C.C., Vianna, D.S.: A hybrid genetic algorithm for the phylogeny problem using path-relinking as a progressive crossover strategy. International Transactions in Operational Research (2009)
Lin, Y.-M., Fang, S.-C., Thorne, J.L.: A tabu search algorithm for maximum parsimony phylogeny inference. European Journal of Operational Research 176 (2007)
Lin, Y.-M.: Tabu search and genetic algorithm for phylogeny inference (2008)
Swofford, D.L.: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods) Version 4 (1998)
Hartigan, J.A.: Minimum mutation fits to a given tree. Biometrics 29 (1973)
Sankoff, D.: Minimal Mutation Trees of Sequences. SIAM on Applied Mathematics (1975)
Arvestad, L., Lagergren, J., Sennblad, B.: The gene evolution model and computing its associated probabilities. J. ACM 56, 1–44 (2009)
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution (1980)
Guindon, S., et al.: New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology (2010)
Guindon, S., Delsuc, F., Dufayard, J.F., Gascuel, O.: Estimating maximum likelihood phylogenies with PhyML. Methods Mol. Biol. 537, 113–137 (2009)
Ronquist, F., Huelsenbeck, J.P.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)
Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9, 91–116 (1992)
Cilibrasi, R., Vitany, P.M.B.: A New Quartet Tree Heuristic for Hierarchical Clustering. Presented at the Theory of Evolutionary Algorithms, Dagstuhl, Germany (2006)
Schmidt, H.A., Strimmer, K., Vingron, M., Haeseler, A.: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. BMC 18 (2002)
Snir, S., Yuster, R.: Reconstructing approximate phylogenetic trees from quartet samples. In: The 21 Annual ACM-SIAM Symposium on Discrete Algorithms, Texas (2010)
Tao, J., Kearney, P., Li, M.: Orchestrating quartets: approximation and data correction. In: Proceedings of 39th Annual Symposium on Foundations of Computer Science (1998)
G. E. M. L. E., Lupo, P.: Gene-Gene Interactions in the Folate Metabolic Pathway and the Risk of Conotruncal Heart Defects. Journal of Biomedicine and Biotechnology (2010)
Pereira-Leal, J., Levy, E.D., Teichmann, S.A.: The origins and evolution of functional modules: lessons from protein complexes. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 361, 507–517 (2006)
Lu, Y.-C., Yec, W.C., Ohashi, P.S.: LPS/TLR4 signal transduction pathway. Cytokine (2008)
Uetz, P., et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000)
Craig, T.: A simplified NP-complete satisfiability problem. Discrete Applied Mathematics (1984)
Berman, P., Alex, M.K., Scott, E.D.: Computational complexity of some restricted instances of 3-SAT. Discrete Applied Mathematics 155, 649–653 (2007)
Doran, R.W.: The Gray Code. Journal of Universal Computer Science 13 (2007)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Series of Books in the Mathematical Sciences. W. H. Freeman (1979)
Day, W.H.E.: Properties of the nearest neighbor interchange metric for trees of small size. Journal of Theoretical Biology 101, 275–288 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, J. et al. (2012). Phylogenetic Tree Reconstruction with Protein Linkage. In: Bleris, L., Măndoiu, I., Schwartz, R., Wang, J. (eds) Bioinformatics Research and Applications. ISBRA 2012. Lecture Notes in Computer Science(), vol 7292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30191-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-30191-9_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30190-2
Online ISBN: 978-3-642-30191-9
eBook Packages: Computer ScienceComputer Science (R0)