Abstract
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to the genomes of human and mouse. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. We have also compared MSOAR with the iterated exemplar algorithm on simulated data and found that MSOAR performed very well in terms of assignment accuracy. These test results indiate that our approach is very promising for genome-wide ortholog assignment.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3402 (1997)
Bairoch, A., et al.: The Universal Protein Resource (UniProt). Nuc. Acids Res. 33, D154–D159 (2005)
Cannon, S.B., Young, N.D.: OrthoParaMap: distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 4(1), 35 (2003)
Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Computing the assignment of orthologous genes via genome rearrangement. In: Proc. 3rd Asia Pacific Bioinformatics Conf (APBC 2005), pp. 363–378 (2005)
Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: The assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4), 302–315 (2005)
Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)
Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In: Proc. 27th Ann. ACM Symp. Theory of Comput (STOC 1995), pp. 178–189 (1995)
Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proc. IEEE 36th Symp. Found. of Comp. Sci, pp. 581–592 (1995)
Karolchik, D., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., Weber, R.J., Haussler, D., Kent, W.J.: The UCSC Genome Browser Database. Nucleic Acids Res. 31(1), 51–54 (2003)
Koonin, E.: Orthologs, paralogs, and evolutionary genomics. In: Annu. Rev. Genet. (2005)
Lee, Y., et al.: Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA). Genome Res. 12, 493–502 (2002)
Li, L., Stoeckert, C., Roos, D.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003)
Marron, M., Swenson, K., Moret, B.: Genomic distances under deletions and insertions. Theoretic Computer Science 325(3), 347–360 (2004)
El-Mabrouk, N.: Reconstructing an ancestral genome using minimum segments duplications and reversals. Journal of Computer and System Sciences 65, 442–464 (2002)
Ozery-Flato, M., Shamir, R.: Two notes on genome rearragnements. Journal of Bioinformatics and Computational Biology 1(1), 71–94 (2003)
Remm, M., Storm, C., Sonnhammer, E.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)
Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)
Swenson, K., Marron, M., Earnest-DeYoung, J., Moret, B.: Approximating the true evolutionary distance between two genomes. In: Proc. 7th SIA Workshop on Algorithm Engineering & Experiments, pp. 121–125 (2005)
Swenson, K., Pattengale, N., Moret, B.: A framework for orthology assignment from gene rearrangement data. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 153–166. Springer, Heidelberg (2005)
Storm, C., Sonnhammer, E.: Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18(1) (2002)
Tatusov, R.L., Galperin, M.Y., Natale, D.A., Koonin, E.: The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000)
Tesler, G.: Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences 65(3), 587–609 (2002)
Tatusov, R.L., Koonin, E., Lipman, D.J.: A genomic perspective on protein families. Science 278, 631–637 (1997)
Wain, H.M., Bruford, E.A., Lovering, R.C., Lush, M.J., Wright, M.W., Povey, S.: Guidelines for human gene nomenclature. Genomics 79(4), 464–470 (2002)
Yuan, Y.P., Eulenstein, O., Vingron, M., Bork, P.: Towards detection of orthologues in sequence databases. Bioinformatics 14(3), 285–289 (1998)
Zheng, X., et al.: Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics 21(6), 703–710 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fu, Z., Chen, X., Vacic, V., Nan, P., Zhong, Y., Jiang, T. (2006). A Parsimony Approach to Genome-Wide Ortholog Assignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_47
Download citation
DOI: https://doi.org/10.1007/11732990_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)