Abstract
For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berman, P.: Relationship between density and deterministic complexity of NP-complete languages. In: Ausiello, G., Böhm, C. (eds.) ICALP 1978. LNCS, vol. 62, pp. 63–71. Springer, Heidelberg (1978)
Blanchette, M., et al.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14, 708–715 (2004)
Cacceta, L., Erdos, P., Ordman, E.T., Pullman, N.J.: On the difference between clique numbers of a graph. Ars Combinatoria 19A, 97–106 (1985)
Cavers, M.: Clique partitions and coverings of graphs (Masters thesis, University of Waterloo) (2005)
Cooper, G.M., et al.: Distribution and intensity of constraint in mammalian genomic sequences. Genome Research 15, 901–913 (2005)
Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)
Fitch, W.M.: Homology, a personal view on some problems. Trends Genet. 16, 227–231 (2000)
Gramm, J., et al.: Data reduction, exact, and heuristic algorithms for clique cover. In: ALENEX, pp. 86–94 (2006)
Gregory, D.A., Pullman, N.J.: On a clique covering problem of Orlin. Discrete Math. 41, 97–99 (1982)
Hall Jr., M.: A problem in partition. Bull. Amer. Math. Soc. 47, 801–807 (1941)
Hou, M., et al.: Aligning multiple genomic sequences that contain duplications (manuscript)
Hughes, J.R., et al.: Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences. Proc. Natl. Acad. Sci. USA 102, 9830–9835 (2005)
Kou, L.T., et al.: Covering edges by cliques with regard to keyword conflicts and intersection graphs. Communications of the ACM 21(2), 135–139 (1978)
Lund, C., Yannakakis, M.: On the hardness of approximation minimization problems. J. Assoc. for Comput. Mach. 41, 961–981 (1994)
Margulies, E.H., et al.: Relationship between evolutionary constraint and genome function in 1% of the human genome. Nature (submitted)
Margulies, E.H., et al.: Annotation of the human genome through comparisons of diverse mammalian sequences. Genome Research (submitted)
Orlin, J.: Contentment in graph theory: covering graphs with cliques. Indag. Math. 39, 406–424 (1977)
Pullman, N.J., Donald, A.: Clique coverings of graphs II: complements of cliques. Utilitas Math. 19, 207–213 (1981)
Pullman, N.J.: Clique coverings of graphs IV: algorithms. SIAM J. on Computing 13, 57–75 (1984)
Schwartz, S., et al.: Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103–107 (2003)
Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research 15, 1034–1050 (2005)
The ENCODE Project Consortium: The ENCODE (ENCyclopedia of DNA Elements) Project. Science 306, 636–640 (2004)
Wakefield, M.J., Maxwell, P., Huttley, G.A.: Vestige: maximum likelihood phylogenetic footprinting. BMC Bioinformatics 6, 130 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hou, M., Berman, P., Zhang, L., Miller, W. (2006). Controlling Size When Aligning Multiple Genomic Sequences with Duplications. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_13
Download citation
DOI: https://doi.org/10.1007/11851561_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39583-6
Online ISBN: 978-3-540-39584-3
eBook Packages: Computer ScienceComputer Science (R0)