Article

A structural perspective on genome evolution

Authors:
David Lee

University College, London

University College, London
View Profile

,
Alastair Grant

University College, London

University College, London
View Profile

,
Ian Sillitoe

University College, London

University College, London
View Profile

,
Mark Dibley

University College, London

University College, London
View Profile

,
Juan Garcia Ranea

University College, London

University College, London
View Profile

,
Christine Orengo

University College, London

University College, London
View Profile

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biologyMarch 2004Pages 336https://doi.org/10.1145/974614.974658

Published:27 March 2004Publication History

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

Pages 336

ABSTRACT

At UCL we have developed several automated protocols for generating protein family resources (CATH; Gene3D). These resources can be used to perform comparative genome analyses in order to understand the evolution of protein families. Also to identify biologically and/or medically interesting families for which no structural data currently exists and which may therefore be important targets for structure genomics initiatives.The CATH domain structure database, established by Orengo and Thornton in 1993, now contains a significant proportion of protein structures from the PDB clustered into 1400 evolutionary families. Relationships have been identified using robust structure comparison methods (SSAP, CATHEDRAL). We have also benchmarked and optimised various 1D-profiles and HMM based protocols for assigning genome sequences to families within the resource (e.g. SAM-T99, SAMOSA, CATH-ISL).In this way we can assign structural data to a large proportion (up to 60%) of whole or partial sequences in completed genomes and >80% of genes coding for enzymes and other proteins in biochemical pathways. However, in order to include all families regardless of whether their structure is known or not, a new protein family resource has been developed (Gene3D). In Gene3D, complete genes have been clustered according to sequence similarity alone, using a robust clustering method (Pfscape). 120 completed genomes from all kingdoms have been clustered into 220,000 gene families, 70,000 of which contain 2 or more sequences. Subsequently, we have labelled those gene families for which CATH structural or Pfam functional domain annotations can be provided for all or part of the gene.Preliminary analysis of the genome annotations reveals that a significant proportion (up to 70%) of CATH annotated genes or gene regions in genomes are assigned to domain families that are common to all three kingdoms of life. However, only 20% of the genome sequences are assigned to gene families common to all kingdoms. Since a large proportion of these genes are multidomain proteins this supports the view that a great deal of functional diversity within the genomes has been achieved by combining domain modules in different ways.In collaboration with Professor Janet Thornton, we have analysed a subset of 56 bacterial genomes to determine the recurrence of specific domain structure families within the genomes. This revealed a small but essential group of universal, and in some cases, highly recurring domain families. For some size-dependent families, domain recurrence is highly correlated with increase in genome size, whilst in other size-independent families no correlation is observed. Statistical analysis allowed us to distinguish three groups. Within the size-dependent families we differentiated two groups: linearly-distributed and non-linearly-distributed. Functional annotation using the COGs revealed that these domains were predominantly involved in metabolism and regulation, respectively. Whilst a third group of Evenly-distributed size independent domains are primarily involved in protein translation and biosynthesis.By mapping CATH and Pfam domains families onto all the genome sequences in Gene3D we observe that a few hundred highly recurrent families are dominating at least 50% of whole or partial genome sequences. Many of these families are common to both prokaryotes and eukaryotes and are performing essential generic functions. In many of the largest families, significant divergence in sequence has been accompanied by modifications in structure and function. Targetting representatives in these families for structure determination will allow the structure genomics initiatives to map both fold and function space and reveal the mechanisms by which divergence in protein families promotes evolution of new functions.

A structural perspective on genome evolution
1. Applied computing
  1. Life and medical sciences

Recommendations

Algorithms for characterizing structural variation in human genome
Read More
Recent duplication, evolution and assembly of the human genome
RECOMB '02: Proceedings of the sixth annual international conference on Computational biology

It has been estimated that 5% of the human genome consists of interspersed duplicated material that has arisen over the last 30 million years of evolution. Two categories of recent duplicated segments can be distinguished: segmental duplications between ...
Read More
The Korea Brassica Genome Project: A glimpse of the Brassica genome based on comparative genome analysis with Arabidopsis: Conference Papers

A complete genome sequence provides unlimited information in the sequenced organism as well as in related taxa. According to the guidance of the Multinational Brassica Genome Project (MBGP), the Korea Brassica Genome Project (KBGP) is sequencing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology
March 2004
370 pages
ISBN:1581137559
DOI:10.1145/974614
General Chair:
Philip E. Bourne
University of California, San Diego
,
Program Chair:
Dan Gusfield
University of California, Davis
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 March 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate148of538submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 98
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A structural perspective on genome evolution

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

ABSTRACT

Cited By

Recommendations

Algorithms for characterizing structural variation in human genome

Recent duplication, evolution and assembly of the human genome

The Korea Brassica Genome Project: A glimpse of the Brassica genome based on comparative genome analysis with Arabidopsis: Conference Papers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A structural perspective on genome evolution

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

ABSTRACT

Cited By

Recommendations

Algorithms for characterizing structural variation in human genome

Recent duplication, evolution and assembly of the human genome

The Korea Brassica Genome Project: A glimpse of the Brassica genome based on comparative genome analysis with Arabidopsis: Conference Papers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media