Article

Accurate detection of very sparse sequence motifs

Authors:
Andreas Heger

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

,
Michael Lappe

EMBL-EBI, Cambridge, United Kingdom

EMBL-EBI, Cambridge, United Kingdom
View Profile

,
Liisa Holm

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biologyApril 2003Pages 139–147https://doi.org/10.1145/640075.640094

Published:10 April 2003Publication History

RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology

Pages 139–147

ABSTRACT

Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any two proteins in a connected set, whether they are direct or indirect sequence neighbours in the underlying library of pairwise alignments. We have implemented a greedy algorithm, MaxFlow, using a novel consistency score to estimate the relative likelihood of alternative paths of transitive alignment. In contrast to traditional profile models of amino acid preferences, MaxFlow models the probability that two positions are structurally equivalent and retains high information content across large distances in sequence space. Thus, MaxFlow is able to identify sparse and narrow active-site sequence signatures which are embedded in high-entropy sequence segments in the structure-based multiple alignment of large diverse enzyme superfamilies. In a challenging benchmark, MaxFlow yields better reliability and double coverage compared to available sequence alignment software. This promises to increase information returns from functional and structural genomics, where reliable sequence alignment is a bottleneck to transferring the functional or structural characterization of model proteins to entire protein families and superfamilies.

References

Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 19, 56--68.Google ScholarCross Ref
Lindahl E, Elofsson A. Identification of related proteins on family, superfamily and fold level. J Mol Biol 2000, 295:613--625.Google Scholar
Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L (2001) A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3. Nucl Acids Res 29, 55--57.Google ScholarCross Ref
Holm L, Sander C (1997) An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins 28, 72--82.Google ScholarCross Ref
Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29, 2994--3005.Google ScholarCross Ref
Altschul SF (1991) Amino acid matrices from an information theoretic perspective. J. Mol. Biol. 219, 555--565.Google ScholarCross Ref
Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131--144.Google ScholarCross Ref
Vingron M, Argos P (1991) Motif recognition and alignment for many sequences by comparison of dot-matrices. J. Mol. Biol. 218, 33--43.Google ScholarCross Ref
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205--217.Google ScholarCross Ref
Kececioglu J (1993) The maximum weight trace problem in multiple sequence alignment. In Proceedings of the 4th Symposium on Combinatorial Pattern Matching, No. 684 in Lect. Notes Comput. Sci., Springer, Berlin, pp. 106--119. Google ScholarDigital Library
Grundy WN, Bailey TL, Elkan CP, Baker ME (1997) Meta-MEME: motif-based hidden Markov models of protein families. CABIOS 5, 211--221.Google Scholar
Morgenstern B (1999) DIALIGN2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211--218.Google ScholarCross Ref
Bork P, Holm L, Koonin E, Sander C (1995) The cytidylyltransferase superfamily: identification of the nucleotide-binding site and fold prediction. Proteins 22, 259--266.Google ScholarCross Ref
Flohil JA, Vriend G, Berendsen HJC (2002) Completion and refinement of 3-D homology models with restricted molecular dynamics: Application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis. Proteins 48, 593--604.Google ScholarCross Ref
Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, Lichtarge O (2002) Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 316, 139--154.Google ScholarCross Ref
Casari G, Sander C, Valencia A (1995) A method to predict functional residues in proteins. Nat Struct Biol 2, 171--178.Google ScholarCross Ref

Index Terms

Accurate detection of very sparse sequence motifs
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

Prediction of the post-translational modification sites on dengue virus E protein and deciphering their role in pathogenesis

Dengue virus, a member of the flavivirus family, is a mosquito-borne viral pathogen for which any specific treatment or control of infection by vaccination is yet to be conclusive. The envelope glycoprotein, E, mediates viral entry by membrane fusion. ...
Read More
Systematic investigation of sequence and structural motifs that recognize ATP

Display Omitted(A) Superimposed cluster of ATP-binding site structures that belong to the "class II aminoacyl- tRNA synthetase" binding mode. (B) Structural motif identified by a clustering method for the "class II aminoacyl- tRNA synthetase" binding ...
Read More
In?uenza-specific Amino Acid Substitution Model
KSE '09: Proceedings of the 2009 International Conference on Knowledge and Systems Engineering

Amino acid substitution model is a crucial component in protein sequence comparative systems such as protein sequence similarity searching, protein sequence alignment, and protein phylogenetic analysis. Although several general amino acid substitution ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology
April 2003
352 pages
ISBN:1581136358
DOI:10.1145/640075
Editors:
Martin Vingron
Max-Planck-Institute for Molecular Genetics, Germany
,
Sorin Istrail
Celera Genomics/Applied Biosystems
,
Pavel Pevzner
University of California at San Diego, CA
,
Michael Waterman
University of Southern California, CA
,
Program Chair:
Webb Miller
The Pennsylvania State University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 April 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithm
consistency
protein evolution
sequence alignment
Qualifiers
- Article
Conference

Acceptance Rates
RECOMB '03 Paper Acceptance Rate35of175submissions,20%Overall Acceptance Rate148of538submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 659
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accurate detection of very sparse sequence motifs

RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prediction of the post-translational modification sites on dengue virus E protein and deciphering their role in pathogenesis

Systematic investigation of sequence and structural motifs that recognize ATP

In?uenza-specific Amino Acid Substitution Model