skip to main content
10.1145/640075.640094acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

Accurate detection of very sparse sequence motifs

Published:10 April 2003Publication History

ABSTRACT

Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any two proteins in a connected set, whether they are direct or indirect sequence neighbours in the underlying library of pairwise alignments. We have implemented a greedy algorithm, MaxFlow, using a novel consistency score to estimate the relative likelihood of alternative paths of transitive alignment. In contrast to traditional profile models of amino acid preferences, MaxFlow models the probability that two positions are structurally equivalent and retains high information content across large distances in sequence space. Thus, MaxFlow is able to identify sparse and narrow active-site sequence signatures which are embedded in high-entropy sequence segments in the structure-based multiple alignment of large diverse enzyme superfamilies. In a challenging benchmark, MaxFlow yields better reliability and double coverage compared to available sequence alignment software. This promises to increase information returns from functional and structural genomics, where reliable sequence alignment is a bottleneck to transferring the functional or structural characterization of model proteins to entire protein families and superfamilies.

References

  1. Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 19, 56--68.Google ScholarGoogle ScholarCross RefCross Ref
  2. Lindahl E, Elofsson A. Identification of related proteins on family, superfamily and fold level. J Mol Biol 2000, 295:613--625.Google ScholarGoogle Scholar
  3. Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L (2001) A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3. Nucl Acids Res 29, 55--57.Google ScholarGoogle ScholarCross RefCross Ref
  4. Holm L, Sander C (1997) An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins 28, 72--82.Google ScholarGoogle ScholarCross RefCross Ref
  5. Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29, 2994--3005.Google ScholarGoogle ScholarCross RefCross Ref
  6. Altschul SF (1991) Amino acid matrices from an information theoretic perspective. J. Mol. Biol. 219, 555--565.Google ScholarGoogle ScholarCross RefCross Ref
  7. Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131--144.Google ScholarGoogle ScholarCross RefCross Ref
  8. Vingron M, Argos P (1991) Motif recognition and alignment for many sequences by comparison of dot-matrices. J. Mol. Biol. 218, 33--43.Google ScholarGoogle ScholarCross RefCross Ref
  9. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205--217.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kececioglu J (1993) The maximum weight trace problem in multiple sequence alignment. In Proceedings of the 4th Symposium on Combinatorial Pattern Matching, No. 684 in Lect. Notes Comput. Sci., Springer, Berlin, pp. 106--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Grundy WN, Bailey TL, Elkan CP, Baker ME (1997) Meta-MEME: motif-based hidden Markov models of protein families. CABIOS 5, 211--221.Google ScholarGoogle Scholar
  12. Morgenstern B (1999) DIALIGN2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211--218.Google ScholarGoogle ScholarCross RefCross Ref
  13. Bork P, Holm L, Koonin E, Sander C (1995) The cytidylyltransferase superfamily: identification of the nucleotide-binding site and fold prediction. Proteins 22, 259--266.Google ScholarGoogle ScholarCross RefCross Ref
  14. Flohil JA, Vriend G, Berendsen HJC (2002) Completion and refinement of 3-D homology models with restricted molecular dynamics: Application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis. Proteins 48, 593--604.Google ScholarGoogle ScholarCross RefCross Ref
  15. Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, Lichtarge O (2002) Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 316, 139--154.Google ScholarGoogle ScholarCross RefCross Ref
  16. Casari G, Sander C, Valencia A (1995) A method to predict functional residues in proteins. Nat Struct Biol 2, 171--178.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Accurate detection of very sparse sequence motifs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology
      April 2003
      352 pages
      ISBN:1581136358
      DOI:10.1145/640075

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 April 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      RECOMB '03 Paper Acceptance Rate35of175submissions,20%Overall Acceptance Rate148of538submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader