Skip to main content

ATDD: An Algorithmic Tool for Domain Discovery in Protein Sequences

  • Conference paper
Book cover Algorithms in Bioinformatics (WABI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

  • 618 Accesses

Abstract

The problem of identifying sequence domains is essential for understanding protein function. Most current methods for protein domain identification rely on prior knowledge of homologous domains and construction of high quality multiple sequence alignments. With rapid accumulation of enormous data from genome sequencing, it is important to be able to automatically determine domain regions from a set of proteins solely based on sequence information.

We describe a new algorithm for automatic protein domain detection that does not require multiple sequence alignment and differs from alignment based methods by allowing arbitrary rearrangements (both in relative ordering and distance) of the domains within the set of proteins under study. Moreover, our algorithm extracts domains by simply performing a comparative analysis of a given set of sequences, and no auxiliary information is required. The method views protein sequences as collections of overlapping fixed length blocks. A pair of blocks within a sequence gets a “vote of confidence” to be part of a domain if several other sequences have similar pairs of blocks at roughly the same distance from each other. Candidate domains are then identified by discovering regions in each protein sequence where most block pairs get strong votes of confidence. We applied our method on several test data sets with a fixed choice of parameters. To evaluate the results we computed sensitivity and specificity measures using SMART-derived domain annotations as a reference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  2. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  3. Bork, P., Koonin, E.: Predicting functions from protein sequences–where are the bottlenecks. Nat. Genet. 18, 313–318 (1998)

    Article  Google Scholar 

  4. Hegyi, H., Bork, P.: On the classification and evolution of protein modules. J. Protein Chem. 16, 545–551 (1997)

    Article  Google Scholar 

  5. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database 32, D138–141 (2004)

    Google Scholar 

  6. Sonnhammer, E., Eddy, S., Birney, E., Bateman, A., Durbin, R.: Pfam: multiple sequence alignments and hmm-profiles of protein domains. Nucl. Acids. Res. 26, 320–322 (1998)

    Article  Google Scholar 

  7. Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P., Bork, P.: Recent improvements to the SMART domain-based sequence annotation resource. Nucl. Acids. Res. 30, 242–244 (2002)

    Article  Google Scholar 

  8. Henikoff, J., Pietrokovski, S., McCallum, C., Henikoff, S.: Blocks-based methods for detecting protein homology. Electrophoresis 21, 1700–1706 (2000)

    Article  Google Scholar 

  9. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J.A., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucl. Acids. Res. 30, 235–238 (2002)

    Article  Google Scholar 

  10. Mulder, N., Apweiler, R., Attwood, T., Bairoch, A., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R., Courcelle, E., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Griffith-Jones, S., Haft, D., Hermjakob, H., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Orchard, S., Pagni, M., Peyruc, D., Ponting, C., Servant, F., Sigrist, C.: Interpro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform 3, 225–235 (2002)

    Article  Google Scholar 

  11. Attwood, T., Beck, M., Bleasby, A., Parry-Smith, D.: PRINTS–a database of protein motif fingerprints. Nucl. Acids. Res. 22, 3590–3596 (1994)

    Google Scholar 

  12. Corpet, F., Servant, F., Gouzy, J., Kahn, D.: ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucl. Acids. Res. 28, 267–269 (2000)

    Article  Google Scholar 

  13. Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., Bork, P.: SMART 4.0: towards genomic data integration. Nucl. Acids. Res. 32, D142–144 (2004)

    Article  Google Scholar 

  14. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    Article  Google Scholar 

  15. Wootton, J.C., Federhen, S.: Statistics of local complexity in amino acid sequences and sequence databases. Computers in Chemistry 17, 149–163 (1993)

    Article  MATH  Google Scholar 

  16. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Angelov, S., Khanna, S., Li, L., Pereira, F. (2004). ATDD: An Algorithmic Tool for Domain Discovery in Protein Sequences. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30219-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23018-2

  • Online ISBN: 978-3-540-30219-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics