Skip to main content
Log in

Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Enzymes database http://www.ebi.ac.uk/thornton-srv/databases/enzymes, and flat file downloaded from http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/data/seqdata.dat.

References

  1. Overington J, Al-Lazikani B, Hopkins A (2006) Nat Rev Drug Discov 5:993

    Article  CAS  Google Scholar 

  2. Holm L, Sander C (1996) Science 273:595

    Article  CAS  Google Scholar 

  3. Smith LM (1989) Genome 31:929

    CAS  Google Scholar 

  4. Burley SK (2000) Nat Struct Biol 7 Suppl:932

    Article  CAS  Google Scholar 

  5. Koonin EV, Galperin MY (2002) Sequence-evolution-function: computational approaches in comparative genomics. Kluwer Academic Publishers, Dordrecht, The Netherlands (published online on NCBI bookshelf, 2003)

  6. Aloy P, Querol E, Aviles FX et al (2001) J Mol Biol 311:395

    Article  CAS  Google Scholar 

  7. Bandyopadhyay D, Huan J, Liu J et al (2006) Protein Sci 15:1537

    Article  CAS  Google Scholar 

  8. Huan J, Bandyopadhyay D, Wang W et al (2005) J Comput Biol 12:657

    Article  CAS  Google Scholar 

  9. Huan J, Wang W, Prins J (2003) ICDM ’03: Proceedings of the Third IEEE International Conference on Data Mining

  10. Bandyopadhyay D, Huan J, Prins J et al (2009) J Comput Aided Mol Des. doi:10.1007/s10822-009-9277-0

  11. Gherardini P, Helmer-Citterich M (2008) Brief Funct Genomic Proteomic 7:291

    Article  CAS  Google Scholar 

  12. Zhao X, Chen L, Aihara K (2008) Amino Acids 35:517

    Article  CAS  Google Scholar 

  13. Redfern O, Dessailly B, Orengo C (2008) Curr Opin Struct Biol 18:394

    Article  CAS  Google Scholar 

  14. Rost B (1999) Protein Eng 12:85

    Article  CAS  Google Scholar 

  15. Tian W, Skolnick J (2003) J Mol Biol 333:863

    Article  CAS  Google Scholar 

  16. Hofmann SK, Bucher P, Falquet L et al (1999) Nucleic Acids Res 27(1):215

    Article  CAS  Google Scholar 

  17. Gribskov M, Luthy R, Eisenberg D (1990) Meth Enzymol 183:146

    Article  CAS  Google Scholar 

  18. Altschul SF, Madden TL, Schaffer AA et al (1997) Nucleic Acids Res 25:3389

    Article  CAS  Google Scholar 

  19. Krogh A, Brown M, Mian IS et al (1994) J Mol Biol 235:1501

    Article  CAS  Google Scholar 

  20. Madera M, Gough J (2002) Nucleic Acids Res 30:4321

    Article  CAS  Google Scholar 

  21. Lichtarge O, Bourne HR, Cohen FE (1996) J Mol Biol 257:342

    Article  CAS  Google Scholar 

  22. Kristensen D, Ward R, Lisewski A et al (2008) BMC Bioinformatics 9:17

    Article  Google Scholar 

  23. Ward R, Erdin S, Tran T et al (2008) PLoS ONE 3:e2136

    Article  Google Scholar 

  24. Koonin EV, Makarova KS, Aravind L (2001) Annu Rev Microbiol 55:709

    Article  CAS  Google Scholar 

  25. Tatusov RL, Koonin EV, Lipman DJ (1997) Science 278:631

    Article  CAS  Google Scholar 

  26. Bowers PM, Pellegrini M, Thompson MJ et al (2004) Genome Biol 5:R35

    Article  Google Scholar 

  27. Date SV, Marcotte EM (2005) Bioinformatics 21:2558

    Article  CAS  Google Scholar 

  28. Thomas J, Ramakrishnan N, Bailey-Kellogg C (2008) IEEE/ACM Trans Comput Biol Bioinform 5:183

    Article  CAS  Google Scholar 

  29. Song N, Joseph J, Davis G et al (2008) PLoS Comput Biol 4:e1000063

    Article  Google Scholar 

  30. Lanczycki C, Chakrabarti S (2008) Bioinformation 2:279

    Google Scholar 

  31. Espadaler J, Eswar N, Querol E et al (2008) BMC Bioinformatics 9:249

    Article  Google Scholar 

  32. Taylor W, Orengo C (1989) J Mol Biol 208:1

    Article  CAS  Google Scholar 

  33. Andreeva A, Howorth D, Brenner SE et al (2004) Nucleic Acids Res 32:D226

    Article  CAS  Google Scholar 

  34. Orengo C, Michie A, Jones S et al (1997) Structure 5:1093

    Article  CAS  Google Scholar 

  35. Gibrat J, Madej T, Bryant S (1996) Curr Opin Struct Biol 6:377

    Article  CAS  Google Scholar 

  36. Krissinel EB, Henrick K (2004) Softw Pract Exp 34:591

    Article  Google Scholar 

  37. Holm L, Sander C (1997) In: Gaasterland T, Karp PD, Karplus K, Ouzonis CA, Sander C, Valencia A (eds) ISMB’97. 5th International conference on intelligent systems for molecular biology, Halkidiki, Greece, June 1997, p 140

  38. Hegyi H, Gerstein M (1999) J Mol Biol 288:147

    Article  CAS  Google Scholar 

  39. Glaser F, Pupko T, Paz I et al (2003) Bioinformatics 19:163

    Article  CAS  Google Scholar 

  40. Liang M, Brutlag D, Altman R (2003) In: Altman RB, Dunker AK, Hunter L, Jung TA (eds) PSB’03. 8th Pacific symposium on biocomputing, Hawaii, January 2003, p 204

  41. Russell RB (1998) J Mol Biol 279:1211

    Article  CAS  Google Scholar 

  42. Stark A, Russell R (2003) Nucleic Acids Res 31:3341

    Article  CAS  Google Scholar 

  43. Stark A, Shkumatov A, Russell RB (2004) Structure (Camb) 12:1405

    Article  CAS  Google Scholar 

  44. Bradley P, Kim PS, Berger B (2002) Proc Natl Acad Sci 99:8500

    Google Scholar 

  45. Jambon M, Andrieu O, Combet C et al (2005) Bioinformatics 21:3929

    Article  CAS  Google Scholar 

  46. Nussinov R, Wolfson HJ (1991) PNAS 88:10495

    Article  CAS  Google Scholar 

  47. Barker J, Thornton J (2003) Bioinformatics 19:1644

    Article  CAS  Google Scholar 

  48. Shulman-Peleg A, Nussinov R, Wolfson H (2004) J Mol Biol 339:607

    Article  CAS  Google Scholar 

  49. Binkowski TA, Freeman P, Liang J (2004) Nucleic Acid Res 32:W555

    Article  CAS  Google Scholar 

  50. Laskowski RA, Luscombe NM, Swindells MB et al (1996) Protein Sci 5:2438

    CAS  Google Scholar 

  51. Ferre F, Ausiello G, Zanzoni A et al (2004) Nucleic Acids Res 32:D240

    Article  CAS  Google Scholar 

  52. Taylor WR, Jonassen I (2004) Proteins 56:222

    Article  CAS  Google Scholar 

  53. Artymiuk PJ, Poirrette AR, Grindley HM et al (1994) J Mol Biol 243:327

    Article  CAS  Google Scholar 

  54. Gardiner EJ, Artymiuk PJ, Willett P (1997) J Mol Graph Model 15:245

    Article  CAS  Google Scholar 

  55. Samudrala R, Moult J (1998) J Mol Biol 279(1):287

    Article  CAS  Google Scholar 

  56. Schmitt S, Kuhn D, Klebe G (2002) J Mol Biol 323(2):387

    Article  CAS  Google Scholar 

  57. Stark A, Sunyaev S, Russell RB (1998) J Mol Biol 326:1307

    Article  Google Scholar 

  58. Wangikar PP, Tendulkar AV, Ramya S et al (2003) J Mol Biol 326:955

    Article  CAS  Google Scholar 

  59. Milik M, Szalma S, Olszewski K (2003) Protein Eng 16(8):543

    Article  CAS  Google Scholar 

  60. Turcotte M, Muggleton S, Sternberg M (2001) J Mol Biol 306(3):591

    Article  CAS  Google Scholar 

  61. Fetrow JS, Skolnick J (1998) J Mol Biol 281:949

    Article  CAS  Google Scholar 

  62. Murga L, Wei Y, Ondrechen M (2007) Genome Inform 19:107

    Article  CAS  Google Scholar 

  63. Xie L, Bourne P (2007) BMC Bioinformatics 8 Suppl 4:S9

    Article  Google Scholar 

  64. Weskamp N, Kuhn D, Hullermeier E et al (2004) Bioinformatics 20:1522

    Article  CAS  Google Scholar 

  65. Laskowski RA, Watson JD, Thornton JM (2005) Nucleic Acids Res 33:W89

    Article  CAS  Google Scholar 

  66. Mulder N, Apweiler R (2008) Curr Protoc Bioinformatics Chapter 2: Unit 2.7

  67. Gough J, Chothia C (2002) Nucleic Acids Res 30:268

    Article  CAS  Google Scholar 

  68. Hendlich M, Bergner A, Gunther J et al (2003) J Mol Biol 326:607

    Article  CAS  Google Scholar 

  69. Porter CT, Bartlett GJ, Thornton JM (2004) Nucleic Acids Res 32:D129

    Article  CAS  Google Scholar 

  70. Jones S, Barker JA, Nobeli I et al (2003) Nucleic Acids Res 31:2811

    Article  CAS  Google Scholar 

  71. Milner-White EJ, Nissink JW, Allen FH et al (2004) Acta Crystallogr D Biol Crystallogr 60:1935

    Article  Google Scholar 

  72. Laskowski R, Watson J, Thornton J (2005) J Mol Biol 351:614

    Article  CAS  Google Scholar 

  73. Watson J, Sanderson S, Ezersky A et al (2007) J Mol Biol 367:1511

    Article  CAS  Google Scholar 

  74. Bandyopadhyay D, Snoeyink J (2004) ACM-SIAM Symposium On Discrete Algorithms. New Orleans, LA, USA

  75. Ullman JR (1976) J Assoc Comput Mach 23:31

    Google Scholar 

  76. Bairoch A (2000) Nucleic Acids Res 28:304

    Article  CAS  Google Scholar 

  77. Gene Ontology Consortium (2004) Nucleic Acids Res 32:D258

    Article  Google Scholar 

  78. Wang G, Dunbrack RL (2003) Bioinformatics 19:1589 http://www.fccc.edu/research/labs/dunbrack/pisces/culledpdb.html

  79. Huan J, Bandyopadhyay D, Snoeyink J et al (2006) IEEE Computational Systems Bioinformatics Conference (CSB). Stanford, CA, USA

  80. Huan J, Wang W, Bandyopadhyay D et al (2004) In: Gusfield D, Bourne P, Istrail S (eds) RECOMB’04. 8th Annual international conference on research in computational molecular biology, San Diego, April 2004, p 308

  81. Huan J, Wang W, Prins J et al (2004) In: Kohavi R, Gehrke J, DuMouchel W, Ghosh J (eds) ACM SIGKDD’04. 10th International conference on knowledge discovery and data mining, Chicago, August 2004, p 581

  82. Pegg SC, Brown S, Ojha S et al (2005) In: Altman RB, Dunker AK, Hunter L, Jung TA (eds) PSB’05. 10th Pacific symposium on biocomputing, Hawaii, January 2005, p 358

  83. Babbitt PC (2003) Curr Opin Chem Biol 7:230

    Article  CAS  Google Scholar 

  84. Wilson CA, Kreychman J, Gerstein M (2000) J Mol Biol 297:233

    Article  CAS  Google Scholar 

  85. Lindqvist Y, Schneider G (1997) Curr Opin Struct Biol 7:422

    Article  CAS  Google Scholar 

  86. Grishin NV (2001) J Struct Biol 134:167

    Article  CAS  Google Scholar 

  87. Keller J, Smith P, Benach J et al (2002) Structure 10:1475

    Article  CAS  Google Scholar 

  88. Fetrow JS, Siew N, Di Gennaro JA et al (2001) Protein Sci 10:1005

    Article  CAS  Google Scholar 

  89. Michalovich D, Overington J, Fagan R (2002) Curr Opin Pharmacol 2:574

    Article  CAS  Google Scholar 

  90. Hegyi H, Gerstein M (2001) Genome Res 11:1632

    Article  CAS  Google Scholar 

  91. Nagano N, Orengo C, Thornton J (2002) J Mol Biol 321:741

    Article  CAS  Google Scholar 

  92. Petsko G, Ringe D (2004) Protein structure and function. New Science Press Ltd, Waltham, MA, USA

    Google Scholar 

  93. Leibowitz N, Fligelman Z, Nussinov R et al (2001) Proteins 43:235

    Article  CAS  Google Scholar 

  94. Wang K, Samudrala R (2006) BMC Bioinformatics 7:278

    Article  Google Scholar 

  95. Hambly K, Danzer J, Muskal S et al (2006) Mol Divers 10:273

    Article  CAS  Google Scholar 

  96. Xie L (2004) WIPO patent http://www.wipo.int/pctdb/en/wo.jsp?WO=2005045424

  97. Xie L, Bourne P (2008) Proc Natl Acad Sci USA 105:5441

    Article  CAS  Google Scholar 

  98. Pazos F, Sternberg MJ (2004) Proc Natl Acad Sci USA 101:14754

    Article  CAS  Google Scholar 

  99. Pal D, Eisenberg D (2005) Structure (Camb) 13:121

    Article  CAS  Google Scholar 

  100. Kleywegt GJ (1999) J Mol Biol 285(4):1887

    Article  CAS  Google Scholar 

Download references

Acknowledgments

These studies were supported by NIH grant GM068665 and NSF grant CCF-0523875.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Deepak Bandyopadhyay or Alexander Tropsha.

Electronic supplementary material

Below is the link to the electronic supplementary material.

PDF (271 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bandyopadhyay, D., Huan, J., Prins, J. et al. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J Comput Aided Mol Des 23, 773–784 (2009). https://doi.org/10.1007/s10822-009-9273-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-009-9273-4

Keywords

Navigation