Abstract
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Similar content being viewed by others
Notes
Enzymes database http://www.ebi.ac.uk/thornton-srv/databases/enzymes, and flat file downloaded from http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/data/seqdata.dat.
References
Overington J, Al-Lazikani B, Hopkins A (2006) Nat Rev Drug Discov 5:993
Holm L, Sander C (1996) Science 273:595
Smith LM (1989) Genome 31:929
Burley SK (2000) Nat Struct Biol 7 Suppl:932
Koonin EV, Galperin MY (2002) Sequence-evolution-function: computational approaches in comparative genomics. Kluwer Academic Publishers, Dordrecht, The Netherlands (published online on NCBI bookshelf, 2003)
Aloy P, Querol E, Aviles FX et al (2001) J Mol Biol 311:395
Bandyopadhyay D, Huan J, Liu J et al (2006) Protein Sci 15:1537
Huan J, Bandyopadhyay D, Wang W et al (2005) J Comput Biol 12:657
Huan J, Wang W, Prins J (2003) ICDM ’03: Proceedings of the Third IEEE International Conference on Data Mining
Bandyopadhyay D, Huan J, Prins J et al (2009) J Comput Aided Mol Des. doi:10.1007/s10822-009-9277-0
Gherardini P, Helmer-Citterich M (2008) Brief Funct Genomic Proteomic 7:291
Zhao X, Chen L, Aihara K (2008) Amino Acids 35:517
Redfern O, Dessailly B, Orengo C (2008) Curr Opin Struct Biol 18:394
Rost B (1999) Protein Eng 12:85
Tian W, Skolnick J (2003) J Mol Biol 333:863
Hofmann SK, Bucher P, Falquet L et al (1999) Nucleic Acids Res 27(1):215
Gribskov M, Luthy R, Eisenberg D (1990) Meth Enzymol 183:146
Altschul SF, Madden TL, Schaffer AA et al (1997) Nucleic Acids Res 25:3389
Krogh A, Brown M, Mian IS et al (1994) J Mol Biol 235:1501
Madera M, Gough J (2002) Nucleic Acids Res 30:4321
Lichtarge O, Bourne HR, Cohen FE (1996) J Mol Biol 257:342
Kristensen D, Ward R, Lisewski A et al (2008) BMC Bioinformatics 9:17
Ward R, Erdin S, Tran T et al (2008) PLoS ONE 3:e2136
Koonin EV, Makarova KS, Aravind L (2001) Annu Rev Microbiol 55:709
Tatusov RL, Koonin EV, Lipman DJ (1997) Science 278:631
Bowers PM, Pellegrini M, Thompson MJ et al (2004) Genome Biol 5:R35
Date SV, Marcotte EM (2005) Bioinformatics 21:2558
Thomas J, Ramakrishnan N, Bailey-Kellogg C (2008) IEEE/ACM Trans Comput Biol Bioinform 5:183
Song N, Joseph J, Davis G et al (2008) PLoS Comput Biol 4:e1000063
Lanczycki C, Chakrabarti S (2008) Bioinformation 2:279
Espadaler J, Eswar N, Querol E et al (2008) BMC Bioinformatics 9:249
Taylor W, Orengo C (1989) J Mol Biol 208:1
Andreeva A, Howorth D, Brenner SE et al (2004) Nucleic Acids Res 32:D226
Orengo C, Michie A, Jones S et al (1997) Structure 5:1093
Gibrat J, Madej T, Bryant S (1996) Curr Opin Struct Biol 6:377
Krissinel EB, Henrick K (2004) Softw Pract Exp 34:591
Holm L, Sander C (1997) In: Gaasterland T, Karp PD, Karplus K, Ouzonis CA, Sander C, Valencia A (eds) ISMB’97. 5th International conference on intelligent systems for molecular biology, Halkidiki, Greece, June 1997, p 140
Hegyi H, Gerstein M (1999) J Mol Biol 288:147
Glaser F, Pupko T, Paz I et al (2003) Bioinformatics 19:163
Liang M, Brutlag D, Altman R (2003) In: Altman RB, Dunker AK, Hunter L, Jung TA (eds) PSB’03. 8th Pacific symposium on biocomputing, Hawaii, January 2003, p 204
Russell RB (1998) J Mol Biol 279:1211
Stark A, Russell R (2003) Nucleic Acids Res 31:3341
Stark A, Shkumatov A, Russell RB (2004) Structure (Camb) 12:1405
Bradley P, Kim PS, Berger B (2002) Proc Natl Acad Sci 99:8500
Jambon M, Andrieu O, Combet C et al (2005) Bioinformatics 21:3929
Nussinov R, Wolfson HJ (1991) PNAS 88:10495
Barker J, Thornton J (2003) Bioinformatics 19:1644
Shulman-Peleg A, Nussinov R, Wolfson H (2004) J Mol Biol 339:607
Binkowski TA, Freeman P, Liang J (2004) Nucleic Acid Res 32:W555
Laskowski RA, Luscombe NM, Swindells MB et al (1996) Protein Sci 5:2438
Ferre F, Ausiello G, Zanzoni A et al (2004) Nucleic Acids Res 32:D240
Taylor WR, Jonassen I (2004) Proteins 56:222
Artymiuk PJ, Poirrette AR, Grindley HM et al (1994) J Mol Biol 243:327
Gardiner EJ, Artymiuk PJ, Willett P (1997) J Mol Graph Model 15:245
Samudrala R, Moult J (1998) J Mol Biol 279(1):287
Schmitt S, Kuhn D, Klebe G (2002) J Mol Biol 323(2):387
Stark A, Sunyaev S, Russell RB (1998) J Mol Biol 326:1307
Wangikar PP, Tendulkar AV, Ramya S et al (2003) J Mol Biol 326:955
Milik M, Szalma S, Olszewski K (2003) Protein Eng 16(8):543
Turcotte M, Muggleton S, Sternberg M (2001) J Mol Biol 306(3):591
Fetrow JS, Skolnick J (1998) J Mol Biol 281:949
Murga L, Wei Y, Ondrechen M (2007) Genome Inform 19:107
Xie L, Bourne P (2007) BMC Bioinformatics 8 Suppl 4:S9
Weskamp N, Kuhn D, Hullermeier E et al (2004) Bioinformatics 20:1522
Laskowski RA, Watson JD, Thornton JM (2005) Nucleic Acids Res 33:W89
Mulder N, Apweiler R (2008) Curr Protoc Bioinformatics Chapter 2: Unit 2.7
Gough J, Chothia C (2002) Nucleic Acids Res 30:268
Hendlich M, Bergner A, Gunther J et al (2003) J Mol Biol 326:607
Porter CT, Bartlett GJ, Thornton JM (2004) Nucleic Acids Res 32:D129
Jones S, Barker JA, Nobeli I et al (2003) Nucleic Acids Res 31:2811
Milner-White EJ, Nissink JW, Allen FH et al (2004) Acta Crystallogr D Biol Crystallogr 60:1935
Laskowski R, Watson J, Thornton J (2005) J Mol Biol 351:614
Watson J, Sanderson S, Ezersky A et al (2007) J Mol Biol 367:1511
Bandyopadhyay D, Snoeyink J (2004) ACM-SIAM Symposium On Discrete Algorithms. New Orleans, LA, USA
Ullman JR (1976) J Assoc Comput Mach 23:31
Bairoch A (2000) Nucleic Acids Res 28:304
Gene Ontology Consortium (2004) Nucleic Acids Res 32:D258
Wang G, Dunbrack RL (2003) Bioinformatics 19:1589 http://www.fccc.edu/research/labs/dunbrack/pisces/culledpdb.html
Huan J, Bandyopadhyay D, Snoeyink J et al (2006) IEEE Computational Systems Bioinformatics Conference (CSB). Stanford, CA, USA
Huan J, Wang W, Bandyopadhyay D et al (2004) In: Gusfield D, Bourne P, Istrail S (eds) RECOMB’04. 8th Annual international conference on research in computational molecular biology, San Diego, April 2004, p 308
Huan J, Wang W, Prins J et al (2004) In: Kohavi R, Gehrke J, DuMouchel W, Ghosh J (eds) ACM SIGKDD’04. 10th International conference on knowledge discovery and data mining, Chicago, August 2004, p 581
Pegg SC, Brown S, Ojha S et al (2005) In: Altman RB, Dunker AK, Hunter L, Jung TA (eds) PSB’05. 10th Pacific symposium on biocomputing, Hawaii, January 2005, p 358
Babbitt PC (2003) Curr Opin Chem Biol 7:230
Wilson CA, Kreychman J, Gerstein M (2000) J Mol Biol 297:233
Lindqvist Y, Schneider G (1997) Curr Opin Struct Biol 7:422
Grishin NV (2001) J Struct Biol 134:167
Keller J, Smith P, Benach J et al (2002) Structure 10:1475
Fetrow JS, Siew N, Di Gennaro JA et al (2001) Protein Sci 10:1005
Michalovich D, Overington J, Fagan R (2002) Curr Opin Pharmacol 2:574
Hegyi H, Gerstein M (2001) Genome Res 11:1632
Nagano N, Orengo C, Thornton J (2002) J Mol Biol 321:741
Petsko G, Ringe D (2004) Protein structure and function. New Science Press Ltd, Waltham, MA, USA
Leibowitz N, Fligelman Z, Nussinov R et al (2001) Proteins 43:235
Wang K, Samudrala R (2006) BMC Bioinformatics 7:278
Hambly K, Danzer J, Muskal S et al (2006) Mol Divers 10:273
Xie L (2004) WIPO patent http://www.wipo.int/pctdb/en/wo.jsp?WO=2005045424
Xie L, Bourne P (2008) Proc Natl Acad Sci USA 105:5441
Pazos F, Sternberg MJ (2004) Proc Natl Acad Sci USA 101:14754
Pal D, Eisenberg D (2005) Structure (Camb) 13:121
Kleywegt GJ (1999) J Mol Biol 285(4):1887
Acknowledgments
These studies were supported by NIH grant GM068665 and NSF grant CCF-0523875.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Bandyopadhyay, D., Huan, J., Prins, J. et al. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J Comput Aided Mol Des 23, 773–784 (2009). https://doi.org/10.1007/s10822-009-9273-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-009-9273-4