Abstract
Determining the function of all proteins is a recurring theme in modern biology and medicine, but the sheer number of proteins makes experimental approaches impractical. For this reason, current efforts have considered in silico function prediction in order to guide and accelerate the function determination process. One approach to predicting protein function is to search functionally uncharacterized protein structures (targets), for substructures with geometric and chemical similarity (matches), to known active sites (motifs). Finding a match can imply that the target has an active site similar to the motif, suggesting functional homology.
An effective function predictor requires effective motifs – motifs whose geometric and chemical characteristics are detected by comparison algorithms within functionally homologous targets (sensitive motifs), which also are not detected within functionally unrelated targets (specific motifs). Designing effective motifs is a difficult open problem. Current approaches select and combine structural, physical, and evolutionary properties to design motifs that mirror functional characteristics of active sites.
We present a new approach, Geometric Sieving (GS), which refines candidate motifs into optimized motifs with maximal geometric and chemical dissimilarity from all known protein structures. The paper discusses both the usefulness and the efficiency of GS. We show that candidate motifs from six well-studied proteins, including α-Chymotrypsin, Dihydrofolate Reductase, and Lysozyme, can be optimized with GS to motifs that are among the most sensitive and specific motifs possible for the candidate motifs. For the same proteins, we also report results that relate evolutionarily important motifs with motifs that exhibit maximal geometric and chemical dissimilarity from all known protein structures. Our current observations show that GS is a powerful tool that can complement existing work on motif design and protein function prediction.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wolfson, H.J., Rigoutsos, I.: Geometric hashing: An overview. IEEE Comp. Sci. Eng. 4(4), 10–21 (1997)
Barker, J.A., Thornton, J.M.: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinf. 19(13), 1644–1649 (2003)
Chen, B.Y., et al.: Algorithms for structural comparison and statistical analysis of 3d protein motifs. In: Proceedings of Pacific Symposium on Biocomputing 2005, pp. 334–345 (2005)
Stark, A., Sunyaev, S., Russell, R.B.: A model for statistical significance of local similarities in structure. J. Mol. Biol. 326, 1307–1316 (2003)
Yao, H., et al.: An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326, 255–261 (2003)
Laskowski, R.A., Watson, J.D., Thornton, J.M.: Protein function prediction using local 3d templates. Journal of Molecular Biology 351, 614–626 (2005)
Porter, C.T., Bartlett, G.J., Thornton, J.M.: The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Research 32, D129–D133 (2004)
Shatsky, M., Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Recognition of binding patterns common to a set of protein structures. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 440–455. Springer, Heidelberg (2005)
Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)
Lichtarge, O., Yamamoto, K.R., Cohen, F.E.: Identification of functional surfaces of the zinc binding domains of intracellular receptors. J. Mol. Biol. 274, 325–327 (1997)
Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221, 709–713 (1983)
Kinoshita, K., Nakamura, H.: Identification of protein biochemical functions by similarity search using the molecular surface database ef-site. Protein Science 12, 1589–1595 (2003)
Shatsky, M., Nussinov, R., Wolfson, H.J.: Flexprot: Alignment of flexible protein structures without a predefinition of hinge regions. Journal of Computational Biology 11(1), 83–106 (2004)
Artymuik, P.J., et al.: A graph-theoretic approach to the identification of three dimensional patterns of amino acid side chains in protein structures. J. Mol. Biol. 243, 327–344 (1994)
Bachar, O., et al.: A computer vision based technique for 3-d sequence independent structural comparison of proteins. Prot. Eng. 6(3), 279–288 (1993)
Rosen, M., et al.: Molecular shape comparisons in searches for active sites and functional similarity. Prot. Eng. 11(4), 263–277 (1998)
Wallace, A.C., Laskowski, R.A., Thornton, J.M.: Derivation of 3D coordinate templates for searching structural databases. Prot. Sci. 5, 1001–1013 (1996)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Amer. Stat. Assoc. 91, 401–407 (1996)
Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selections method for kernel density estimation. J. Roy. Stat. Soc. 53(3), 683–690 (1991)
Berman, H.M., et al.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: Cath- a hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997)
Efron, B., Tibshirani, R.: The bootstrap method for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1(1), 1–35 (1986)
Efron, B.: Better bootstrap confidence intervals (with discussion). J. Amer. Stat. Assoc. 82, 171 (1987)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chappman & Hall, London (1993)
Blow, D.M., Birktoft, J.J., Hartley, B.S.: Role of a buried acid group in the mechanism of action of chymotrypsin. Nature 221(178), 337–340 (1969)
Reyes, V., et al.: Isomorphous crystal structures of Escherichia coli dihydrofolate reductase complexed with folate, 5-deazafolate, and 5,10-dideazatetrahydrofolate: mechanistic implications. Biochemistry 34, 2710–2723 (1995)
Bystroff, C., et al.: Crystal structures of Escherichia coli dihydrofolate reductase: the nadp + holoenzyme and the folate-nadp + ternary complex. substrate binding and a model for the transition state. Biochemistry 29, 3263–3277 (1990)
Knochel, T.R., et al.: The crystal structure of indole-3-glycerol phosphate synthase from the hyperthermophilic archaeon sulfolobus solfataricus in three different crystal forms: effects of ionic strength. J. Mol. Biol. 262, 502–515 (1996)
Huang, C.-C., et al.: Crystal structures of mycolic acid cyclopropane synthases from mycobacterium tuberculosis. J. Biol. Chem. 277, 11559–11569 (2002)
Krengel, U., Dijkstra, B.W.: Three-dimensional structure of endo-1,4-beta-xylanase i from aspergillus niger: Molecular basis for its low ph optimum. J. Mol. Biol. 263, 70–78 (1996)
International Union of Biochemistry. Nomenclature Committee. Enzyme Nomenclature. Academic Press, San Diego, California (1992)
Snir, M., Gropp, W.: MPI: The Complete Reference, 2nd edn. The MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, B.Y. et al. (2006). Geometric Sieving: Automated Distributed Optimization of 3D Motifs for Protein Function Prediction. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_42
Download citation
DOI: https://doi.org/10.1007/11732990_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)