Skip to main content

Geometric Sieving: Automated Distributed Optimization of 3D Motifs for Protein Function Prediction

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2006)

Abstract

Determining the function of all proteins is a recurring theme in modern biology and medicine, but the sheer number of proteins makes experimental approaches impractical. For this reason, current efforts have considered in silico function prediction in order to guide and accelerate the function determination process. One approach to predicting protein function is to search functionally uncharacterized protein structures (targets), for substructures with geometric and chemical similarity (matches), to known active sites (motifs). Finding a match can imply that the target has an active site similar to the motif, suggesting functional homology.

An effective function predictor requires effective motifs – motifs whose geometric and chemical characteristics are detected by comparison algorithms within functionally homologous targets (sensitive motifs), which also are not detected within functionally unrelated targets (specific motifs). Designing effective motifs is a difficult open problem. Current approaches select and combine structural, physical, and evolutionary properties to design motifs that mirror functional characteristics of active sites.

We present a new approach, Geometric Sieving (GS), which refines candidate motifs into optimized motifs with maximal geometric and chemical dissimilarity from all known protein structures. The paper discusses both the usefulness and the efficiency of GS. We show that candidate motifs from six well-studied proteins, including α-Chymotrypsin, Dihydrofolate Reductase, and Lysozyme, can be optimized with GS to motifs that are among the most sensitive and specific motifs possible for the candidate motifs. For the same proteins, we also report results that relate evolutionarily important motifs with motifs that exhibit maximal geometric and chemical dissimilarity from all known protein structures. Our current observations show that GS is a powerful tool that can complement existing work on motif design and protein function prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Wolfson, H.J., Rigoutsos, I.: Geometric hashing: An overview. IEEE Comp. Sci. Eng. 4(4), 10–21 (1997)

    Article  Google Scholar 

  2. Barker, J.A., Thornton, J.M.: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinf. 19(13), 1644–1649 (2003)

    Article  Google Scholar 

  3. Chen, B.Y., et al.: Algorithms for structural comparison and statistical analysis of 3d protein motifs. In: Proceedings of Pacific Symposium on Biocomputing 2005, pp. 334–345 (2005)

    Google Scholar 

  4. Stark, A., Sunyaev, S., Russell, R.B.: A model for statistical significance of local similarities in structure. J. Mol. Biol. 326, 1307–1316 (2003)

    Article  Google Scholar 

  5. Yao, H., et al.: An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326, 255–261 (2003)

    Article  Google Scholar 

  6. Laskowski, R.A., Watson, J.D., Thornton, J.M.: Protein function prediction using local 3d templates. Journal of Molecular Biology 351, 614–626 (2005)

    Article  Google Scholar 

  7. Porter, C.T., Bartlett, G.J., Thornton, J.M.: The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Research 32, D129–D133 (2004)

    Google Scholar 

  8. Shatsky, M., Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Recognition of binding patterns common to a set of protein structures. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 440–455. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)

    Article  Google Scholar 

  10. Lichtarge, O., Yamamoto, K.R., Cohen, F.E.: Identification of functional surfaces of the zinc binding domains of intracellular receptors. J. Mol. Biol. 274, 325–327 (1997)

    Article  Google Scholar 

  11. Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221, 709–713 (1983)

    Article  Google Scholar 

  12. Kinoshita, K., Nakamura, H.: Identification of protein biochemical functions by similarity search using the molecular surface database ef-site. Protein Science 12, 1589–1595 (2003)

    Article  Google Scholar 

  13. Shatsky, M., Nussinov, R., Wolfson, H.J.: Flexprot: Alignment of flexible protein structures without a predefinition of hinge regions. Journal of Computational Biology 11(1), 83–106 (2004)

    Article  Google Scholar 

  14. Artymuik, P.J., et al.: A graph-theoretic approach to the identification of three dimensional patterns of amino acid side chains in protein structures. J. Mol. Biol. 243, 327–344 (1994)

    Article  Google Scholar 

  15. Bachar, O., et al.: A computer vision based technique for 3-d sequence independent structural comparison of proteins. Prot. Eng. 6(3), 279–288 (1993)

    Article  Google Scholar 

  16. Rosen, M., et al.: Molecular shape comparisons in searches for active sites and functional similarity. Prot. Eng. 11(4), 263–277 (1998)

    Article  Google Scholar 

  17. Wallace, A.C., Laskowski, R.A., Thornton, J.M.: Derivation of 3D coordinate templates for searching structural databases. Prot. Sci. 5, 1001–1013 (1996)

    Article  Google Scholar 

  18. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    MATH  Google Scholar 

  19. Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Amer. Stat. Assoc. 91, 401–407 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  20. Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selections method for kernel density estimation. J. Roy. Stat. Soc. 53(3), 683–690 (1991)

    MATH  MathSciNet  Google Scholar 

  21. Berman, H.M., et al.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)

    Article  Google Scholar 

  22. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: Cath- a hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997)

    Article  Google Scholar 

  23. Efron, B., Tibshirani, R.: The bootstrap method for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1(1), 1–35 (1986)

    MathSciNet  Google Scholar 

  24. Efron, B.: Better bootstrap confidence intervals (with discussion). J. Amer. Stat. Assoc. 82, 171 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  25. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chappman & Hall, London (1993)

    MATH  Google Scholar 

  26. Blow, D.M., Birktoft, J.J., Hartley, B.S.: Role of a buried acid group in the mechanism of action of chymotrypsin. Nature 221(178), 337–340 (1969)

    Article  Google Scholar 

  27. Reyes, V., et al.: Isomorphous crystal structures of Escherichia coli dihydrofolate reductase complexed with folate, 5-deazafolate, and 5,10-dideazatetrahydrofolate: mechanistic implications. Biochemistry 34, 2710–2723 (1995)

    Article  Google Scholar 

  28. Bystroff, C., et al.: Crystal structures of Escherichia coli dihydrofolate reductase: the nadp +  holoenzyme and the folate-nadp +  ternary complex. substrate binding and a model for the transition state. Biochemistry 29, 3263–3277 (1990)

    Article  Google Scholar 

  29. Knochel, T.R., et al.: The crystal structure of indole-3-glycerol phosphate synthase from the hyperthermophilic archaeon sulfolobus solfataricus in three different crystal forms: effects of ionic strength. J. Mol. Biol. 262, 502–515 (1996)

    Article  Google Scholar 

  30. Huang, C.-C., et al.: Crystal structures of mycolic acid cyclopropane synthases from mycobacterium tuberculosis. J. Biol. Chem. 277, 11559–11569 (2002)

    Article  Google Scholar 

  31. Krengel, U., Dijkstra, B.W.: Three-dimensional structure of endo-1,4-beta-xylanase i from aspergillus niger: Molecular basis for its low ph optimum. J. Mol. Biol. 263, 70–78 (1996)

    Article  Google Scholar 

  32. International Union of Biochemistry. Nomenclature Committee. Enzyme Nomenclature. Academic Press, San Diego, California (1992)

    Google Scholar 

  33. Snir, M., Gropp, W.: MPI: The Complete Reference, 2nd edn. The MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, B.Y. et al. (2006). Geometric Sieving: Automated Distributed Optimization of 3D Motifs for Protein Function Prediction. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_42

Download citation

  • DOI: https://doi.org/10.1007/11732990_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33295-4

  • Online ISBN: 978-3-540-33296-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics