Analysis and prediction of loop segments in protein structures

https://doi.org/10.1016/j.compchemeng.2004.07.017Get rights and content

Abstract

The accurate modeling of loop segments in proteins is an important component of the overall protein folding problem. The challenge of the protein folding problem is to understand and predict the formation of the native three-dimensional structure of a protein given its primary amino acid sequence. In this paper, two methods are introduced to determine the structure of loop segments within the context of ASTRO-FOLD, an overall approach for the structure prediction of proteins. These approaches address a more difficult problem than that of traditional loop prediction in the sense that the separation distances between the loop stem regions are not assumed to be known a priori. When considering these additional degrees of freedom, the proposed methods perform extremely well, which is a result of both new modeling and algorithmic developments. In particular, the methods are validated on a testbed of benchmark protein systems, as well as a number of blind predictions from the recent CASP5 experiment.

Introduction

Loops, those segments which connect elements of secondary structure in the protein fold, are often exposed or surfacial features of the protein structure. As a result, loops can be important for defining differences in binding and activity characteristics for a fold family because functional variability is often related to the structural differences in the exposed regions.

Exploring the conformational space of a loop segment is a difficult undertaking given the large structural variability often observed in the loop regions of experimentally determined protein structures. For example, it is not unusual for loop fragments with the identical seven or nine residue sequence to exhibit highly dissimilar structures. These difficulties are compounded by the typically low sequence identities among the loop segments, which makes the application of comparative modeling techniques often inaccurate. As a result, the prediction of loop conformations is treated in a manner similar to generic protein structure prediction. Two types of approaches are typically pursued: those based on the optimization of energy functions Fiser et al., 2000, Rapp and Friesner, 1999, Xiang et al., 2002, and those directed by the statistical analyses of loop conformations Donate et al., 1996, Tramontano and Lesk, 1992, Greer, 1980.

Optimization based methods attempt to treat the loop prediction problem in a general manner. Loop segments can be described through a variety of all atom, unified atom or continuum based representations. If the loop stems are fixed a priori, a number of algorithms can be used to generate feasible loop conformations Shenkin et al., 1987, Wedemeyer and Scheraga, 1999. A free energy function is used to model those interactions within the loop and those between the loop and its environment. Loop predictions require minimization of the free energy to identify the most stable conformation of the loop. The difficulties reflect the need for accurate force fields to correctly model the loop segment.

Statistical methods rely on the identification of database derived loop segments to fit the flanking residue units on either side of a loop. A number of potential structural segments are first identified and then further discriminated according to geometric or energetic criteria. Structural refinement is utilized to rank the set of potential segments. Statistical methods can be accurate when addressing a specific class of loops, or for loops that are well represented among a suite of homologous sequences. However, these approaches suffer from their database dependence and their limited sampling when compared to the exponential growth in the number of allowable conformations as the segment length grows.

In this work two novel approaches are introduced to explore the conformations of loop segments. The goal of the two approaches is to aid in the successful ab initio structure prediction of proteins using the ASTRO-FOLD methodology Klepeis and Floudas, 2003a, Klepeis and Floudas, 2003b. Since both methods are designed for use in a truly ab initio framework for structure prediction, only minimal information regarding the structure of the residues that flank the loop segment is known. Most importantly, an inherent assumption common to many existing loop models—that is, the requirement of fixing the orientation and distance between the flanking loop stem residues—is not imposed. Both methods directly utilize the optimization of energy functions.

In the sequel, a brief introduction to the ASTRO-FOLD methodology is given in order to first describe the conditions under which the loop prediction approaches are designed to operate. This is followed by a detailed description of the modeling and optimization components of each approach. Finally, loop prediction results are presented for a set of benchmarks proteins, as well as for a number of proteins from the recent CASP5 experiment.

Section snippets

Modeling and computational methodology

Before providing the details of the loop prediction approaches, it is instructive to understand the context under which these methods are used. Specifically, although these loop prediction methods can be employed independently, their development was inspired specifically for application to the ASTRO-FOLD ab initio structure prediction approach (Klepeis & Floudas, 2003a, 2003b). A schematic illustration of the ASTRO-FOLD methodology is given in Fig. 1. ASTRO-FOLD is a four stage approach that

Results and discussion

The loop prediction approaches have been applied within the context of the ASTRO-FOLD approach to a number of test systems. Initial tests included a number of benchmark case studies for protein structure prediction (Klepeis & Floudas, 2003a). More recently, a set of results has been compiled based on blind predictions for a number of protein systems as part of the CASP5 experiment (Klepeis & Floudas, 2003b).

Conclusions

The presented loop prediction approaches play an important role in restraining and focusing the conformational searches used in treating the overall three-dimensional structure prediction problem. In particular, these restraints take the form of reduced ϕ and ψ domains as well as internal interatomic distance restraints for those residues connecting consecutive elements of secondary structure. The bounds are extracted from the set of low free energy conformers identified from conformational

Acknowledgments

CAF gratefully acknowledges financial support from the National Science Foundation and the National Institutes of Health (R01 GM52032).

References (32)

  • DeisenhoferJ. et al.

    Crystallographic refinement of the structure of bovine pancreatic trypsin inhibitor at 1.5 a resolution

    Acta Crystallographica Section B

    (1975)
  • DonateL. et al.

    Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: a database for modeling an prediction

    Protein Science

    (1996)
  • FiserA. et al.

    Modeling of loops in protein structures

    Protein Science

    (2000)
  • FloudasC.

    Deterministic global optimization in design, control, and computational chemistry

  • FloudasC.A.

    Deterministic global optimization: theory, methods and applications

    Nonconvex optimization and its applications

    (2000)
  • GallagherT. et al.

    Two crystal structures of the b1 immunoglobulin-binding domain of streptococcal protein g and comparison with nmr

    Biochemistry

    (1994)
  • Cited by (0)

    View full text