Research Article
Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction

https://doi.org/10.1016/j.compbiolchem.2018.02.003Get rights and content

Highlights

  • A plug-in method for guiding exploring in conformational space with Lipschitz underestimation for PSP is proposed.

  • The constructed lower bound estimate information can be used for exploration guidance.

  • The invalid sampling areas can be eliminated in advance.

  • The number of energy function evaluations can be reduced.

  • Test on 15 target proteins verify the effectiveness of the proposed method.

Abstract

Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE.

Introduction

Proteins play a vital role as a class of macromolecules present in all biological organisms. They form the basis of cellular and molecular life and significantly affect the structural and functional characteristics of cells and genes (Gu and Bourne, 2009). Determining the structure and biological activity state of a protein molecule is significant to facilitate further understanding and curing of many diseases caused by changes in protein structure. Predicting the 3D structure of protein molecules has become a major research topic and an important task in the current bioinformatics, which directly predicts the protein tertiary structure from amino acid according to the principle of Anfinsen (Anfinsen, 1973). With the complexity and high dimensionality of protein conformational space, the problem of gaining near-native protein conformations by computing becomes NP-Hard. To solve the bottleneck problem in ab-initio protein structure prediction, an algorithm efficiently exploring for the conformational space needs to be developed (Kim et al., 2009).

As it predicts the structure with the use of only sequence information, ab-initio protein structure prediction is of great significance for protein molecular design and protein folding research. A variety of methods have been developed for ab-initio protein structure construction (Xu and Zhang, 2013), ranging from atomic-level molecular dynamic simulation (Brooks et al., 1983, Case et al., 1997) to reduced-level physics-based (Liwo et al., 1993, Klepeis and Floudas, 2003) and knowledge-based (Simons et al., 1997, Xu and Zhang, 2012) Monte Carlo (MC) assembly, to topology-level fold enumeration (Taylor et al., 2008) and to residue-contacts constrained conformational reconstruction (Wu et al., 2011, Marks et al., 2011). The Rosetta (Leaver et al., 2011) developed by the research group of Baker, and QUARK (Xu and Zhang, 2012) by Zhang are ab-initio methods that performed well in previous CASP versions and have become internationally advanced ab-initio prediction methods.

Owing to the Lennard-Jones energy term in protein molecule, the local minima solutions in the conformational space quadratically grow with an increase in sequence length. Thus the energy surface becomes too rugged to explore, due to the increased energy barriers in conformational space. As a result, the time for computing a conformation also quadratically grows with chain length increase. Thus, the exploring process needs a large consumption of computing resource in evaluating the energy of generated conformations. However, not every conformation generated in the exploring process is helpful for finding the near-native conformations. How to exclude the invalid exploring regions and meaningless conformations in advance, save the evaluation times and lead the exploring to a more potential area, seems to be an important work in protein structure prediction.

ACUE (Hao et al., 2016) proposed in our previous work is an independent algorithm that could help save the function evaluations. Abstract convex underestimate technique (Zhou et al., 2016a) is mainly used, which is a deterministic global optimization method originally mainly used for solving the integer programming problem. It achieves the asymptotic convergence of the upper and lower estimate bounds of the original optimization problem by constructing a series of convex relaxed models, and the upper and lower estimate bounds eventually converge to the global optimal solution of the original optimization problem. The abstract convex theory (Andramonov et al., 1999) provides strong theoretical support to deterministic global optimization (Rubinov, 2000).

LUE, which is a new plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction, is proposed on the basis of the Lipschitz estimation theory. Unlike ACUE (Hao et al., 2016), more accurate underestimation of the original problem can be obtained with neither model transformation nor sample training needs with the use of Lipschitz method (Rubinov and Andramonov, 1999) in LUE, and it can be ported to other prediction algorithms. The most significant contributions of the proposed LUE are as follows: the use of tight lower bound estimate information for exploration guidance, the advance elimination of invalid sampling area, and the reduction of the number of evaluations as a consequence of using underestimation model. LUE provides a novel technique to solve the exploring problem of protein conformational space. In this study, LUE is applied to DE (Storn and Price, 1997) algorithm, and MMC (Leaver et al., 2011) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with LUE.

Section snippets

Differential evolution

Differential evolution (Storn and Price, 1997) is a population-based stochastic search algorithm for global optimization, which has been successfully used for protein structure prediction problems (Hao et al., 2016, Hao et al., 2017, Xin et al., 2010, Zhou et al., 2016b, Zhang et al., 2017). Basic DE is conducted as follows. An initial population with N individuals is randomly sampled from the feasible solution space Ω. During each generation, mutation operation is performed for each target

Lipschitz UnderEstimate (LUE) method

Main ideas of LUE show in Fig. 1. The most significant contributions of the proposed LUE are as follows: the use of tight lower bound estimate information for exploration guidance, the advance elimination of invalid sampling area, and the reduction of the number of evaluations as a consequence of using underestimation model. When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. LUE provides a novel technique to solve the

Experiments and results

The ability of a method to reproduce conformations that populate the protein native state provides an important benchmark in protein structure prediction (Ding et al., 2008). LUE is applied to a local enhancement DE (LEDE) algorithm presented in (Hao et al., 2016) and Rosetta-MMC algorithm (we named the integrated algorithm DELUE and RoLUE), and is compared with them by testing on 15 structurally diverse protein sequences of varying lengths. The computed conformations are compared with

Discussion

Computing such conformations that are essential to associate structural and functional information with gene sequences is difficult owning to the high-dimensionality and the rugged energy surface of the protein conformational space. LUE is proposed for guiding exploration in conformational feature space with the use of Lipschitz underestimation in ab-initio protein structure prediction, and thus provides a way to solve the exploring problem.

LUE, proposed in this study, transforms the high

Acknowledgements

This work was supported by National Nature Science Foundation of China (No. 61773346, 61573317). The authors would like to thank the anonymous reviewers for their insight comments and useful suggestions.

References (31)

  • P.J. Ballester et al.

    Ultrafast shape recognition to search compound databases for similar molecular shapes

    J. Comput. Chem.

    (2007)
  • G. Beliakov

    Extended cutting angle method of global optimization

    Pac. J. Optim.

    (2008)
  • B.R. Brooks et al.

    CHARMM: a program for macromolecular energy, minimization, and dynamics calculations

    J. Comput. Chem.

    (1983)
  • D.A. Case et al.

    AMBER 5. 0

    (1997)
  • J. Gu et al.

    Structural Bioinformatics (Methods of Biochemical Analysis)

    (2009)
  • Cited by (0)

    View full text