Research ArticleGuiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction
Graphical abstract
Introduction
Proteins play a vital role as a class of macromolecules present in all biological organisms. They form the basis of cellular and molecular life and significantly affect the structural and functional characteristics of cells and genes (Gu and Bourne, 2009). Determining the structure and biological activity state of a protein molecule is significant to facilitate further understanding and curing of many diseases caused by changes in protein structure. Predicting the 3D structure of protein molecules has become a major research topic and an important task in the current bioinformatics, which directly predicts the protein tertiary structure from amino acid according to the principle of Anfinsen (Anfinsen, 1973). With the complexity and high dimensionality of protein conformational space, the problem of gaining near-native protein conformations by computing becomes NP-Hard. To solve the bottleneck problem in ab-initio protein structure prediction, an algorithm efficiently exploring for the conformational space needs to be developed (Kim et al., 2009).
As it predicts the structure with the use of only sequence information, ab-initio protein structure prediction is of great significance for protein molecular design and protein folding research. A variety of methods have been developed for ab-initio protein structure construction (Xu and Zhang, 2013), ranging from atomic-level molecular dynamic simulation (Brooks et al., 1983, Case et al., 1997) to reduced-level physics-based (Liwo et al., 1993, Klepeis and Floudas, 2003) and knowledge-based (Simons et al., 1997, Xu and Zhang, 2012) Monte Carlo (MC) assembly, to topology-level fold enumeration (Taylor et al., 2008) and to residue-contacts constrained conformational reconstruction (Wu et al., 2011, Marks et al., 2011). The Rosetta (Leaver et al., 2011) developed by the research group of Baker, and QUARK (Xu and Zhang, 2012) by Zhang are ab-initio methods that performed well in previous CASP versions and have become internationally advanced ab-initio prediction methods.
Owing to the Lennard-Jones energy term in protein molecule, the local minima solutions in the conformational space quadratically grow with an increase in sequence length. Thus the energy surface becomes too rugged to explore, due to the increased energy barriers in conformational space. As a result, the time for computing a conformation also quadratically grows with chain length increase. Thus, the exploring process needs a large consumption of computing resource in evaluating the energy of generated conformations. However, not every conformation generated in the exploring process is helpful for finding the near-native conformations. How to exclude the invalid exploring regions and meaningless conformations in advance, save the evaluation times and lead the exploring to a more potential area, seems to be an important work in protein structure prediction.
ACUE (Hao et al., 2016) proposed in our previous work is an independent algorithm that could help save the function evaluations. Abstract convex underestimate technique (Zhou et al., 2016a) is mainly used, which is a deterministic global optimization method originally mainly used for solving the integer programming problem. It achieves the asymptotic convergence of the upper and lower estimate bounds of the original optimization problem by constructing a series of convex relaxed models, and the upper and lower estimate bounds eventually converge to the global optimal solution of the original optimization problem. The abstract convex theory (Andramonov et al., 1999) provides strong theoretical support to deterministic global optimization (Rubinov, 2000).
LUE, which is a new plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction, is proposed on the basis of the Lipschitz estimation theory. Unlike ACUE (Hao et al., 2016), more accurate underestimation of the original problem can be obtained with neither model transformation nor sample training needs with the use of Lipschitz method (Rubinov and Andramonov, 1999) in LUE, and it can be ported to other prediction algorithms. The most significant contributions of the proposed LUE are as follows: the use of tight lower bound estimate information for exploration guidance, the advance elimination of invalid sampling area, and the reduction of the number of evaluations as a consequence of using underestimation model. LUE provides a novel technique to solve the exploring problem of protein conformational space. In this study, LUE is applied to DE (Storn and Price, 1997) algorithm, and MMC (Leaver et al., 2011) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with LUE.
Section snippets
Differential evolution
Differential evolution (Storn and Price, 1997) is a population-based stochastic search algorithm for global optimization, which has been successfully used for protein structure prediction problems (Hao et al., 2016, Hao et al., 2017, Xin et al., 2010, Zhou et al., 2016b, Zhang et al., 2017). Basic DE is conducted as follows. An initial population with N individuals is randomly sampled from the feasible solution space Ω. During each generation, mutation operation is performed for each target
Lipschitz UnderEstimate (LUE) method
Main ideas of LUE show in Fig. 1. The most significant contributions of the proposed LUE are as follows: the use of tight lower bound estimate information for exploration guidance, the advance elimination of invalid sampling area, and the reduction of the number of evaluations as a consequence of using underestimation model. When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. LUE provides a novel technique to solve the
Experiments and results
The ability of a method to reproduce conformations that populate the protein native state provides an important benchmark in protein structure prediction (Ding et al., 2008). LUE is applied to a local enhancement DE (LEDE) algorithm presented in (Hao et al., 2016) and Rosetta-MMC algorithm (we named the integrated algorithm DELUE and RoLUE), and is compared with them by testing on 15 structurally diverse protein sequences of varying lengths. The computed conformations are compared with
Discussion
Computing such conformations that are essential to associate structural and functional information with gene sequences is difficult owning to the high-dimensionality and the rugged energy surface of the protein conformational space. LUE is proposed for guiding exploration in conformational feature space with the use of Lipschitz underestimation in ab-initio protein structure prediction, and thus provides a way to solve the exploring problem.
LUE, proposed in this study, transforms the high
Acknowledgements
This work was supported by National Nature Science Foundation of China (No. 61773346, 61573317). The authors would like to thank the anonymous reviewers for their insight comments and useful suggestions.
References (31)
- et al.
Cutting angle methods in global optimization
Appl. Math. Lett.
(1999) - et al.
Ab-initio folding of proteins with all-atom discrete molecular dynamics
Structure
(2008) - et al.
Sampling bottlenecks in de novo protein structure prediction
J. Mol. Biol.
(2009) - et al.
ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence
Biophys. J.
(2003) - et al.
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions
J. Mol. Biol.
(1997) - et al.
Improving protein structure prediction using multiple sequence-based contact predictions
Structure
(2011) - et al.
A novel differential evolution algorithm using local abstract convex underestimate strategy for global optimization
Comput. Oper. Res.
(2016) - et al.
Enhanced differential evolution using local Lipschitz underestimate strategy for computationally expensive optimization problems
Appl. Soft Comput.
(2016) Principles that govern the folding of protein chains
Science
(1973)- et al.
Global minimization of increasing positively homogeneous functions over the unit simplex
Ann. Oper. Res.
(2000)