Abstract
Protein design algorithms that model continuous sidechain flexibility and conformational ensembles better approximate the in vitro and in vivo behavior of proteins. The previous state of the art, iMinDEE-\(A^*\)-\(K^*\), computes provable \(\varepsilon \)-approximations to partition functions of protein states (e.g., bound vs. unbound) by computing provable, admissible pairwise-minimized energy lower bounds on protein conformations and using the \(A^*\) enumeration algorithm to return a gap-free list of lowest-energy conformations. iMinDEE-A\(^*\)-\(K^*\) runs in time sublinear in the number of conformations, but can be trapped in loosely-bounded, low-energy conformational wells containing many conformations with highly similar energies. That is, iMinDEE-\(A^*\)-\(K^*\) is unable to exploit the correlation between protein conformation and energy: similar conformations often have similar energy. We introduce two new concepts that exploit this correlation: Minimization-Aware Enumeration and Recursive \(K^{*}\). We combine these two insights into a novel algorithm, Minimization-Aware Recursive \(K^{*}\) (\({ MARK}^{*}\)), that tightens bounds not on single conformations, but instead on distinct regions of the conformation space. We compare the performance of iMinDEE-\(A^*\)-\(K^*\) vs. \({ MARK}^{*}\) by running the \(BBK^*\) algorithm, which provably returns sequences in order of decreasing \(K^{*}\) score, using either iMinDEE-\(A^*\)-\(K^*\) or \({ MARK}^{*}\) to approximate partition functions. We show on 200 design problems that \({ MARK}^{*}\) not only enumerates and minimizes vastly fewer conformations than the previous state of the art, but also runs up to two orders of magnitude faster. Finally, we show that \({ MARK}^{*}\) not only efficiently approximates the partition function, but also provably approximates the energy landscape. To our knowledge, \({ MARK}^{*}\) is the first algorithm to do so. We use \({ MARK}^{*}\) to analyze the change in energy landscape of the bound and unbound states of the HIV-1 capsid protein C-terminal domain in complex with camelid V\(_{\mathrm{{H}}}\)H, and measure the change in conformational entropy induced by binding. Thus, \({ MARK}^{*}\) both accelerates existing designs and offers new capabilities not possible with previous algorithms.
J. D. Jou and G. T. Holt—These authors contributed equally to the work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
ClinicalTrials.gov Identifier: NCT02840474. NIAID and National Institutes of Health Clinical Center, September 2018. https://clinicaltrials.gov/ct2/results?cond=&term=VRC07
Chazelle, B., Kingsford, C., Singh, M.: A semidefinite programming approach to side chain positioning with new rounding strategies. INFORMS J. Comput. 16(4), 380–392 (2004). https://doi.org/10.1287/ijoc.1040.0096
Chen, C.Y., Georgiev, I., Anderson, A.C., Donald, B.R.: Computational structure-based redesign of enzyme activity. Proc. Natl. Acad. Sci. USA 106(10), 3764–9 (2009). https://doi.org/10.1073/pnas.0900266106
Dahiyat, B.I., Mayo, S.L.: De novo protein design: fully automated sequence selection. Science 278(5335), 82–87 (1997)
Davey, J.A., Damry, A.M., Goto, N.K., Chica, R.A.: Rational design of proteins that exchange on functional timescales. Nat. Chem. Biol. 13(12), 1280–1285 (2017)
Donald, B.R.: Algorithms in Structural Molecular Biology. MIT Press, Cambridge (2011)
Fleishman, S.J., Khare, S.D., Koga, N., Baker, D.: Restricted sidechain plasticity in the structures of native proteins and complexes. Protein Sci. 20(4), 753–757 (2011). https://doi.org/10.1002/pro.604
Frederick, K.K., Marlow, M.S., Valentine, K.G., Wand, A.J.: Conformational entropy in molecular recognition by proteins. Nature 448(7151), 325–329 (2007). https://doi.org/10.1038/nature05959
Frey, K.M., Georgiev, I., Donald, B.R., Anderson, A.C.: Predicting resistance mutations using protein design algorithms. Proc. Natl. Acad. Sci. U.S.A. 107(31), 13,707–13,712 (2010). https://doi.org/10.1073/pnas.1002162107
Gainza, P., Nisonoff, H.M., Donald, B.R.: Algorithms for protein design. Curr. Opin. Struct. Biol. 39, 16–26 (2016)
Gainza, P., Roberts, K.E., Donald, B.R.: Protein design using continuous rotamers. PLoS Comput. Biol. 8(1), e1002335 (2012). https://doi.org/10.1371/journal.pcbi.1002335
Georgiev, I., Donald, B.R.: Dead-end elimination with backbone flexibility. Bioinformatics 23(13), i185–i194 (2007). https://doi.org/10.1093/bioinformatics/btm197
Georgiev, I., Keedy, D., Richardson, J.S., Richardson, D.C., Donald, B.R.: Algorithm for backrub motions in protein design. Bioinformatics 24(13), i196–i204 (2008). https://doi.org/10.1093/bioinformatics/btn169
Georgiev, I., Lilien, R.H., Donald, B.R.: Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design. Bioinformatics 22(14), e174–e183 (2006). https://doi.org/10.1093/bioinformatics/btl220
Georgiev, I., Lilien, R.H., Donald, B.R.: The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J. Comput. Chem. 29(10), 1527–1542 (2008). https://doi.org/10.1002/jcc.20909
Georgiev, I., et al.: Design of epitope-specific probes for sera analysis and antibody isolation. Retrovirology 9, P50 (2012)
Georgiev, I.S., et al.: Antibodies VRC01 and 10E8 neutralize HIV-1 with high breadth and potency even with IG-framework regions substantially reverted to germline. J. Immunol. 192(3), 1100–1106 (2014). https://doi.org/10.4049/jimmunol.1302515
Gilson, M.K., Given, J.A., Bush, B.L., McCammon, J.A.: The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophys. J. 72(3), 1047–1069 (1997). https://doi.org/10.1016/S0006-3495(97)78756-3
Gorczynski, M.J., et al.: Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. Chem. Biol. 14(10), 1186–1197 (2007). https://doi.org/10.1016/j.chembiol.2007.09.006
Hallen, M.A., Donald, B.R.: CATS (coordinates of atoms by taylor series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 33(14), i5–i12 (2017). https://doi.org/10.1093/bioinformatics/btx277
Hallen, M.A., Gainza, P., Donald, B.R.: Compact representation of continuous energy surfaces for more efficient protein design. J. Chem. Theory Comput. 11(5), 2292–2306 (2015). https://doi.org/10.1021/ct501031m
Hallen, M.A., Jou, J.D., Donald, B.R.: LUTE (local unpruned tuple expansion): accurate continuously flexible protein design with general energy functions and rigid rotamer-like efficiency. J. Comput. Biol. 24(6), 536–546 (2017). https://doi.org/10.1089/cmb.2016.0136
Hallen, M.A., Keedy, D.A., Donald, B.R.: Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 81(1), 18–39 (2013). https://doi.org/10.1002/prot.24150
Hallen, M.A., et al.: OSPREY 3.0: open-source protein redesign for you, with powerful new features. J. Comput. Chem. 39(30), 2494–2507 (2018)
Hart, P., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. SSC 4, 100–114 (1968)
Hastings, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97
Jou, J.D., Holt, G.T., Lowegard, A.U., Donald, B.R.: Supplementary information: minimization-aware recursive: K\(^{*}\) (MARK\(^{*}\)): A novel, provable partition function approximation algorithm that accelerates ensemble-based protein design and provably approximates the energy landscape (2019). (Available at http://www.cs.duke.edu/donaldlab/Supplementary/recomb19/markstar)
Kuhlman, B., Baker, D.: Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U.S.A. 97(19), 10,383–10,388 (2000)
Leach, A.R., Lemon, A.P.: Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2), 227–239 (1998)
Leaver-Fay, A., et al.: Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011). https://doi.org/10.1016/B978-0-12-381270-4.00019-6
Lee, C., Subbiah, S.: Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. 217(2), 373–388 (1991)
Lee, J.: New Monte Carlo algorithm: entropic sampling. Phys. Rev. Lett. 71(2), 211–214 (1993). https://doi.org/10.1103/PhysRevLett.71.211
Lilien, R.H., Stevens, B.W., Anderson, A.C., Donald, B.R.: A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J. Comput. Biol. 12(6), 740–761 (2005). https://doi.org/10.1089/cmb.2005.12.740
Lou, Q., Dechter, R., Ihler, A.T.: Anytime anyspace and/or search for bounding the partition function. In: AAAI (2017)
Lou, Q., Dechter, R., Ihler, A.T.: Dynamic importance sampling for anytime bounds of the partition function. In: NIPS (2017)
Lovell, S.C., Word, J.M., Richardson, J.S., Richardson, D.C.: The penultimate rotamer library. Proteins 40(3), 389–408 (2000)
Nisonoff, H.: Efficient partition function estimation in computational protein design: probabalistic guarantees and characterization of a novel algorithm. B.S. thesis. Department of Mathematics, Duke University (2015). http://hdl.handle.net/10161/9746
Nosé, S.: A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52(2), 255–268 (2006). https://doi.org/10.1080/00268978400101201
Ojewole, A., et al.: OSPREY predicts resistance mutations using positive and negative computational protein design. Methods Mol. Biol. 1529, 291–306 (2017)
Ojewole, A.A., Jou, J.D., Fowler, V.G., Donald, B.R.: BBK* (Branch and Bound over K*): a provable and efficient ensemble-based protein design algorithm to optimize stability and binding affinity over large sequence spaces. J. Comput. Biol. 25(7), 726–739 (2018). https://doi.org/10.1089/cmb.2017.0267
Qi, Y., et al.: Continuous interdomain orientation distributions reveal components of binding thermodynamics. J. Mol. Biol. 430(18 Pt B), 3412–3426 (2018)
Reardon, P.N., et al.: Structure of an HIV-1-neutralizing antibody target, the lipid-bound gp41 envelope membrane proximal region trimer. Proc. Natl. Acad. Sci. U.S.A. 111(4), 1391–1396 (2014). https://doi.org/10.1073/pnas.1309842111
Reeve, S.M., Gainza, P., Frey, K.M., Georgiev, I., Donald, B.R., Anderson, A.C.: Protein design algorithms predict viable resistance to an experimental antifolate. Proc. Natl. Acad. Sci. U.S.A. 112(3), 749–754 (2015). https://doi.org/10.1073/pnas.1411548112
Roberts, K.E., Cushing, P.R., Boisguerin, P., Madden, D.R., Donald, B.R.: Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput. Biol. 8(4), e1002477 (2012). https://doi.org/10.1371/journal.pcbi.1002477
Roberts, K.E., Donald, B.R.: Improved energy bound accuracy enhances the efficiency of continuous protein design. Proteins 83(6), 1151–1164 (2015). https://doi.org/10.1002/prot.24808
Roberts, K.E., Gainza, P., Hallen, M.A., Donald, B.R.: Fast gap-free enumeration of conformations and sequences for protein design. Proteins 83(10), 1859–1877 (2015). https://doi.org/10.1002/prot.24870
Rudicell, R.S., et al.: Enhanced potency of a broadly neutralizing HIV-1 antibody in vitro improves protection against lentiviral infection in vivo. J. Virol. 88(21), 12,669–12,682 (2014). https://doi.org/10.1128/JVI.02213-14
Sciretti, D., Bruscolini, P., Pelizzola, A., Pretti, M., Jaramillo, A.: Computational protein design with side-chain conformational entropy. Proteins 74(1), 176–191 (2009). https://doi.org/10.1002/prot.22145
Silver, N.W., et al.: Efficient computation of small-molecule configurational binding entropy and free energy changes by ensemble enumeration. J. Chem. Theory Comput. 9(11), 5098–5115 (2013). https://doi.org/10.1021/ct400383v
Simoncini, D., Allouche, D., de Givry, S., Delmas, C., Barbe, S., Schiex, T.: Guaranteed discrete energy optimization on large protein design problems. J. Chem. Theory Comput. 11(12), 5980–5989 (2015). https://doi.org/10.1021/acs.jctc.5b00594
Stevens, B.W., Lilien, R.H., Georgiev, I., Donald, B.R., Anderson, A.C.: Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme’s mechanism and selectivity. Biochemistry 45(51), 15,495–15,504 (2006). https://doi.org/10.1021/bi061788m
Traoré, S., et al.: A new framework for computational protein design through cost function network optimization. Bioinformatics 29(17), 2129–2136 (2013). https://doi.org/10.1093/bioinformatics/btt374
Tzeng, S.R., Kalodimos, C.G.: Protein activity regulation by conformational entropy. Nature 488(7410), 236–240 (2012). https://doi.org/10.1038/nature11271
Valiant, L.G.: The complexity of computing the permanent. Theoret. Comput. Sci. 8(2), 189–201 (1979)
Viricel, C., Simoncini, D., Barbe, S., Schiex, T.: Guaranteed weighted counting for affinity computation: beyond determinism and structure. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 733–750. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44953-1_46
Acknowledgements
We thank Goke Ojewole, Mark Hallen, Jeffrey Martin, Marcel Frenkel, Terrence Oas, Jane and Dave Richardson, Hong Niu, and all members of the lab for helpful discussions; Jeffrey Martin for software optimizations; and the NIH (R01-GM078031 and R01-GM118543 to BRD) for funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jou, J.D., Holt, G.T., Lowegard, A.U., Donald, B.R. (2019). Minimization-Aware Recursive \(K^{*}\) (\({ MARK}^{*}\)): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape. In: Cowen, L. (eds) Research in Computational Molecular Biology. RECOMB 2019. Lecture Notes in Computer Science(), vol 11467. Springer, Cham. https://doi.org/10.1007/978-3-030-17083-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-17083-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17082-0
Online ISBN: 978-3-030-17083-7
eBook Packages: Computer ScienceComputer Science (R0)