Abstract
In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:
(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.
(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \(1\over 2\).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily Finding a Dense Subgraph. Journal of Algorithms 34, 203–221 (2000)
Asahiro, Y., Hassin, R., Iwama, K.: Complexity of finding dense subgraphs. Discrete Applied Mathematics 121, 15–26 (2002)
Atkins, J., Hart, W.E.: On the intractability of protein folding with a finite alphabet of amino acids. Algorithmica 25(2-3), 279–294 (1999)
Banavar, J., Cieplak, M., Maritan, A., Nadig, G., Seno, F., Vishveshwara, S.: Structure-based design of model proteins. Proteins: Structure, Function, and Genetics 31, 10–20 (1998)
Berger, B., Leighton, T.: Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete. Journal of Computational Biology 5(1), 27–40 (1998)
Berman, P., DasGupta, B., Muthukrishnan, S.: Approximation Algorithms For MAX-MIN Tiling. Journal of Algorithms 47(2), 122–134 (2003)
Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A., Yannakakis, M.: On the complexity of protein folding. Journal of Computational Biology, 423–466 (1998)
Deutsch, J.M., Kurosky, T.: New algorithm for protein design. Physical Review Letters 76, 323–326 (1996)
Dill, K.A., Bromberg, S., Yue, K., Fiebig, K.M., Yee, D.P., Thomas, P.D., Chan, H.S.: Principles of protein folding — A perspective from simple exact models. Protein Science 4, 561–602 (1995)
Drexler, K.E.: Molecular engineering: An approach to the development of general capabilities for molecular manipulation. Proceedings of the National Academy of Sciences of the U.S.A. 78, 5275–5278 (1981)
Feige, U., Seltser, M.: On the densest k-subgraph problems. Technical Report # CS97-16, Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Israel , available online at http://citeseer.nj.nec.com/feige97densest.html
Hart, W.E.: On the computational complexity of sequence design problems. In: Proceedings of the 1st Annual International Conference on Computational Molecular Biology, pp. 128–136 (1997)
Hart, W.E., Istrail, S.: Fast protein folding in the hydrophobic-hydrophilic model within three-eighths of optimal. Journal of Computational Biology 3(1), 53–96 (1996)
Hart, W.E., Istrail, S.: Invariant patterns in crystal lattices: Implications for protein folding algorithms (extended abstract). In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 288–303. Springer, Heidelberg (1996)
Hart, W.E., Istrail, S.: Lattice and off-lattice side chain models of protein folding: Linear time structure prediction better than 86% of optimal. Journal of Computational Biology 4(3), 241–260 (1997)
Heun, V.: Approximate protein folding in the HP side chain model on extended cubic lattices. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 212–223. Springer, Heidelberg (1999)
Hochbaum, D.: Approximation Algorithms for NP-hard problems. PWS Publishing Company (1997)
Hochbaum, D.S., Mass, W.: Approximation schemes for covering and packing problems in image processing and VLSI. Journal of ACM 32(1), 130–136 (1985)
Kleinberg, J.: Efficient Algorithms for Protein Sequence Design and the Analysis of Certain Evolutionary Fitness Landscapes.In: Proceedings of the 3rd Annual International Conference on Computational Molecular Biology,pp. 226-237 (1999)
Lau, K.F., Dill, K.A.: A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22, 3986–3997 (1989)
Mauri, G., Pavesi, G., Piccolboni, A.: Approximation algorithms for protein folding prediction. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 945–946 (1999)
Merz, K.M., Grand, S.M.L.: The Protein Folding Problem and Tertiary Structure Prediction. Birkhauser, Boston (1994)
Ponder, J., Richards, F.M.: Tertiary templates for proteins. Journal of Molecular Biology 193, 63–89 (1987)
Sun, S.J., Brem, R., Chan, H.S., Dill, K.A.: Designing amino acid sequences to fold with good hydrophobic cores. Protein Engineering 8(12), 1205–1213 (1995)
Shakhnovich, E.I., Gutin, A.M.: Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. 90, 7195–7199 (1993)
Smith, T.F., Conte, L.L., Bienkowska, J., Rogers, B., Gaitatzes, C., Lathrop, R.H.: The threading approach to the inverse protein folding problem. In: Proceedings of the 1st Annual International Conference on Computational Molecular Biology, pp. 287–292 (1997)
Yue, K., Dill, K.A.: Inverse protein folding problem: Designing polymer sequences. In: Proceedings of the National Academy of Sciences of the U.S.A., vol. 89, pp. 4163–4167 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berman, P., DasGupta, B., Mubayi, D., Sloan, R., Turán, G., Zhang, Y. (2004). The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive