The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

Berman, Piotr; DasGupta, Bhaskar; Mubayi, Dhruv; Sloan, Robert; Turán, György; Zhang, Yi

doi:10.1007/978-3-540-27801-6_18

The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

Piotr Berman¹⁸,
Bhaskar DasGupta¹⁹,
Dhruv Mubayi²⁰,
Robert Sloan¹⁹,
György Turán²⁰ &
…
Yi Zhang¹⁹

Conference paper

620 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Abstract

In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:

(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.

(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \(1\over 2\).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily Finding a Dense Subgraph. Journal of Algorithms 34, 203–221 (2000)
Article MATH MathSciNet Google Scholar
Asahiro, Y., Hassin, R., Iwama, K.: Complexity of finding dense subgraphs. Discrete Applied Mathematics 121, 15–26 (2002)
Article MATH MathSciNet Google Scholar
Atkins, J., Hart, W.E.: On the intractability of protein folding with a finite alphabet of amino acids. Algorithmica 25(2-3), 279–294 (1999)
Article MATH MathSciNet Google Scholar
Banavar, J., Cieplak, M., Maritan, A., Nadig, G., Seno, F., Vishveshwara, S.: Structure-based design of model proteins. Proteins: Structure, Function, and Genetics 31, 10–20 (1998)
Article Google Scholar
Berger, B., Leighton, T.: Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete. Journal of Computational Biology 5(1), 27–40 (1998)
Article Google Scholar
Berman, P., DasGupta, B., Muthukrishnan, S.: Approximation Algorithms For MAX-MIN Tiling. Journal of Algorithms 47(2), 122–134 (2003)
MATH MathSciNet Google Scholar
Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A., Yannakakis, M.: On the complexity of protein folding. Journal of Computational Biology, 423–466 (1998)
Google Scholar
Deutsch, J.M., Kurosky, T.: New algorithm for protein design. Physical Review Letters 76, 323–326 (1996)
Article Google Scholar
Dill, K.A., Bromberg, S., Yue, K., Fiebig, K.M., Yee, D.P., Thomas, P.D., Chan, H.S.: Principles of protein folding — A perspective from simple exact models. Protein Science 4, 561–602 (1995)
Article Google Scholar
Drexler, K.E.: Molecular engineering: An approach to the development of general capabilities for molecular manipulation. Proceedings of the National Academy of Sciences of the U.S.A. 78, 5275–5278 (1981)
Article Google Scholar
Feige, U., Seltser, M.: On the densest k-subgraph problems. Technical Report # CS97-16, Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Israel , available online at http://citeseer.nj.nec.com/feige97densest.html
Hart, W.E.: On the computational complexity of sequence design problems. In: Proceedings of the 1st Annual International Conference on Computational Molecular Biology, pp. 128–136 (1997)
Google Scholar
Hart, W.E., Istrail, S.: Fast protein folding in the hydrophobic-hydrophilic model within three-eighths of optimal. Journal of Computational Biology 3(1), 53–96 (1996)
Article Google Scholar
Hart, W.E., Istrail, S.: Invariant patterns in crystal lattices: Implications for protein folding algorithms (extended abstract). In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 288–303. Springer, Heidelberg (1996)
Google Scholar
Hart, W.E., Istrail, S.: Lattice and off-lattice side chain models of protein folding: Linear time structure prediction better than 86% of optimal. Journal of Computational Biology 4(3), 241–260 (1997)
Article Google Scholar
Heun, V.: Approximate protein folding in the HP side chain model on extended cubic lattices. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 212–223. Springer, Heidelberg (1999)
Google Scholar
Hochbaum, D.: Approximation Algorithms for NP-hard problems. PWS Publishing Company (1997)
Google Scholar
Hochbaum, D.S., Mass, W.: Approximation schemes for covering and packing problems in image processing and VLSI. Journal of ACM 32(1), 130–136 (1985)
Article MATH Google Scholar
Kleinberg, J.: Efficient Algorithms for Protein Sequence Design and the Analysis of Certain Evolutionary Fitness Landscapes.In: Proceedings of the 3rd Annual International Conference on Computational Molecular Biology,pp. 226-237 (1999)
Google Scholar
Lau, K.F., Dill, K.A.: A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22, 3986–3997 (1989)
Article Google Scholar
Mauri, G., Pavesi, G., Piccolboni, A.: Approximation algorithms for protein folding prediction. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 945–946 (1999)
Google Scholar
Merz, K.M., Grand, S.M.L.: The Protein Folding Problem and Tertiary Structure Prediction. Birkhauser, Boston (1994)
Google Scholar
Ponder, J., Richards, F.M.: Tertiary templates for proteins. Journal of Molecular Biology 193, 63–89 (1987)
Article Google Scholar
Sun, S.J., Brem, R., Chan, H.S., Dill, K.A.: Designing amino acid sequences to fold with good hydrophobic cores. Protein Engineering 8(12), 1205–1213 (1995)
Article Google Scholar
Shakhnovich, E.I., Gutin, A.M.: Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. 90, 7195–7199 (1993)
Article Google Scholar
Smith, T.F., Conte, L.L., Bienkowska, J., Rogers, B., Gaitatzes, C., Lathrop, R.H.: The threading approach to the inverse protein folding problem. In: Proceedings of the 1st Annual International Conference on Computational Molecular Biology, pp. 287–292 (1997)
Google Scholar
Yue, K., Dill, K.A.: Inverse protein folding problem: Designing polymer sequences. In: Proceedings of the National Academy of Sciences of the U.S.A., vol. 89, pp. 4163–4167 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA
Piotr Berman
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, 60607-7053, USA
Bhaskar DasGupta, Robert Sloan & Yi Zhang
Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, Chicago, IL, 60607-7045, USA
Dhruv Mubayi & György Turán

Authors

Piotr Berman
View author publications
You can also search for this author in PubMed Google Scholar
Bhaskar DasGupta
View author publications
You can also search for this author in PubMed Google Scholar
Dhruv Mubayi
View author publications
You can also search for this author in PubMed Google Scholar
Robert Sloan
View author publications
You can also search for this author in PubMed Google Scholar
György Turán
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Suleyman Cenk Sahinalp
Google Inc., 76 9th Av, 4th Fl., 10011, New York, NY
S. Muthukrishnan
Tom Sawyer Software, 94612, Oakland, CA, USA
Ugur Dogrusoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berman, P., DasGupta, B., Mubayi, D., Sloan, R., Turán, G., Zhang, Y. (2004). The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-27801-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics