Skip to main content

The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Abstract

In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:

(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.

(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \(1\over 2\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily Finding a Dense Subgraph. Journal of Algorithms 34, 203–221 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. Asahiro, Y., Hassin, R., Iwama, K.: Complexity of finding dense subgraphs. Discrete Applied Mathematics 121, 15–26 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  3. Atkins, J., Hart, W.E.: On the intractability of protein folding with a finite alphabet of amino acids. Algorithmica 25(2-3), 279–294 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  4. Banavar, J., Cieplak, M., Maritan, A., Nadig, G., Seno, F., Vishveshwara, S.: Structure-based design of model proteins. Proteins: Structure, Function, and Genetics 31, 10–20 (1998)

    Article  Google Scholar 

  5. Berger, B., Leighton, T.: Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete. Journal of Computational Biology 5(1), 27–40 (1998)

    Article  Google Scholar 

  6. Berman, P., DasGupta, B., Muthukrishnan, S.: Approximation Algorithms For MAX-MIN Tiling. Journal of Algorithms 47(2), 122–134 (2003)

    MATH  MathSciNet  Google Scholar 

  7. Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A., Yannakakis, M.: On the complexity of protein folding. Journal of Computational Biology, 423–466 (1998)

    Google Scholar 

  8. Deutsch, J.M., Kurosky, T.: New algorithm for protein design. Physical Review Letters 76, 323–326 (1996)

    Article  Google Scholar 

  9. Dill, K.A., Bromberg, S., Yue, K., Fiebig, K.M., Yee, D.P., Thomas, P.D., Chan, H.S.: Principles of protein folding — A perspective from simple exact models. Protein Science 4, 561–602 (1995)

    Article  Google Scholar 

  10. Drexler, K.E.: Molecular engineering: An approach to the development of general capabilities for molecular manipulation. Proceedings of the National Academy of Sciences of the U.S.A. 78, 5275–5278 (1981)

    Article  Google Scholar 

  11. Feige, U., Seltser, M.: On the densest k-subgraph problems. Technical Report # CS97-16, Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Israel , available online at http://citeseer.nj.nec.com/feige97densest.html

  12. Hart, W.E.: On the computational complexity of sequence design problems. In: Proceedings of the 1st Annual International Conference on Computational Molecular Biology, pp. 128–136 (1997)

    Google Scholar 

  13. Hart, W.E., Istrail, S.: Fast protein folding in the hydrophobic-hydrophilic model within three-eighths of optimal. Journal of Computational Biology 3(1), 53–96 (1996)

    Article  Google Scholar 

  14. Hart, W.E., Istrail, S.: Invariant patterns in crystal lattices: Implications for protein folding algorithms (extended abstract). In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 288–303. Springer, Heidelberg (1996)

    Google Scholar 

  15. Hart, W.E., Istrail, S.: Lattice and off-lattice side chain models of protein folding: Linear time structure prediction better than 86% of optimal. Journal of Computational Biology 4(3), 241–260 (1997)

    Article  Google Scholar 

  16. Heun, V.: Approximate protein folding in the HP side chain model on extended cubic lattices. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 212–223. Springer, Heidelberg (1999)

    Google Scholar 

  17. Hochbaum, D.: Approximation Algorithms for NP-hard problems. PWS Publishing Company (1997)

    Google Scholar 

  18. Hochbaum, D.S., Mass, W.: Approximation schemes for covering and packing problems in image processing and VLSI. Journal of ACM 32(1), 130–136 (1985)

    Article  MATH  Google Scholar 

  19. Kleinberg, J.: Efficient Algorithms for Protein Sequence Design and the Analysis of Certain Evolutionary Fitness Landscapes.In: Proceedings of the 3rd Annual International Conference on Computational Molecular Biology,pp. 226-237 (1999)

    Google Scholar 

  20. Lau, K.F., Dill, K.A.: A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22, 3986–3997 (1989)

    Article  Google Scholar 

  21. Mauri, G., Pavesi, G., Piccolboni, A.: Approximation algorithms for protein folding prediction. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 945–946 (1999)

    Google Scholar 

  22. Merz, K.M., Grand, S.M.L.: The Protein Folding Problem and Tertiary Structure Prediction. Birkhauser, Boston (1994)

    Google Scholar 

  23. Ponder, J., Richards, F.M.: Tertiary templates for proteins. Journal of Molecular Biology 193, 63–89 (1987)

    Article  Google Scholar 

  24. Sun, S.J., Brem, R., Chan, H.S., Dill, K.A.: Designing amino acid sequences to fold with good hydrophobic cores. Protein Engineering 8(12), 1205–1213 (1995)

    Article  Google Scholar 

  25. Shakhnovich, E.I., Gutin, A.M.: Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. 90, 7195–7199 (1993)

    Article  Google Scholar 

  26. Smith, T.F., Conte, L.L., Bienkowska, J., Rogers, B., Gaitatzes, C., Lathrop, R.H.: The threading approach to the inverse protein folding problem. In: Proceedings of the 1st Annual International Conference on Computational Molecular Biology, pp. 287–292 (1997)

    Google Scholar 

  27. Yue, K., Dill, K.A.: Inverse protein folding problem: Designing polymer sequences. In: Proceedings of the National Academy of Sciences of the U.S.A., vol. 89, pp. 4163–4167 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berman, P., DasGupta, B., Mubayi, D., Sloan, R., Turán, G., Zhang, Y. (2004). The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics