Skip to main content
Log in

CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with different lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17 % and 15.48 %, on average, when the comparison is conducted at the level of super-family and fold, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ginalski K, Grishin N V, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic Acids Research, 2005, 33(6): 1874–1891.

    Article  Google Scholar 

  2. Roytberg M, Gambin A, Noe L et al. On subset seeds for protein alignment. IEEE/ACM Trans. Computational Biology and Bioinformatics, 2009, 6(3): 483–494.

    Article  Google Scholar 

  3. Mayr G, Domingues F, Lackner P. Comparative analysis of protein structure alignments. BMC Structural Biology, 2007, 7: Article No.50.

    Google Scholar 

  4. Zhang Y. Protein structure prediction: When is it useful? Current Opinion in Structural Biology, 2009, 19(2): 145–155.

    Article  Google Scholar 

  5. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 1993, 233(1): 123–138.

    Article  Google Scholar 

  6. Dahiyat B I, Mayo S L. De novo protein design: Fully automated sequence selection. Science, 1997, 278(5335): 82–87.

    Article  Google Scholar 

  7. Yakunin A F, Yee A A, Savchenko A, Edwards A M, Arrowsmith C H. Structural proteomics: A tool for genome annotation. Current Opinion on Chemical Biology, 2004, 8(1): 42–48.

    Article  Google Scholar 

  8. Menke M, Berger B, Cowen L. Matt: Local flexibility aids protein multiple structure alignment. PLoS Computational Biology, 2008, 4(1): e10.

    Article  MathSciNet  Google Scholar 

  9. Gu J, Bourne P. Structural Bioinformatics (2nd edition). John Wiley, 2009.

  10. Arun K S, Huang T S, Blostein S D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Analysis and Machine Intelligence, 1987, 9(5): 698–700.

    Article  Google Scholar 

  11. Sippl M J, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics, 2008, 24(3): 426–427.

    Article  Google Scholar 

  12. Chen L, Zhou T, Tang Y. Protein structure alignment by deterministic annealing. Bioinformatics, 2005, 21: 51–62.

    Article  MATH  Google Scholar 

  13. Glasgow J, Kuo T, Davies J. Protein structure from contact maps: A case-based reasoning approach. Information Systems Frontiers, 2006, 8(1): 29–36.

    Article  Google Scholar 

  14. Bhattacharya S, Bhattacharyya C, Chandra N R. Comparison of protein structures by growing neighborhood alignments. BMC Bioinformatics, 2007, 8: Article No.77.

  15. Kolbeck B, May P, Schmidt-Goenner T, Steinke T, Knapp E W. Connectivity independent protein-structure alignment: A hierarchical approach. BMC Bioinformatics, 2006, 7: Article No.510.

    Google Scholar 

  16. Eidhammer I, Jonassen I, Taypor W. Structure comparison and structure patterns. Journal of Computational Biology, 2000, 7(5): 685–716.

    Article  Google Scholar 

  17. Shindyalov I N, Bourne P E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering, 1998, 11(9): 739–747.

    Article  Google Scholar 

  18. Taylor W R, Orengo C A. Protein structure alignment. Journal of Molecular Biology, 1989, 208(1): 1–22.

    Article  Google Scholar 

  19. Taylor WR. Protein structure comparison using iterated double dynamic programming. Protein Science, 1999, 8(3): 654–665.

    Article  Google Scholar 

  20. Jewett A I, Huang C C, Ferrin T E. MINRMS: An efficient algorithm for determining protein structure similarity using root-mean-squared-distance. Bioinformatics, 2003, 19(5): 625–634.

    Article  Google Scholar 

  21. Lotan I, Schwarzer F. Approximation of protein structure for fast similarity measures. Journal of Computational Biology, 2004, 11(2/3): 299–317.

    Article  Google Scholar 

  22. Gibrat J F, Madej T, Bryant S H. Surprising similarities in structure comparison. Current Opinion in Structural Biology, 1996, 6(3): 377–385.

    Article  Google Scholar 

  23. Kabsch W, Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983, 22(12): 2577–2637.

    Article  Google Scholar 

  24. Frishman D, Argos P. Knowledge-based protein secondary structure assignment. Proteins-Structure Function and Genetics, 1995, 23(4): 566–579.

    Article  Google Scholar 

  25. Holm L, Sander C. 3-D lookup: Fast protein structure database searches at 90 % reliability. In Proc. the 3rd Int. Conference on Intelligent Systems for Molecular Biology, July 1995, Vol.3, pp.179-187.

  26. Nussinov R, Wolfson H J. Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc. National Academy of Sciences of USA, 1991, 88(23): 10495–10499.

    Article  Google Scholar 

  27. Le Q, Pollastri G, Koehl P. Structural alphabets for protein structure classification: A comparison study. Journal of Molecular Biology, 2009, 387(2): 431–450.

    Article  Google Scholar 

  28. Erdmann M A. Protein similarity from knot theory: Geometric convolution and line weavings. Journal of Computational Biology, 2005, 12(6): 609–637.

    Article  Google Scholar 

  29. Zhang Y, Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 2005, 33(7): 2302–2309.

    Article  Google Scholar 

  30. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins, 2004, 57(4): 702–710.

    Article  Google Scholar 

  31. Godzik A. The structural alignment between two proteins: Is there a unique answer? Protein Science, 1996, 5(7): 1325–1338.

    Article  Google Scholar 

  32. Murzin A G, Brenner S E, Hubbard T, Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995, 247(4): 536–540.

    Google Scholar 

  33. Berman H M, Westbrook J, Feng Z et al. The protein data bank. Nucleic Acids Research, 2000, 28(1): 235–242.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jung-Im Won.

Additional information

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea under Grant No.2012R1A1A3013084.

The preliminary version of the paper was published in the Proceedings of EDB2012.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOC 41.5 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, WC., Park, S. & Won, JI. CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution. J. Comput. Sci. Technol. 28, 647–656 (2013). https://doi.org/10.1007/s11390-013-1365-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-013-1365-x

Keywords

Navigation