CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

Kim, Woo-Cheol; Park, Sanghyun; Won, Jung-Im

doi:10.1007/s11390-013-1365-x

CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

Regular Paper
Published: 05 July 2013

Volume 28, pages 647–656, (2013)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Woo-Cheol Kim¹,
Sanghyun Park² &
Jung-Im Won³

84 Accesses
Explore all metrics

Abstract

Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with different lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17 % and 15.48 %, on average, when the comparison is conducted at the level of super-family and fold, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSACompro: Improving Multiple Protein Sequence Alignment by Predicted Structural Features

MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions

Article Open access 23 November 2015

PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information

References

Ginalski K, Grishin N V, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic Acids Research, 2005, 33(6): 1874–1891.
Article Google Scholar
Roytberg M, Gambin A, Noe L et al. On subset seeds for protein alignment. IEEE/ACM Trans. Computational Biology and Bioinformatics, 2009, 6(3): 483–494.
Article Google Scholar
Mayr G, Domingues F, Lackner P. Comparative analysis of protein structure alignments. BMC Structural Biology, 2007, 7: Article No.50.
Google Scholar
Zhang Y. Protein structure prediction: When is it useful? Current Opinion in Structural Biology, 2009, 19(2): 145–155.
Article Google Scholar
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 1993, 233(1): 123–138.
Article Google Scholar
Dahiyat B I, Mayo S L. De novo protein design: Fully automated sequence selection. Science, 1997, 278(5335): 82–87.
Article Google Scholar
Yakunin A F, Yee A A, Savchenko A, Edwards A M, Arrowsmith C H. Structural proteomics: A tool for genome annotation. Current Opinion on Chemical Biology, 2004, 8(1): 42–48.
Article Google Scholar
Menke M, Berger B, Cowen L. Matt: Local flexibility aids protein multiple structure alignment. PLoS Computational Biology, 2008, 4(1): e10.
Article MathSciNet Google Scholar
Gu J, Bourne P. Structural Bioinformatics (2nd edition). John Wiley, 2009.
Arun K S, Huang T S, Blostein S D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Analysis and Machine Intelligence, 1987, 9(5): 698–700.
Article Google Scholar
Sippl M J, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics, 2008, 24(3): 426–427.
Article Google Scholar
Chen L, Zhou T, Tang Y. Protein structure alignment by deterministic annealing. Bioinformatics, 2005, 21: 51–62.
Article MATH Google Scholar
Glasgow J, Kuo T, Davies J. Protein structure from contact maps: A case-based reasoning approach. Information Systems Frontiers, 2006, 8(1): 29–36.
Article Google Scholar
Bhattacharya S, Bhattacharyya C, Chandra N R. Comparison of protein structures by growing neighborhood alignments. BMC Bioinformatics, 2007, 8: Article No.77.
Kolbeck B, May P, Schmidt-Goenner T, Steinke T, Knapp E W. Connectivity independent protein-structure alignment: A hierarchical approach. BMC Bioinformatics, 2006, 7: Article No.510.
Google Scholar
Eidhammer I, Jonassen I, Taypor W. Structure comparison and structure patterns. Journal of Computational Biology, 2000, 7(5): 685–716.
Article Google Scholar
Shindyalov I N, Bourne P E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering, 1998, 11(9): 739–747.
Article Google Scholar
Taylor W R, Orengo C A. Protein structure alignment. Journal of Molecular Biology, 1989, 208(1): 1–22.
Article Google Scholar
Taylor WR. Protein structure comparison using iterated double dynamic programming. Protein Science, 1999, 8(3): 654–665.
Article Google Scholar
Jewett A I, Huang C C, Ferrin T E. MINRMS: An efficient algorithm for determining protein structure similarity using root-mean-squared-distance. Bioinformatics, 2003, 19(5): 625–634.
Article Google Scholar
Lotan I, Schwarzer F. Approximation of protein structure for fast similarity measures. Journal of Computational Biology, 2004, 11(2/3): 299–317.
Article Google Scholar
Gibrat J F, Madej T, Bryant S H. Surprising similarities in structure comparison. Current Opinion in Structural Biology, 1996, 6(3): 377–385.
Article Google Scholar
Kabsch W, Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983, 22(12): 2577–2637.
Article Google Scholar
Frishman D, Argos P. Knowledge-based protein secondary structure assignment. Proteins-Structure Function and Genetics, 1995, 23(4): 566–579.
Article Google Scholar
Holm L, Sander C. 3-D lookup: Fast protein structure database searches at 90 % reliability. In Proc. the 3rd Int. Conference on Intelligent Systems for Molecular Biology, July 1995, Vol.3, pp.179-187.
Nussinov R, Wolfson H J. Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc. National Academy of Sciences of USA, 1991, 88(23): 10495–10499.
Article Google Scholar
Le Q, Pollastri G, Koehl P. Structural alphabets for protein structure classification: A comparison study. Journal of Molecular Biology, 2009, 387(2): 431–450.
Article Google Scholar
Erdmann M A. Protein similarity from knot theory: Geometric convolution and line weavings. Journal of Computational Biology, 2005, 12(6): 609–637.
Article Google Scholar
Zhang Y, Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 2005, 33(7): 2302–2309.
Article Google Scholar
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins, 2004, 57(4): 702–710.
Article Google Scholar
Godzik A. The structural alignment between two proteins: Is there a unique answer? Protein Science, 1996, 5(7): 1325–1338.
Article Google Scholar
Murzin A G, Brenner S E, Hubbard T, Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995, 247(4): 536–540.
Google Scholar
Berman H M, Westbrook J, Feng Z et al. The protein data bank. Nucleic Acids Research, 2000, 28(1): 235–242.
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, 16802, U.S.A.
Woo-Cheol Kim
Department of Computer Science, Yonsei University, Seoul, 120-749, Korea
Sanghyun Park
Research Center of Information and Electronic Engineering, Hallym University, Chuncheon, Gangwon, 200-702, Korea
Jung-Im Won

Authors

Woo-Cheol Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sanghyun Park
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Im Won
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jung-Im Won.

Additional information

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea under Grant No.2012R1A1A3013084.

The preliminary version of the paper was published in the Proceedings of EDB2012.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOC 41.5 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, WC., Park, S. & Won, JI. CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution. J. Comput. Sci. Technol. 28, 647–656 (2013). https://doi.org/10.1007/s11390-013-1365-x

Download citation

Received: 08 September 2012
Revised: 10 May 2013
Published: 05 July 2013
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11390-013-1365-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

Abstract

Access this article

Similar content being viewed by others

MSACompro: Improving Multiple Protein Sequence Alignment by Predicted Structural Features

MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions

PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(DOC 41.5 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution

Abstract

Access this article

Similar content being viewed by others

MSACompro: Improving Multiple Protein Sequence Alignment by Predicted Structural Features

MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions

PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(DOC 41.5 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation