High performance clustering algorithm for analysis of protein family clusters

Han, Seok-Hyeon; Yi, Gangman

doi:10.1007/s11227-016-1706-y

High performance clustering algorithm for analysis of protein family clusters

Published: 29 March 2016

Volume 72, pages 1878–1896, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Seok-Hyeon Han¹ &
Gangman Yi¹

249 Accesses
1 Citation
Explore all metrics

Abstract

Techniques for analyzing genome sequences in high performance environments to predict the function and structure of a protein have been developing. The function of a protein is determined by its characteristics and the sequence pattern, and a protein is classified as belonging to a family according to its genealogy and structure. This study determines the protein family of unknown proteins by analyzing the sequence database of the proteins, which is classified using a clustering algorithm. The analysis of the experimental clustering results verified that, by applying the proposed pf_cluster algorithm, the protein family of new proteins can be found using their sequence information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Article Open access 15 July 2019

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Article Open access 05 February 2015

MidClustpy: A Clustering Approach to Predict Coding Region in a Biological Sequence

References

Bork P, Koonin EV (1998) Predicting functions from Protein sequences–where are the bottlenecks? Nat Genet 18(4):313–318
Article Google Scholar
Chargaff E (1950) Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6:201–209
Article Google Scholar
Watson JD, Crick FHC (1953) A structure for deoxyribose nucleic acid. Nature 171:737–738
Article Google Scholar
Altschul SF (1990) Basic local alignment search tool. J Mol Biol 215.3:403–410
Article Google Scholar
Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96:2896–2901
Article Google Scholar
Wu CH (2003) Protein family classification and functional annotation. Comput Biol Chem 27(1):37–47
Article Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two Proteins. J Mol Biol 48(3):443–453
Article Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Article Google Scholar
Enright AJ, Ouzounis CA (2000) GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16(5):451–457
Article Google Scholar
Yona G, Linial N, Linial M (1999) ProtoMap: automatic classification of protein sequences, a hierarchy of Protein families, and local maps of the Protein space. Proteins 37(3):360–378
Article Google Scholar
Sasson O et al (2003) ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res 31(1):348–352
Article Google Scholar
Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575–1584
Article Google Scholar
Chen Y et al (2006) SEQOPTICS: a protein sequence clustering system. BMC Bioinformatics 7(Suppl 4):S10
Article Google Scholar
Finn RD et al (2013) Pfam: the protein families database. Nucleic Acids Res. doi:10.1093/nar/gkt1223
Bateman A et al (2002) The Pfam protein families database. Nucleic Acids Res 30:276–280
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2063006).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Gangneung-Wonju National University, Wonju, Gangwon-do, 220-711, South Korea
Seok-Hyeon Han & Gangman Yi

Authors

Seok-Hyeon Han
View author publications
You can also search for this author in PubMed Google Scholar
Gangman Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gangman Yi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, SH., Yi, G. High performance clustering algorithm for analysis of protein family clusters. J Supercomput 72, 1878–1896 (2016). https://doi.org/10.1007/s11227-016-1706-y

Download citation

Published: 29 March 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11227-016-1706-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High performance clustering algorithm for analysis of protein family clusters

Abstract

Access this article

Similar content being viewed by others

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

MidClustpy: A Clustering Approach to Predict Coding Region in a Biological Sequence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High performance clustering algorithm for analysis of protein family clusters

Abstract

Access this article

Similar content being viewed by others

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

MidClustpy: A Clustering Approach to Predict Coding Region in a Biological Sequence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation