Skip to main content
Log in

High performance clustering algorithm for analysis of protein family clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Techniques for analyzing genome sequences in high performance environments to predict the function and structure of a protein have been developing. The function of a protein is determined by its characteristics and the sequence pattern, and a protein is classified as belonging to a family according to its genealogy and structure. This study determines the protein family of unknown proteins by analyzing the sequence database of the proteins, which is classified using a clustering algorithm. The analysis of the experimental clustering results verified that, by applying the proposed pf_cluster algorithm, the protein family of new proteins can be found using their sequence information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Bork P, Koonin EV (1998) Predicting functions from Protein sequences–where are the bottlenecks? Nat Genet 18(4):313–318

    Article  Google Scholar 

  2. Chargaff E (1950) Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6:201–209

    Article  Google Scholar 

  3. Watson JD, Crick FHC (1953) A structure for deoxyribose nucleic acid. Nature 171:737–738

    Article  Google Scholar 

  4. Altschul SF (1990) Basic local alignment search tool. J Mol Biol 215.3:403–410

    Article  Google Scholar 

  5. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96:2896–2901

    Article  Google Scholar 

  6. Wu CH (2003) Protein family classification and functional annotation. Comput Biol Chem 27(1):37–47

    Article  Google Scholar 

  7. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two Proteins. J Mol Biol 48(3):443–453

    Article  Google Scholar 

  8. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197

    Article  Google Scholar 

  9. Enright AJ, Ouzounis CA (2000) GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16(5):451–457

    Article  Google Scholar 

  10. Yona G, Linial N, Linial M (1999) ProtoMap: automatic classification of protein sequences, a hierarchy of Protein families, and local maps of the Protein space. Proteins 37(3):360–378

    Article  Google Scholar 

  11. Sasson O et al (2003) ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res 31(1):348–352

    Article  Google Scholar 

  12. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575–1584

    Article  Google Scholar 

  13. Chen Y et al (2006) SEQOPTICS: a protein sequence clustering system. BMC Bioinformatics 7(Suppl 4):S10

    Article  Google Scholar 

  14. Finn RD et al (2013) Pfam: the protein families database. Nucleic Acids Res. doi:10.1093/nar/gkt1223

  15. Bateman A et al (2002) The Pfam protein families database. Nucleic Acids Res 30:276–280

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2063006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gangman Yi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, SH., Yi, G. High performance clustering algorithm for analysis of protein family clusters. J Supercomput 72, 1878–1896 (2016). https://doi.org/10.1007/s11227-016-1706-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1706-y

Keywords

Navigation