A Parallel Multiple K-Means Clustering and Application on Detect Near Native Model

Wu, Hongjie; Wu, Chuang; cheng, Chen; Song, Longfei; Jiang, Min

doi:10.1007/978-3-319-42294-7_78

Hongjie Wu¹⁵,
Chuang Wu¹⁵,
Chen cheng¹⁵,
Longfei Song¹⁵ &
…
Min Jiang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9772))

Included in the following conference series:

International Conference on Intelligent Computing

1788 Accesses

Abstract

Protein structure clustering is an important and essential step in protein 3D structure prediction. However, two issues limited current methods. But the large-scale candidate models in the decoy and undistinguished metric limit current methods to identify the near-native models. In this paper we proposed a novel method based on parallel multiple K-means cluster algorithms to identify the near-native structures. Parallel is introduced to reduce the memory and time consumption and multiple K-means to fusion different metrics of protein 3D similarity. Tested on 56 proteins, MK-means can well identify 33(58.9 %) proteins which are better or the same to SPICKER selected and 10 of the 33 proteins is the same results to the SPICKER. It indicates the performance of MK-means is similar to the top protein clustered tools SPICKER.

This paper is supported by grants no. 61540058, 61202290 under the National Natural Science Foundation of China (http://www.nsfc.gov.cn) and grants no. BK20131154 under Natural Science Foundation of Jiangsu Province.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang, D.S., Zhang, L., Han, K., et al.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15(6), 553–560 (2014)
Article Google Scholar
Wu, H., Lü, Q., Quan, L., et al.: patGPCR: a multitemplate approach for improving 3D structure prediction of transmembrane helices of G-protein-coupled receptors. Comput. Math. Methods Med. 2013(1), 151–164 (2013)
Google Scholar
Yang, J., Yan, R., Roy, A., et al.: The I-TASSER suite: protein structure and function prediction. Nat. Methods 12(1), 7–8 (2014)
Article Google Scholar
You, Z.H., Lei, Y.K., Zhu, L., et al.: Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics 14(8), 69–75 (2013)
Google Scholar
Ravinder, A., Bray, J.K., Goddard, W.A.: Bihelix: towards de novo structure prediction of an ensemble of G-protein coupled receptor conformations. Proteins Struct. Funct. Bioinformatics 80(2), 505–518 (2012)
Article Google Scholar
Roy, A., Xu, D., Poisson, J., et al.: A protocol for computer-based protein structure and function prediction. J. Visualized Exp. 57(57), e3259–e3259 (2012)
Google Scholar
Zhang, Y., Skolnick, J.: SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem. 25(25), 865–871 (2004)
Article Google Scholar
Jamroz, M., Kolinski, A.: ClusCo: Clustering and comparison of protein models. BMC Bioinformatics 14(1), 898 (2013)
Article Google Scholar
Francois, B., Rojan, S., Yong, Z., et al.: Durandal: fast exact clustering of protein decoys. J. Comput. Chem. 33(4), 471–474 (2012)
Article Google Scholar
Zhu, L., Huang, D.S.: A Rayleigh-Ritz style method for large-scale discriminant analysis. Pattern Recogn. 47(4), 1698–1708 (2014)
Article MathSciNet MATH Google Scholar
Tim, H., Mikael, B., Wouter, B., et al.: Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 28(4), 510–515 (2012)
Article Google Scholar
Zhang, J., Xu, D.: Fast Algorithm for Clustering a Large Number of Protein Structural Decoys. In: Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine, pp. 30–36. IEEE Computer Society (2011)
Google Scholar
Zhang, Y., Skolnick, J.: Scoring function for automated assessment of protein structure template quality. Proteins Struct. Funct. Bioinformatics 68(4), 702–710 (2007)
Article Google Scholar
Shatabda, S., Newton, M.A., Rashid, M.A., et al.: How good are simplified models for protein structure prediction? Adv. Bioinformatics 2014, 867179 (2014)
Article Google Scholar
Zhou, J., Wishart, D.S.: An improved method to detect correct protein folds using partial clustering. BMC Bioinformatics 14(1), 101 (2013)
Article Google Scholar
Tan, C.W., Jones, D.T.: Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction. BMC Bioinformatics 9(4), 1–23 (2008)
Google Scholar
Wu, S., Skolnick, J., Zhang, Y.: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 5, 17 (2007)
Article Google Scholar
Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Cryst A32, 922–923 (1976)
Article Google Scholar
Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Cryst A34, 827–828 (1978)
Article Google Scholar
Dehzangi, A., Paliwal, K., Lyons, J., et al.: Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genom. 15(Suppl 1), 133–139 (2014)
Article Google Scholar
Levitt, M., Gerstein, M.: A unified statistical framework for sequence comparison and structure comparison. Proc. Nat. Acad. Sci. U.S.A. 95(11), 5913–5920 (1998)
Article Google Scholar
Zhang, J., Xu, D.: Fast algorithm for population-based protein structural model analysis. Proteomics 13(2), 221–229 (2013)
Article Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Jain, Anil K.: Data Clustering: 50 Years Beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2015)
Article Google Scholar
Huang, T., Lu, D.T., Li, X., et al.: GPU-based SNESIM implementation for multiple-point statistical simulation. Comput. Geosci. 54(4), 75–87 (2013)
Article Google Scholar

Download references

Acknowledgments

This paper is supported by grants no. 61540058, 61202290 under the National Natural Science Foundation of China (http://www.nsfc.gov.cn) and grants no. BK20131154 under Natural Science Foundation of Jiangsu Province. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the paper. Chuang Wu and Longfei Song wrote the codes, paper and implemented the experiments, Hongjie Wu designed the algorithm, experiments and wrote the paper, Min Jiang prepared the datasets.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
Hongjie Wu, Chuang Wu, Chen cheng & Longfei Song
The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
Min Jiang

Authors

Hongjie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chuang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chen cheng
View author publications
You can also search for this author in PubMed Google Scholar
Longfei Song
View author publications
You can also search for this author in PubMed Google Scholar
Min Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjie Wu .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
University of Ulsan , Ulsan, Korea (Republic of)
Kang-Hyun Jo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H., Wu, C., cheng, C., Song, L., Jiang, M. (2016). A Parallel Multiple K-Means Clustering and Application on Detect Near Native Model. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9772. Springer, Cham. https://doi.org/10.1007/978-3-319-42294-7_78

Download citation

DOI: https://doi.org/10.1007/978-3-319-42294-7_78
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42293-0
Online ISBN: 978-3-319-42294-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics