Abstract
The protein fragment libraries play an important role in a wide variety of structural biology applications. In this work, we present the use of a spectral clustering algorithm to analyze the fixed-length protein backbone fragment sets derived from the continuously growing Protein Data Bank (PDB) to construct libraries of protein fragments. Incorporating the rank-revealing randomized singular value decomposition algorithm into spectral clustering to fast approximate the dominant eigenvectors of the fragment affinity matrix enables the clustering algorithm to handle large-scale fragment sample sets. Compared to the popularly used protein fragment libraries developed by Kolodny et al., the fragments in our new libraries exhibit better representability across diverse protein structures in PDB. Moreover, using much larger fragment sample sets, libraries of longer fragments with length up to 20 residues are also generated. Our fragment libraries can be found at http://hpcr.cs.odu.edu/frag/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Munoz, V., Serrano, L.: Local versus nonlocal interactions in protein folding and stability – an experimentalist’s point of view. Fold. Des. 1(4), R71–R77 (1996)
Chikenji, G., Fujitsuka, Y., Takada, S.: Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. Proc. Natl. Acad. Sci. 103(9), 3141–3146 (2006)
Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian Scoring functions. J. Mol. Biol. 268, 209–225 (1997)
de Oliveira, S.H.P., Shi, J., Deane, C.M.: Building a better fragment library for de novo protein structure prediction. PLoS ONE 10(4), e0123998 (2015)
Rata, I., Li, Y., Jakobsson, E.: Backbone Statistical Potential from Local Sequence-Structure Interactions in Protein Loops. J. Phys. Chem. B 114(5), 1859–1869 (2010)
Li, Y., Rata, I., Jakobsson, E.: Sampling multiple scoring functions can improve protein loop structure prediction accuracy. J. Chem. Inf. Model. 51(7), 1656–1666 (2011)
Li, Y.: Conformational sampling in template-free protein loop structure modeling: an overview. Comput. Struct. Biotechnol. J. 5(6), e201302003 (2013)
Di Maio, F., Shavlik, J., Phillips, G.: A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 22(14), 81–89 (2006)
Terwiliger, T.C.: Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr. D Biol. Crystallogr. 59(1), 38–44 (2003)
Budowski-Tal, I., Nov, Y., Kolodny, R.: FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. 107, 3481–3486 (2010)
Keasar, C., Kolodny, R.: Using protein fragments for searching and data-mining protein databases. In: Proceedings of AAAI workshop of Artificial Intelligence and Robotics Methods in Computational Biology (2013)
Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small Libraries of Protein Fragments Model Native Protein Structures Accurately. J. Mol. Biol. 323, 297–307 (2005)
Denise, C.: Structural GENOMICS exploring the 3D protein landscape. Simbios (2010)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Wang, G.L., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)
Ji, H., Weinberg, S., Li, Y.: A revisit of block power methods for finite state markov chain applications. arXiv:1610.08881 (2016)
Ji, H., Yu, W., Li, Y.: A rank revealing randomized singular value decomposition (R3SVD) algorithm for low-rank matrix approximations. arXiv:1605.08134 (2016)
Gu, Y., Yu, W., Li, Y.: Efficient randomized algorithms for adaptive low-rank factorizations of large matrices. arXiv:1606.09402 (2016)
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2009)
Chiang, Y.S., Gelfand, T.I., Kister, A.E., Gelfand, I.M.: New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. Proteins 68(4), 915–921 (2007)
Elhefnawy, W., Chen, L., Han, Y., Li, Y.: ICOSA: a distance-dependent, orientation-specific coarse-grain contact potential for protein structure modeling. J. Mol. Biol. 427(15), 2562–2576 (2015)
Li, Y., Liu, H., Rata, I., Jakobsson, E.: Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. J. Chem. Inf. Model. 53(2), 500–508 (2013)
Acknowledgements
Y. Li acknowledges support from National Science Foundation through Grant No. CCF-1066471. W. Elhefnawy acknowledges support from Old Dominion University Modeling and Simulation Fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Elhefnawy, W., Li, M., Wang, J., Li, Y. (2017). Construction of Protein Backbone Fragments Libraries on Large Protein Sets Using a Randomized Spectral Clustering Algorithm. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-59575-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59574-0
Online ISBN: 978-3-319-59575-7
eBook Packages: Computer ScienceComputer Science (R0)