Skip to main content

Construction of Protein Backbone Fragments Libraries on Large Protein Sets Using a Randomized Spectral Clustering Algorithm

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10330))

Included in the following conference series:

Abstract

The protein fragment libraries play an important role in a wide variety of structural biology applications. In this work, we present the use of a spectral clustering algorithm to analyze the fixed-length protein backbone fragment sets derived from the continuously growing Protein Data Bank (PDB) to construct libraries of protein fragments. Incorporating the rank-revealing randomized singular value decomposition algorithm into spectral clustering to fast approximate the dominant eigenvectors of the fragment affinity matrix enables the clustering algorithm to handle large-scale fragment sample sets. Compared to the popularly used protein fragment libraries developed by Kolodny et al., the fragments in our new libraries exhibit better representability across diverse protein structures in PDB. Moreover, using much larger fragment sample sets, libraries of longer fragments with length up to 20 residues are also generated. Our fragment libraries can be found at http://hpcr.cs.odu.edu/frag/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Munoz, V., Serrano, L.: Local versus nonlocal interactions in protein folding and stability – an experimentalist’s point of view. Fold. Des. 1(4), R71–R77 (1996)

    Article  Google Scholar 

  2. Chikenji, G., Fujitsuka, Y., Takada, S.: Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. Proc. Natl. Acad. Sci. 103(9), 3141–3146 (2006)

    Article  Google Scholar 

  3. Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian Scoring functions. J. Mol. Biol. 268, 209–225 (1997)

    Article  Google Scholar 

  4. de Oliveira, S.H.P., Shi, J., Deane, C.M.: Building a better fragment library for de novo protein structure prediction. PLoS ONE 10(4), e0123998 (2015)

    Article  Google Scholar 

  5. Rata, I., Li, Y., Jakobsson, E.: Backbone Statistical Potential from Local Sequence-Structure Interactions in Protein Loops. J. Phys. Chem. B 114(5), 1859–1869 (2010)

    Article  Google Scholar 

  6. Li, Y., Rata, I., Jakobsson, E.: Sampling multiple scoring functions can improve protein loop structure prediction accuracy. J. Chem. Inf. Model. 51(7), 1656–1666 (2011)

    Article  Google Scholar 

  7. Li, Y.: Conformational sampling in template-free protein loop structure modeling: an overview. Comput. Struct. Biotechnol. J. 5(6), e201302003 (2013)

    Article  Google Scholar 

  8. Di Maio, F., Shavlik, J., Phillips, G.: A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 22(14), 81–89 (2006)

    Article  Google Scholar 

  9. Terwiliger, T.C.: Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr. D Biol. Crystallogr. 59(1), 38–44 (2003)

    Article  Google Scholar 

  10. Budowski-Tal, I., Nov, Y., Kolodny, R.: FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. 107, 3481–3486 (2010)

    Article  Google Scholar 

  11. Keasar, C., Kolodny, R.: Using protein fragments for searching and data-mining protein databases. In: Proceedings of AAAI workshop of Artificial Intelligence and Robotics Methods in Computational Biology (2013)

    Google Scholar 

  12. Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small Libraries of Protein Fragments Model Native Protein Structures Accurately. J. Mol. Biol. 323, 297–307 (2005)

    Article  Google Scholar 

  13. Denise, C.: Structural GENOMICS exploring the 3D protein landscape. Simbios (2010)

    Google Scholar 

  14. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  15. Wang, G.L., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)

    Article  Google Scholar 

  16. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  17. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)

    Google Scholar 

  18. Ji, H., Weinberg, S., Li, Y.: A revisit of block power methods for finite state markov chain applications. arXiv:1610.08881 (2016)

  19. Ji, H., Yu, W., Li, Y.: A rank revealing randomized singular value decomposition (R3SVD) algorithm for low-rank matrix approximations. arXiv:1605.08134 (2016)

  20. Gu, Y., Yu, W., Li, Y.: Efficient randomized algorithms for adaptive low-rank factorizations of large matrices. arXiv:1606.09402 (2016)

  21. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  22. Chiang, Y.S., Gelfand, T.I., Kister, A.E., Gelfand, I.M.: New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. Proteins 68(4), 915–921 (2007)

    Article  Google Scholar 

  23. Elhefnawy, W., Chen, L., Han, Y., Li, Y.: ICOSA: a distance-dependent, orientation-specific coarse-grain contact potential for protein structure modeling. J. Mol. Biol. 427(15), 2562–2576 (2015)

    Article  Google Scholar 

  24. Li, Y., Liu, H., Rata, I., Jakobsson, E.: Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. J. Chem. Inf. Model. 53(2), 500–508 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

Y. Li acknowledges support from National Science Foundation through Grant No. CCF-1066471. W. Elhefnawy acknowledges support from Old Dominion University Modeling and Simulation Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaohang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Elhefnawy, W., Li, M., Wang, J., Li, Y. (2017). Construction of Protein Backbone Fragments Libraries on Large Protein Sets Using a Randomized Spectral Clustering Algorithm. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59575-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59574-0

  • Online ISBN: 978-3-319-59575-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics