Skip to main content

High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies

  • Conference paper
  • First Online:
Supercomputing (RuSCDays 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1129))

Included in the following conference series:

  • 897 Accesses

Abstract

Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beerens, K., et al.: Evolutionary analysis as a powerful complement to energy calculations for protein stabilization. ACS Catal. 8(10), 9420–9428 (2018)

    Article  Google Scholar 

  2. Bornscheuer, U.T.: The fourth wave of biocatalysis is approaching. Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci. 376(2110), 20170063 (2017)

    Article  Google Scholar 

  3. Buß, O., Buchholz, P.C., Gräff, M., Klausmann, P., Rudat, J., Pleiss, J.: The \(\omega \)-transaminase engineering database (oTAED): a navigation tool in protein sequence and structure space. Proteins Struct. Funct. Bioinf. 86(5), 566–580 (2018)

    Article  Google Scholar 

  4. Hendrikse, N.M., Charpentier, G., Nordling, E., Syrén, P.O.: Ancestral diterpene cyclases show increased thermostability and substrate acceptance. FEBS J. 285(24), 4660–4673 (2018)

    Article  Google Scholar 

  5. Lutz, S., Iamurri, S.M.: Protein engineering: past, present, and future. In: Bornscheuer, U.T., Höhne, M. (eds.) Protein Engineering. MMB, vol. 1685, pp. 1–12. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7366-8_1

    Chapter  Google Scholar 

  6. Pellis, A., Cantone, S., Ebert, C., Gardossi, L.: Evolving biocatalysis to meet bioeconomy challenges and opportunities. New Biotechnol. 40, 154–169 (2018)

    Article  Google Scholar 

  7. Suplatov, D., Voevodin, V., Švedas, V.: Robust enzyme design: bioinformatic tools for improved protein stability. Biotechnol. J. 10(3), 344–355 (2015)

    Article  Google Scholar 

  8. Armougom, F., et al.: Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-coffee. Nucleic Acids Res. 34(suppl-2), W604–W608 (2006)

    Article  Google Scholar 

  9. Krieger, E., Vriend, G.: YASARA view–molecular graphics for all devices–from smartphones to workstations. Bioinformatics 30(20), 2981–2982 (2014)

    Article  Google Scholar 

  10. Kuipers, R.K., et al.: 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins Struct. Funct. Bioinf. 78(9), 2101–2113 (2010)

    Google Scholar 

  11. Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9), 1073–1079 (2007)

    Article  Google Scholar 

  12. Pie, J., Kim, B., Grishin, N.: PROMALS3D: a tool for multiple sequence and structure alignment. Nucleic Acids Res. 36(7), 2295–2300 (2008)

    Article  Google Scholar 

  13. Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Švedas, V.K.: Mustguseal: a server for multiple structure-guided sequence alignment of protein families. Bioinformatics 34(9), 1583–1585 (2018)

    Article  Google Scholar 

  14. Pleiss, J.: Systematic analysis of large enzyme families: identification of specificity-and selectivity-determining hotspots. ChemCatChem 6(4), 944–950 (2014)

    Article  Google Scholar 

  15. Sumbalova, L., Stourac, J., Martinek, T., Bednar, D., Damborsky, J.: Hotspot wizard 30: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 46(W1), W356–W362 (2018)

    Article  Google Scholar 

  16. Suplatov, D., Kirilin, E., Arbatsky, M., Takhaveev, V., Švedas, V.: pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families. Nucleic Acids Res. 42(W1), W344–W349 (2014)

    Article  Google Scholar 

  17. Suplatov, D., Kirilin, E., Švedas, V.: Bioinformatic analysis of protein families to select function-related variable positions. In: Understanding Enzymes, pp. 375–410. Pan Stanford (2016)

    Google Scholar 

  18. Suplatov, D., Kirilin, E., Takhaveev, V., Švedas, V.: Zebra: a web server for bioinformatic analysis of diverse protein families. J. Biomol. Struct. Dyn. 32(11), 1752–1758 (2014)

    Article  Google Scholar 

  19. Suplatov, D., Shalaeva, D., Kirilin, E., Arzhanik, V., Švedas, V.: Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J. Biomol. Struct. Dyn. 32(1), 75–87 (2014)

    Article  Google Scholar 

  20. Suplatov, D., Sharapova, Y., Timonina, D., Kopylov, K., Švedas, V.: The visualcmat: a web-server to select and interpret correlated mutations/co-evolving residues in protein families. J. Bioinf. Comput. Biol. 16(02), 1840005 (2018)

    Article  Google Scholar 

  21. Fesko, K., Suplatov, D., Švedas, V.: Bioinformatic analysis of the fold type I PLP-dependent enzymes reveals determinants of reaction specificity in l-threonine aldolase from Aeromonas jandaei. FEBS Open Bio 8(6), 1013–1028 (2018)

    Article  Google Scholar 

  22. Genz, M., et al.: Engineering the Amine Transaminase from Vibrio fluvialis towards Branched-Chain substrates. ChemCatChem 8(20), 3199–3202 (2016)

    Article  Google Scholar 

  23. Steffen-Munsberg, F., et al.: Bioinformatic analysis of a PLP-dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol. Adv. 33(5), 566–604 (2015)

    Article  Google Scholar 

  24. Knight, A.M., et al.: Bioinformatic analysis of fold-type III PLP-dependent enzymes discovers multimeric racemases. Appl. Microbiol. Biotechnol. 101(4), 1499–1507 (2017)

    Article  Google Scholar 

  25. Bezsudnova, E.Y., et al.: Biochemical and structural insights into PLP fold type IV transaminase from thermobaculum terrenum. Biochimie 158, 130–138 (2019)

    Article  Google Scholar 

  26. Bezsudnova, E.Y., Dibrova, D.V., Nikolaeva, A.Y., Rakitina, T.V., Popov, V.O.: Identification of branched-chain amino acid aminotransferases active towards (R)-(+)-1-phenylethylamine among PLP fold type IV transaminases. J. Biotechnol. 271, 26–28 (2018)

    Article  Google Scholar 

  27. Bezsudnova, E.Y., Stekhanova, T.N., Suplatov, D.A., Mardanov, A.V., Ravin, N.V., Popov, V.O.: Experimental and computational studies on the unusual substrate specificity of branched-chain amino acid aminotransferase from thermoproteus uzoniensis. Arch. Biochem. Biophys. 607, 27–36 (2016)

    Article  Google Scholar 

  28. Jochens, H., Aerts, D., Bornscheuer, U.T.: Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Eng. Des. Sel. 23(12), 903–909 (2010)

    Article  Google Scholar 

  29. Kourist, R., et al.: The \(\alpha \)/\(\beta \)-hydrolase fold 3DM database (ABHDB) as a tool for protein engineering. ChemBioChem 11(12), 1635–1643 (2010)

    Article  Google Scholar 

  30. Pleiss, J., Fischer, M., Peiker, M., Thiele, C., Schmid, R.D.: Lipase engineering database: understanding and exploiting sequence-structure-function relationships. J. Mol. Catal. B Enzym. 10(5), 491–508 (2000)

    Article  Google Scholar 

  31. Rauwerdink, A., Kazlauskas, R.J.: How the same core catalytic machinery catalyzes 17 different reactions: the serine-histidine-aspartate catalytic triad of \(\alpha \)/\(\beta \)-hydrolase fold enzymes. ACS Catal. 5(10), 6153–6176 (2015)

    Article  Google Scholar 

  32. Suplatov, D., Besenmatter, W., Švedas, V., Svendsen, A.: Bioinformatic analysis of alpha/beta-hydrolase fold enzymes reveals subfamily-specific positions responsible for discrimination of amidase and lipase activities. Protein Eng. Des. Sel. 25(11), 689–697 (2012)

    Article  Google Scholar 

  33. Widmann, M., Juhl, P.B., Pleiss, J.: Structural classification by the Lipase Engineering Database: a case study of Candida antarctica lipase A. BMC Genom. 11(1), 123 (2010)

    Article  Google Scholar 

  34. Deaguero, A.L., Blum, J.K., Bommarius, A.S.: Biocatalytic synthesis of \(\beta \)-lactam antibiotics. Encycl. Ind. Biotechnol. Bioprocess Bioseparation Cell Technol., 1–18 (2009)

    Google Scholar 

  35. Suplatov, D., Panin, N., Kirilin, E., Shcherbakova, T., Kudryavtsev, P., Švedas, V.: Computational design of a pH stable enzyme: understanding molecular mechanism of penicillin acylase’s adaptation to alkaline conditions. PLoS ONE 9(6), e100643 (2014)

    Article  Google Scholar 

  36. Grienke, U., et al.: Discovery of prenylated flavonoids with dual activity against influenza virus and streptococcus pneumoniae. Sci. Rep. 6, 27156 (2016)

    Article  Google Scholar 

  37. Sharapova, Y.A., Švedas, V.: Molecular modeling of the binding of the allosteric inhibitor optactin at a new binding site in neuraminidase a from streptococcus pneumoniae. Mosc. Univ. Chem. Bull. 73(5), 205–211 (2018)

    Article  Google Scholar 

  38. Sharapova, Y., Suplatov, D., Švedas, V.: Neuraminidase a from streptococcus pneumoniae has a modular organization of catalytic and lectin domains separated by a flexible linker. FEBS J. 285(13), 2428–2445 (2018)

    Article  Google Scholar 

  39. Walther, E., et al.: Dual acting neuraminidase inhibitors open new opportunities to disrupt the lethal synergism between streptococcus pneumoniae and influenza virus. Frontiers Microbiol. 7, 357 (2016)

    Article  Google Scholar 

  40. Xu, Z., et al.: Sequence diversity of nana manifests in distinct enzyme kinetics and inhibitor susceptibility. Sci. Rep. 6, 25169 (2016)

    Article  Google Scholar 

  41. Karasev, D., Veselovsky, A., Lagunin, A., Filimonov, D., Sobolev, B.: Determination of amino acid residues responsible for specific interaction of protein kinases with small molecule inhibitors. Mol. Biol. 52(3), 478–487 (2018)

    Article  Google Scholar 

  42. Korbee, C.J., et al.: Combined chemical genetics and data-driven bioinformatics approach identifies receptor tyrosine kinase inhibitors as host-directed antimicrobials. Nat. Commun. 9(1), 358 (2018)

    Article  Google Scholar 

  43. Song, J., et al.: Phosphopredict: a bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci. Rep. 7(1), 6862 (2017)

    Article  Google Scholar 

  44. Suplatov, D., Kopylov, K., Sharapova, Y., Švedas, V.: Human p38\(\alpha \) mitogen-activated protein kinase in the Asp168-Phe169-Gly170-in (DFG-in) state can bind allosteric inhibitor doramapimod. J. Biomol. Struct. Dyn. 37(8), 2049–2060 (2019)

    Article  Google Scholar 

  45. Consortium, U.: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506–D515 (2018)

    Article  Google Scholar 

  46. Burley, S.K., Berman, H.M., Kleywegt, G.J., Markley, J.L., Nakamura, H., Velankar, S.: Protein Data Bank (PDB): the single global macromolecular structure archive. In: Wlodawer, A., Dauter, Z., Jaskolski, M. (eds.) Protein Crystallography. MMB, vol. 1607, pp. 627–641. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-7000-1_26

    Chapter  Google Scholar 

  47. Sadovnichy, V., Tikhonravov, A., Voevodin, V., Opanasenko, V.I.: “Lomonosov”: supercomputing at Moscow State University. Contemporary High Performance Computing: From Petascale toward Exascale (Chapman & Hall/CRC Computational Science), pp. 283–307 (2013)

    Google Scholar 

  48. Krissinel, E., Henrick, K.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. Sect. D: Biol. Crystallogr. 60(12), 2256–2268 (2004)

    Article  Google Scholar 

  49. Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)

    Article  Google Scholar 

  50. Suplatov, D., Popova, N., Zhumatiy, S., Voevodin, V., Švedas, V.: Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer. J. Bioinf. Comput. Biol. 14(02), 1641008 (2016)

    Article  Google Scholar 

  51. Obe, R.O., Hsu, L.S.: PostgreSQL: Up and Running: A Practical Guide to the Advanced Open Source Database. O’Reilly Media Inc., Sebastopol (2017)

    Google Scholar 

  52. Shegay, M.V., Suplatov, D.A., Popova, N.N., Švedas, V.K., Voevodin, V.V.: parMATT: parallel multiple alignment of protein 3D-structures with translations and twists for distributed-memory systems. Bioinformatics 35(21), 4456–4458 (2019)

    Article  Google Scholar 

  53. Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 4(1), e10 (2008)

    Article  MathSciNet  Google Scholar 

  54. Kalaimathy, S., Sowdhamini, R., Kanagarajadurai, K.: Critical assessment of structure-based sequence alignment methods at distant relationships. Briefings Bioinf. 12(2), 163–175 (2011)

    Article  Google Scholar 

  55. Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2010)

    Article  Google Scholar 

  56. Katoh, K., Standley, D.M.: Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)

    Article  Google Scholar 

  57. Söding, J., Biegert, A., Lupas, A.N.: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33(suppl-2), W244–W248 (2005)

    Article  Google Scholar 

  58. Fischer, J., Mayer, C.E., Söding, J.: Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24(5), 613–620 (2008)

    Article  Google Scholar 

  59. Nobile, M.S., Cazzaniga, P., Tangherloni, A., Besozzi, D.: Graphics processing units in bioinformatics, computational biology and systems biology. Briefings Bioinf. 18(5), 870–885 (2016)

    Google Scholar 

  60. Vega-Rodríguez, M.A., Rubio-Largo, A.: Parallelism in computational biology: a view from diverse high-performance computing applications. Int. J. High Perform. Comput. Appl. 32(3), 317–320 (2018)

    Article  Google Scholar 

  61. Götz, A.W., Williamson, M.J., Xu, D., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with amber on GPUs. 1. Generalized born. J. Chem. Theor. Comput. 8(5), 1542–1555 (2012)

    Article  Google Scholar 

  62. Salomon-Ferrer, R., Götz, A.W., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theor. Comput. 9(9), 3878–3888 (2013)

    Article  Google Scholar 

  63. Sharapova, Y.A., Suplatov, D.A., Švedas, V.K.: Simulating the long-timescale structural behavior of bacterial and influenza neuraminidases with different HPC resources. Supercomput. Frontiers Innovations 5(3), 30–33 (2018)

    Google Scholar 

  64. Suplatov, D., Sharapova, Y., Popova, N., Kopylov, K., Voevodin, V., Švedas, V.: Molecular dynamics in the force field FF14SB in water TIP4P-EW, and in the force field FF15IPQ in water SPC/EB: a comparative analysis on GPU and CPU (in Russian). Bull. South Ural State University Ser. Comput. Math. Softw. Eng. 8(1), 71–88 (2019)

    Google Scholar 

  65. Imbernón, B., Prades, J., Giménez, D., Cecilia, J.M., Silla, F.: Enhancing large-scale docking simulation on heterogeneous systems: an MPI vs rCUDA study. Future Gener. Comput. Syst. 79, 26–37 (2018)

    Article  Google Scholar 

  66. Prakhov, N.D., Chernorudskiy, A.L., Gainullin, M.R.: VSDocker: a tool for parallel high-throughput virtual screening using autodock on windows-based computer clusters. Bioinformatics 26(10), 1374–1375 (2010)

    Article  Google Scholar 

  67. Suplatov, D., Timonina, D., Sharapova, Y., Švedas, V.: Yosshi: a web-server for disulfide engineering by bioinformatic analysis of diverse protein families. Nucleic Acids Res. 47(W1), 308–314 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Russian Foundation for Basic Research grant #18-29-13060 and carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University supported by the project RFMEFI62117X0011 [47].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Suplatov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Suplatov, D. et al. (2019). High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2019. Communications in Computer and Information Science, vol 1129. Springer, Cham. https://doi.org/10.1007/978-3-030-36592-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36592-9_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36591-2

  • Online ISBN: 978-3-030-36592-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics