High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies

Suplatov, Dmitry; Sharapova, Yana; Shegay, Maxim; Popova, Nina; Fesko, Kateryna; Voevodin, Vladimir; Švedas, Vytas

doi:10.1007/978-3-030-36592-9_21

Dmitry Suplatov⁸,
Yana Sharapova⁸,
Maxim Shegay⁸,
Nina Popova⁸,
Kateryna Fesko⁹,
Vladimir Voevodin⁸ &
…
Vytas Švedas⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1129))

Included in the following conference series:

Russian Supercomputing Days

897 Accesses

Abstract

Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beerens, K., et al.: Evolutionary analysis as a powerful complement to energy calculations for protein stabilization. ACS Catal. 8(10), 9420–9428 (2018)
Article Google Scholar
Bornscheuer, U.T.: The fourth wave of biocatalysis is approaching. Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci. 376(2110), 20170063 (2017)
Article Google Scholar
Buß, O., Buchholz, P.C., Gräff, M., Klausmann, P., Rudat, J., Pleiss, J.: The \(\omega \)-transaminase engineering database (oTAED): a navigation tool in protein sequence and structure space. Proteins Struct. Funct. Bioinf. 86(5), 566–580 (2018)
Article Google Scholar
Hendrikse, N.M., Charpentier, G., Nordling, E., Syrén, P.O.: Ancestral diterpene cyclases show increased thermostability and substrate acceptance. FEBS J. 285(24), 4660–4673 (2018)
Article Google Scholar
Lutz, S., Iamurri, S.M.: Protein engineering: past, present, and future. In: Bornscheuer, U.T., Höhne, M. (eds.) Protein Engineering. MMB, vol. 1685, pp. 1–12. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7366-8_1
Chapter Google Scholar
Pellis, A., Cantone, S., Ebert, C., Gardossi, L.: Evolving biocatalysis to meet bioeconomy challenges and opportunities. New Biotechnol. 40, 154–169 (2018)
Article Google Scholar
Suplatov, D., Voevodin, V., Švedas, V.: Robust enzyme design: bioinformatic tools for improved protein stability. Biotechnol. J. 10(3), 344–355 (2015)
Article Google Scholar
Armougom, F., et al.: Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-coffee. Nucleic Acids Res. 34(suppl-2), W604–W608 (2006)
Article Google Scholar
Krieger, E., Vriend, G.: YASARA view–molecular graphics for all devices–from smartphones to workstations. Bioinformatics 30(20), 2981–2982 (2014)
Article Google Scholar
Kuipers, R.K., et al.: 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins Struct. Funct. Bioinf. 78(9), 2101–2113 (2010)
Google Scholar
Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9), 1073–1079 (2007)
Article Google Scholar
Pie, J., Kim, B., Grishin, N.: PROMALS3D: a tool for multiple sequence and structure alignment. Nucleic Acids Res. 36(7), 2295–2300 (2008)
Article Google Scholar
Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Švedas, V.K.: Mustguseal: a server for multiple structure-guided sequence alignment of protein families. Bioinformatics 34(9), 1583–1585 (2018)
Article Google Scholar
Pleiss, J.: Systematic analysis of large enzyme families: identification of specificity-and selectivity-determining hotspots. ChemCatChem 6(4), 944–950 (2014)
Article Google Scholar
Sumbalova, L., Stourac, J., Martinek, T., Bednar, D., Damborsky, J.: Hotspot wizard 30: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 46(W1), W356–W362 (2018)
Article Google Scholar
Suplatov, D., Kirilin, E., Arbatsky, M., Takhaveev, V., Švedas, V.: pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families. Nucleic Acids Res. 42(W1), W344–W349 (2014)
Article Google Scholar
Suplatov, D., Kirilin, E., Švedas, V.: Bioinformatic analysis of protein families to select function-related variable positions. In: Understanding Enzymes, pp. 375–410. Pan Stanford (2016)
Google Scholar
Suplatov, D., Kirilin, E., Takhaveev, V., Švedas, V.: Zebra: a web server for bioinformatic analysis of diverse protein families. J. Biomol. Struct. Dyn. 32(11), 1752–1758 (2014)
Article Google Scholar
Suplatov, D., Shalaeva, D., Kirilin, E., Arzhanik, V., Švedas, V.: Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J. Biomol. Struct. Dyn. 32(1), 75–87 (2014)
Article Google Scholar
Suplatov, D., Sharapova, Y., Timonina, D., Kopylov, K., Švedas, V.: The visualcmat: a web-server to select and interpret correlated mutations/co-evolving residues in protein families. J. Bioinf. Comput. Biol. 16(02), 1840005 (2018)
Article Google Scholar
Fesko, K., Suplatov, D., Švedas, V.: Bioinformatic analysis of the fold type I PLP-dependent enzymes reveals determinants of reaction specificity in l-threonine aldolase from Aeromonas jandaei. FEBS Open Bio 8(6), 1013–1028 (2018)
Article Google Scholar
Genz, M., et al.: Engineering the Amine Transaminase from Vibrio fluvialis towards Branched-Chain substrates. ChemCatChem 8(20), 3199–3202 (2016)
Article Google Scholar
Steffen-Munsberg, F., et al.: Bioinformatic analysis of a PLP-dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol. Adv. 33(5), 566–604 (2015)
Article Google Scholar
Knight, A.M., et al.: Bioinformatic analysis of fold-type III PLP-dependent enzymes discovers multimeric racemases. Appl. Microbiol. Biotechnol. 101(4), 1499–1507 (2017)
Article Google Scholar
Bezsudnova, E.Y., et al.: Biochemical and structural insights into PLP fold type IV transaminase from thermobaculum terrenum. Biochimie 158, 130–138 (2019)
Article Google Scholar
Bezsudnova, E.Y., Dibrova, D.V., Nikolaeva, A.Y., Rakitina, T.V., Popov, V.O.: Identification of branched-chain amino acid aminotransferases active towards (R)-(+)-1-phenylethylamine among PLP fold type IV transaminases. J. Biotechnol. 271, 26–28 (2018)
Article Google Scholar
Bezsudnova, E.Y., Stekhanova, T.N., Suplatov, D.A., Mardanov, A.V., Ravin, N.V., Popov, V.O.: Experimental and computational studies on the unusual substrate specificity of branched-chain amino acid aminotransferase from thermoproteus uzoniensis. Arch. Biochem. Biophys. 607, 27–36 (2016)
Article Google Scholar
Jochens, H., Aerts, D., Bornscheuer, U.T.: Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Eng. Des. Sel. 23(12), 903–909 (2010)
Article Google Scholar
Kourist, R., et al.: The \(\alpha \)/\(\beta \)-hydrolase fold 3DM database (ABHDB) as a tool for protein engineering. ChemBioChem 11(12), 1635–1643 (2010)
Article Google Scholar
Pleiss, J., Fischer, M., Peiker, M., Thiele, C., Schmid, R.D.: Lipase engineering database: understanding and exploiting sequence-structure-function relationships. J. Mol. Catal. B Enzym. 10(5), 491–508 (2000)
Article Google Scholar
Rauwerdink, A., Kazlauskas, R.J.: How the same core catalytic machinery catalyzes 17 different reactions: the serine-histidine-aspartate catalytic triad of \(\alpha \)/\(\beta \)-hydrolase fold enzymes. ACS Catal. 5(10), 6153–6176 (2015)
Article Google Scholar
Suplatov, D., Besenmatter, W., Švedas, V., Svendsen, A.: Bioinformatic analysis of alpha/beta-hydrolase fold enzymes reveals subfamily-specific positions responsible for discrimination of amidase and lipase activities. Protein Eng. Des. Sel. 25(11), 689–697 (2012)
Article Google Scholar
Widmann, M., Juhl, P.B., Pleiss, J.: Structural classification by the Lipase Engineering Database: a case study of Candida antarctica lipase A. BMC Genom. 11(1), 123 (2010)
Article Google Scholar
Deaguero, A.L., Blum, J.K., Bommarius, A.S.: Biocatalytic synthesis of \(\beta \)-lactam antibiotics. Encycl. Ind. Biotechnol. Bioprocess Bioseparation Cell Technol., 1–18 (2009)
Google Scholar
Suplatov, D., Panin, N., Kirilin, E., Shcherbakova, T., Kudryavtsev, P., Švedas, V.: Computational design of a pH stable enzyme: understanding molecular mechanism of penicillin acylase’s adaptation to alkaline conditions. PLoS ONE 9(6), e100643 (2014)
Article Google Scholar
Grienke, U., et al.: Discovery of prenylated flavonoids with dual activity against influenza virus and streptococcus pneumoniae. Sci. Rep. 6, 27156 (2016)
Article Google Scholar
Sharapova, Y.A., Švedas, V.: Molecular modeling of the binding of the allosteric inhibitor optactin at a new binding site in neuraminidase a from streptococcus pneumoniae. Mosc. Univ. Chem. Bull. 73(5), 205–211 (2018)
Article Google Scholar
Sharapova, Y., Suplatov, D., Švedas, V.: Neuraminidase a from streptococcus pneumoniae has a modular organization of catalytic and lectin domains separated by a flexible linker. FEBS J. 285(13), 2428–2445 (2018)
Article Google Scholar
Walther, E., et al.: Dual acting neuraminidase inhibitors open new opportunities to disrupt the lethal synergism between streptococcus pneumoniae and influenza virus. Frontiers Microbiol. 7, 357 (2016)
Article Google Scholar
Xu, Z., et al.: Sequence diversity of nana manifests in distinct enzyme kinetics and inhibitor susceptibility. Sci. Rep. 6, 25169 (2016)
Article Google Scholar
Karasev, D., Veselovsky, A., Lagunin, A., Filimonov, D., Sobolev, B.: Determination of amino acid residues responsible for specific interaction of protein kinases with small molecule inhibitors. Mol. Biol. 52(3), 478–487 (2018)
Article Google Scholar
Korbee, C.J., et al.: Combined chemical genetics and data-driven bioinformatics approach identifies receptor tyrosine kinase inhibitors as host-directed antimicrobials. Nat. Commun. 9(1), 358 (2018)
Article Google Scholar
Song, J., et al.: Phosphopredict: a bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci. Rep. 7(1), 6862 (2017)
Article Google Scholar
Suplatov, D., Kopylov, K., Sharapova, Y., Švedas, V.: Human p38\(\alpha \) mitogen-activated protein kinase in the Asp168-Phe169-Gly170-in (DFG-in) state can bind allosteric inhibitor doramapimod. J. Biomol. Struct. Dyn. 37(8), 2049–2060 (2019)
Article Google Scholar
Consortium, U.: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506–D515 (2018)
Article Google Scholar
Burley, S.K., Berman, H.M., Kleywegt, G.J., Markley, J.L., Nakamura, H., Velankar, S.: Protein Data Bank (PDB): the single global macromolecular structure archive. In: Wlodawer, A., Dauter, Z., Jaskolski, M. (eds.) Protein Crystallography. MMB, vol. 1607, pp. 627–641. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-7000-1_26
Chapter Google Scholar
Sadovnichy, V., Tikhonravov, A., Voevodin, V., Opanasenko, V.I.: “Lomonosov”: supercomputing at Moscow State University. Contemporary High Performance Computing: From Petascale toward Exascale (Chapman & Hall/CRC Computational Science), pp. 283–307 (2013)
Google Scholar
Krissinel, E., Henrick, K.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. Sect. D: Biol. Crystallogr. 60(12), 2256–2268 (2004)
Article Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)
Article Google Scholar
Suplatov, D., Popova, N., Zhumatiy, S., Voevodin, V., Švedas, V.: Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer. J. Bioinf. Comput. Biol. 14(02), 1641008 (2016)
Article Google Scholar
Obe, R.O., Hsu, L.S.: PostgreSQL: Up and Running: A Practical Guide to the Advanced Open Source Database. O’Reilly Media Inc., Sebastopol (2017)
Google Scholar
Shegay, M.V., Suplatov, D.A., Popova, N.N., Švedas, V.K., Voevodin, V.V.: parMATT: parallel multiple alignment of protein 3D-structures with translations and twists for distributed-memory systems. Bioinformatics 35(21), 4456–4458 (2019)
Article Google Scholar
Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 4(1), e10 (2008)
Article MathSciNet Google Scholar
Kalaimathy, S., Sowdhamini, R., Kanagarajadurai, K.: Critical assessment of structure-based sequence alignment methods at distant relationships. Briefings Bioinf. 12(2), 163–175 (2011)
Article Google Scholar
Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2010)
Article Google Scholar
Katoh, K., Standley, D.M.: Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)
Article Google Scholar
Söding, J., Biegert, A., Lupas, A.N.: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33(suppl-2), W244–W248 (2005)
Article Google Scholar
Fischer, J., Mayer, C.E., Söding, J.: Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24(5), 613–620 (2008)
Article Google Scholar
Nobile, M.S., Cazzaniga, P., Tangherloni, A., Besozzi, D.: Graphics processing units in bioinformatics, computational biology and systems biology. Briefings Bioinf. 18(5), 870–885 (2016)
Google Scholar
Vega-Rodríguez, M.A., Rubio-Largo, A.: Parallelism in computational biology: a view from diverse high-performance computing applications. Int. J. High Perform. Comput. Appl. 32(3), 317–320 (2018)
Article Google Scholar
Götz, A.W., Williamson, M.J., Xu, D., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with amber on GPUs. 1. Generalized born. J. Chem. Theor. Comput. 8(5), 1542–1555 (2012)
Article Google Scholar
Salomon-Ferrer, R., Götz, A.W., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theor. Comput. 9(9), 3878–3888 (2013)
Article Google Scholar
Sharapova, Y.A., Suplatov, D.A., Švedas, V.K.: Simulating the long-timescale structural behavior of bacterial and influenza neuraminidases with different HPC resources. Supercomput. Frontiers Innovations 5(3), 30–33 (2018)
Google Scholar
Suplatov, D., Sharapova, Y., Popova, N., Kopylov, K., Voevodin, V., Švedas, V.: Molecular dynamics in the force field FF14SB in water TIP4P-EW, and in the force field FF15IPQ in water SPC/EB: a comparative analysis on GPU and CPU (in Russian). Bull. South Ural State University Ser. Comput. Math. Softw. Eng. 8(1), 71–88 (2019)
Google Scholar
Imbernón, B., Prades, J., Giménez, D., Cecilia, J.M., Silla, F.: Enhancing large-scale docking simulation on heterogeneous systems: an MPI vs rCUDA study. Future Gener. Comput. Syst. 79, 26–37 (2018)
Article Google Scholar
Prakhov, N.D., Chernorudskiy, A.L., Gainullin, M.R.: VSDocker: a tool for parallel high-throughput virtual screening using autodock on windows-based computer clusters. Bioinformatics 26(10), 1374–1375 (2010)
Article Google Scholar
Suplatov, D., Timonina, D., Sharapova, Y., Švedas, V.: Yosshi: a web-server for disulfide engineering by bioinformatic analysis of diverse protein families. Nucleic Acids Res. 47(W1), 308–314 (2019)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Russian Foundation for Basic Research grant #18-29-13060 and carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University supported by the project RFMEFI62117X0011 [47].

Author information

Authors and Affiliations

Lomonosov Moscow State University, Moscow, Russia
Dmitry Suplatov, Yana Sharapova, Maxim Shegay, Nina Popova, Vladimir Voevodin & Vytas Švedas
Institute of Organic Chemistry, Graz University of Technology, Graz, Austria
Kateryna Fesko

Authors

Dmitry Suplatov
View author publications
You can also search for this author in PubMed Google Scholar
Yana Sharapova
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Shegay
View author publications
You can also search for this author in PubMed Google Scholar
Nina Popova
View author publications
You can also search for this author in PubMed Google Scholar
Kateryna Fesko
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Voevodin
View author publications
You can also search for this author in PubMed Google Scholar
Vytas Švedas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Suplatov .

Editor information

Editors and Affiliations

Research Computing Center, Moscow State University, Moscow, Russia
Vladimir Voevodin
Research Computing Center, Moscow State University, Moscow, Russia
Sergey Sobolev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suplatov, D. et al. (2019). High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2019. Communications in Computer and Information Science, vol 1129. Springer, Cham. https://doi.org/10.1007/978-3-030-36592-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-36592-9_21
Published: 10 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36591-2
Online ISBN: 978-3-030-36592-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics