Abstract
Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beerens, K., et al.: Evolutionary analysis as a powerful complement to energy calculations for protein stabilization. ACS Catal. 8(10), 9420–9428 (2018)
Bornscheuer, U.T.: The fourth wave of biocatalysis is approaching. Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci. 376(2110), 20170063 (2017)
Buß, O., Buchholz, P.C., Gräff, M., Klausmann, P., Rudat, J., Pleiss, J.: The \(\omega \)-transaminase engineering database (oTAED): a navigation tool in protein sequence and structure space. Proteins Struct. Funct. Bioinf. 86(5), 566–580 (2018)
Hendrikse, N.M., Charpentier, G., Nordling, E., Syrén, P.O.: Ancestral diterpene cyclases show increased thermostability and substrate acceptance. FEBS J. 285(24), 4660–4673 (2018)
Lutz, S., Iamurri, S.M.: Protein engineering: past, present, and future. In: Bornscheuer, U.T., Höhne, M. (eds.) Protein Engineering. MMB, vol. 1685, pp. 1–12. Springer, New York (2018). https://doi.org/10.1007/978-1-4939-7366-8_1
Pellis, A., Cantone, S., Ebert, C., Gardossi, L.: Evolving biocatalysis to meet bioeconomy challenges and opportunities. New Biotechnol. 40, 154–169 (2018)
Suplatov, D., Voevodin, V., Švedas, V.: Robust enzyme design: bioinformatic tools for improved protein stability. Biotechnol. J. 10(3), 344–355 (2015)
Armougom, F., et al.: Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-coffee. Nucleic Acids Res. 34(suppl-2), W604–W608 (2006)
Krieger, E., Vriend, G.: YASARA view–molecular graphics for all devices–from smartphones to workstations. Bioinformatics 30(20), 2981–2982 (2014)
Kuipers, R.K., et al.: 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins Struct. Funct. Bioinf. 78(9), 2101–2113 (2010)
Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9), 1073–1079 (2007)
Pie, J., Kim, B., Grishin, N.: PROMALS3D: a tool for multiple sequence and structure alignment. Nucleic Acids Res. 36(7), 2295–2300 (2008)
Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Švedas, V.K.: Mustguseal: a server for multiple structure-guided sequence alignment of protein families. Bioinformatics 34(9), 1583–1585 (2018)
Pleiss, J.: Systematic analysis of large enzyme families: identification of specificity-and selectivity-determining hotspots. ChemCatChem 6(4), 944–950 (2014)
Sumbalova, L., Stourac, J., Martinek, T., Bednar, D., Damborsky, J.: Hotspot wizard 30: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 46(W1), W356–W362 (2018)
Suplatov, D., Kirilin, E., Arbatsky, M., Takhaveev, V., Švedas, V.: pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families. Nucleic Acids Res. 42(W1), W344–W349 (2014)
Suplatov, D., Kirilin, E., Švedas, V.: Bioinformatic analysis of protein families to select function-related variable positions. In: Understanding Enzymes, pp. 375–410. Pan Stanford (2016)
Suplatov, D., Kirilin, E., Takhaveev, V., Švedas, V.: Zebra: a web server for bioinformatic analysis of diverse protein families. J. Biomol. Struct. Dyn. 32(11), 1752–1758 (2014)
Suplatov, D., Shalaeva, D., Kirilin, E., Arzhanik, V., Švedas, V.: Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J. Biomol. Struct. Dyn. 32(1), 75–87 (2014)
Suplatov, D., Sharapova, Y., Timonina, D., Kopylov, K., Švedas, V.: The visualcmat: a web-server to select and interpret correlated mutations/co-evolving residues in protein families. J. Bioinf. Comput. Biol. 16(02), 1840005 (2018)
Fesko, K., Suplatov, D., Švedas, V.: Bioinformatic analysis of the fold type I PLP-dependent enzymes reveals determinants of reaction specificity in l-threonine aldolase from Aeromonas jandaei. FEBS Open Bio 8(6), 1013–1028 (2018)
Genz, M., et al.: Engineering the Amine Transaminase from Vibrio fluvialis towards Branched-Chain substrates. ChemCatChem 8(20), 3199–3202 (2016)
Steffen-Munsberg, F., et al.: Bioinformatic analysis of a PLP-dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol. Adv. 33(5), 566–604 (2015)
Knight, A.M., et al.: Bioinformatic analysis of fold-type III PLP-dependent enzymes discovers multimeric racemases. Appl. Microbiol. Biotechnol. 101(4), 1499–1507 (2017)
Bezsudnova, E.Y., et al.: Biochemical and structural insights into PLP fold type IV transaminase from thermobaculum terrenum. Biochimie 158, 130–138 (2019)
Bezsudnova, E.Y., Dibrova, D.V., Nikolaeva, A.Y., Rakitina, T.V., Popov, V.O.: Identification of branched-chain amino acid aminotransferases active towards (R)-(+)-1-phenylethylamine among PLP fold type IV transaminases. J. Biotechnol. 271, 26–28 (2018)
Bezsudnova, E.Y., Stekhanova, T.N., Suplatov, D.A., Mardanov, A.V., Ravin, N.V., Popov, V.O.: Experimental and computational studies on the unusual substrate specificity of branched-chain amino acid aminotransferase from thermoproteus uzoniensis. Arch. Biochem. Biophys. 607, 27–36 (2016)
Jochens, H., Aerts, D., Bornscheuer, U.T.: Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Eng. Des. Sel. 23(12), 903–909 (2010)
Kourist, R., et al.: The \(\alpha \)/\(\beta \)-hydrolase fold 3DM database (ABHDB) as a tool for protein engineering. ChemBioChem 11(12), 1635–1643 (2010)
Pleiss, J., Fischer, M., Peiker, M., Thiele, C., Schmid, R.D.: Lipase engineering database: understanding and exploiting sequence-structure-function relationships. J. Mol. Catal. B Enzym. 10(5), 491–508 (2000)
Rauwerdink, A., Kazlauskas, R.J.: How the same core catalytic machinery catalyzes 17 different reactions: the serine-histidine-aspartate catalytic triad of \(\alpha \)/\(\beta \)-hydrolase fold enzymes. ACS Catal. 5(10), 6153–6176 (2015)
Suplatov, D., Besenmatter, W., Švedas, V., Svendsen, A.: Bioinformatic analysis of alpha/beta-hydrolase fold enzymes reveals subfamily-specific positions responsible for discrimination of amidase and lipase activities. Protein Eng. Des. Sel. 25(11), 689–697 (2012)
Widmann, M., Juhl, P.B., Pleiss, J.: Structural classification by the Lipase Engineering Database: a case study of Candida antarctica lipase A. BMC Genom. 11(1), 123 (2010)
Deaguero, A.L., Blum, J.K., Bommarius, A.S.: Biocatalytic synthesis of \(\beta \)-lactam antibiotics. Encycl. Ind. Biotechnol. Bioprocess Bioseparation Cell Technol., 1–18 (2009)
Suplatov, D., Panin, N., Kirilin, E., Shcherbakova, T., Kudryavtsev, P., Švedas, V.: Computational design of a pH stable enzyme: understanding molecular mechanism of penicillin acylase’s adaptation to alkaline conditions. PLoS ONE 9(6), e100643 (2014)
Grienke, U., et al.: Discovery of prenylated flavonoids with dual activity against influenza virus and streptococcus pneumoniae. Sci. Rep. 6, 27156 (2016)
Sharapova, Y.A., Švedas, V.: Molecular modeling of the binding of the allosteric inhibitor optactin at a new binding site in neuraminidase a from streptococcus pneumoniae. Mosc. Univ. Chem. Bull. 73(5), 205–211 (2018)
Sharapova, Y., Suplatov, D., Švedas, V.: Neuraminidase a from streptococcus pneumoniae has a modular organization of catalytic and lectin domains separated by a flexible linker. FEBS J. 285(13), 2428–2445 (2018)
Walther, E., et al.: Dual acting neuraminidase inhibitors open new opportunities to disrupt the lethal synergism between streptococcus pneumoniae and influenza virus. Frontiers Microbiol. 7, 357 (2016)
Xu, Z., et al.: Sequence diversity of nana manifests in distinct enzyme kinetics and inhibitor susceptibility. Sci. Rep. 6, 25169 (2016)
Karasev, D., Veselovsky, A., Lagunin, A., Filimonov, D., Sobolev, B.: Determination of amino acid residues responsible for specific interaction of protein kinases with small molecule inhibitors. Mol. Biol. 52(3), 478–487 (2018)
Korbee, C.J., et al.: Combined chemical genetics and data-driven bioinformatics approach identifies receptor tyrosine kinase inhibitors as host-directed antimicrobials. Nat. Commun. 9(1), 358 (2018)
Song, J., et al.: Phosphopredict: a bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci. Rep. 7(1), 6862 (2017)
Suplatov, D., Kopylov, K., Sharapova, Y., Švedas, V.: Human p38\(\alpha \) mitogen-activated protein kinase in the Asp168-Phe169-Gly170-in (DFG-in) state can bind allosteric inhibitor doramapimod. J. Biomol. Struct. Dyn. 37(8), 2049–2060 (2019)
Consortium, U.: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506–D515 (2018)
Burley, S.K., Berman, H.M., Kleywegt, G.J., Markley, J.L., Nakamura, H., Velankar, S.: Protein Data Bank (PDB): the single global macromolecular structure archive. In: Wlodawer, A., Dauter, Z., Jaskolski, M. (eds.) Protein Crystallography. MMB, vol. 1607, pp. 627–641. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-7000-1_26
Sadovnichy, V., Tikhonravov, A., Voevodin, V., Opanasenko, V.I.: “Lomonosov”: supercomputing at Moscow State University. Contemporary High Performance Computing: From Petascale toward Exascale (Chapman & Hall/CRC Computational Science), pp. 283–307 (2013)
Krissinel, E., Henrick, K.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. Sect. D: Biol. Crystallogr. 60(12), 2256–2268 (2004)
Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)
Suplatov, D., Popova, N., Zhumatiy, S., Voevodin, V., Švedas, V.: Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer. J. Bioinf. Comput. Biol. 14(02), 1641008 (2016)
Obe, R.O., Hsu, L.S.: PostgreSQL: Up and Running: A Practical Guide to the Advanced Open Source Database. O’Reilly Media Inc., Sebastopol (2017)
Shegay, M.V., Suplatov, D.A., Popova, N.N., Švedas, V.K., Voevodin, V.V.: parMATT: parallel multiple alignment of protein 3D-structures with translations and twists for distributed-memory systems. Bioinformatics 35(21), 4456–4458 (2019)
Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 4(1), e10 (2008)
Kalaimathy, S., Sowdhamini, R., Kanagarajadurai, K.: Critical assessment of structure-based sequence alignment methods at distant relationships. Briefings Bioinf. 12(2), 163–175 (2011)
Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2010)
Katoh, K., Standley, D.M.: Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)
Söding, J., Biegert, A., Lupas, A.N.: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33(suppl-2), W244–W248 (2005)
Fischer, J., Mayer, C.E., Söding, J.: Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24(5), 613–620 (2008)
Nobile, M.S., Cazzaniga, P., Tangherloni, A., Besozzi, D.: Graphics processing units in bioinformatics, computational biology and systems biology. Briefings Bioinf. 18(5), 870–885 (2016)
Vega-Rodríguez, M.A., Rubio-Largo, A.: Parallelism in computational biology: a view from diverse high-performance computing applications. Int. J. High Perform. Comput. Appl. 32(3), 317–320 (2018)
Götz, A.W., Williamson, M.J., Xu, D., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with amber on GPUs. 1. Generalized born. J. Chem. Theor. Comput. 8(5), 1542–1555 (2012)
Salomon-Ferrer, R., Götz, A.W., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theor. Comput. 9(9), 3878–3888 (2013)
Sharapova, Y.A., Suplatov, D.A., Švedas, V.K.: Simulating the long-timescale structural behavior of bacterial and influenza neuraminidases with different HPC resources. Supercomput. Frontiers Innovations 5(3), 30–33 (2018)
Suplatov, D., Sharapova, Y., Popova, N., Kopylov, K., Voevodin, V., Švedas, V.: Molecular dynamics in the force field FF14SB in water TIP4P-EW, and in the force field FF15IPQ in water SPC/EB: a comparative analysis on GPU and CPU (in Russian). Bull. South Ural State University Ser. Comput. Math. Softw. Eng. 8(1), 71–88 (2019)
Imbernón, B., Prades, J., Giménez, D., Cecilia, J.M., Silla, F.: Enhancing large-scale docking simulation on heterogeneous systems: an MPI vs rCUDA study. Future Gener. Comput. Syst. 79, 26–37 (2018)
Prakhov, N.D., Chernorudskiy, A.L., Gainullin, M.R.: VSDocker: a tool for parallel high-throughput virtual screening using autodock on windows-based computer clusters. Bioinformatics 26(10), 1374–1375 (2010)
Suplatov, D., Timonina, D., Sharapova, Y., Švedas, V.: Yosshi: a web-server for disulfide engineering by bioinformatic analysis of diverse protein families. Nucleic Acids Res. 47(W1), 308–314 (2019)
Acknowledgements
This work was supported by the Russian Foundation for Basic Research grant #18-29-13060 and carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University supported by the project RFMEFI62117X0011 [47].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Suplatov, D. et al. (2019). High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2019. Communications in Computer and Information Science, vol 1129. Springer, Cham. https://doi.org/10.1007/978-3-030-36592-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-36592-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36591-2
Online ISBN: 978-3-030-36592-9
eBook Packages: Computer ScienceComputer Science (R0)