Abstract
Enzyme functional annotation has been a challenging problem in Bioinformatics for many years now, with Deep Learning recently appearing as an efficient alternative. Here, the use of recurrent neural networks, trained from sequential data and boosted by the use of attention mechanisms, is analysed. We assess the consequences of the choice of different parameters, as the length of the sequence and type of truncation, often not mentioned in previous studies. We also compare the use of different aminoacid encoding schemes to describe the protein, using one-hot, z-scales and Blosum62 encodings, as well as embedding layers. Lastly, we try to understand what the network is learning and inferring. Our results show that for enzyme classification, networks formed with Bidirectional recurrent layers and attention lead to better results. In addition, using simpler encoding schemes (e.g. one-hot) leads to higher performance. Using attention and embedding layers, we demonstrate that the model is capable of learning biological meaningful representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015)
Almagro Armenteros, J.J., et al.: DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics 33(21), 3387–3395 (2017). https://doi.org/10.1093/bioinformatics/btx431
Amidi, A., et al.: EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation. PeerJ 2018(5), 1–18 (2018). https://doi.org/10.7717/peerj.4750
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, pp. 1–15 (2015)
Bileschi, M.L., et al.: Using deep learning to annotate the protein universe. bioRxiv, pp. 1–29 (2019). https://doi.org/10.1101/626507
Chollet, F., et al.: Keras (2015)
Dalkiran, A., Rifaioglu, A.S., Martin, M.J., Cetin-Atalay, R., Atalay, V., Doğan, T.: ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinform. 19(1), 1–13 (2018). https://doi.org/10.1186/s12859-018-2368-y
Elabd, H., et al.: Amino acid encoding for deep learning applications. BMC Bioinform. 21(1), 1–14 (2020). https://doi.org/10.1186/s12859-020-03546-x
Gao, R., et al.: Prediction of enzyme function based on three parallel deep CNN and amino acid mutation. Int. J. Mol. Sci. 20(11) (2019). https://doi.org/10.3390/ijms20112845
Guo, Y., et al.: DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction. BMC Bioinform. 20(1), 1–12 (2019). https://doi.org/10.1186/s12859-019-2940-0
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)
Li, S., Chen, J., Liu, B.: Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform. 18(1), 1–8 (2017). https://doi.org/10.1186/s12859-017-1842-2
Li, Y., et al.: DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018). https://doi.org/10.1093/bioinformatics/btx680
Liu, J., Gong, X.: Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinform. 20(1), 1–11 (2019). https://doi.org/10.1186/s12859-019-3199-1
Lopez-del Rio, A., Martin, M., Perera-Lluna, A., Saidi, R.: Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction. Sci. Rep. 10(1), 1–14 (2020). https://doi.org/10.1038/s41598-020-71450-8
Raimondi, D., Orlando, G., Vranken, W.F., Moreau, Y.: Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci. Rep. 9(1), 1–11 (2019). https://doi.org/10.1038/s41598-019-53324-w
Ryu, J.Y., Kim, H.U., Lee, S.Y.: Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. U. S. A. 116(28), 13996–14001 (2019). https://doi.org/10.1073/pnas.1821905116
Sandberg, M., et al.: New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J. Med. Chem. 41(14), 2481–2491 (1998). https://doi.org/10.1021/jm9700575
Schwartz, A.S., et al.: Deep semantic protein representation for annotation, discovery, and engineering. bioRxiv (2018). https://doi.org/10.1101/365965
Semwal, R., Aier, I., Tyagi, P., Varadwaj, P.K.: DeEPn: a deep neural network based tool for enzyme functional annotation. J. Biomol. Struct. Dyn. (2020). https://doi.org/10.1080/07391102.2020.1754292
Sequeira, A.M., Lousa, D., Rocha, M.: ProPythia: a python automated platform for the classification of proteins using machine learning. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M., Casado-Vara, R. (eds.) Practical Applications of Computational Biology & Bioinformatics. AISC, vol. 1240, pp. 32–41. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54568-0_4
Shi, Q., et al.: Deep learning for mining protein data. Brief. Bioinform. 1–25 (2019). https://doi.org/10.1093/bib/bbz156
Strodthoff, N., Wagner, P., Wenzel, M., Samek, W.: UDSMProt: universal deep sequence models for protein classification. Bioinformatics 36(8), 2401–2409 (2020). https://doi.org/10.1093/bioinformatics/btaa003
Suzek, B.E., et al.: UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31(6), 926–932 (2015). https://doi.org/10.1093/bioinformatics/btu739
Van Westen, G.J., et al.: Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J. Cheminform. 5(9), 1–11 (2013). https://doi.org/10.1186/1758-2946-5-42
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp. 5999–6009 (2017)
Vig, J., et al.: BERTology meets biology: interpreting attention in protein language models. bioRxiv (2020). https://doi.org/10.1101/2020.06.26.174417
Zou, Z., Tian, S., Gao, X., Li, Y.: mlDEEPre: multi-functional enzyme function prediction with hierarchical multi-label deep learning. Front. Genet. 10, 1–10 (2019). https://doi.org/10.3389/fgene.2018.00714
Acknowledgements
This study was supported by the European Regional Development Fund under the scope of Norte2020, through the project DeepBio (ref. NORTE-01-0247-FEDER-039831). This study was also supported by the PhD scholarship with reference 2020.07867.BD, granted by the Portuguese Foundation for Science and Technology and the European social fund under the scope of Norte2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sequeira, A.M., Rocha, M. (2022). Recurrent Deep Neural Networks for Enzyme Functional Annotation. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-86258-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86257-2
Online ISBN: 978-3-030-86258-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)