A hybrid deep learning model for classification of plant transcription factor proteins

Öncül, Ali Burak; Çelik, Yüksel

doi:10.1007/s11760-022-02419-5

A hybrid deep learning model for classification of plant transcription factor proteins

Original Paper
Published: 01 December 2022

Volume 17, pages 2055–2061, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

484 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Studies on the amino acid sequences, protein structure, and the relationships of amino acids are still a large and challenging problem in biology. Although bioinformatics studies have progressed in solving these problems, the relationship between amino acids and determining the type of protein formed by amino acids are still a problem that has not been fully solved. This problem is why the use of some of the available protein sequences is also limited. This study proposes a hybrid deep learning model to classify amino acid sequences of unknown species using the amino acid sequences in the plant transcription factor database. The model achieved 98.23% success rate in the tests performed. With the hybrid model created, transcription factor proteins in the plant kingdom can be easily classified. The fact that the model is hybrid has made its layers lighter. The training period has decreased, and the success has increased. When tested with a bidirectional LSTM produced with a similar dataset to our dataset and a ResNet-based ProtCNN model, a CNN model, the proposed model was more successful. In addition, we found that the hybrid model we designed by creating vectors with Word2Vec is more successful than other LSTM or CNN-based models. With the model we have prepared, other proteins, especially transcription factor proteins, will be classified, thus enabling species identification to be carried out efficiently and successfully. The use of such a triplet hybrid structure in classifying plant transcription factors stands out as an innovation brought to the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Classification of Family Domain of Amino Acid Sequences Using CNN-LSTM

sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs

Article 27 January 2023

DeepReg: a deep learning hybrid model for predicting transcription factors in eukaryotic and prokaryotic genomes

Article Open access 21 April 2024

Data availability

After the article is published, the data set can be accessed by e-mailing the corresponding author.

References

Acar, N., Gündeğer, E., Selçuki, C.: Protein yapı analizleri. In: Baloğlu, M.C. (ed.) Biyoinformatik Temelleri Ve Uygulamaları, pp. 85–128. Pegem Akademi Yayıncılık, Kastamonu (2018)
Google Scholar
Petrey, D., Honig, B.: Is protein classification necessary? towards alternative approaches to function annotation. Curr. Opin. Struct. Biol. 19(3), 363–368 (2009)
Article Google Scholar
Baldi, P., Brunak, S.: Bioinformatics: the machine learning approach. The MIT Press, London (2001)
MATH Google Scholar
Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Article Google Scholar
Gromiha, M.M.: Chapter 2 - protein sequence analysis. In: Protein Bioinformatics. pp. 29–62. Academic Press, Tokyo (2010)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local aligment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Shen, H.-B., Chou, K.-C.: Ezypred: A top-down approach for predicting enzyme functional classes and subclasses. Biochem. Biophys. Res. Commun. 364(1), 53–59 (2007)
Article Google Scholar
Cozzetto, D., Minneci, F., Currant, H., Jones, D.T.: Ffpred 3: feature-based function prediction for all gene ontology domains. Sci. Rep. 6, 1–11 (2016)
Article Google Scholar
Dalkıran, A., Rifaioğlu, A.S., Martin, M.J., Çetin, A.R., Atalay, V., Doğan, T.: Ecpred: a tool for the prediction of the enzymatic functions of protein sequences based on the ec nomenclature. BMC Bioinf. 19, 1–13 (2018)
Article Google Scholar
Gong, Q., Ning, W., Tian, W.: Gofdr: A sequence alignment based method for predicting protein functions. Methods 93(2), 3–14 (2016)
Article Google Scholar
Asgari, E., Mofrad, M.R.K.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), 1–15 (2015)
Article Google Scholar
Naveenkumar, K.S., R., M.H.B., Vinayakumar, R., Soman, K.P.: Protein family classification using deep learning. Preprint at https://www.biorxiv.org/content/10.1101/414128v2 (2018)
Strodthoff, N., Wagner, P., Wenzel, M., Samek, W.: Udsmprot: universal deep sequence models for protein classification. Bioinformatics 36(8), 2401–2409 (2020)
Article Google Scholar
Le, N.Q.K., Yapp, E.K.Y., Nagasundaram, N., Chua, M.C.H., Yeh, H.-Y.: Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture. Comput. Struct. Biotechnol. J. 17, 1245–1254 (2009)
Article Google Scholar
Li, S., Chen, J., Liu, B.: Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinf. 18, 1–8 (2017)
Article Google Scholar
Bileschi, M.L., Belanger, D., Bryant, D., Sanderson, T., Carter, D.B., Sculley DePristo, M.A., Colwell, L.J.: Using deep learning to annotate the protein universe. Nat. Biotechnol. 40(6), 932–937 (2022)
Article Google Scholar
Rao, R., Bhattacharya, N., Thomas, N., Duan, Y., Chen, X., Canny, J., Abbeel, P., Song, Y.S.: Evaluating protein transfer learning with tape. Adv. Neural Inf. Process. Syst. 32, 9689–9701 (2019)
Google Scholar
Belzen, J.U.Z., Bürgel, T., Holderbach, S., Bubeck, F., Adam, L., Gandor, C., Klein, M., Mathony, J., Pfuderer, P., Platz, L., Przybilla, M., Schwendemann, M., Heid, D., Hoffmann, M.D., Jendrusch, M., Schmelas, C., Waldhauer, M., Lehmann, I., D., N., Eils, R.: The index of general nonlinear DAES. Nat. Mach. Intell. 1, 225–235 (2019)
Torrisi, M., Pollastri, G., Le, Q.: Deep learning methods in protein structure prediction. Comput. Struct. Biotechnol. Jo. 18, 1301–1310 (2020)
Article Google Scholar
Gustafsson, C., Minshull, J., Govindarajan, S., Ness, J., Villalobos, A., Welch, M.: Engineering genes for predictable protein expression. Protein Expr. Purif. 83(1), 37–46 (2012)
Article Google Scholar
Latchman, D.S.: Transcription factors: An overview. Int. J. Biochem. Cell Biol. 29(12), 1305–1312 (1997)
Article Google Scholar
Jin, J., Zhang, H., Kong, L., Gao, G., Luo, J.: Planttfdb 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucl. Acids Res. 42(D1), 1182–1187 (2014)
Article Google Scholar
Jin, J., Tian, F., Yang, D.-C., Meng, Y.-Q., Kong, L., Luo, J., Gao, G.: Planttfdb 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucl. Acids Res. 45(D1), 1040–1045 (2017)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
on Biochemical Nomenclature (CBN), I.-I.C.: A one-letter notation for amino acid sequences tentative rules. European J. Biochem. 7(8), 151–153 (1968)
Ofer, D., Brandes, N., Linial, M.: The language of proteins: Nlp, machine learning & protein sequences. Comput. Struct. Biotechnol. J. 19, 1750–1758 (2021)
Article Google Scholar
Pfam: Family: HLH (PF00010). Available at http://pfam.xfam.org/family/PF00010 (Access date: February 2019)
Schuster-Böckler, B., Schultz, J., Rahmann, S.: Hmm logos for visualization of protein families. BMC Bioinf. 5, 1–8 (2004)
Article Google Scholar
Vries, J.K., Liu, X., Bahar, I.: The relationship between n-gram patterns and protein secondary structure. Proteins 68(4), 830–9838 (2007)
Article Google Scholar
Vries, J.K., Liu, X.: Subfamily specific conservation profiles for proteins based on n-gram patterns. BMC Bioinf. 9, 1–13 (2008)
Article Google Scholar
Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: Lstm: a search space odyssey. Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)
Article MathSciNet Google Scholar
Gao, Y., Glowacka, D.: Deep gate recurrent neural network. In: JMLR: Workshop and Conference Proceedings 63, 350–365 (2016)
Kingma, D.P., Ba, J.L.: ADAM: A Method for Stochastic Optimization. In: Paper presented at International Conference on Learning Representations (ICLR), pp. 7–9 May 2015 (2014)

Download references

Funding

Not applicable.

Author information

Yüksel Çelik contributed equally to this work.

Authors and Affiliations

Department of Computer Engineering, Faculty of Engineering and Architecture, Kastamonu University, 37150, Kastamonu, Turkey
Ali Burak Öncül
Department of Computer Engineering, Faculty of Engineering, Karabük University, 78050, Karabük, Turkey
Ali Burak Öncül & Yüksel Çelik

Authors

Ali Burak Öncül
View author publications
You can also search for this author in PubMed Google Scholar
Yüksel Çelik
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have an equal contribution to the article.

Corresponding author

Correspondence to Ali Burak Öncül.

Ethics declarations

Conflict of interest

The author declares that there is no competing interests related to this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Öncül, A.B., Çelik, Y. A hybrid deep learning model for classification of plant transcription factor proteins. SIViP 17, 2055–2061 (2023). https://doi.org/10.1007/s11760-022-02419-5

Download citation

Received: 23 April 2022
Revised: 15 November 2022
Accepted: 21 November 2022
Published: 01 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11760-022-02419-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid deep learning model for classification of plant transcription factor proteins

Abstract

Access this article

Similar content being viewed by others

Classification of Family Domain of Amino Acid Sequences Using CNN-LSTM

sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs

DeepReg: a deep learning hybrid model for predicting transcription factors in eukaryotic and prokaryotic genomes

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid deep learning model for classification of plant transcription factor proteins

Abstract

Access this article

Similar content being viewed by others

Classification of Family Domain of Amino Acid Sequences Using CNN-LSTM

sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs

DeepReg: a deep learning hybrid model for predicting transcription factors in eukaryotic and prokaryotic genomes

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation