Skip to main content
Log in

A novel improved prediction of protein structural class using deep recurrent neural network

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

For last few decades, sequence arrangement of amino acids have been utilized for the prediction of protein secondary structure. Recent methods have applied high dimensional natural language based features in machine learning models. Performance measures of machine learning based models are significantly affected by data size and data dimensionality. It is a huge challenge to develop a generic model which can be trained to perform both for small and large sized datasets in a low dimensional framework. In the present research, we suggest a low dimensional representation for both small and large sized datasets. A hybrid space of Atchley’s factors II, IV, V, electron ion interaction potential and SkipGram based word2vec have been employed for amino acid sequence representation. Subsequently Stockwell transformation is applied to the representation to preserve features both in time and frequency domains. Finally, deep gated recurrent network with dropout, categorical-cross entropy error estimation and Adam optimization is used for classification purpose. The introduced method results in better prediction accuracies for both small (204,277, and 498) and large sized (PDB25, Protein 640 and FC699) bench mark data sets of low sequence similarity (25–40%). The obtained classification accuracies for PDB25, 640, FC699, 498, 277, 204 datasets are 84.2%, 94.31%, 93.1%, 95.9%, 94.5% and 85.36% respectively. The major contributions in this research is that, for the first time, we verify the protein secondary structural class prediction in a very low dimensional (18-D) feature space with a novel feature representation method. Secondly, we also verify for the first time, the behaviour of deep networks for low dimensional small sized data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Breda A, Valadares NF, de Souza ON, Garratt RC (2007) Protein structure, modelling and applications

  2. Guo JT, Ellrott K, Xu Y (2008) A historical perspective of template-based protein structure prediction. In: Protein structure prediction. Humana Press, pp 3–42

  3. Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) The protein folding problem. Annu Rev Biophys 37:289–316

    Article  Google Scholar 

  4. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230

    Article  Google Scholar 

  5. Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261(5561):552

    Article  Google Scholar 

  6. Nakashima H, Nishikawa K, Ooi T (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99(1):153–162

    Article  Google Scholar 

  7. Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct Funct Bioinf 21(4):319–344

    Article  Google Scholar 

  8. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices1. J Mol Biol 292(2):195–202

    Article  Google Scholar 

  9. Wang ZX (2001) The prediction accuracy for protein structural class by the component-coupled method is around 60%. Proteins Struct Funct Genet 43(3):339–340

    Article  Google Scholar 

  10. Luo RY, Feng ZP, Liu JK (2002) Prediction of protein structural class by amino acid and polypeptide composition. FEBS J 269(17):4219–4225

    Google Scholar 

  11. Kurgan LA, Homaeian L (2006) Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recogn 39(12):2323–2343

    Article  Google Scholar 

  12. Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34(5–6):320–327

    Article  Google Scholar 

  13. Yang JY, Peng ZL, Chen X (2010) Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinf 11(1):S9

    Article  Google Scholar 

  14. Garza-Fabre M, Rodriguez-Tello E, Toscano-Pulido G (2015) Constraint-handling through multi-objective optimization: The hydrophobic-polar model for protein structure prediction. Comput Oper Res 53:128–153

    Article  MathSciNet  Google Scholar 

  15. Chou KC, Maggiora GM (1998) Domain structural class prediction. Protein Eng 11(7):523–538

    Article  Google Scholar 

  16. Bu WS, Feng ZP, Zhang Z, Zhang CT (1999) Prediction of protein (domain) structural classes based on amino-acid index. FEBS J 266(3):1043–1049

    Google Scholar 

  17. Chou KC (2004) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19

    Article  Google Scholar 

  18. Ding S, Zhang S, Li Y, Wang T (2012) A novel protein structural class prediction method based on predicted secondary structure. Biochimie 94(5):1166–1171

    Article  Google Scholar 

  19. Bursia A, Jaitly N (2017) Next-step conditioned deep convolutional neural networks improve protein secondary structure prediction. arXiv preprint. arXiv:1702.03865

  20. Liu X (2017) Deep recurrent neural network for protein function prediction from sequence. arXiv preprint. arXiv:1701.08318

  21. Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6:18962

    Article  Google Scholar 

  22. Wang Y, Mao H, Yi Z (2017) Protein secondary structure prediction by using deep learning method. Knowl Based Syst 118:115–123

    Article  Google Scholar 

  23. Lee TK, Nguyen T (2016) Protein family classification with neural networks

  24. Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10(11):e0141287

    Article  Google Scholar 

  25. Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605

    MATH  Google Scholar 

  26. Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400

    Article  Google Scholar 

  27. Chen KE, Kurgan LA, Ruan J (2008) Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 29(10):1596–1604

    Article  Google Scholar 

  28. Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17(8):729–738

    Article  Google Scholar 

  29. Chou KC (1999) A key driving force in determination of protein structural classes. Biochem Biophys Res Commun 264(1):216–224

    Article  Google Scholar 

  30. Stockwell RG, Mansinha L, Lowe RP (1996) Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 44(4):998–1001

    Article  Google Scholar 

  31. Sejdić E, Djurović I, Jiang J (2009) Time–frequency feature representation using energy concentration: an overview of recent advances. Digit Signal Proc 19(1):153–183

    Article  Google Scholar 

  32. Veljkovic V, Cosic I, Lalovic D (1985) Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng 5:337–341

  33. Bhende CN, Mishra S, Panigrahi BK (2008) Detection and classification of power quality disturbances using S-transform and modular neural network. Electr Power Syst Res 78(1):122–128

    Article  Google Scholar 

  34. Hermans M, Schrauwen B (2013) Training and analysing deep recurrent neural networks. In: Advances in neural information processing systems, pp 190–198

  35. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint. arXiv:1412.3555

  36. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:1412.6980

  37. Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(Aug):pp115–143

    MathSciNet  MATH  Google Scholar 

  38. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

    Article  MathSciNet  Google Scholar 

  39. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Kudlur M (2016). TensorFlow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283

  40. Chollet F (2017) Deep learning with python. Manning Publications Co., New York

    Google Scholar 

  41. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Both the authors have equal contribution. Both the authors read and approved the final manuscript.

Corresponding authors

Correspondence to Bishnupriya Panda or Babita Majhi.

Ethics declarations

Conflict of interest

We declare that we have no competing interests as well as conflict of interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 31 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panda, B., Majhi, B. A novel improved prediction of protein structural class using deep recurrent neural network. Evol. Intel. 14, 253–260 (2021). https://doi.org/10.1007/s12065-018-0171-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-018-0171-3

Keywords

Navigation