Abstract
For last few decades, sequence arrangement of amino acids have been utilized for the prediction of protein secondary structure. Recent methods have applied high dimensional natural language based features in machine learning models. Performance measures of machine learning based models are significantly affected by data size and data dimensionality. It is a huge challenge to develop a generic model which can be trained to perform both for small and large sized datasets in a low dimensional framework. In the present research, we suggest a low dimensional representation for both small and large sized datasets. A hybrid space of Atchley’s factors II, IV, V, electron ion interaction potential and SkipGram based word2vec have been employed for amino acid sequence representation. Subsequently Stockwell transformation is applied to the representation to preserve features both in time and frequency domains. Finally, deep gated recurrent network with dropout, categorical-cross entropy error estimation and Adam optimization is used for classification purpose. The introduced method results in better prediction accuracies for both small (204,277, and 498) and large sized (PDB25, Protein 640 and FC699) bench mark data sets of low sequence similarity (25–40%). The obtained classification accuracies for PDB25, 640, FC699, 498, 277, 204 datasets are 84.2%, 94.31%, 93.1%, 95.9%, 94.5% and 85.36% respectively. The major contributions in this research is that, for the first time, we verify the protein secondary structural class prediction in a very low dimensional (18-D) feature space with a novel feature representation method. Secondly, we also verify for the first time, the behaviour of deep networks for low dimensional small sized data sets.
Similar content being viewed by others
References
Breda A, Valadares NF, de Souza ON, Garratt RC (2007) Protein structure, modelling and applications
Guo JT, Ellrott K, Xu Y (2008) A historical perspective of template-based protein structure prediction. In: Protein structure prediction. Humana Press, pp 3–42
Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) The protein folding problem. Annu Rev Biophys 37:289–316
Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230
Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261(5561):552
Nakashima H, Nishikawa K, Ooi T (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99(1):153–162
Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct Funct Bioinf 21(4):319–344
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices1. J Mol Biol 292(2):195–202
Wang ZX (2001) The prediction accuracy for protein structural class by the component-coupled method is around 60%. Proteins Struct Funct Genet 43(3):339–340
Luo RY, Feng ZP, Liu JK (2002) Prediction of protein structural class by amino acid and polypeptide composition. FEBS J 269(17):4219–4225
Kurgan LA, Homaeian L (2006) Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recogn 39(12):2323–2343
Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34(5–6):320–327
Yang JY, Peng ZL, Chen X (2010) Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinf 11(1):S9
Garza-Fabre M, Rodriguez-Tello E, Toscano-Pulido G (2015) Constraint-handling through multi-objective optimization: The hydrophobic-polar model for protein structure prediction. Comput Oper Res 53:128–153
Chou KC, Maggiora GM (1998) Domain structural class prediction. Protein Eng 11(7):523–538
Bu WS, Feng ZP, Zhang Z, Zhang CT (1999) Prediction of protein (domain) structural classes based on amino-acid index. FEBS J 266(3):1043–1049
Chou KC (2004) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19
Ding S, Zhang S, Li Y, Wang T (2012) A novel protein structural class prediction method based on predicted secondary structure. Biochimie 94(5):1166–1171
Bursia A, Jaitly N (2017) Next-step conditioned deep convolutional neural networks improve protein secondary structure prediction. arXiv preprint. arXiv:1702.03865
Liu X (2017) Deep recurrent neural network for protein function prediction from sequence. arXiv preprint. arXiv:1701.08318
Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6:18962
Wang Y, Mao H, Yi Z (2017) Protein secondary structure prediction by using deep learning method. Knowl Based Syst 118:115–123
Lee TK, Nguyen T (2016) Protein family classification with neural networks
Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10(11):e0141287
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102(18):6395–6400
Chen KE, Kurgan LA, Ruan J (2008) Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 29(10):1596–1604
Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17(8):729–738
Chou KC (1999) A key driving force in determination of protein structural classes. Biochem Biophys Res Commun 264(1):216–224
Stockwell RG, Mansinha L, Lowe RP (1996) Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 44(4):998–1001
Sejdić E, Djurović I, Jiang J (2009) Time–frequency feature representation using energy concentration: an overview of recent advances. Digit Signal Proc 19(1):153–183
Veljkovic V, Cosic I, Lalovic D (1985) Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng 5:337–341
Bhende CN, Mishra S, Panigrahi BK (2008) Detection and classification of power quality disturbances using S-transform and modular neural network. Electr Power Syst Res 78(1):122–128
Hermans M, Schrauwen B (2013) Training and analysing deep recurrent neural networks. In: Advances in neural information processing systems, pp 190–198
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint. arXiv:1412.3555
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:1412.6980
Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(Aug):pp115–143
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Kudlur M (2016). TensorFlow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283
Chollet F (2017) Deep learning with python. Manning Publications Co., New York
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830
Author information
Authors and Affiliations
Contributions
Both the authors have equal contribution. Both the authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
We declare that we have no competing interests as well as conflict of interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Panda, B., Majhi, B. A novel improved prediction of protein structural class using deep recurrent neural network. Evol. Intel. 14, 253–260 (2021). https://doi.org/10.1007/s12065-018-0171-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-018-0171-3