Abstract
Protein succinylation is a novel type of post-translational modification in recent decade years. Experiments verified that it played an important role in biological structure and functions. However, experimental identification of succinylation sites is time-consuming and laborious. Traditional technology cannot meet the rapid growth of the sequence data sets. Therefore, we proposed a new computational method named SuccSPred to predict succinylation sites in a given protein sequence by fusing many kinds of feature representation and ranking method. SuccSPred was implemented based on a two-step strategy. Firstly, linear discriminant analysis was used to reduce feature dimensions to prevent overfitting. Subsequently, the predictor was built based on incrementing features selection binding classifiers to identify succinylation sites. After the comparison of the classifiers using ten-fold cross-validation experiment, the selected model achieved promising improvement. Comparative experiments showed that SuccSPred significantly outperformed previous tools and had the great ability to identify the succinylation sites in given proteins.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Meng, X., et al.: Proteome-wide lysine acetylation identification in developing rice (Oryza sativa) seeds and protein co-modification by acetylation, succinylation, ubiquitination, and phosphorylation. Biochim Biophys Acta Proteins Proteom 1866(3), 451–463 (2018)
Huang, K.Y., et al.: dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res. 47(D1), D298–D308 (2019)
Ao, C., Yu, L., Zou, Q.: Prediction of bio-sequence modifications and the associations with diseases. Brief Funct. Genomics 20(1), 1–18 (2021)
Kawai, Y., et al.: Formation of Nepsilon-(succinyl)lysine in vivo: a novel marker for docosahexaenoic acid-derived protein modification. J. Lipid. Res. 47(7), 1386–1398 (2006)
Xie, L., et al.: First succinyl-proteome profiling of extensively drug-resistant Mycobacterium tuberculosis revealed involvement of succinylation in cellular physiology. J. Proteome Res. 14(1), 107–119 (2015)
Li, F., et al.: PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact. Brief Bioinform. 21(3), 1069–1079 (2020)
Chen, Z., et al.: Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform. 20(6), 2267–2290 (2019)
Zhao, X.W., et al.: Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique. J. Theor. Biol. 374, 60–65 (2015)
Xu, Y., et al.: iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity. Sci. Rep. 5, 10184 (2015)
Xu, H.D., et al.: SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy. Bioinformatics 31(23), 3748–3750 (2015)
Hasan, M.M., et al.: SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Mol. Biosyst. 12(3), 786–795 (2016)
Dehzangi, A., et al.: PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J. Theor. Biol. 425, 97–102 (2017)
Lopez, Y., et al.: Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction. BMC Genomics 19(Suppl 1), 923 (2018)
Lopez, Y., et al.: SucStruct: prediction of succinylated lysine residues by using structural properties of amino acids. Anal. Biochem. 527, 24–32 (2017)
Jia, J., et al.: pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J. Theor. Biol. 394, 223–230 (2016)
Jia, J., et al.: iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal. Biochem. 497, 48–56 (2016)
Dehzangi, A., et al.: Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams. PLoS One 13(2), e0191900 (2018)
Ning, Q., et al.: Detecting succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinform. 19(1), 237 (2018)
Hasan, M.M., Kurata, H.: GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PLoS One 13(10), e0200283 (2018)
Ning, W., et al.: HybridSucc: A Hybrid-learning Architecture for General and Species-specific Succinylation Site Prediction. Genomics Proteomics Bioinform. 18(2), 194–207 (2020)
Thapa, N., et al.: DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction. BMC Bioinform. 21(Suppl 3), 63 (2020)
Ning, Q., et al.: SSKM_Succ: a novel succinylation sites prediction method incorprating K-means clustering with a new semi-supervised learning algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020)
Zhang, L., et al.: Succinylation site prediction based on protein sequences using the IFS-LightGBM (BO) model. Comput. Math. Methods Med. 2020, 8858489 (2020)
Zhu, Y., et al.: Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal. Biochem. 593, 113592 (2020)
Yang, Y., et al.: Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks. BMC Bioinform. 22(1), 171 (2021)
Huang, K.Y., et al.: dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res. 44(D1), D435–D446 (2016)
Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 14, 106 (2013)
Vacic, V., Iakoucheva, L.M., Radivojac, P.: Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12), 1536–1537 (2006)
Liu, B.: BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform. 20(4), 1280–1294 (2019)
Ge, R., et al.: EnACP: an ensemble learning model for identification of anticancer peptides. Front. Genet. 11, 760 (2020)
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS One 12(6), e0177678 (2017)
Narain, D., et al.: Structure learning and the Occam’s razor principle: a new view of human function acquisition. Front. Comput. Neurosci. 8, 121 (2014)
Bureau, A., et al.: Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. 28(2), 171–182 (2005)
Maree, R., Geurts, P., Wehenkel, L.: Random subwindows and extremely randomized trees for image classification in cell biology. BMC Cell Biol. 8(Suppl 1), S2 (2007)
Zhou, C., et al., Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree. PLoS One, 2017. 12(8): p. e0181426.
Sivaraj, S., Malmathanraj, R., Palanisamy, P.: Detecting anomalous growth of skin lesion using threshold-based segmentation algorithm and Fuzzy K-Nearest Neighbor classifier. J. Cancer Res. Ther. 16(1), 40–52 (2020)
Yu, B., et al.: SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36(4), 1074–1081 (2020)
Aydin, Z., et al.: Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure. BMC Bioinform. 12, 154 (2011)
Acknowledgment
This research was supported in part by the National Natural Science Foundation of China (No. 61702146, 61841104), National key research and development program of China (No. 2019YFC0118404), Joint Funds of the Zhejiang Provincial Natural Science Foundation of China (No. U1909210, U20A20386), Zhejiang Provincial Natural Science Foundation of China (No. LY21F020017) and Zhejiang Provincial Science and Technology Program in China (No. 2021C01108).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ge, R. et al. (2021). SuccSPred: Succinylation Sites Prediction Using Fused Feature Representation and Ranking Method. In: Wei, Y., Li, M., Skums, P., Cai, Z. (eds) Bioinformatics Research and Applications. ISBRA 2021. Lecture Notes in Computer Science(), vol 13064. Springer, Cham. https://doi.org/10.1007/978-3-030-91415-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-91415-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91414-1
Online ISBN: 978-3-030-91415-8
eBook Packages: Computer ScienceComputer Science (R0)