Abstract
Lysine malonylation is a newly discovered type of protein post-translational modification, which plays an essential role in many biological activities. A good knowledge of malonylation sites can serve as guidance in solving a large number of biological problems, such as disease diagnosis and drug discovery. There have already been several experimental approaches to identify modification sites, but they are relatively expensive. In this work, we propose three novel machine learning models and utilizes several effective feature description methods. The model is trained based on the cross validation method named Split to Equal Validation (SEV). The experiments show that our model outperforms the others considerably.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mann, M., Jensen, O.N.: Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261 (2003)
Appella, E., Anderson, C.W.: Post-translational modifications and activation of p53 by genotoxic stresses. FEBS J. 268, 2764–2772 (2001)
Walsh, G., Jefferis, R.: Post-translational modifications in the context of therapeutic proteins. Nat. Biotechnol. 24, 1241–1252 (2006)
Westermann, S., Weber, K.: Post-translational modifications regulate microtubule function. Nat. Rev. Mol. Cell Biol. 4, 938–947 (2003)
Keller, J.N., Hanni, K.B., Markesbery, W.R.: Impaired proteasome function in Alzheimer’s disease. J. Neurochem. 75, 436–439 (2001)
Maccioni, R.B., Munoz, J.P., Barbeito, L.: The molecular bases of Alzheimer’s disease and other neurodegenerative disorders. Arch. Med. Res. 32, 367–381 (2001)
Ishigami, A., Maruyama, N.: Importance of research on peptidylarginine deiminase and citrullinated proteins in age-related disease. Geriatr. Gerontol. Int. 10, S53-S58 (2010)
Mangat, P., Wegner, N., Venables, P.J., Potempa, J.: Bacterial and human peptidylarginine deiminases: targets for inhibiting the autoimmune response in rheumatoid arthritis? Arthritis Res. Therapy 12, 209 (2010). https://doi.org/10.1186/ar3000
Schwenzer, A., Jiang, X., Mikuls, T.R., Payne, J.B., Sayles, H., Quirke, A.M., et al.: Identification of an immunodominant peptide from citrullinated tenascin-C as a major target for autoantibodies in rheumatoid arthritis. Ann. Rheum. Dis. 75, 1876–1883 (2016)
Brill, A., Fuchs, T.A., Savchenko, A.S., Thomas, G.M., Martinod, K., De Meyer, S.F., et al.: Neutrophil extracellular traps promote deep vein thrombosis in mice. J. Thromb. Haemost. 10, 136–144 (2012)
Van Venrooij, W.J., Pruijn, G.J.M.: Citrullination: a small change for a protein with great consequences for rheumatoid arthritis. Arthritis Res. Therapy 2, 249–251 (2000)
Guo, Q., Bedford, M.T., Fast, W.: Discovery of peptidylarginine deiminase-4 substrates by protein array: antagonistic citrullination and methylation of human ribosomal protein S2. Mol. BioSyst. 7, 2286–2295 (2011)
Wang, S., Wang, Y.: Peptidylarginine deiminases in citrullination, gene regulation, health and pathogenesis. Biochem. Biophys. Acta. 1829, 1126–1135 (2013)
Bicker, K.L., Subramanian, V., Chumanevich, A.A., Hofseth, L.J., Thompson, P.R.: Seeing citrulline: development of a phenylglyoxal-based probe to visualize protein citrullination. J. Am. Chem. Soc. 134, 17015–17018 (2012)
Stensland, M., Holm, A., Kiehne, A., Fleckenstein, B.: Targeted analysis of protein citrullination using chemical modification and tandem mass spectrometry. Rapid Commun. Mass Spectrom. 23, 2754–2762 (2009)
Hermansson, M., Artemenko, K.A., Ossipova, E., Eriksson, H., Lengqvist, J., Makrygiannakis, D., et al.: MS analysis of rheumatoid arthritic synovial tissue identifies specific citrullination sites on fibrinogen. Proteomics Clin. Appl. 4, 511–518 (2010)
Bao, W., Yang, B., Huang, D., Wang, D., Liu, Q., Chen, Y., et al.: IMKPse: identification of protein malonylation sites by the key features into general PseAAC. IEEE Access 7, 54073–54083 (2019)
Bao, W., Wang, D., Chen, Y.: Classification of protein structure classes on flexible neutral tree. IEEE/ACM Trans. Comput. Biol. Bioinf. 14, 1122–1133 (2017)
Qiu, W., Xiao, X., Xu, Z., Chou, K.: iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget 7, 51270–51283 (2016)
Qiu, W., Sun, B., Xiao, X., Xu, Z., Jia, J., Chou, K.: iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 110, 239–246 (2017)
Gao, J., Tao, X., Zhao, J., Feng, Y., Cai, Y., Zhang, N.: Computational prediction of protein epsilon lysine acetylation sites based on a feature selection method. Comb. Chem. High Throughput Screening 20, 629–637 (2017)
Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., Li, Y.: Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42, 1387–1395 (2012). https://doi.org/10.1007/s00726-011-0835-0
Hasan, M.A.M., Li, J., Ahmad, S., Molla, M.K.I.: predCar-site: carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue. Anal. Biochem. 525, 107–113 (2017)
Cheng, X., Xiao, X., Chou, K.: pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 110, 50–58 (2018)
Bao, W., Yuan, C., Zhang, Y., Han, K., Nandi, A.K., Honig, B., et al.: Mutli-features prediction of protein translational modification sites. IEEE/ACM Trans. Comput. Biol. Bioinf. 15, 1453–1460 (2018)
Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K.: iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal. Biochem. 497, 48–56 (2016)
Xu, Y., Wang, Z., Li, C., Chou, K.: iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Med. Chem. 13, 544–551 (2017)
Li, B., Hu, L., Niu, S., Cai, Y., Chou, K.: Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches. J. Proteomics 75, 1654–1665 (2012)
Zhang, Q., Sun, X., Feng, K., Wang, S., Zhang, Y., Wang, S., et al.: Predicting citrullination sites in protein sequences using mRMR method and random forest algorithm. Comb. Chem. High Throughput Screening 20, 164–173 (2017)
Bao, W., Yang, B., Bao, R., Chen, Y.: LipoFNT: lipoylation sites identification with flexible neural tree. Complexity 2019, 1–9 (2019)
Bao, W., Yang, B., Li, D., Li, Z., Zhou, Y., Bao, R.: CMSENN: computational modification sites with ensemble neural network. Chemometr. Intell. Lab. Syst. 185, 65–72 (2019)
Shao, J., Xu, D., Tsai, S.N., Wang, Y., Ngai, S.M.: Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS ONE 4, e4920 (2009)
Bao, W., Chen, Y., Wang, D.: Prediction of protein structure classes with flexible neural tree. Biomed. Mater. Eng. 24, 3797–3806 (2014)
Szilágyi, A., Skolnick, J.: Efficient prediction of nucleic acid binding function from low-resolution protein structures. J. Mol. Biol. 358, 922–933 (2006)
Kumar, K.K., Pugalenthi, G., Suganthan, P.N.: DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn. 26, 679–686 (2009)
Lin, W.Z., Fang, J.A., Xiao, X., Chou, K.C.: iDNA-prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6, e24756 (2011)
Song, L., Li, D., Zeng, X. et al.: nDNA-prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinform. 15, 298 (2014). https://doi.org/10.1186/1471-2105-15-298
Shi, S.P., Qiu, J.D., Sun, X.Y., Suo, S.B., Huang, S.Y., Liang, R.P.: PLMLA: prediction of lysine methylation and lysine acetylation by combining multiple features. Mol. BioSyst. 8, 1520–1527 (2012)
Florian, G., Shubin, R., Chunaram, C., Jürgen, C., Matthias, M.: Predicting post-translational lysine acetylation using support vector machines. Bioinformatics 26, 1666 (2010)
Li, S., Li, H., Li, M., Shyr, Y., Xie, L., Li, Y.: Improved prediction of lysine acetylation by support vector machines. Protein Peptide Lett. 16, 977–983 (2009)
Xu, Y., Wang, X.B., Ding, J., Wu, L.Y., Deng, N.Y.: Lysine acetylation sites prediction using an ensemble of support vector machine classifiers. J. Theor. Biol. 264, 130–135 (2010)
Suo, S.B., Qiu, J.D., Shi, S.P., Sun, X.Y., Huang, S.Y., Chen, X., et al.: Position-specific analysis and prediction for protein lysine acetylation based on multiple features. PLoS ONE 7, e49108 (2012)
Shao, J., Xu, D., Hu, L., Kwan, Y.W., Wang, Y., Kong, X., et al.: Systematic analysis of human lysine acetylation proteins and accurate prediction of human lysine acetylation through bi-relative adapted binomial score Bayes feature representation. Mol. BioSyst. 8, 2964–2973 (2012)
Li, Y., Wang, M., Wang, H., Tan, H., Zhang, Z., Webb, G.I., et al.: Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. Sci. Rep. 4, 5765 (2014)
Acknowledgments
This work was supported by the grants of the National Science Foundation of China, Nos. 61902337, 61702445, and the grant from the Ph.D. Programs Foundation of Ministry of Education of China (No. 20120072110040). The Shandong Provincial Natural Science Foundation, China (No. ZR2018LF005).
Author information
Authors and Affiliations
Contributions
Data Availability
To data used to support the findings of this study are available from the corresponding author upon request.
Author Contribution Statement
W.B. conceived the method. Z.L designed the method. B.Y. designed the website of this algorithm. Y.Z. conducted the experiments and W.B. wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Competing Interests
The authors declare no competing interests.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, J., Bao, W., Cao, Y., Chen, Y. (2020). Classification of Protein Modification Sites with Machine Learning. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2020. Lecture Notes in Computer Science(), vol 12464. Springer, Cham. https://doi.org/10.1007/978-3-030-60802-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-60802-6_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60801-9
Online ISBN: 978-3-030-60802-6
eBook Packages: Computer ScienceComputer Science (R0)