Skip to main content
Log in

Combining feature engineering and feature selection to improve the prediction of methionine oxidation sites in proteins

  • S.I. : IWANN2017: Learning algorithms with real world applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Methionine is a proteinogenic amino acid that can be post-translationally modified. It is now well established that reactive oxygen species can oxidise methionine residues within living cells. For a long time, it has been thought that such a modification represents merely an inevitable damage derived from aerobic metabolism. However, several authors have begun to contemplate a possible role for this methionine modification in cell signalling. During the last years, a number of proteomic studies have been carried out with the purpose of detecting proteins containing oxidised methionines. Although these proteomic works allow to pinpoint those methionines being oxidised, they are also arduous, expensive and time-consuming. For these reasons, computational approaches aimed at predicting methionine oxidation sites in proteins become an appealing alternative. In the current work, we address methionine oxidation prediction by combining computational intelligence methods with feature engineering and feature selection techniques to improve the efficacy of several machine learning models, while reducing the number of input characteristics needed to get high accuracy rates. We compare random forests, support vector machines, neural networks and flexible discriminant analysis models. Random forests give the best AUC (\(0.8124 \pm 0.0334\)) and accuracy rates (\(0.7590 \pm 0.0551\)) by using only a reduced set of 16 characteristics. These results surpass the outcomes of previous works. In addition, we present an end-user script that has been developed to take a protein ID as an input and return a list with the oxidation state of all the methionine residues found in the analysed protein. Finally, to illustrate the applicability of this tool, we have selected the human \(\alpha 1\)-antitrypsin protein as a case study. This protein was selected because it was not present among the set of proteins used to build up the predictive models but the protein has been well characterised experimentally in terms of methionine oxidation. The prediction returned by our script fully matches the empirical evidence. Out of the nine methionine residues found in this protein, our model predicts the oxidation of only two of them, M351 and M358, which have been reported, on the base of mass spectrometry analyses, to be particularly susceptible to oxidation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aledo JC (2014) Life-history constraints on the mechanisms that control the rate of ROS production. Curr Genomics 15:217–230. https://doi.org/10.2174/1389202915666140515230615. http://www.eurekaselect.com/122198/article

    Article  Google Scholar 

  2. Aledo JC, Cantón FR, Veredas FJ (2017) A machine learning approach for predicting methionine oxidation sites. BMC Bioinform 18(1):430. https://doi.org/10.1186/s12859-017-1848-9

    Article  Google Scholar 

  3. Arnér ES, Holmgren A (2000) Physiological functions of thioredoxin and thioredoxin reductase. Eur J Biochem 267(20):6102–6109. https://doi.org/10.1046/j.1432-1327.2000.01701.x

    Article  Google Scholar 

  4. Bergmeir C, Benítez JM (2012) Neural networks in R using the stuttgart neural network simulator: RSNNS. J Stat Softw 46(7):1–26. https://doi.org/10.18637/jss.v046.i07. http://www.jstatsoft.org/v46/i07/

  5. Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and regression trees. Chapman & Hall, New York. https://www.crcpress.com/Classification-and-Regression-Trees/Breiman-Friedman-Stone-Olshen/p/book/9780412048418

  6. Caputo B, Sim K, Furesjo F, Smola A (2002) Appearance-based object recognition using SVMs: which kernel should I use? In: Proc of NIPS workshop on statistical methods for computational experiments in visual processing and computer vision, Whistler, vol 2002

  7. Collins Y, Chouchani ET, James AM, Menger KE, Cochemé HM, Murphy MP (2012) Mitochondrial redox signalling at a glance. J Cell Sci 125(Pt 4):801–806. https://doi.org/10.1242/jcs.098475

    Article  Google Scholar 

  8. Datta S, Mukhopadhyay S (2015) A grammar inference approach for predicting kinase specific phosphorylation sites. PLoS One 10(4):e0122,294. https://doi.org/10.1371/journal.pone.0122294

    Article  Google Scholar 

  9. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30, http://dl.acm.org/citation.cfm?id=1248547.1248548

  10. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923. https://doi.org/10.1162/089976698300017197. https://www.mitpressjournals.org/doi/10.1162/089976698300017197

    Article  Google Scholar 

  11. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. In: Computational systems bioinformatics CSB2003. Proceedings of the 2003 IEEE bioinformatics conference CSB2003, vol 3(2), pp 523–528. https://doi.org/10.1109/CSB.2003.1227396

  12. Drazic A, Miura H, Peschek J, Le Y, Bach NC, Kriehuber T, Winter J (2013) Methionine oxidation activates a transcription factor in response to oxidative stress. Proc Natl Acad Sci USA 110(23):9493–9498. https://doi.org/10.1073/pnas.1300578110

    Article  Google Scholar 

  13. Erickson JR, MlA Joiner, Guan X, Kutschke W, Yang J, Oddis CV, Bartlett RK, Lowe JS, O’Donnell SE, Aykin-Burns N, Zimmerman MC, Zimmerman K, Ham AJL, Weiss RM, Spitz DR, Shea MA, Colbran RJ, Mohler PJ, Anderson ME (2008) A dynamic pathway for calcium-independent activation of CaMKII by methionine oxidation. Cell 133(3):462–474. https://doi.org/10.1016/j.cell.2008.02.048

    Article  Google Scholar 

  14. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22. http://www.jstatsoft.org/v33/i01/

  15. Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 1–67. https://projecteuclid.org/euclid.aos/1176347963

    Article  MathSciNet  Google Scholar 

  16. Ghesquière B, Jonckheere V, Colaert N, Van Durme J, Timmerman E, Goethals M, Schymkowitz J, Rousseau F, Vandekerckhove J, Gevaert K (2011) Redox proteomics of protein-bound methionine oxidation. Mol Cell Proteomics 10(5):M110.006,866. https://doi.org/10.1074/mcp.M110.006866

    Article  Google Scholar 

  17. Härndahl U, Kokke BP, Gustavsson N, Linse S, Berggren K, Tjerneld F, Boelens WC, Sundby C (2001) The chaperone-like activity of a small heat shock protein is lost after sulfoxidation of conserved methionines in a surface-exposed amphipathic alpha-helix. Biochim Biophys Acta 1545(1–2):227–237. https://doi.org/10.1016/S0167-4838(00)00280-6. https://www.sciencedirect.com/science/article/pii/S0167483800002806?via%3Dihub

    Article  Google Scholar 

  18. Jacques S, Ghesquière B, Van Breusegem F, Gevaert K (2013) Plant proteins under oxidative attack. Proteomics 13(6):932–940. https://doi.org/10.1002/pmic.201200237

    Article  Google Scholar 

  19. Jacques S, Ghesquière B, De Bock PJ, Demol H, Wahni K, Willemns P, Messens J, Van Breusegem F, Gevaert K (2015) Protein methionine sulfoxide dynamics in arabidopsis thaliana under oxidative stress. Mol Cell Proteomics 14:1217–1229. https://doi.org/10.1074/mcp.M114.043729. http://www.mcponline.org/content/14/5/1217.long

    Article  Google Scholar 

  20. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an S4 package for Kernel methods in R. J Stat Softw 11(9):1–20. https://doi.org/10.18637/jss.v011.i09. http://www.jstatsoft.org/v11/i09/

  21. Kim G, Weiss SJ, Levine RL (2014) Methionine oxidation and reduction in proteins. BBA-Gen Subjects 1840(2):901–905. https://doi.org/10.1016/j.bbagen.2013.04.038. https://www.sciencedirect.com/science/article/pii/S0304416513001931?via%3Dihub

    Article  Google Scholar 

  22. Kim HY (2013) The methionine sulfoxide reduction system: selenium utilization and methionine sulfoxide reductase enzymes and their functions. Antioxid Redox Signal 19(9):958–969. https://doi.org/10.1089/ars.2012.5081

    Article  Google Scholar 

  23. Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26. https://doi.org/10.18637/jss.v028.i05. https://www.jstatsoft.org/v028/i05

  24. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York. https://doi.org/10.1007/978-1-4614-6849-3. https://www.springer.com/fr/book/9781461468486

    Book  Google Scholar 

  25. Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. In: Proceedings of the fifteenth international conference on artificial intelligence and statistics, vol 22, pp 665–675. http://proceedings.mlr.press/v22/lacoste12/lacoste12.pdf

  26. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22. http://cran.r-project.org/doc/Rnews/

  27. R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/

  28. Rao RSP, Møller IM, Thelen JJ, Miernyk JA (2014) Convergent signaling pathways–interaction between methionine oxidation and serine/threonine/tyrosine O-phosphorylation. Cell Stress Chaperon 20(1):15–21. https://doi.org/10.1007/s12192-014-0544-1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255251/

    Article  Google Scholar 

  29. Taggart C, Cervantes-Laurean D, Kim G, McElvaney NG, Wehr N, Moss J, Levine RL (2000) Oxidation of either methionine 351 or methionine 358 in alpha 1-antitrypsin causes loss of anti-neutrophil elastase activity. J Biol Chem 275:27,258–27,265. https://doi.org/10.1074/jbc.M004850200. http://www.jbc.org/content/early/2000/06/23/jbc.M004850200.long

  30. Tang XD, Daggett H, Hanner M, Garcia ML, McManus OB, Brot N, Weissbach H, Heinemann SH, Hoshi T (2001) Oxidative regulation of large conductance calcium-activated potassium channels. J Gen Physiol 117(3):253–274. https://doi.org/10.1085/jgp.117.3.253. http://jgp.rupress.org/content/117/3/253.long

    Article  Google Scholar 

  31. Trost B, Kusalik A (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935. https://doi.org/10.1093/bioinformatics/btr525. https://academic.oup.com/bioinformatics/article/27/21/2927/219032

    Article  Google Scholar 

  32. Veredas FJ, Aledo JC, Cantón FR (2017a) Methionine residues around phosphorylation sites are preferentially oxidized in vivo under stress conditions. Sci Rep 7(40403):1–14. https://doi.org/10.1038/srep40403. https://dx.doi.org/10.1038%2Fsrep40403

  33. Veredas FJ, Cantón FR, Aledo JC (2017b) Prediction of protein oxidation sites. In: Rojas I, Joya G, Catala A (eds) Advances in computational intelligence: 14th international work-conference on artificial neural networks, IWANN 2017, June 14–16, Proceedings, Part II. Springer, Cham, Cadiz, Spain, pp 3–14. https://doi.org/10.1007/978-3-319-59147-6_1. https://www.springer.com/in/book/9783319591469

    Chapter  Google Scholar 

  34. Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 7(9):1598–1608. https://doi.org/10.1074/mcp.M700574-MCP200

    Article  Google Scholar 

  35. Zumel N, Mount J (2014) Practical data science with R, 1st edn. Manning Publications Co., Greenwich. https://www.manning.com/books/practical-data-science-with-r

Download references

Acknowledgements

This work was partially supported by the project TIN2017-88728-C2-1-R, MINECO, Plan Nacional de I+D+I.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco J. Veredas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 343 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Veredas, F.J., Urda, D., Subirats, J.L. et al. Combining feature engineering and feature selection to improve the prediction of methionine oxidation sites in proteins. Neural Comput & Applic 32, 323–334 (2020). https://doi.org/10.1007/s00521-018-3655-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3655-2

Keywords

Navigation