MethEvo: an accurate evolutionary information-based methylation site predictor

Islam, Sadia; Mugdha, Shafayat Bin Shabbir; Dipta, Shubhashis Roy; Arafat, MD. Easin; Shatabda, Swakkhar; Alinejad-Rokny, Hamid; Dehzangi, Iman

doi:10.1007/s00521-022-07738-9

MethEvo: an accurate evolutionary information-based methylation site predictor

S.I.: Improving Healthcare outcomes using Multimedia Big Data Analytics
Published: 22 September 2022

Volume 36, pages 201–212, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Sadia Islam¹,
Shafayat Bin Shabbir Mugdha¹,
Shubhashis Roy Dipta²,
MD. Easin Arafat³,
Swakkhar Shatabda¹,
Hamid Alinejad-Rokny ORCID: orcid.org/0000-0002-2189-9153^4,5,6 &
…
Iman Dehzangi^7,8

482 Accesses
Explore all metrics

Abstract

Post Translational Modification (PTM) plays an essential role in the biological and molecular mechanisms. They are also considered as a vital element in cell signaling and networking pathways. Among different PTMs, Methylation is regarded as one of the most important types. Methylation plays a crucial role in maintaining the dynamic balance, stability, and remodeling of chromatins. Methylation also leads to different abnormalities in cells and is responsible for many serious diseases. Methylation can be detected by experimental approaches such as methylation-specific antibodies, mass spectrometry, or characterizing methylation sites using the radioactive labeling method. However, these approaches are time-consuming and costly. Therefore, there is a demand for fast and accurate computational techniques to solve these issues. This study proposes a novel machine learning approach called MethEvo to predict methylation sites in proteins. To build this model, we use an evolutionary-based bi-gram profile approach to extract features. We also use SVM as our classification technique to build MethEvo. Our results demonstrate that MethEvo achieves 98.7%, 98.8%, 98.4%, and 0.974 in terms of accuracy, specificity, sensitivity, and Matthews Correlation Coefficient (MCC). MethEvo and its source code are publicly available at: https://github.com/islamsadia88/MethEvo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate Prediction of Lysine Methylation Sites Using Evolutionary and Structural-Based Information

Article Open access 02 May 2024

Position-specific prediction of methylation sites from sequence conservation based on information theory

Article Open access 23 July 2015

Two-Level Protein Methylation Prediction using structure model-based features

Article Open access 07 April 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

MethEvo and its source code are publicly available at: https://github.com/islamsadia88/MethEvo.

References

Cornett EM, Ferry L, Defossez PA, Rothbart SB (2019) Lysine methylation regulators moonlighting outside the epigenome. Mol Cell 75(6):1092–1101
Google Scholar
Qiu WR, Xiao X, Lin WZ, Chou KC (2014) IMethyl-PseAAC: Identification of protein methylation sites via a pseudo amino acid composition approach. Biomed Res Int. https://doi.org/10.1155/2014/947416
Article Google Scholar
Qiu H, Guo Y, Yu L, Pu X, Li M (2018) Predicting protein lysine methylation sites by incorporating single-residue structural features into Chou’s pseudo components. Chemom Intell Lab Syst 179:31–38
Google Scholar
Cao XJ, Arnaudo AM, Garcia BA (2013) Large-scale global identification of protein lysine methylation in vivo. Epigenetics 8(5):477–485
Google Scholar
Shien DM, Lee TY, Chang WC, Hsu JBK, Horng JT, Hsu PC, Wang TY, Huang HD (2009) Incorporating structural characteristics for identification of protein methylation sites. J Comput Chem 30(9):1532–1543
Google Scholar
Liu H, Galka M, Mori E, Liu X, Lin YF, Wei R, Pittock P, Voss C, Dhami G, Li X, Miyaji M (2013) A method for systematic mapping of protein lysine methylation identifies functions for HP1β in DNA damage response. Mol Cell 50(5):723–735
Google Scholar
Biggar KK, Charih F, Liu H, Ruiz-Blanco YB, Stalker L, Chopra A, Connolly J, Adhikary H, Frensemier K, Galka M, Fang Q (2020) Proteome-wide prediction of lysine methylation reveals novel histone marks and outlines the methyllysine proteome. Biorxiv. https://doi.org/10.1101/274688
Article Google Scholar
Chen H, Xue Y, Huang N, Yao X, Sun Z (2006) MeMo: a web tool for prediction of protein methylation modifications. Nucleic Acid Res 34(suppl 2):W249–W253
Google Scholar
Shao J, Xu D, Tsai SN, Wang Y, Ngai SM (2009) Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One 4(3):e4920
Google Scholar
Shi SP, Qiu JD, Sun XY, Suo SB, Huang SY, Liang RP (2012) PMeS: prediction of methylation sites based on enhanced feature encoding scheme. PLoS One 7(6):e38772
Google Scholar
Wei L, Xing P, Shi G, Ji ZL, Zou Q (2017) Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans. Comput. Biol Bioinform 16:1–12
Google Scholar
Zheng W, Wuyun Q, Cheng M, Hu G, Zhang Y (2020) Two-level protein methylation prediction using structure model-based features. Sci Rep 10(1):1–15
Google Scholar
Ahmad MW, Arafat ME, Taherzadeh G, Sharma A, Dipta SR, Dehzangi A, Shatabda S (2020) Mal-light: enhancing lysine malonylation sites prediction problem using evolutionary-based features. IEEE Access 8:77888–77902
Google Scholar
Shatabda S, Saha S, Sharma A, Dehzangi A (2017) iPHLoc-ES: identification of bacteriophage protein locations using evolutionary and structural features. J Theor Biol 435:229–237
Google Scholar
Uddin MR, Sharma A, Farid DM, Rahman MM, Dehzangi A, Shatabda S (2018) EvoStruct-Sub: an accurate gram-positive protein subcellular localization predictor using evolutionary and structural features. J Theor Biol 443:138–146
MathSciNet Google Scholar
Dehzangi A, López Y, Lal SP, Taherzadeh G, Michaelson J, Sattar A, Tsunoda T, Sharma A (2017) PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J Theor Biol 425:97–102
Google Scholar
Liu Z, Wang Y, Gao T, Pan Z, Cheng H, Yang Q, Cheng Z, Guo A, Ren J, Xue Y (2014) CPLM: a database of protein lysine modifications. Nucleic Acid Res 42(D1):D531–D536
Google Scholar
Reddy HM, Sharma A, Dehzangi A, Shigemizu D, Chandra AA, Tsunoda T (2019) GlyStruct: glycation prediction using structural properties of amino acid residues. BMC Bioinform. https://doi.org/10.1186/s12859-018-2547-x
Article Google Scholar
Abid H, Jenny NJ, and Shovan SM (2020) Improved identification performance of lysine glycation PTM using PSI-BLAST. 2020 IEEE region 10 symposium TENSYMP 2020, pp 18–21
Xu Y, Ding YX, Ding J, Lei YH, Wu LY, Deng NY (2015) ISuc-PseAAC: Predicting lysine succinylation in proteins by incorporating peptide position-specific propensity. Sci Rep 5(June):3–8
Google Scholar
Jia J, Liu Z, Xiao X, Liu B, Chou KC (2016) ISuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem 497:48–56
Google Scholar
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. Proc Int Jt Conf Neural Netw 3:1322–1328
Google Scholar
Dehzangi A, López Y, Lal SP, Taherzadeh G, Sattar A, Tsunoda T, Sharma A (2018) Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams. PLoS One 13(2):e0191900
Google Scholar
Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A (2018) iProtGly-SS: identifying protein glycation sites using sequence and structure based features. Protein Struct Funct Bioinform 86(7):777–789
Google Scholar
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11
Google Scholar
Sharma A, Lyons J, Dehzangi A, Paliwal KK (2013) A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 320:41–46
MathSciNet Google Scholar
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567
Google Scholar
Patle A and Chouhan DS (2013) SVM kernel functions for classification. In 2013 international conference on advances in technology and engineering (ICATE), pp 1–9
Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ (2003) SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acid Res 31(13):3692–3697
Google Scholar
Lewis DP, Jebara T, Noble WS (2006) Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 22(22):2753–2760
Google Scholar
Guo Y, Yu L, Wen Z, Li M (2008) Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acid Res 36(9):3025–3030
Google Scholar
Kleinbaum DG (1994) Introduction to Logistic Regression. Springer, New York
Google Scholar
Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD (2004) An introduction to decision tree modeling. J Chemom A J Chemome Soc 18(6):275–285
Google Scholar
Natekin A, Knoll A (2013) Gradient boosting machines, a tutorial. Front Neurorobot 7:21
Google Scholar
Jahromi AH and Taheri M (2017) A non-parametric mixture of gaussian naive bayes classifiers based on local independent features. In: 2017 artificial intelligence and signal processing conference (AISP) IEEE pp 209–212
Schapire RE (2013) Explaining adaboost. Empirical inference. Springer, Berlin, Heidelberg, pp 37–52
Google Scholar
Biau G, Scornet E (2016) A random forest guided tour. Test 25(2):197–227
MathSciNet Google Scholar
Davis J, Goadrich M (2006) The relationship between PR and ROC curves. ACM Int Conf Proc Ser 148:233–240
Google Scholar
Chou K-C, Shen H-B (2009) REVIEW: recent advances in developing web-servers for predicting protein attributes. Nat Sci 01(02):63–92
Google Scholar
Alinejad-Rokny H, Ghavami Modegh R, Rabiee HR, Ramezani Sarbandi E, Rezaie N, Tam KT, Forrest AR (2022) MaxHiC: a robust background correction model to identify biologically relevant chromatin interactions in Hi-C and capture Hi-C experiments. PLoS Comput Biol 18(6):e1010241
Google Scholar
Dashti H, Dehzangi I, Bayati M, Breen J, Beheshti A, Lovell N (2022) Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer. BMC Bioinform 23(1):1–24
Google Scholar
Khakmardan S, Rezvani M, Pouyan AA, Fateh M (2020) MHiC, an integrated user-friendly tool for the identification and visualization of significant interactions in Hi-C data. BMC Genom 21(1):1–10
Google Scholar
Javanmard R, JeddiSaravi K (2013) Proposed a new method for rules extraction using artificial neural network and artificial immune system in cancer diagnosis. J Bionanosci 7(6):665–672
Google Scholar
Alinejad-Rokny H, Sadroddiny E, Scaria V (2018) Machine learning and data mining techniques for medical complex data analysis. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.09.027
Article Google Scholar
Niu H, Xu W, Akbarzadeh H, Parvin H, Beheshti A (2020) Deep feature learnt by conventional deep neural network. Comput Electr Eng 84:106656
Google Scholar
Bayati M, Rabiee HR, Mehrbod M, Vafaee F, Ebrahimi D, Forrest AR (2020) CANCERSIGN: a user-friendly and robust tool for identification and classification of mutational signatures and patterns in cancer genomes. Sci Rep 10(1):1–11
Google Scholar
Rajaei P, Jahanian KH, Beheshti A, Band SS, Dehzangi A (2021) VIRMOTIF: a user-friendly tool for viral sequence analysis. Genes 12(2):186
Google Scholar
Sharifrazi D, Alizadehsani R, Joloudari JH, Shamshirband S, Hussain S, Sani ZA (2022) CNN-KCL: automatic myocarditis diagnosis using convolutional neural network combined with k-means clustering. Math Biosci Eng 19(3):2381–2402
Google Scholar

Download references

Funding

This research received no external funding Rutgers, The State University of New Jersey, p321243, Abdollah Dehzangi.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
Sadia Islam, Shafayat Bin Shabbir Mugdha & Swakkhar Shatabda
Department of Computer Science, University of Maryland Baltimore County, Baltimore, MD, 21250, USA
Shubhashis Roy Dipta
Institute of Information Technology, Jahangirnagar University, Savar, Dhaka, Bangladesh
MD. Easin Arafat
BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW SYDNEY, Sydney, NSW, 2052, Australia
Hamid Alinejad-Rokny
UNSW Data Science Hub, The University of New South Wales (UNSW Sydney), Sydney, NSW, 2052, Australia
Hamid Alinejad-Rokny
AI-Enabled Processes (AIP) Research Centre, Macquarie University, Sydney, 2109, Australia
Hamid Alinejad-Rokny
Department of Computer Science, Rutgers University, Camden, NJ, 08102, USA
Iman Dehzangi
Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08102, USA
Iman Dehzangi

Authors

Sadia Islam
View author publications
You can also search for this author inPubMed Google Scholar
Shafayat Bin Shabbir Mugdha
View author publications
You can also search for this author inPubMed Google Scholar
Shubhashis Roy Dipta
View author publications
You can also search for this author inPubMed Google Scholar
MD. Easin Arafat
View author publications
You can also search for this author inPubMed Google Scholar
Swakkhar Shatabda
View author publications
You can also search for this author inPubMed Google Scholar
Hamid Alinejad-Rokny
View author publications
You can also search for this author inPubMed Google Scholar
Iman Dehzangi
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization, I.D., H.A.R and S.S.; methodology, S.I, S.B.S.M, S.R.D, and MD.E.A software, S.I, R.R.D; validation, S.B.S.M, S.I. and S.S.; formal analysis, S.I.; investigation, I.D; resources, S.S., and I.D; data curation, I.D.; writing—original draft preparation, S.I., I.D, AND S.S; writing—review and editing, I.D, AND H.A.R; supervision, S.S., H.A.R, AND I.D.

Corresponding authors

Correspondence to Swakkhar Shatabda, Hamid Alinejad-Rokny or Iman Dehzangi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Informed consent

Informed consent was obtained from all subjects involved in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Islam, S., Mugdha, S.B.S., Dipta, S.R. et al. MethEvo: an accurate evolutionary information-based methylation site predictor. Neural Comput & Applic 36, 201–212 (2024). https://doi.org/10.1007/s00521-022-07738-9

Download citation

Received: 21 May 2022
Accepted: 17 August 2022
Published: 22 September 2022
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00521-022-07738-9

Keywords

Part of a collection:

S.I.: Improving Healthcare outcomes using Multimedia Big Data Analytics (vol 36, issue 1)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MethEvo: an accurate evolutionary information-based methylation site predictor

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accurate Prediction of Lysine Methylation Sites Using Evolutionary and Structural-Based Information

Position-specific prediction of methylation sites from sequence conservation based on information theory

Two-Level Protein Methylation Prediction using structure model-based features

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now