Mal_PCASVM: Malonylation Residues Classification with Principal Component Analysis Support Vector Machine

Meng, Tong; Chen, Yuehui; Bao, Wenzheng; Cao, Yi

doi:10.1007/978-3-030-84529-2_57

Mal_PCASVM: Malonylation Residues Classification with Principal Component Analysis Support Vector Machine

Tong Meng¹³,
Yuehui Chen¹⁴,
Wenzheng Bao¹⁵ &
…
Yi Cao¹⁶

Conference paper
First Online: 09 August 2021

1329 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12837))

Abstract

Post-translational modification (PTM) is considered a significant biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. Malonylation of lysine is a newly discovered post-translational modification, which is associated with many diseases, such as type 2 diabetes and different types of cancer. In addition, compared with the experimental identification of propionylation sites, the calculation method can save time and reduce cost. In this paper, we combine principal component analysis with support vector machine (SVM) to propose a new computational model - Mal-PCASVM (malonylation prediction). Firstly, the one-hot encoding, physicochemical properties and the composition of k-spacer acid pairs were used to extract sequence features. Secondly, we preprocess the data, select the best feature subset by principal component analysis (PCA), and predict the malonylation sites by SVM. And then, we do a five-fold cross validation, and the results show that compared with other methods, Mal-PCASVM can get better prediction performance. In the 10-fold cross validation of independent data sets, AUC (area under receiver operating characteristic curve) analysis has reached 96.39%. Mal-PCASVM is used to identify the malonylation sites in the protein sequence, which is a computationally reliable method. It is superior to the existing prediction tools that found in the literature and can be used as a useful tool for identifying and discovering novel malonylation sites in human proteins.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Molinie, B., Giallourakis, C.C.: Genome-wide location analyses of N6-Methyladenosine modifications (m(6)A-Seq). Methods Mol Biol 1562, 45–53 (2017)
Article Google Scholar
Nye, T.M., van Gijtenbeek, L.A., Stevens, A.G., et al.: Methyltransferase DnmA is responsible for genome-wide N6-methyladenosine modifications at non-palindromic recognition sites in Bacillus subtilis. Nucleic Acids Res. 48, 5332–5348 (2020)
Article Google Scholar
O’Brown, Z.K., Greer, E.L.: N6-methyladenine: a conserved and dynamic DNA mark. In: Jeltsch, A., Jurkowska, R.Z. (eds.) DNA Methyltransferases - Role and Function. AEMB, vol. 945, pp. 213–246. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43624-1_10
Chapter Google Scholar
Zhang, G., et al.: N6-methyladenine DNA modification in drosophila. Cell 161(4), 893–906 (2015)
Article Google Scholar
Janulaitis, A., et al.: Cytosine modification in DNA by BCNI methylase yields N4-methylcytosine. FEBS Lett. 161, 131–134 (1983)
Article Google Scholar
Unger, G., Venner, H.: Remarks on minor bases in spermatic desoxyribonucleic acid. Hoppe-Seylers Z. Physiol. Chem. 344, 280–283 (1966)
Article Google Scholar
Fu, Y., et al.: N6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas. Cell 161, 879–892 (2015)
Article Google Scholar
Greer, E.L., et al.: DNA methylation on N6-adenine in C. Elegans. Cell 161, 868–878 (2015)
Article Google Scholar
Zhang, G., et al.: N6-methyladenine DNA modification in Drosophila. Cell 161, 893–906 (2015)
Article Google Scholar
Wu, T.P., et al.: DNA methylation on N6-adenine in mammalian embryonic stem cells. Nature 532, 329–333 (2016)
Article Google Scholar
Xiao, C.L., et al.: N-methyladenine DNA modification in the human genome. Mol. Cell 71, 306–318 (2018)
Article Google Scholar
Zhou, C., et al.: Identification and analysis of adenine N6-methylation sites in the rice genome. Nat. Plants 4, 554–563 (2018)
Article Google Scholar
Chen, W., et al.: i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 35, 2796–2800 (2019)
Article Google Scholar
Almagor, H.A.: A Markov analysis of DNA sequences. J. Theor. Biol. 104, 633–645 (1983)
Article Google Scholar
Borodovsky, M., et al.: Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 17, 3554–3562 (1995)
Article Google Scholar
Durbin, R., et al.: Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Book Google Scholar
Ohler, U., et al.: Interpolated Markov chains for Eukaryotic promoter recognition. Bioinformatics 15, 362–369 (1999)
Article Google Scholar
Reese, M., et al.: Improved splice site detection in genie. J. Comput. Biol. 4, 311–323 (1997)
Article Google Scholar
Wren, J.D., et al.: Markov model recognition and classification of DNA/protein sequences within large text databases. Bioinformatics 21, 4046–4053 (2005)
Article Google Scholar
Yakhnenko, O., et al.: Discriminatively trained Markov model for sequence classification. In: IEEE International Conference on Data Mining (2005)
Google Scholar
Matthews, B.W.: Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)- Protein Structure 405(2), 442–451 (1975)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the University Innovation Team Project of Jinan (2019GXRC015), and in part by Key Science &Technology Innovation Project of Shandong Province (2019JZZY010324), the Natural Science Foundation of China (No. 61902337), Natural Science Fund for Colleges and Universities in Jiangsu Prov-ince (No. 19KJB520016), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Young talents of science and technology in Jiangsu.

Author information

Authors and Affiliations

School of Information Science and Engineering, University of Jinan, Jinan, China
Tong Meng
School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China
Yuehui Chen
School of Information School of Information Engineering (School of Big Data), Xuzhou University of Technology, Xuzhou, China
Wenzheng Bao
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, School of Information Science and Engineering), University of Jinan, Jinan, China
Yi Cao

Authors

Tong Meng
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenzheng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenzheng Bao .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Shenzhen University, Shenzhen, China
Jianqiang Li
Far Eastern Branch of the Russian Academy of Sciences, Vladivostok, Russia
Valeriya Gribova
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Supplementary Information

Supplementary Table 1. The performance of the proposed method using different CKSAAP features.

Full size table

Supplementary Table 2. Performance of the proposed method using different CKSAAP combinations

Full size table

Supplementary Table 3. The performance of 5-fold cross-validation without PCA

Full size table

Supplementary Table 4. The performance comparison of different single feature.

Full size table

Supplementary Table 5. The performance comparison of different feature combination.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, T., Chen, Y., Bao, W., Cao, Y. (2021). Mal_PCASVM: Malonylation Residues Classification with Principal Component Analysis Support Vector Machine. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12837. Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-84529-2_57
Published: 09 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84528-5
Online ISBN: 978-3-030-84529-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Supplementary Information

Supplementary Information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation