Abstract
Golgi is an important eukaryotic organelle. Golgi plays a key role in protein synthesis in eukaryotic cells, and its dysfunction will lead to various genetic and neurodegenerative diseases. In order to better develop drugs to treat diseases, one of the key problems is to identify the protein category of Golgi apparatus. In the past, the physical and chemical properties of Golgi proteins have often been used as feature extraction methods, but more accurate sub-Golgi protein identification is still challenged by existing methods. In this article, we use the Tape-Bert model to extract the features of Golgi body. To create a balanced dataset from an unbalanced Golgi dataset, we used the SMOTE oversampling method. In addition, we screened out the important eigenvalues of 300 dimensions to identify the types of Golgi proteins. In 10-fold cross validation and independent test set test, the accuracy rate reached 90.6% and 95.31%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fujita, Y., et al.: Fragmentation of Golgi apparatus of nigral neurons with α-synuclein-positive inclusions in patients with Parkinson’s disease. Acta Neuropathol. 112(3), 261–265 (2006)
Hoyer, S.: Is sporadic Alzheimer disease the brain type of non-insulin dependent diabetes mellitus? A challenging hypothesis. J. Neural Transm. 105(4–5), 415–422 (1998)
Rose, D.R.: Structure, mechanism and inhibition of Golgi α-mannosidase II. Curr. Opin. Struct. Biol. 22(5), 558–562 (2012)
Gonatas, N.K., Gonatas, J.O., Stieber, A.: The involvement of the Golgi apparatus in the pathogenesis of amyotrophic lateral sclerosis, Alzheimer’s disease, and ricin intoxication. Histochem. Cell Biol. 109(5–6), 591–600 (1998)
Yang, W., et al.: A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform. 14(3), 234–240 (2019)
Wang, Z., Ding, H., Zou, Q.: Identifying cell types to interpret scRNA-seq data: how, why and more possibilities. Brief. Funct. Genomics 19(4), 286–291 (2020)
Yuan, L., Guo, F., Wang, L., Zou, Q.: Prediction of tumor metastasis from sequencing data in the era of genome sequencing. Brief. Funct. Genomics 18(6), 412–418 (2019)
Hummer, B.H., Maslar, D., Gutierrez, M.S., de Leeuw, N.F., Asensio, C.S.: Differential sorting behavior for soluble and transmembrane cargoes at the trans-Golgi network in endocrine cells. Mol. Biol. Cell 31(3), 157–166 (2020)
Deng, S., Liu, H., Qiu, K., You, H., Lei, Q., Lu, W.: Role of the Golgi apparatus in the blood-brain barrier: Golgi protection may be a targeted therapy for neurological diseases. Mol. Neurobiol. 55(6), 4788–4801 (2018)
Villeneuve, J., Duran, J., Scarpa, M., Bassaganyas, L., Van Galen, J., Malhotra, V.: Golgi enzymes do not cycle through the endoplasmic reticulum during protein secretion or mitosis. Mol. Biol. Cell 28(1), 141–151 (2017)
Hou, Y., Dai, J., He, J., Niemi, A.J., Peng, X., Ilieva, N.: Intrinsic protein geometry with application to non-proline cis peptide planes. J. Math. Chem. 57(1), 263–279 (2019)
Wei, L., Xing, P., Tang, J., Zou, Q.: PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only. IEEE Trans. Nanobiosci. 16(4), 240–247 (2017)
Du, X., et al.: DeepPPI: boosting prediction of protein–protein interactions with deep neural networks. J. Chem. Inf. Model. 57(6), 1499–1510 (2017)
van Dijk, A.D.J., et al.: Predicting sub-Golgi localization of type II membrane proteins. Bioinformatics 24(16), 1779–1786 (2008)
Ding, H., et al.: Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept. Lett. 18(1), 58–63 (2011)
Ding, H., et al.: Prediction of Golgi-resident protein types by using feature selection technique. Chemom. Intell. Lab. Syst. 124, 9–13 (2013)
Jiao, Y.-S., Pu-Feng, D.: Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties. J. Theor. Biol. 391, 35–42 (2016)
Jiao, Y.-S., Pu-Feng, D.: Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: approaches with minimal redundancy maximal relevance feature selection. J. Theor. Biol. 402, 38–44 (2016)
Lv, Z., et al.: A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front. Bioeng. Biotechnol. 7, 215 (2019)
Zhao, W., et al.: Predicting protein sub-Golgi locations by combining functional domain enrichment scores with pseudo-amino acid compositions. J. Theor. Biol. 473, 38–43 (2019)
Yang, R., Zhang, C., Gao, R., Zhang, L.: A novel feature extraction method with feature selection to identify Golgi–resident protein types from imbalanced data. Int. J. Mol. Sci. 17(2), 218 (2016)
Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K.-C.: IPPBS-Opt: a sequence based ensemble classifier for identifying protein–protein binding sites by optimizing imbalanced training datasets. Molecules 21(1), 95 (2016)
Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K.-C.: IPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J. Theor. Biol. 377, 47–56 (2015)
Liu, B., Fang, L., Wang, S., Wang, X., Li, H., Chou, K.-C.: Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J. Theor. Biol. 385, 153–159 (2015)
Liu, B., Long, R., Chou, K.-C.: IDHS-EL: Identifying DNase I hyper sensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16), 2411–2418 (2016)
Ding, H., et al.: ICTX-type: A sequence–based predictor for identifying the types of conotoxins in targeting ion channels. Biomed. Res. Int. 2014, 1–10 (2014)
Liu, B., Gao, X., Zhang, H.: BioSeq–Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 47(20), e127 (2019)
Chen, W., Feng, P., Liu, T., Jin, D.: Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab. 20(3), 224–228 (2019)
Rao, R., et al.: Evaluating protein transfer learning with tape. Adv. Neural Inf. Process. Syst. 32, 9689 (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Zeng, X., Lin, W., Guo, M., Zou, Q.: A comprehensive overview and evaluation of circular RNA detection tools. PLoS Comput. Biol. 13(6), e1005420 (2017)
Wei, L., Xing, P., Su, R., Shi, G., Ma, Z.S., Zou, Q.: CPPred–RF: a sequence-based predictor for identifying cell–penetrating peptides and their uptake efficiency. J. Proteome Res. 16(5), 2044–2053 (2017)
Wei, L., Xing, P., Zeng, J., Chen, J., Su, R., Guo, F.: Improved prediction of protein–protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med. 83, 67–74 (2017)
Hu, Y., Zhao, T., Zhang, N., Zang, T., Zhang, J., Cheng, L.: Identifying diseases-related metabolites using random walk. BMC Bioinf. 19(S5), 116 (2018)
Zhang, M., et al.: MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 35(17), 2957–2965 (2019)
Song, T., Rodriguez-Paton, A., Zheng, P., Zeng, X.: Spiking neural P systems with colored spikes. IEEE Trans. Cogn. Dev. Syst. 10(4), 1106–1115 (2018)
Lin, X., Quan, Z., Wang, Z.-J., Huang, H., Zeng, X.: A novel molecular representation with BiGRU neural networks for learning atom. Briefings Bioinf. Art. no. bbz125 (2019)
Acknowledgments
This work was supported in part by the University Innovation Team Project of Jinan (2019GXRC015), and in part by Key Science & Technology Innovation Project of Shandong Province (2019JZZY010324), the Natural Science Foundation of China (No. 61902337), the talent project of “Qingtan scholar” of Zaozhuang University, Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Young talents of science and technology in Jiangsu.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cui, Q., Bao, W., Cao, Y., Yang, B., Chen, Y. (2021). RF_Bert: A Classification Model of Golgi Apparatus Based on TAPE_BERT Extraction Features. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12837. Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-84529-2_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84528-5
Online ISBN: 978-3-030-84529-2
eBook Packages: Computer ScienceComputer Science (R0)