Abstract
Mitochondrial proteins carry out unique physiological functions within subcellular localization regions, making accurate subcellular localization prediction essential in uncovering pathological disease mechanisms and guiding drug development. Mitochondria serve as the energy production centers of cells, comprising four major subcellular localization regions, namely matrix, outer membrane, inner membrane, and intermembrane space. While traditional research methods analyze physical and chemical protein sequence properties, coupled with machine learning or deep learning algorithms, such methods necessitate considerable time and effort in data preprocessing. To overcome such challenges, this study proposes a novel approach to efficiently predicting mitochondrial protein subcellular localization by perceiving semantic information of protein sequences directly, using an ESM2-35B pre-trained model based on transformer architecture. The study utilized four datasets, comparing two models - the Transformer-Encoder Only model trained from scratch and the classification predictor centered on ESM2-35B pre-trained model fine-tuning. Results show that fine-tuning the large pre-trained model presents a superior performance in subsequent mitochondrial protein subcellular localization tasks in comparison to the Transformer-Encoder Only model. In conclusion, ESM2-35B pre-trained model based on transformer architecture offers vast application prospects in addressing mitochondrial protein subcellular localization prediction issues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dai, L.S., Zhu, B.J., Zhao, Y., et al.: Author correction: comparative mitochondrial genome analysis of Eligma narcissus and other lepidopteran insects reveals conserved mitochondrial genome organization and phylogenetic relationships. Sci. Rep. 10, 7221 (2020)
Dorji, J., Vander Jagt, C.J., Garner, J.B., et al.: Correction to: expression of mitochondrial protein genes encoded by nuclear and mitochondrial genomes correlate with energy metabolism in dairy cattle. BMC Genomics 23, 315 (2022)
Mei, S.: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. JTBIAP 310, 80–87 (2012)
Lin, H., Chen, W., Yuan, L.F., Li, Z.Q., Ding, H.: Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor. 61, 259–268 (2013)
Kumar, R., Kumari, B., Kumar, M.: Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 42, 11–22 (2018)
Qiu, W., et al.: Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J. Theor. Biol. 450, 86–103 (2018)
Yu, B., et al.: SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36, 1074–1081 (2020)
Jiarui, F., Yang, Y., Chengduo, Z., Jie, Z.: Turbotransformers: an efficient GPU serving system for transformer models. In: PPoPP 2021: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2021)
Xiao, W., Yinping, J., Qiuwen, Z.: DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment (2020)
Lin, Z., et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023)
Yi, T., Dara, B., Donald, M., Dacheng, J., Zhe, Z., Che, Z.: Synthesizer: rethinking self-attention in transformer models. In: Proceedings of ICML (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Bapna, A., Chen, M., Firat, O., Cao, Y., Wu, Y.: Training deeper neural machine translation models with transparent attention. In: EMNLP, pp. 3028–3033 (2018)
Lan, Z., et al.: Albert: a lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2020)
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu and ghfund202302026465.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, B., He, L., Wang, Q., Wang, Z., Bao, W., Cheng, H. (2023). Mit Protein Transformer: Identification Mitochondrial Proteins with Transformer Model. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_52
Download citation
DOI: https://doi.org/10.1007/978-981-99-4749-2_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4748-5
Online ISBN: 978-981-99-4749-2
eBook Packages: Computer ScienceComputer Science (R0)