Mit Protein Transformer: Identification Mitochondrial Proteins with Transformer Model

Zhang, Baichuan; He, Luying; Wang, Qi; Wang, Zhuo; Bao, Wenzheng; Cheng, Honglin

doi:10.1007/978-981-99-4749-2_52

Baichuan Zhang¹³,
Luying He¹³,
Qi Wang¹³,
Zhuo Wang¹³,
Wenzheng Bao¹³ &
…
Honglin Cheng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14088))

Included in the following conference series:

International Conference on Intelligent Computing

781 Accesses
1 Citations

Abstract

Mitochondrial proteins carry out unique physiological functions within subcellular localization regions, making accurate subcellular localization prediction essential in uncovering pathological disease mechanisms and guiding drug development. Mitochondria serve as the energy production centers of cells, comprising four major subcellular localization regions, namely matrix, outer membrane, inner membrane, and intermembrane space. While traditional research methods analyze physical and chemical protein sequence properties, coupled with machine learning or deep learning algorithms, such methods necessitate considerable time and effort in data preprocessing. To overcome such challenges, this study proposes a novel approach to efficiently predicting mitochondrial protein subcellular localization by perceiving semantic information of protein sequences directly, using an ESM2-35B pre-trained model based on transformer architecture. The study utilized four datasets, comparing two models - the Transformer-Encoder Only model trained from scratch and the classification predictor centered on ESM2-35B pre-trained model fine-tuning. Results show that fine-tuning the large pre-trained model presents a superior performance in subsequent mitochondrial protein subcellular localization tasks in comparison to the Transformer-Encoder Only model. In conclusion, ESM2-35B pre-trained model based on transformer architecture offers vast application prospects in addressing mitochondrial protein subcellular localization prediction issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dai, L.S., Zhu, B.J., Zhao, Y., et al.: Author correction: comparative mitochondrial genome analysis of Eligma narcissus and other lepidopteran insects reveals conserved mitochondrial genome organization and phylogenetic relationships. Sci. Rep. 10, 7221 (2020)
Article Google Scholar
Dorji, J., Vander Jagt, C.J., Garner, J.B., et al.: Correction to: expression of mitochondrial protein genes encoded by nuclear and mitochondrial genomes correlate with energy metabolism in dairy cattle. BMC Genomics 23, 315 (2022)
Article Google Scholar
Mei, S.: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. JTBIAP 310, 80–87 (2012)
MathSciNet MATH Google Scholar
Lin, H., Chen, W., Yuan, L.F., Li, Z.Q., Ding, H.: Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor. 61, 259–268 (2013)
Article Google Scholar
Kumar, R., Kumari, B., Kumar, M.: Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 42, 11–22 (2018)
Article Google Scholar
Qiu, W., et al.: Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J. Theor. Biol. 450, 86–103 (2018)
Article MathSciNet MATH Google Scholar
Yu, B., et al.: SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36, 1074–1081 (2020)
Article Google Scholar
Jiarui, F., Yang, Y., Chengduo, Z., Jie, Z.: Turbotransformers: an efficient GPU serving system for transformer models. In: PPoPP 2021: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2021)
Google Scholar
Xiao, W., Yinping, J., Qiuwen, Z.: DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment (2020)
Google Scholar
Lin, Z., et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023)
Google Scholar
Yi, T., Dara, B., Donald, M., Dacheng, J., Zhe, Z., Che, Z.: Synthesizer: rethinking self-attention in transformer models. In: Proceedings of ICML (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Bapna, A., Chen, M., Firat, O., Cao, Y., Wu, Y.: Training deeper neural machine translation models with transparent attention. In: EMNLP, pp. 3028–3033 (2018)
Google Scholar
Lan, Z., et al.: Albert: a lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2020)
Google Scholar
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu and ghfund202302026465.

Author information

Authors and Affiliations

Xuzhou University of Technology, Xuzhou, 221018, China
Baichuan Zhang, Luying He, Qi Wang, Zhuo Wang, Wenzheng Bao & Honglin Cheng

Authors

Baichuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Luying He
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenzheng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Honglin Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhuo Wang or Wenzheng Bao .

Editor information

Editors and Affiliations

Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Zhengzhou University of Light Industry, Zhengzhou, China
Baohua Jin
Zhong Yuan University of Technology, Zhengzhou, China
Boyang Qu
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B., He, L., Wang, Q., Wang, Z., Bao, W., Cheng, H. (2023). Mit Protein Transformer: Identification Mitochondrial Proteins with Transformer Model. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_52

Download citation

DOI: https://doi.org/10.1007/978-981-99-4749-2_52
Published: 30 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4748-5
Online ISBN: 978-981-99-4749-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics