Skip to main content

Mit Protein Transformer: Identification Mitochondrial Proteins with Transformer Model

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14088))

Included in the following conference series:

Abstract

Mitochondrial proteins carry out unique physiological functions within subcellular localization regions, making accurate subcellular localization prediction essential in uncovering pathological disease mechanisms and guiding drug development. Mitochondria serve as the energy production centers of cells, comprising four major subcellular localization regions, namely matrix, outer membrane, inner membrane, and intermembrane space. While traditional research methods analyze physical and chemical protein sequence properties, coupled with machine learning or deep learning algorithms, such methods necessitate considerable time and effort in data preprocessing. To overcome such challenges, this study proposes a novel approach to efficiently predicting mitochondrial protein subcellular localization by perceiving semantic information of protein sequences directly, using an ESM2-35B pre-trained model based on transformer architecture. The study utilized four datasets, comparing two models - the Transformer-Encoder Only model trained from scratch and the classification predictor centered on ESM2-35B pre-trained model fine-tuning. Results show that fine-tuning the large pre-trained model presents a superior performance in subsequent mitochondrial protein subcellular localization tasks in comparison to the Transformer-Encoder Only model. In conclusion, ESM2-35B pre-trained model based on transformer architecture offers vast application prospects in addressing mitochondrial protein subcellular localization prediction issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dai, L.S., Zhu, B.J., Zhao, Y., et al.: Author correction: comparative mitochondrial genome analysis of Eligma narcissus and other lepidopteran insects reveals conserved mitochondrial genome organization and phylogenetic relationships. Sci. Rep. 10, 7221 (2020)

    Article  Google Scholar 

  2. Dorji, J., Vander Jagt, C.J., Garner, J.B., et al.: Correction to: expression of mitochondrial protein genes encoded by nuclear and mitochondrial genomes correlate with energy metabolism in dairy cattle. BMC Genomics 23, 315 (2022)

    Article  Google Scholar 

  3. Mei, S.: Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. JTBIAP 310, 80–87 (2012)

    MathSciNet  MATH  Google Scholar 

  4. Lin, H., Chen, W., Yuan, L.F., Li, Z.Q., Ding, H.: Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor. 61, 259–268 (2013)

    Article  Google Scholar 

  5. Kumar, R., Kumari, B., Kumar, M.: Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 42, 11–22 (2018)

    Article  Google Scholar 

  6. Qiu, W., et al.: Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J. Theor. Biol. 450, 86–103 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Yu, B., et al.: SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36, 1074–1081 (2020)

    Article  Google Scholar 

  8. Jiarui, F., Yang, Y., Chengduo, Z., Jie, Z.: Turbotransformers: an efficient GPU serving system for transformer models. In: PPoPP 2021: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2021)

    Google Scholar 

  9. Xiao, W., Yinping, J., Qiuwen, Z.: DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment (2020)

    Google Scholar 

  10. Lin, Z., et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023)

    Google Scholar 

  11. Yi, T., Dara, B., Donald, M., Dacheng, J., Zhe, Z., Che, Z.: Synthesizer: rethinking self-attention in transformer models. In: Proceedings of ICML (2021)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)

    Google Scholar 

  14. Bapna, A., Chen, M., Firat, O., Cao, Y., Wu, Y.: Training deeper neural machine translation models with transparent attention. In: EMNLP, pp. 3028–3033 (2018)

    Google Scholar 

  15. Lan, Z., et al.: Albert: a lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2020)

    Google Scholar 

  16. Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations (2020)

    Google Scholar 

  17. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  18. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu and ghfund202302026465.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhuo Wang or Wenzheng Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, B., He, L., Wang, Q., Wang, Z., Bao, W., Cheng, H. (2023). Mit Protein Transformer: Identification Mitochondrial Proteins with Transformer Model. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4749-2_52

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4748-5

  • Online ISBN: 978-981-99-4749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics