Skip to main content

A Comparative Study of Pre-trained Gene Embeddings for COVID-19 mRNA Vaccine Degradation Prediction

  • Conference paper
  • First Online:
Proceedings of the Seventh International Conference on Mathematics and Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1412))

  • 629 Accesses

Abstract

Messenger Ribonucleic acid (mRNA) vaccine faces a challenge of structural instability, due to which the production of vaccine becomes a big challenge. The sequence information of the mRNA vaccine can provide possible degradation sites. Recently, Deep learning areas like Natural Language Processing have shown great promise in understanding these sequences. An appropriate sequence to vector representation is necessary to apply deep learning methods effectively. In this paper, pre-trained dna2vec, rna2vec, and lshvec gene embeddings are compared to identify the best vector representation for predicting the amount of degradation given the mRNA vaccine sequences. The comparison shows that dna2vec embedding performs best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation 06

    Google Scholar 

  2. Choy CT, Wong CH, Chan SL (2019) Embedding of genes using cancer gene expression data: biological relevance and potential application on biomarker discovery. Front Genet 9:682

    Google Scholar 

  3. Rachlin MWE (2017) mrna vaccines: disruptive innovation in vaccination. Moderna 17:05

    Google Scholar 

  4. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610

    Article  Google Scholar 

  5. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  6. Hu S, Ma R, Wang H (2019) An improved deep learning method for predicting dna-binding proteins based on contextual features in amino acid sequences. PLOS ONE 14:1–21, 11

    Google Scholar 

  7. Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. arXiv:1612.03651

  8. Alexey V Lobanov, Anton A Turanov, Dolph L Hatfield, and Vadim N Gladyshev. Dual functions of codons in the genetic code. Critical reviews in biochemistry and molecular biology, 45(4):257–265, 2010

    Google Scholar 

  9. Mostavi M, Salekin S, Huang Y. Deep-2’-o-me: Predicting 2’-o-methylation sites by convolutional neural networks. In: 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2394–2397

    Google Scholar 

  10. Ng P (2017) dna2vec: Consistent vector representations of variable-length k-mers 01

    Google Scholar 

  11. Pan X, Shen H-B (2018) Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network. Neurocomputing 305:51–58

    Article  Google Scholar 

  12. Pardi N, Hogan M, Porter F, Weissman D (2018) mRNA vaccines—a new era in vaccinology. Nat Rev Drug Discov 17:01

    Article  Google Scholar 

  13. Pardi N, Hogan MJ, Weissman D (2020) Recent advances in mRNA vaccine technology. Curr Opin Immunol 65:14–20

    Google Scholar 

  14. Premjith B, Soman KP, Kumar MA (2018) A deep learning approach for malayalam morphological analysis at character level. Procedia Comput Sci 132:47–54

    Google Scholar 

  15. Premjith B, Soman KP, Poornachandran P (2018) A deep learning based part-of-speech (POS) tagger for sanskrit language by embedding character level features. In: Proceedings of the 10th annual meeting of the forum for information retrieval evaluation, pp 56–60. ACM

    Google Scholar 

  16. Ramos J (2003) Using tf-idf to determine word relevance in document queries, 01

    Google Scholar 

  17. Rehurek R (2011) Scalability of semantic analysis in natural language processing

    Google Scholar 

  18. Rizzo R, Fiannaca A, La Rosa M, Urso A (2016) A deep learning approach to DNA sequence classification 9874:129–140, 07

    Google Scholar 

  19. Sasidhar TT, Premjith B, Soman KP (2020) Emotion detection in hinglish (hindi+ english) code-mixed social media text. Procedia Comput Sci 171:1346–1352

    Google Scholar 

  20. Shi L, Chen B (2019) A vector representation of DNA sequences using locality sensitive hashing. BioRxiv

    Google Scholar 

  21. Stanford University (2016) Openvaccine: Covid-19 mrna vaccine degradation prediction. https://www.kaggle.com/c/stanford-covid-vaccine/data

  22. Watson JD, Crick FHC (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171(4356):737–738

    Google Scholar 

  23. Zhang N-N, Li X-F, Deng Y-Q, Zhao H, Huang Y-J, Yang G, Huang W-J, Gao P, Zhou C, Zhang R-R, Guo Y, Sun S-H, Fan H, Shu-Long Z, Chen Q, He Q, Cao T-S, Huang X-Y, Qiu H-Y, Nie J-H, Jiang Y, Yan H-Y, Ye Q, Zhong X, Xue X-L, Zha Z-Y, Zhou D, Yang X, Wang Y-C, Ying B, Qin C-F (2020) A thermostable mRNA vaccine against covid-19. Cell 182(5):1271-1283.e16

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Premjith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krishna, U.V., Premjith, B., Soman, K.P. (2022). A Comparative Study of Pre-trained Gene Embeddings for COVID-19 mRNA Vaccine Degradation Prediction. In: Giri, D., Raymond Choo, KK., Ponnusamy, S., Meng, W., Akleylek, S., Prasad Maity, S. (eds) Proceedings of the Seventh International Conference on Mathematics and Computing . Advances in Intelligent Systems and Computing, vol 1412. Springer, Singapore. https://doi.org/10.1007/978-981-16-6890-6_22

Download citation

Publish with us

Policies and ethics