loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Marco Nicolini 1 ; Dario Malchiodi 1 ; Alberto Cabri 1 ; Emanuele Cavalleri 1 ; Marco Mesiti 1 ; Alberto Paccanaro 2 ; Peter Robinson 3 ; Justin Reese 4 ; Elena Casiraghi 1 ; 5 and Giorgio Valentini 1 ; 5

Affiliations: 1 AnacletoLab, Dept. of Computer Science, University of Milan, Italy ; 2 School of Applied Mathematics (EMAp) - FGV, Rio de Janeiro, Brazil ; 3 Berlin Institute of Health at Charité (BIH), Germany ; 4 Environmental Genomics and Systems Biology Bioscience, Lawrence Berkeley National Laboratory, U.S.A. ; 5 ELLIS European Laboratory for Learning and Intelligent Systems

Keyword(s): Large Language Models, Protein Language Models, Conditional Transformers, Protein Design and Modeling.

Abstract: Conditional transformers improve the generative capabilities of large language models (LLMs) by processing specific control tags able to drive the generation of texts characterized by specific features. Recently, a similar approach has been applied to the generation of functionally characterized proteins by adding specific tags to the protein sequence to qualify their functions (e.g., Gene Ontology terms) or other characteristics (e.g., their family or the species which they belong to). In this work, we show that fine tuning conditional transformers, pre-trained on large corpora of proteins, on specific protein families can significantly enhance the prediction accuracy of the pre-trained models and can also generate new potentially functional proteins that could enlarge the protein space explored by the natural evolution. We obtained encouraging results on the phage lysozyme family of proteins, achieving statistically significant better prediction results than the original pre-traine d model. The comparative analysis of the primary and tertiary structure of the synthetic proteins generated by our model with the natural ones shows that the resulting fine-tuned model is able to generate biologically plausible proteins. Our results confirm and suggest that fine-tuned conditional transformers can be applied to other functionally characterized proteins for possible industrial and pharmacological applications. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.17.150.89

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Nicolini, M.; Malchiodi, D.; Cabri, A.; Cavalleri, E.; Mesiti, M.; Paccanaro, A.; Robinson, P.; Reese, J.; Casiraghi, E. and Valentini, G. (2024). Fine-Tuning of Conditional Transformers Improves the Generation of Functionally Characterized Proteins. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS; ISBN 978-989-758-688-0; ISSN 2184-4305, SciTePress, pages 561-568. DOI: 10.5220/0012567900003657

@conference{bioinformatics24,
author={Marco Nicolini. and Dario Malchiodi. and Alberto Cabri. and Emanuele Cavalleri. and Marco Mesiti. and Alberto Paccanaro. and Peter Robinson. and Justin Reese. and Elena Casiraghi. and Giorgio Valentini.},
title={Fine-Tuning of Conditional Transformers Improves the Generation of Functionally Characterized Proteins},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS},
year={2024},
pages={561-568},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012567900003657},
isbn={978-989-758-688-0},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS
TI - Fine-Tuning of Conditional Transformers Improves the Generation of Functionally Characterized Proteins
SN - 978-989-758-688-0
IS - 2184-4305
AU - Nicolini, M.
AU - Malchiodi, D.
AU - Cabri, A.
AU - Cavalleri, E.
AU - Mesiti, M.
AU - Paccanaro, A.
AU - Robinson, P.
AU - Reese, J.
AU - Casiraghi, E.
AU - Valentini, G.
PY - 2024
SP - 561
EP - 568
DO - 10.5220/0012567900003657
PB - SciTePress