Logo do repositório
 
A carregar...
Miniatura
Publicação

Deep Learning to optimize viral vector production for human gene therapy

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TM_João_Ferraz.pdf1.79 MBAdobe PDF Ver/Abrir

Resumo(s)

This work explores the potential of Protein Language Models (PLMs) to advance the design of novel Adeno-Associated Virus 2 (AAV2) sequences, while focusing on two primary objectives: sequence classification and generative design. For the classification task, we fine-tuned a PLM (ProtBERT) to accurately differentiate between viable and non-viable AAV2 sequences. Results demonstrated high classification performance across multiple trained models, validating the hypothesis that domain-specific fine-tuning enables PLMs to effectively capture important AAV2 sequence features. For sequence generation, we fine-tuned a conditional generative PLM (ProGen) to design viable AAV2 capsid protein sequences. While the model generated structurally diverse sequences, extensive evaluations indicated that additional refinements are necessary to consistently align with viability criteria. The classification model highlights the potential of PLMs in predicting sequence viability, offering a reliable approach that could help reduce experimental costs. We consider that the generative approach, though requiring further optimization, introduces a novel avenue for designing diverse AAV2 variants. Future efforts will focus on refining the generative framework by incorporating explicit viability tags, classifier feedback, and more extensive generation hyperparameter testing, as well as expanding its application to additional AAV2 properties. This work lays a foundation for leveraging PLMs in AAV2 sequence engineering, offering promising prospects for the use of Language Models for viral vector design.

Descrição

Tese de Mestrado, Engenharia Informática, 2025, Universidade de Lisboa, Faculdade de Ciências

Palavras-chave

Aprendizagem profunda Modelos de linguagem de proteínas Aprendizagem por transferência Representações Investigação de sequências proteicas Teses de mestrado - 2025

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Licença CC