Publication
Deep Learning to optimize viral vector production for human gene therapy
| datacite.subject.fos | Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática | pt_PT |
| dc.contributor.advisor | Pesquita, Cátia Luísa Santana Calisto | |
| dc.contributor.advisor | Rodrigues, Ana Filipa | |
| dc.contributor.author | Ferraz, João Lucas Figueiredo | |
| dc.date.accessioned | 2025-04-04T15:08:40Z | |
| dc.date.available | 2025-04-04T15:08:40Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | 2025 | |
| dc.description | Tese de Mestrado, Engenharia Informática, 2025, Universidade de Lisboa, Faculdade de Ciências | pt_PT |
| dc.description.abstract | This work explores the potential of Protein Language Models (PLMs) to advance the design of novel Adeno-Associated Virus 2 (AAV2) sequences, while focusing on two primary objectives: sequence classification and generative design. For the classification task, we fine-tuned a PLM (ProtBERT) to accurately differentiate between viable and non-viable AAV2 sequences. Results demonstrated high classification performance across multiple trained models, validating the hypothesis that domain-specific fine-tuning enables PLMs to effectively capture important AAV2 sequence features. For sequence generation, we fine-tuned a conditional generative PLM (ProGen) to design viable AAV2 capsid protein sequences. While the model generated structurally diverse sequences, extensive evaluations indicated that additional refinements are necessary to consistently align with viability criteria. The classification model highlights the potential of PLMs in predicting sequence viability, offering a reliable approach that could help reduce experimental costs. We consider that the generative approach, though requiring further optimization, introduces a novel avenue for designing diverse AAV2 variants. Future efforts will focus on refining the generative framework by incorporating explicit viability tags, classifier feedback, and more extensive generation hyperparameter testing, as well as expanding its application to additional AAV2 properties. This work lays a foundation for leveraging PLMs in AAV2 sequence engineering, offering promising prospects for the use of Language Models for viral vector design. | pt_PT |
| dc.identifier.uri | http://hdl.handle.net/10400.5/100019 | |
| dc.language.iso | eng | pt_PT |
| dc.subject | Aprendizagem profunda | pt_PT |
| dc.subject | Modelos de linguagem de proteínas | pt_PT |
| dc.subject | Aprendizagem por transferência | pt_PT |
| dc.subject | Representações | pt_PT |
| dc.subject | Investigação de sequências proteicas | pt_PT |
| dc.subject | Teses de mestrado - 2025 | pt_PT |
| dc.title | Deep Learning to optimize viral vector production for human gene therapy | pt_PT |
| dc.type | master thesis | |
| dspace.entity.type | Publication | |
| rcaap.rights | openAccess | pt_PT |
| rcaap.type | masterThesis | pt_PT |
| thesis.degree.name | Mestrado em Engenharia Informática | pt_PT |
