Logo do repositório
 
Publicação

Deep Learning for Discovery of Drug Binding Activities for Orphan Targets

datacite.subject.fosDepartamento de Informáticapt_PT
dc.contributor.advisorFalcão, André Osório e Cruz de Azerêdo, 1969-
dc.contributor.authorGorman, Sean
dc.date.accessioned2025-02-19T15:33:24Z
dc.date.available2025-02-19T15:33:24Z
dc.date.issued2025
dc.date.submitted2024
dc.descriptionTese de mestrado, Bioinformática e Biologia Computacional, 2025, Universidade de Lisboa, Faculdade de Ciênciaspt_PT
dc.description.abstractThe vastness of chemical space poses a daunting challenge in drug discovery - particularly when predicting drug-target interactions (DTIs) for novel or orphan protein targets. Graph Neural Networks (GNNs) have emerged as powerful deep learning models for modeling complex biological interactions, yet they often struggle with the cold-start problem - which can be defined as generalisation to unseen proteins or ligands. In this thesis this challenge is addressed by developing a GNN-based model using PyTorch Geometric to predict inhibitory constants (Ki) and half-maximal inhibitory concentrations (IC50) for protein-ligand pairs. Comprehensive datasets were extracted from the ChEMBL database, focusing on Swiss-Prot verified proteins to ensure high quality interaction data. After filtering to remove entries lacking canonical SMILES, missing activity values, or containing ambiguous activity measurements, refined datasets of 276,098 Ki entries (70.83% retention) and 412,726 IC50 entries (82.29% retention) were obtained. Molecules and proteins were represented as graph objects, enabling the GNN to capture intricate structural and relational features. The model architecture consisted of dual encoders for molecules and proteins, respectively, whose learned features were fused by concatenation and fed into a multilayer perceptron head for activity prediction. Experiments revealed a clear discrepancy in model performance between traditional random splits and a more stringent cold-start evaluation. While the model showed strong predictive capabilities on a validation set randomly sampled from the training data, it performed poorer on a ’blinded’ cold-start dataset where entire proteins and their interactions were excluded before splitting. The model detects some signal in the blind dataset, yet this decline highlights the model’s struggle to generalise to entirely new proteins. This is a common scenario in drug discovery when seeking ligands for orphan targets. These results highlight the limitations of current GNN approaches in addressing the cold-start problem and emphasize the need for novel strategies to enhance model generalisation. Future work should explore advanced techniques such as transfer learning, incorporation of protein domain knowledge, incorporating knowledge graphs, and data augmentation to mitigate this issue and improve performance. Overcoming the cold-start challenge is crucial for broadening the scope of targetable proteins and expediting the development of treatments for previously unstudied targets.pt_PT
dc.identifier.urihttp://hdl.handle.net/10400.5/98571
dc.language.isoengpt_PT
dc.subjectAprendizagem profundapt_PT
dc.subjectinteração fármaco-alvo (DTI)pt_PT
dc.subjectredes neuronais de grafos (GNNs)pt_PT
dc.subjectrastreio virtual de ligandos (VLS)pt_PT
dc.subjectalvos órfãospt_PT
dc.subjectTeses de mestrado - 2025pt_PT
dc.titleDeep Learning for Discovery of Drug Binding Activities for Orphan Targetspt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameTese de mestrado em Bioinformática e Biologia Computacionalpt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TM_Sean_Gorman.pdf
Tamanho:
3.75 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.2 KB
Formato:
Item-specific license agreed upon to submission
Descrição: