Deep Learning for Discovery of Drug Binding Activities for Orphan Targets

Gorman, Sean

http://hdl.handle.net/10400.5/98571

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TM_Sean_Gorman.pdf		3.75 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Gorman, Sean

Orientador(es)

Falcão, André Osório e Cruz de Azerêdo, 1969-

Resumo(s)

The vastness of chemical space poses a daunting challenge in drug discovery - particularly when predicting drug-target interactions (DTIs) for novel or orphan protein targets. Graph Neural Networks (GNNs) have emerged as powerful deep learning models for modeling complex biological interactions, yet they often struggle with the cold-start problem - which can be defined as generalisation to unseen proteins or ligands. In this thesis this challenge is addressed by developing a GNN-based model using PyTorch Geometric to predict inhibitory constants (Ki) and half-maximal inhibitory concentrations (IC50) for protein-ligand pairs. Comprehensive datasets were extracted from the ChEMBL database, focusing on Swiss-Prot verified proteins to ensure high quality interaction data. After filtering to remove entries lacking canonical SMILES, missing activity values, or containing ambiguous activity measurements, refined datasets of 276,098 Ki entries (70.83% retention) and 412,726 IC50 entries (82.29% retention) were obtained. Molecules and proteins were represented as graph objects, enabling the GNN to capture intricate structural and relational features. The model architecture consisted of dual encoders for molecules and proteins, respectively, whose learned features were fused by concatenation and fed into a multilayer perceptron head for activity prediction. Experiments revealed a clear discrepancy in model performance between traditional random splits and a more stringent cold-start evaluation. While the model showed strong predictive capabilities on a validation set randomly sampled from the training data, it performed poorer on a ’blinded’ cold-start dataset where entire proteins and their interactions were excluded before splitting. The model detects some signal in the blind dataset, yet this decline highlights the model’s struggle to generalise to entirely new proteins. This is a common scenario in drug discovery when seeking ligands for orphan targets. These results highlight the limitations of current GNN approaches in addressing the cold-start problem and emphasize the need for novel strategies to enhance model generalisation. Future work should explore advanced techniques such as transfer learning, incorporation of protein domain knowledge, incorporating knowledge graphs, and data augmentation to mitigate this issue and improve performance. Overcoming the cold-start challenge is crucial for broadening the scope of targetable proteins and expediting the development of treatments for previously unstudied targets.

Descrição

Tese de mestrado, Bioinformática e Biologia Computacional, 2025, Universidade de Lisboa, Faculdade de Ciências

Palavras-chave

Aprendizagem profunda interação fármaco-alvo (DTI) redes neuronais de grafos (GNNs) rastreio virtual de ligandos (VLS) alvos órfãos Teses de mestrado - 2025

URI

http://hdl.handle.net/10400.5/98571

Coleções

FC-DI - Master Thesis (dissertation)

Ver registo completo