BioKnowQA: Prompt-tuning for biomedical QA with knowledge graphs

Lopes,Paulo Rodrigo Coelho

Publicação

BioKnowQA: Prompt-tuning for biomedical QA with knowledge graphs

2026Dissertação de mestrado

dc.contributor.advisor	Couto,Francisco José Moreira
dc.contributor.advisor	Fernandes,Maria Isabel Mou Sequeira
dc.contributor.author	Lopes,Paulo Rodrigo Coelho
dc.contributor.institution	Faculty of Sciences
dc.date.accessioned	2026-04-30T14:30:04Z
dc.date.available	2026-04-30T14:30:04Z
dc.date.issued	2026
dc.description.abstract	The rapid proliferation of biomedical literature, evidenced by the addition of over 1.5 million new citations to PubMed in 2023 alone, creates an urgent need for automated tools for efficient information extraction and synthesis. While Large Language Models (LLMs) demonstrate promising capabilities in rapid text processing, they frequently suffer from hallucinations and rely on static knowledge, limiting their application in rigorous scientific domains. This dissertation addresses these challenges through the development of BioKnowQA, a modular Biomedical Question Answering system that integrates LLMs with structured knowledge derived from Knowledge Graphs (KGs). The central hypothesis is that integrating curated knowledge into the answer generation process enhances its precision, reliability, and interpretability. The system was empirically evaluated through participation in two distinct competitions. Within the context of BioASQ Task B, the evaluation focused on Information Extraction and answer generation. In Phase A (Information Extraction), a hybrid neural reranking strategy was implemented. Ablation studies demonstrated that, although neural models improve precision by filtering noise, applying strict similarity thresholds results in reduced coverage (recall), confirming that the purity of the retrieved context is prioritized for subsequent generation. In Phase B (Answer Generation), graph-constrained decoding mechanisms (Graph Constrained Reasoning) were explored. Results demonstrated that anchoring the model (e.g., Mistral-7B) to generate reasoning paths consistent with the Monarch Knowledge Graph leads to more robust answers, particularly for complex Factoid and List-type questions. In parallel, in the MedHopQA task of BioCreative IX, the focus was on the quality of external knowledge for multi-hop reasoning. Experiments revealed that enriching models with structured definitions derived from ontologies outperforms the use of unstructured text from Wikipedia. This result highlights the critical importance of knowledge curation in grounding complex clinical inferences. In summary, this work contributes a proven methodology for developing biomedical question answering systems, demonstrating that while LLMs provide the necessary linguistic fluency, the generated answers require structured and validated knowledge to ensure the factual correctness demanded in scientific and clinical domains.	en
dc.format	application/pdf
dc.identifier.uri	http://hdl.handle.net/10400.5/118320
dc.language.iso	eng
dc.subject	Biomedical Question Answering
dc.subject	Large Language Models
dc.subject	Knowledge Graphs
dc.subject	Retrieval Augmented Generation
dc.subject	Information Retrieval
dc.title	BioKnowQA: Prompt-tuning for biomedical QA with knowledge graphs	en
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: TM_Paulo_Lopes.pdf
Tamanho:: 11.71 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Coleções

Pure > Dspace
PURE > Dspace - Faculdade de Ciências