Publicação
BioKnowQA: Prompt-tuning for biomedical QA with knowledge graphs
| dc.contributor.advisor | Couto,Francisco José Moreira | |
| dc.contributor.advisor | Fernandes,Maria Isabel Mou Sequeira | |
| dc.contributor.author | Lopes,Paulo Rodrigo Coelho | |
| dc.contributor.institution | Faculty of Sciences | |
| dc.date.accessioned | 2026-04-30T14:30:04Z | |
| dc.date.available | 2026-04-30T14:30:04Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | The rapid proliferation of biomedical literature, evidenced by the addition of over 1.5 million new citations to PubMed in 2023 alone, creates an urgent need for automated tools for efficient information extraction and synthesis. While Large Language Models (LLMs) demonstrate promising capabilities in rapid text processing, they frequently suffer from hallucinations and rely on static knowledge, limiting their application in rigorous scientific domains. This dissertation addresses these challenges through the development of BioKnowQA, a modular Biomedical Question Answering system that integrates LLMs with structured knowledge derived from Knowledge Graphs (KGs). The central hypothesis is that integrating curated knowledge into the answer generation process enhances its precision, reliability, and interpretability. The system was empirically evaluated through participation in two distinct competitions. Within the context of BioASQ Task B, the evaluation focused on Information Extraction and answer generation. In Phase A (Information Extraction), a hybrid neural reranking strategy was implemented. Ablation studies demonstrated that, although neural models improve precision by filtering noise, applying strict similarity thresholds results in reduced coverage (recall), confirming that the purity of the retrieved context is prioritized for subsequent generation. In Phase B (Answer Generation), graph-constrained decoding mechanisms (Graph Constrained Reasoning) were explored. Results demonstrated that anchoring the model (e.g., Mistral-7B) to generate reasoning paths consistent with the Monarch Knowledge Graph leads to more robust answers, particularly for complex Factoid and List-type questions. In parallel, in the MedHopQA task of BioCreative IX, the focus was on the quality of external knowledge for multi-hop reasoning. Experiments revealed that enriching models with structured definitions derived from ontologies outperforms the use of unstructured text from Wikipedia. This result highlights the critical importance of knowledge curation in grounding complex clinical inferences. In summary, this work contributes a proven methodology for developing biomedical question answering systems, demonstrating that while LLMs provide the necessary linguistic fluency, the generated answers require structured and validated knowledge to ensure the factual correctness demanded in scientific and clinical domains. | en |
| dc.format | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10400.5/118320 | |
| dc.language.iso | eng | |
| dc.subject | Biomedical Question Answering | |
| dc.subject | Large Language Models | |
| dc.subject | Knowledge Graphs | |
| dc.subject | Retrieval Augmented Generation | |
| dc.subject | Information Retrieval | |
| dc.title | BioKnowQA: Prompt-tuning for biomedical QA with knowledge graphs | en |
| dc.type | master thesis | |
| dspace.entity.type | Publication | |
| rcaap.rights | openAccess |
Ficheiros
Principais
1 - 1 de 1
A carregar...
- Nome:
- TM_Paulo_Lopes.pdf
- Tamanho:
- 11.71 MB
- Formato:
- Adobe Portable Document Format
