LLM fine-tuning with biomedical open-source data

Couto,Francisco José MoreiraFernandes,Maria Isabel Mou SequeiraAnaya,Christopher2026-01-152026-01-152025http://hdl.handle.net/10400.5/116644Tese de Mestrado, Ciência de Dados, 2025, Universidade de Lisboa, Faculdade de CiênciasBiomedical question answering (QA) systems aim to support researchers and clinicians by providing accurate, context-aware answers to complex information needs. Recent advances in large language models (LLMs) have significantly improved QA performance across domains, yet challenges remain in the biomedical domain due to terminology complexity, limited data availability, and the risk of generating hallucinated content. This thesis investigates the application of parameter-efficient fine-tuning techniques to adapt LLMs for biomedical QA, focusing on the BioASQ challenge Task B Phase B, which includes yes/no, factoid, list, and ideal questions. A comprehensive review of biomedical QA datasets and LLM adaptations highlights the evolving landscape of knowledge-infused models. The thesis presents a fine-tuning pipeline based on QLoRA, a memory-efficient method for adapting the Mistral-7B-Instruct-v0.1 model using quantized weights. Domain-specific prompt templates were designed for each question type to optimize answer formatting and reduce hallucinations. The experimental setup included training on a curated dataset comprising the training dataset provided by BioASQ, Gene Ontology, DrugBank, and BiQA-derived examples. Results show that the proposed system achieves competitive performance across question types, particularly for yes/no questions, attaining F1 scores of 0.76, where structured JSON outputs enabled reliable automatic evaluation. For ideal (free-text) questions, the system demonstrated fluent but occasionally speculative responses, highlighting the trade-offs between informativeness and factual grounding. Evaluation metrics such as F1, MRR, and ROUGE were complemented by qualitative error analysis to assess system robustness. The study concludes that combining domain-adapted prompts with QLoRA fine-tuning offers a promising approach for deploying efficient and effective biomedical QA systems. Future work should explore retrieval-augmented generation, deeper integration of biomedical ontologies, and improved evaluation frameworks tailored to the nuances of clinical and research settings.application/pdfengBiomedical Question AnsweringLarge Language ModelsParameter-Efficient Fine-TuningBioASQEvaluation MetricsLLM fine-tuning with biomedical open-source datamaster thesis204174821