Repository logo
 
Publication

Extracting n-ary Relations from Biomedical Literature using Deep-Learning Techniques

dc.contributor.authorFernandes,João Lucas Matias
dc.contributor.institutionDepartment of Chemistry and Biochemistry
dc.contributor.supervisorCouto,Francisco José Moreira
dc.date.accessioned2025-12-30T12:10:58Z
dc.date.available2025-12-30T12:10:58Z
dc.date.issued2025
dc.descriptionTese de mestrado, Bioquímica e Biomedicina, 2025, Universidade de Lisboa, Faculdade de Ciências
dc.description.abstractThe rapid growth of biomedical literature makes it challenging for researchers to stay up-to-date. Text mining has become essential for efficiently extracting knowledge from unstructured texts. Abstracts offer a focused alternative to full-text articles, but extracting meaningful insights remains difficult. Key tasks such as Named Entity Recognition (NER) and Named Entity Linking (NEL) face issues like ambiguous terminology, entity variability, and incomplete knowledge bases, especially when handling novel or NIL (not-in-lexicon) entities. Relation Extraction (RE) systems also face challenges, including limited scope, lack of interpretability, and a focus on binary relations that do not fully capture complex biomedical interactions. This thesis introduces a small gold-standard dataset created by expanding 31 abstracts from the 600-document BioRED corpus. The dataset adds CellTypeOrAnatomicalConcept and NIL entities, serving as a resource to test and improve the Biomedical Entity Annotator (BENT) tool for NER and NEL. It also enables the extension of relation extraction from binary to n-ary relations, starting with ternary relations. Compared to BioRED, NER performance was generally lower across most entity types, while NEL showed particularly low scores for GeneOrGeneProduct, CellTypeOrAnatomicalConcept, and NIL entities, reflecting the challenges of novel entity annotation. For n-ary relation extraction, the K-RET system, built on BERT-based models, was employed with SciBERT and BioMedBERT. In the binary setting, the system achieved an F1-score of 0.775 compared to BioRED’s 0.7562. Ternary relations were evaluated against BioRex, a state-of-the-art study, yielding F1- scores of approximately 0.65. Despite being lower than BioRex, the results provide a promising baseline for n-ary relation extraction across a broader set of entity types.en
dc.formatapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10400.5/116469
dc.language.isoeng
dc.subjectMachine Learning Methods
dc.subjectNatural Language Processing
dc.subjectRelation Extraction
dc.subjectText Mining
dc.subjectBiomedical Literature
dc.titleExtracting n-ary Relations from Biomedical Literature using Deep-Learning Techniquesen
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccess

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
TM_Joao_Fernandes.pdf
Size:
1.12 MB
Format:
Adobe Portable Document Format