Publicação
Semantic Indexing of Descriptors, Partitioning and Classification for Mapping Biomedical Entities
| dc.contributor.author | Vedor, João Pedro Ramos | |
| dc.contributor.institution | Faculty of Sciences | |
| dc.contributor.institution | Department of Informatics | |
| dc.contributor.supervisor | Couto, Francisco José Moreira | |
| dc.date.accessioned | 2026-02-09T17:55:02Z | |
| dc.date.available | 2026-02-09T17:55:02Z | |
| dc.date.issued | 2025 | |
| dc.description | Tese de mestrado, Engenharia Informática, 2025, Universidade de Lisboa, Faculdade de Ciências | |
| dc.description.abstract | Imagine reading a clinical note and encountering the term ALS.” Is it amyotrophic lateral sclerosis (a disease), advanced life support (a procedure), or even a gene symbol? Traditional Named Entity Recognition (NER) can identify that ALS” is a mention of a biomedical entity, but it does not disambiguate which specific concept it refers to. Named Entity Linking (NEL) is required to map the mention to a unique identifier in an ontology such as UMLS or MeSH, ensuring that downstream systems understand precisely which disease, procedure, or gene is referenced. For example, correctly linking “ALS” to the UMLS concept for amyotrophic lateral sclerosis allows consistent retrieval, reasoning, and integration across biomedical datasets. Scaling NEL across articles, clinical narratives, and curated resources is challenging due to the size of ontologies, acronym collisions, synonyms, and ambiguous mentions (e.g., “Lou Gehrig’s disease” vs. “ALS”). Motivated by these challenges, this thesis introduces XMR4EL, a modular and reproducible framework that treats NEL as eXtreme Multi-label Ranking (XMR): “organize → route → rank.” XMR4EL decouples semantic indexing, hierarchical routing, and label-level ranking behind stable interfaces, enabling plug-and-play components while preserving deterministic preprocessing, sparse-first modeling, and persisted artifacts for reproducibility. On automatically labeled disease corpora (Inst-100/Inst-500 for training; BC5CDR for testing), a 4-layer hierarchy achieves 61.7% Hit@100 at 195 ms/mention with beam=40, revealing a clear trade-off point between speed and quality near beams 30–40. Increasing per-label synonyms from 100 to 500 instances yields +8–13 Hit@100 points at practical beams, and added depth improves recall at matched latency by shrinking leaf scopes. Conclusion: XMR4EL demonstrates that an open, sparse-first XMR design is practical today. By combining effective mention detection (NER) with accurate concept grounding (NEL), it provides a reliable pipeline for linking disease mentions to their unique IDs, supporting high-quality biomedical information retrieval and integration. Opportunities for further improvements include calibration, document-level coherence, and label-side text encoders to boost top-1/5 accuracy. | en |
| dc.format | application/pdf | |
| dc.identifier.tid | 204177677 | |
| dc.identifier.uri | http://hdl.handle.net/10400.5/116943 | |
| dc.language.iso | eng | |
| dc.subject | Biomedical Entity Linking | |
| dc.subject | Extreme Multi-label Ranking | |
| dc.subject | Candidate Generation | |
| dc.subject | Beam Search | |
| dc.subject | Hard Negative Mining | |
| dc.title | Semantic Indexing of Descriptors, Partitioning and Classification for Mapping Biomedical Entities | en |
| dc.type | master thesis | |
| dspace.entity.type | Publication | |
| rcaap.rights | openAccess |
Ficheiros
Principais
1 - 1 de 1
A carregar...
- Nome:
- TM_Joao_Vedor.pdf
- Tamanho:
- 568.57 KB
- Formato:
- Adobe Portable Document Format
