Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching

Amorim, Sofia Pessoa de

Publicação

Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching

2022Dissertação de mestrado

datacite.subject.fos	Departamento de Informática	pt_PT
dc.contributor.advisor	Pesquita, Cátia, 1980-
dc.contributor.author	Amorim, Sofia Pessoa de
dc.date.accessioned	2022-07-22T08:59:36Z
dc.date.available	2022-07-22T08:59:36Z
dc.date.issued	2022
dc.date.submitted	2021
dc.description	Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2022	pt_PT
dc.description.abstract	The ontology matching process focuses on discovering mappings between two concepts from distinct ontologies, a source and a target. It is a fundamental step when trying to integrate heterogeneous data sources that are described in ontologies. This data represents an even more challenging problem since we are working with complex data as biomedical data. Thus, derived from the necessity of keeping on improving ontology matching techniques, this dissertation focused on implementing a new approach to the AML pipeline to calculate similarities between entities from two distinct ontologies. For the implementation of this dissertation, we used some of the OAEI tracks, such as Anatomy and LargeBio, to apply a new algorithm and evaluate if it improves AML’s results against a refer ence alignment. This new approach consisted of using pre-trained word embeddings of five different types, BioWordVec Extrinsic, BioWordVec Intrinsic, PubMed+PC, PubMed+PC+Wikipedia and English Wikipedia. These pre-trained word embeddings use a machine learning technique, Word2Vec, and were used in this work since it allows to carry the semantic meaning inherent to the words represented with the corresponding vector. Word embeddings allowed that each concept of each ontology was represented with a corresponding vector to see if, with that information, it was possible to improve how relations between concepts were determined in the AML system. The similarity between concepts was calculated through the cosine distance and the evaluation of the new alignment used the metrics precision recall and F-measure. Although we could not prove that word embeddings improve AML current results, this implementation could be refined, and the technique can be still an option to consider in future work if applied in some other way.	pt_PT
dc.identifier.tid	203205685	pt_PT
dc.identifier.uri	http://hdl.handle.net/10451/53906
dc.language.iso	eng	pt_PT
dc.subject	Embeddings de Palavras	pt_PT
dc.subject	Alinhamento de Ontologias	pt_PT
dc.subject	Ontologias Biomédicas	pt_PT
dc.subject	Teses de mestrado - 2022	pt_PT
dc.title	Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching	pt_PT
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess	pt_PT
rcaap.type	masterThesis	pt_PT
thesis.degree.name	Tese de mestrado em Ciência de Dados	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: TM_Sofia_Amorim.pdf
Tamanho:: 483.1 KB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.2 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

FC-DI - Master Thesis (dissertation)