Pesquita, Cátia, 1980-Aveiro, Lina Andreia Gama2022-07-182022-07-1820222021http://hdl.handle.net/10451/53810Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022Classical Semantic Similarity Measures did not consider negative annotations in similarity compu tation, and the impact that these annotations can have in this data mining technique is not well studied. As such, this work aims to understand how the addition of negative annotations impacts semantic sim ilarity. To do so, two pairwise similarity measures, Best-Match Average and Resnik, were adapted to create the polar measures PolarBMA and PolarResnik. These were evaluated in two currently relevant scopes: protein-protein interaction prediction and disease prediction against the original measures. Pairs of proteins where the proteins were known to interact or not were taken from STRING and enriched with positive and negative annotations from the Gene Ontology. Synthetic patients were created as sets of annotations taken from the Mendelian diseases they were designed to have, as well as possible noise or imprecise annotations. Then semantic similarity was computed with both polar and non-polar measures between proteins in pairs and between patients and candidate diseases including the Mendelian diseases, as well as random diseases taken from the Human Phenotype Ontology. To evaluate if the polar measures performed well in comparison to the baseline, a ranking according to semantic similarity was made for each measure and scope for evaluation and the rank cumulative frequencies were plotted. ROC AUC and Precision-Recall curves were also determined for the Protein Protein interaction(PPI) prediction, as well as average precision for the disease prediction dataset. In PPI prediction, polar measures had an increased performance in the Molecular Function branch for both experiments where negative annotations were added and also in one of the experiments with the Cellular Component branch. In the disease prediction scope, polar measures had an improved performance of approximately ten percent. This improvement was verified in all disease prediction experiments, even with the addition of noise and imprecision. Considering the results obtained, this work concludes that negative annotations have an impact on semantic similarity, but the amplitude of this impact requires further study.engSemelhança SemânticaOntologia biomédicaAnotação negativaPrevisão Interação Proteína-ProteínaPrevisão de doençaTeses de mestrado - 2022To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similaritymaster thesis203217659