Sequential recommender systems for chemical data

Afonso, Cláudia Filipa Martins

http://hdl.handle.net/10400.5/116634

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TM_Claudia_Afonso.pdf		9.29 MB	Adobe PDF	Ver/Abrir
TM_Claudia_Afonso.pdf		9.29 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Afonso, Cláudia Filipa Martins

Resumo(s)

Recommender systems are fundamental tools that leverage machine learning algorithms to provide personalized suggestions of items based on past user behaviour. Despite their widespread use across various business domains, such systems are still rarely employed in scientific fields. This limited adoption is mainly driven by two key aspects, namely the scarcity of available datasets with information about researchers’ interests in scientific items and the sequential nature of these interactions, which is essential to capture the evolution of preferences over time and thus provide more precise suggestions. To tackle the first issue, prior work established the LIBRETTI methodology to generate datasets of implicit feedback based on research literature. One of the scientific areas this methodology addressed was Chemistry, with the creation of the respective dataset using the Chemical Entities of Biological Interest (ChEBI) database and ontology as a source of items and article identifiers associated with chemical compounds. Regarding the second issue, existing literature proposed the sequential enrichment (SeEn) approach, which consists of adding the n most similar items after each original one in a sequence before passing it as input to a sequential recommender algorithm. This work focuses on evaluating whether the recommendation task can be improved by targeting the LIBRETTI-generated chemical dataset for the specific biomedical purpose of drug discovery, as well as validating the SeEn approach using a structural similarity strategy based on fingerprints. The results showed that the performance of the SASRec sequential recommender decreased when using the LIBRETTI-generated chemical dataset filtered on data from the DrugBank repository and enriched with the most similar compound in the training set. Additionally, the performance of the SASRec sequential recommender using the SeEn-enriched datasets only increased when the validation position was also occupied by a similar rather than a real compound of the sequence.

Descrição

Tese de Mestrado, Ciência de Dados, 2025, Universidade de Lisboa, Faculdade de Ciências

Palavras-chave

chemical recommender system sequential recommendation fingerprint-based molecular similarity sequential enrichment approach

URI

http://hdl.handle.net/10400.5/116634

Coleções

Pure > Dspace
PURE > Dspace - Faculdade de Ciências

Ver registo completo