Repository logo
 
Loading...
Project Logo
Research Project

Untitled

Authors

Publications

SPECTRE: Standardised Global Spatial Data on Terrestrial SPecies and ECosystems ThREats
Publication . Branco, Vasco V.; Capinha, César; Rocha, Jorge; Correia, Luís; Cardoso, Pedro
Motivation: SPECTRE is an open-source database containing standardised spatial data on global environmental and anthropo-genic variables that are potential threats to terrestrial species and ecosystems. Its goal is to allow users to swiftly access spatialdata on multiple threats at a resolution of 30-arc seconds for all terrestrial areas. Following the standard set by Worldclim, thesedata allow full comparability and ease of use under common statistical frameworks for global change studies, species distributionmodelling, threat assessments, quantification of ecosystem services and disturbance, among multiple other uses. A web userinterface, a persistent online repository and an accompanying R package with functions for downloading and manipulating dataare provided.Main Types of Variable Contained: SPECTRE is a GIS product, currently with 21 geoTiff raster layers with an approximate1 × 1 km resolution.Spatial Location and Grain: Global (longitude −180–180, latitude −60–90) terrestrial database with a resolution of 30-arc sec-onds (approximately 1 × 1 km at the equator), converted from global sources of different original spatial grain, from 0.03 × 0.03to10 × 10 km.Time Period and Grain: The known time period for all sources present in SPECTRE varies from 1976 to 2020 (all but threeafter 1990), with a minimum temporal grain of 1 year.Major Taxa and Level of Measurement: Non-taxa-specific.Software Format: geoTiff and R.
Triclustering three-way temporal and heterogeneous data
Publication . Soares, Diogo F.; Madeira, Sara; Henriques, Rui
Triclustering, targeting the discovery of coherent subspaces within three-way data, is becoming increasingly relevant in data science, especially for pattern discovery and knowledge acquisition from complex datasets in the biomedical field. This technique can reveal hidden patterns such as putative regulatory modules, disease progression profiles, and individuals with coherent behaviors. When applied to labeled data, triclustering aids in class differentiation and supports real-world decision-making. However, learning from 3W biomedical data is typically challenged by the rich temporal and heterogeneous nature, having mixed-type features and different structure compositions. In response to these challenges, this thesis establishes the foundations for pattern-centric 3W data analysis, focusing on triclustering for temporal and heterogeneous three-way data, targeting both descriptive and predictive tasks. In this context, this thesis includes six major contributions. It provides a literature review and comparative study of current triclustering algorithms for temporal data, highlighting the strengths and weaknesses of existing methods. It presents new tools to support the development and assessment of pattern discovery approaches in descriptive and predictive contexts, including a new data generator capable of creating heterogeneous three-way datasets with annotated triclustering solutions and benchmark datasets for comparative evaluation. It proposes a novel approach to capture time-contiguous triclusters, enhancing the search for temporal coherence. It introduces a new triclustering approach able to handle heterogeneous data by applying sequential pattern mining principles to identify relevant patterns and derive triclusters capturing temporal data dynamics. Additionally, it presents a new method for learning pattern-centric predictors. Finally, it proposes an extension and integration of principles for learning from static and temporal data structures. The developed methods were comprehensively validated in concrete real-world clinical scenarios, showing promising results concerning two progressive diseases. They were used to predict clinically relevant endpoints and identify disease-specific progression patterns, supporting medical decisions and identifying significant patient profiles.
Semantic perspectives for learning over biomedical knowledge graphs
Publication . Sousa, Rita Isabel Torres de; Pesquita, Cátia Luísa Santana Calisto; Silva, Sara Guilherme Oliveira da
Knowledge graphs represent an unparalleled opportunity for machine learning in the biomedical domain, given their ability to enrich data with meaningful context through semantic representations, such as knowledge graph embeddings and semantic similarity. However, the specificity of many biomedical tasks contrasts with the broad domains covered by large and successful biomedical knowledge graphs that describe entities according to several perspectives — semantic aspects. This is particularly challenging for predicting specific relations between entities described in the knowledge graph when the graph itself does not encode these relations. Current semantic representation methods consider the knowledge graph as a whole, ignoring the different semantic aspects. This thesis hypothesizes that semantic representations that are able to distinguish semantic aspects can improve the performance and explainability of biomedical relation prediction tasks. This work investigated different paradigms for defining semantic aspects based on classes and properties and developed multiple semantic representation techniques for both individual entities and entity pairs, with a focus on their explainability. Extensive experiments in proteinprotein interaction and gene-disease association predictions supported the empirical evaluation of the proposed methods and demonstrated that semantic aspect-oriented representations improve both predictive performance and explainability, fostering biomedical research. This work further highlights that in complex and multi-disciplinary domains, where a single knowledge graph is used to support a wide variety of tasks, it is essential to shift from viewing knowledge graphs as a whole to focusing on specific semantic perspectives.
OSINT
Publication . Alves, Fernando Baptista Leal; Bessani, Alysson Neves; Ferreira, Pedro Miguel Frazão Fernandes
Cybersecurity is a topic of growing concern as the number and gravity of cyberattacks are continuously increasing. Receiving the latest updates, patches, and news is crucial to maintaining an IT infrastructure’s high-security level. An alternative to purchasing expensive security news feeds is to collect Open Source Intelligence: a wealth of knowledge published daily by users, security companies, researchers, and hackers, among others. In particular, Twitter has become an information hub for obtaining cutting-edge information about many subjects, including cybersecurity. This thesis is focused on the collection and processing of cybersecurity-related tweets. Firstly, we conducted a qualitative and quantitative study about the security data found on Twitter and compared it to databases that publish confirmed vulnerabilities or exploits. Our study shows that Twitter is a relevant cybersecurity source. The remainder of the work is about developing a framework for collecting, processing, and delivering security tweets. Its pipeline comprises text filtering, text feature extraction, a binary classifier, clustering, and Indicator of Compromise generation. We show how to obtain a tweet classifier model following tweet characteristics and machine learning best practices. Our clustering strategy adopts the k-means algorithm to an unknown number of clusters, and to cluster and update based on a stream of tweets instead of the classical batch operation. From the clusters we generate Indicators of Compromise, which are structured data formats used in cybersecurity; this step eases the integration of our tool with existing cybersecurity tools. Finally, we showcase one such integration with the Security Information and Event Management system of a nation-wide electrical utility company.

Organizational Units

Description

Keywords

Contributors

Funders

Funding agency

Fundação para a Ciência e a Tecnologia

Funding programme

Concurso de avaliação no âmbito do Programa Plurianual de Financiamento de Unidades de I&D (2017/2018) - Financiamento Programático

Funding Award Number

UIDP/00408/2020

ID