Loading...
Research Project
Untitled
Funder
Authors
Publications
Learning prognostic models using a mixture of biclustering and triclustering: predicting the need for non-invasive ventilation in amyotrophic lateral sclerosis
Publication . Soares, Diogo F.; Henriques, Rui; Gromicho, Marta; Carvalho, Mamede; Madeira, Sara C.
Longitudinal cohort studies to study disease progression generally combine temporal features produced under periodic assessments (clinical follow-up) with static features associated with single-time assessments, genetic, psychophysiological, and demographic profiles. Subspace clustering, including biclustering and triclustering stances, enables the discovery of local and discriminative patterns from such multidimensional cohort data. These patterns, highly interpretable, are relevant to identifying groups of patients with similar traits or progression patterns. Despite their potential, their use for improving predictive tasks in clinical domains remains unexplored. In this work, we propose to learn predictive models from static and temporal data using discriminative patterns, obtained via biclustering and triclustering, as features within a state-of-the-art classifier, thus enhancing model interpretation. triCluster is extended to find time-contiguous triclusters in temporal data (temporal patterns) and a biclustering algorithm to discover coherent patterns in static data. The transformed data space, composed of bicluster and tricluster features, capture local and cross-variable associations with discriminative power, yielding unique statistical properties of interest. As a case study, we applied our methodology to follow-up data from Portuguese patients with Amyotrophic Lateral Sclerosis (ALS) to predict the need for non-invasive ventilation (NIV) since the last appointment. The results showed that, in general, our methodology outperformed baseline results using the original features. Furthermore, the bicluster/tricluster-based patterns used by the classifier can be used by clinicians to understand the models by highlighting relevant prognostic patterns.
Triclustering three-way temporal and heterogeneous data
Publication . Soares, Diogo F.; Madeira, Sara; Henriques, Rui
Triclustering, targeting the discovery of coherent subspaces within three-way data, is becoming increasingly
relevant in data science, especially for pattern discovery and knowledge acquisition from complex
datasets in the biomedical field. This technique can reveal hidden patterns such as putative regulatory
modules, disease progression profiles, and individuals with coherent behaviors. When applied to labeled
data, triclustering aids in class differentiation and supports real-world decision-making. However,
learning from 3W biomedical data is typically challenged by the rich temporal and heterogeneous nature,
having mixed-type features and different structure compositions. In response to these challenges,
this thesis establishes the foundations for pattern-centric 3W data analysis, focusing on triclustering for
temporal and heterogeneous three-way data, targeting both descriptive and predictive tasks. In this context,
this thesis includes six major contributions. It provides a literature review and comparative study
of current triclustering algorithms for temporal data, highlighting the strengths and weaknesses of existing
methods. It presents new tools to support the development and assessment of pattern discovery
approaches in descriptive and predictive contexts, including a new data generator capable of creating
heterogeneous three-way datasets with annotated triclustering solutions and benchmark datasets for comparative
evaluation. It proposes a novel approach to capture time-contiguous triclusters, enhancing the
search for temporal coherence. It introduces a new triclustering approach able to handle heterogeneous
data by applying sequential pattern mining principles to identify relevant patterns and derive triclusters
capturing temporal data dynamics. Additionally, it presents a new method for learning pattern-centric
predictors. Finally, it proposes an extension and integration of principles for learning from static and
temporal data structures. The developed methods were comprehensively validated in concrete real-world
clinical scenarios, showing promising results concerning two progressive diseases. They were used to
predict clinically relevant endpoints and identify disease-specific progression patterns, supporting medical
decisions and identifying significant patient profiles.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
3599-PPCDT
Funding Award Number
PTDC/CCI-CIF/4613/2020