Logo do repositório
 
A carregar...
Logótipo do projeto
Projeto de investigação

A DATA MINING APPROACH TO STUDY DISEASE PRESENTATION AND PROGRESSION PATTERNS IN PPA AND MCI.

Autores

Publicações

Targeting the uncertainty of predictions at patient-level using an ensemble of classifiers coupled with calibration methods, Venn-ABERS, and Conformal Predictors: a case study in AD
Publication . Pereira, Telma; Cardoso, Sandra; Guerreiro, Manuela; De Mendonça, Alexandre; Madeira, Sara C.
Despite being able to make accurate predictions, most existing prognostic models lack a proper indication about the uncertainty of each prediction, that is, the risk of prediction error for individual patients. This hampers their translation to primary care settings through decision support systems. To address this problem, we studied different methods for transforming classifiers into probabilistic/confidence-based predictors (here called uncertainty methods), where predictions are complemented with probability estimates/confidence regions reflecting their uncertainty (uncertainty estimates). We tested several uncertainty methods: two well-known calibration methods (Platt Scaling and Isotonic Regression), Conformal Predictors, and Venn-ABERS predictors. We evaluated whether these methods produce valid predictions, where uncertainty estimates reflect the ground truth probabilities. Furthermore, we assessed the proportion of valid predictions made at high-certainty thresholds (predictions with uncertainty measures above a given threshold) since this impacts their usefulness in clinical decisions. Finally, we proposed an ensemble-based approach where predictions from multiple pairs of (classifier, uncertainty method) are combined to predict whether a given MCI patient will convert to AD. This ensemble should putatively provide predictions for a larger number of patients while releasing users from deciding which pair of (classifier, uncertainty method) is more appropriate for data under study. The analysis was performed with a Portuguese cohort (CCC) of around 400 patients and validated in the publicly available ADNI cohort. Despite our focus on MCI to AD prognosis, the proposed approach can be applied to other diseases and prognostic problems.
Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer’s disease: a feature selection ensemble combining stability and predictability
Publication . Pereira, Telma; Ferreira, Francisco L.; Cardoso, Sandra; Silva, Dina; De Mendonça, Alexandre; Guerreiro, Manuela; Madeira, Sara C.
Background: Predicting progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) is an utmost open issue in AD-related research. Neuropsychological assessment has proven to be useful in identifying MCI patients who are likely to convert to dementia. However, the large battery of neuropsychological tests (NPTs) performed in clinical practice and the limited number of training examples are challenge to machine learning when learning prognostic models. In this context, it is paramount to pursue approaches that effectively seek for reduced sets of relevant features. Subsets of NPTs from which prognostic models can be learnt should not only be good predictors, but also stable, promoting generalizable and explainable models. Methods: We propose a feature selection (FS) ensemble combining stability and predictability to choose the most relevant NPTs for prognostic prediction in AD. First, we combine the outcome of multiple (filter and embedded) FS methods. Then, we use a wrapper-based approach optimizing both stability and predictability to compute the number of selected features. We use two large prospective studies (ADNI and the Portuguese Cognitive Complaints Cohort, CCC) to evaluate the approach and assess the predictive value of a large number of NPTs. Results: The best subsets of features include approximately 30 and 20 (from the original 79 and 40) features, for ADNI and CCC data, respectively, yielding stability above 0.89 and 0.95, and AUC above 0.87 and 0.82. Most NPTs learnt using the proposed feature selection ensemble have been identified in the literature as strong predictors of conversion from MCI to AD. Conclusions: The FS ensemble approach was able to 1) identify subsets of stable and relevant predictors from a consensus of multiple FS methods using baseline NPTs and 2) learn reliable prognostic models of conversion from MCI to AD using these subsets of features. The machine learning models learnt from these features outperformed the models trained without FS and achieved competitive results when compared to commonly used FS algorithms. Furthermore, the selected features are derived from a consensus of methods thus being more robust, while releasing users from choosing the most appropriate FS method to be used in their classification task.
Classification of primary progressive aphasia: do unsupervised data mining methods support a logopenic variant?
Publication . Maruta, Carolina; Pereira, Telma; Madeira, Sara C.; De Mendonça, Alexandre; Guerreiro, Manuela
Our objective was to test whether data mining techniques, through an unsupervised learning approach, support the three-group diagnostic model of primary progressive aphasia (PPA) versus the existence of two main/classic groups. A series of 155 PPA patients observed in a clinical setting and subjected to at least one neuropsychological/language assessment was studied. Several demographic, clinical and neuropsychological attributes, grouped in distinct sets, were introduced in unsupervised learning methods (Expectation Maximization, K-Means, X-Means, Hierarchical Clustering and Consensus Clustering). Results demonstrated that unsupervised learning methods revealed two main groups consistently obtained throughout all the analyses (with different algorithms and different set of attributes). One group included most of the agrammatic/non-fluent and some logopenic cases while the other was mainly composed of semantic and logopenic cases. Clustering the patients in a larger number of groups (k > 2) revealed some clusters composed mostly of non-fluent or of semantic cases. However, we could not evidence any group chiefly composed of logopenic cases. In conclusion, unsupervised data mining approaches do not support a clear distinction of logopenic PPA as a separate variant.

Unidades organizacionais

Descrição

Palavras-chave

Contribuidores

Financiadores

Entidade financiadora

Fundação para a Ciência e a Tecnologia

Programa de financiamento

OE

Número da atribuição

SFRH/BD/95846/2013

ID