A carregar...
Projeto de investigação
Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa
Financiador
Autores
Publicações
Robust identification of target genes and outliers in triple-negative breast cancer data
Publication . Segaert, Pieter; Lopes, Marta B.; Casimiro, Sandra; Vinga, Susana; Rousseeuw, Peter J.
Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.
Integrative biomarker discovery in neurodegenerative diseases
Publication . Carreiro, André V.; De Mendonça, Alexandre; Carvalho, Mamede; Madeira, Sara C.
Data mining has been widely applied in biomarker discovery resulting in significant findings of different clinical and biological biomarkers. With developments in technology, from genomics to proteomics analysis, a deluge of data has become available, as well as standardized data repositories. Nonetheless, researchers are still facing important challenges in analyzing the data, especially when considering the complexity of pathways involved in biological processes and diseases. Data from single sources appear unable to explain complex processes, such as those involved in brain-related disorders, including Alzheimer's disease, Parkinson's disease and amyotrophic lateral sclerosis, thus raising the need for a more comprehensive perspective. A possible solution relies on data and model integration, where several data types are combined to provide complementary views. This in turn can result in the discovery of previously unknown biomarkers by unraveling otherwise hidden relationships between data from different sources, and/or validate such composite biomarkers in more powerful predictive models.
Prognostic models based on patient snapshots and time windows: predicting disease progression to assisted ventilation in Amyotrophic Lateral Sclerosis
Publication . Carreiro, André V.; Amaral, Pedro; Pinto, Susana; Tomás, Pedro; Carvalho, Mamede; Madeira, Sara C.
Amyotrophic Lateral Sclerosis (ALS) is a devastating disease and the most common neurodegenerative disorder of young adults. ALS patients present a rapidly progressive motor weakness. This usually leads to death in a few years by respiratory failure. The correct prediction of respiratory insufficiency is thus key for patient management. In this context, we propose an innovative approach for prognostic prediction based on patient snapshots and time windows. We first cluster temporally-related tests to obtain snapshots of the patient's condition at a given time (patient snapshots). Then we use the snapshots to predict the probability of an ALS patient to require assisted ventilation after k days from the time of clinical evaluation (time window). This probability is based on the patient's current condition, evaluated using clinical features, including functional impairment assessments and a complete set of respiratory tests. The prognostic models include three temporal windows allowing to perform short, medium and long term prognosis regarding progression to assisted ventilation. Experimental results show an area under the receiver operating characteristics curve (AUC) in the test set of approximately 79% for time windows of 90, 180 and 365 days. Creating patient snapshots using hierarchical clustering with constraints outperforms the state of the art, and the proposed prognostic model becomes the first non population-based approach for prognostic prediction in ALS. The results are promising and should enhance the current clinical practice, largely supported by non-standardized tests and clinicians' experience.
Unidades organizacionais
Descrição
Palavras-chave
Contribuidores
Financiadores
Entidade financiadora
Fundação para a Ciência e a Tecnologia
Programa de financiamento
6817 - DCRRNI ID
Número da atribuição
UID/CEC/50021/2013
