Logo do repositório
 
Publicação

Ensemble outlier detection and gene selection in triple-negative breast cancer data

dc.contributor.authorLopes, Marta B.
dc.contributor.authorVeríssimo, André
dc.contributor.authorCarrasquinha, Eunice
dc.contributor.authorCasimiro, Sandra
dc.contributor.authorBeerenwinkel, Niko
dc.contributor.authorVinga, Susana
dc.date.accessioned2022-09-23T14:02:59Z
dc.date.available2022-09-23T14:02:59Z
dc.date.issued2018
dc.description© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.pt_PT
dc.description.abstractBackground: Learning accurate models from 'omics data is bringing many challenges due to their inherent high-dimensionality, e.g. the number of gene expression variables, and comparatively lower sample sizes, which leads to ill-posed inverse problems. Furthermore, the presence of outliers, either experimental errors or interesting abnormal clinical cases, may severely hamper a correct classification of patients and the identification of reliable biomarkers for a particular disease. We propose to address this problem through an ensemble classification setting based on distinct feature selection and modeling strategies, including logistic regression with elastic net regularization, Sparse Partial Least Squares - Discriminant Analysis (SPLS-DA) and Sparse Generalized PLS (SGPLS), coupled with an evaluation of the individuals' outlierness based on the Cook's distance. The consensus is achieved with the Rank Product statistics corrected for multiple testing, which gives a final list of sorted observations by their outlierness level. Results: We applied this strategy for the classification of Triple-Negative Breast Cancer (TNBC) RNA-Seq and clinical data from the Cancer Genome Atlas (TCGA). The detected 24 outliers were identified as putative mislabeled samples, corresponding to individuals with discrepant clinical labels for the HER2 receptor, but also individuals with abnormal expression values of ER, PR and HER2, contradictory with the corresponding clinical labels, which may invalidate the initial TNBC label. Moreover, the model consensus approach leads to the selection of a set of genes that may be linked to the disease. These results are robust to a resampling approach, either by selecting a subset of patients or a subset of genes, with a significant overlap of the outlier patients identified. Conclusions: The proposed ensemble outlier detection approach constitutes a robust procedure to identify abnormal cases and consensus covariates, which may improve biomarker selection for precision medicine applications. The method can also be easily extended to other regression models and datasets.pt_PT
dc.description.sponsorshipThis work was partially supported by the European Union Horizon 2020 research and innovation program under grant agreement No. 633974 (SOUND project), and the Portuguese Foundation for Science & Technology (FCT), through IDMEC, under LAETA, projects UID/EMS/50022/2013 and PERSEIDS (PTDC/EMS-SIS/0642/2014). André Veríssimo acknowledges support from FCT (SFRH/BD/97415/2013). Susana Vinga acknowledges support by Program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH).pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationBMC Bioinformatics. 2018 May 4;19(1):168pt_PT
dc.identifier.doi10.1186/s12859-018-2149-7pt_PT
dc.identifier.eissn1471-2105
dc.identifier.urihttp://hdl.handle.net/10451/54572
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringer Naturept_PT
dc.relationStatistical multi-Omics UNDerstanding of Patient Samples
dc.relationAssociate Laboratory of Energy, Transports and Aeronautics
dc.relationPERSEIDS - Personalizing cancer therapy through integrated modeling and decision
dc.relationAdaptive clinical decision support system for personalized medicine
dc.relationIntegrative computational physiology
dc.relation.publisherversionhttps://bmcbioinformatics.biomedcentral.com/pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectEnsemble modelingpt_PT
dc.subjectHigh-dimensionalitypt_PT
dc.subjectOutlier detectionpt_PT
dc.subjectRank product testpt_PT
dc.subjectTriple-negative breast cancerpt_PT
dc.titleEnsemble outlier detection and gene selection in triple-negative breast cancer datapt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.awardNumber633974
oaire.awardNumberUID/EMS/50022/2013
oaire.awardNumberPTDC/EMS-SIS/0642/2014
oaire.awardNumberSFRH/BD/97415/2013
oaire.awardNumberIF/00653/2012/CP0161/CT0002
oaire.awardTitleStatistical multi-Omics UNDerstanding of Patient Samples
oaire.awardTitleAssociate Laboratory of Energy, Transports and Aeronautics
oaire.awardTitlePERSEIDS - Personalizing cancer therapy through integrated modeling and decision
oaire.awardTitleAdaptive clinical decision support system for personalized medicine
oaire.awardTitleIntegrative computational physiology
oaire.awardURIinfo:eu-repo/grantAgreement/EC/H2020/633974/EU
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UID%2FEMS%2F50022%2F2013/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/3599-PPCDT/PTDC%2FEMS-SIS%2F0642%2F2014/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/OE/SFRH%2FBD%2F97415%2F2013/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/Investigador FCT/IF%2F00653%2F2012%2FCP0161%2FCT0002/PT
oaire.citation.issue1pt_PT
oaire.citation.titleBMC Bioinformaticspt_PT
oaire.citation.volume19pt_PT
oaire.fundingStreamH2020
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream3599-PPCDT
oaire.fundingStreamOE
oaire.fundingStreamInvestigador FCT
person.familyNameB. Lopes
person.familyNameVeríssimo
person.familyNameGanhão Carrasquinha Trigueirão
person.familyNameCara de Anjo Casimiro
person.familyNameVinga
person.givenNameMarta
person.givenNameAndré
person.givenNameEunice Isabel
person.givenNameSandra Cristina
person.givenNameSusana
person.identifier1195979
person.identifierD-3834-2018
person.identifierB-8450-2008
person.identifier.ciencia-idFD16-A07F-7B12
person.identifier.ciencia-id311E-7D8E-6A69
person.identifier.ciencia-id0F12-5181-0B22
person.identifier.ciencia-id9713-F74D-4805
person.identifier.orcid0000-0002-4135-1857
person.identifier.orcid0000-0002-2212-339X
person.identifier.orcid0000-0003-3465-4347
person.identifier.orcid0000-0002-6917-4477
person.identifier.orcid0000-0002-1954-5487
person.identifier.ridF-5378-2011
person.identifier.ridH-7446-2017
person.identifier.scopus-author-id55489480400
person.identifier.scopus-author-id56461291700
person.identifier.scopus-author-id14043403400
person.identifier.scopus-author-id55893670600
project.funder.identifierhttp://doi.org/10.13039/501100008530
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameEuropean Commission
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublication5a511048-f9f4-471c-93d0-5c637df97ac8
relation.isAuthorOfPublication3004199b-b55d-41bb-b1e5-f7aefadf4acd
relation.isAuthorOfPublicationf1ddef46-d7dc-4b47-8c95-8c6c8c9cca04
relation.isAuthorOfPublicationfd665de1-a7e6-4638-94c3-ecc645f84607
relation.isAuthorOfPublicationd7e30d0b-702b-4588-8bb1-5dcd3ee6f6e7
relation.isAuthorOfPublication.latestForDiscovery3004199b-b55d-41bb-b1e5-f7aefadf4acd
relation.isProjectOfPublication9e22a12f-e560-41c4-8be3-43b3b3d03b85
relation.isProjectOfPublicationbb9aa881-57b9-4183-bc41-603fce6552db
relation.isProjectOfPublicationd34c78e2-6886-4f1d-84af-e109874bca55
relation.isProjectOfPublication26b45fd9-6acc-441f-995f-c5a2190ec79d
relation.isProjectOfPublication5c381b3f-c270-4deb-b7b0-b2e1a621262a
relation.isProjectOfPublication.latestForDiscoveryd34c78e2-6886-4f1d-84af-e109874bca55

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
Ensemble_outlier.pdf
Tamanho:
1.23 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.2 KB
Formato:
Item-specific license agreed upon to submission
Descrição: