Logo do repositório
 
Publicação

Bioinformatics toolbox for comparative clustering evaluation of Whole-Genome Sequencing (WGS) pipelines for bacteria routine surveillance

dc.contributor.authorPereira, Joana Vanessa Gomes
dc.contributor.institutionFaculty of Sciences
dc.contributor.supervisorMixão, Verónica de Pinho
dc.contributor.supervisorCouto, Francisco José Moreira
dc.date.accessioned2026-02-24T15:20:02Z
dc.date.available2026-02-24T15:20:02Z
dc.date.issued2025
dc.descriptionTese de mestrado, Bioinformática e Biologia Computacional, 2025, Universidade de Lisboa, Faculdade de Ciências
dc.description.abstractWhole-Genome Sequencing (WGS) provides higher resolution than traditional typing to distinguish closely related isolates. As result, disease surveillance increasingly adopts WGS, with international agencies recommending its use in reference laboratories. However, the heterogeneity of workflows and unequal resources raise concerns about inter-laboratory result comparability and, consequently, data sharing and communication. To address these issues, this thesis project developed EvalTree, a Python-based command-line tool to compare clustering results from two typing solutions, including traditional and genome-scale approaches, assessing their congruence at all possible resolution levels. EvalTree accepts two input folders or clustering files, processes them, and produces multiple outputs, including an user-friendly HTML report. When a folder generated by ReporTree, a tool to identify genetic clusters at all possible distance thresholds, is provided as input, EvalTree enables not only the inter-pipeline clustering comparison, but also detection of stable clustering regions, cluster characterization using metadata, and assessment of outbreak signal overlap. EvalTree was validated and benchmarked using a large (2946 isolates) and diverse dataset of Salmonella enterica, showing it accurately reproduces a recently published large-scale evaluation of inter-pipeline congruence at the European level. Its running time was mainly affected by dataset diversity rather than size. To further demonstrate its applicability, EvalTree supported the implementation of the S. enterica genomic surveillance pipeline at the Portuguese National Institute of Health (INSA), by comparing its performance with that of the European Food Safety Authority (EFSA), revealing high cluster congruence and similar resolution power. In summary, EvalTree is a novel bioinformatics tool (available through conda installation) that offers a practical, flexible solution to evaluate cluster congruence between the pipelines of different laboratories, supporting inter-laboratory communication in a One Health framework. It also promotes the long-term sustainability of any pipeline by enabling informed decision-making throughout its life-cycle (e.g., evaluating software updates).en
dc.formatapplication/pdf
dc.identifier.tid204173167
dc.identifier.urihttp://hdl.handle.net/10400.5/117273
dc.language.isoeng
dc.subjectEvalTree
dc.subjectClustering Congruence
dc.subjectGenomic Surveillance
dc.subjectWhole-Genome Sequencing
dc.subjectOutbreaks
dc.titleBioinformatics toolbox for comparative clustering evaluation of Whole-Genome Sequencing (WGS) pipelines for bacteria routine surveillanceen
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccess

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TM_Joana_Pereira.pdf
Tamanho:
3.59 MB
Formato:
Adobe Portable Document Format