| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 3.59 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Whole-Genome Sequencing (WGS) provides higher resolution than traditional typing to distinguish closely related isolates. As result, disease surveillance increasingly adopts WGS, with international agencies recommending its use in reference laboratories. However, the heterogeneity of workflows and unequal resources raise concerns about inter-laboratory result comparability and, consequently, data sharing and communication. To address these issues, this thesis project developed EvalTree, a Python-based command-line tool to compare clustering results from two typing solutions, including traditional and genome-scale approaches, assessing their congruence at all possible resolution levels. EvalTree accepts two input folders or clustering files, processes them, and produces multiple outputs, including an user-friendly HTML report. When a folder generated by ReporTree, a tool to identify genetic clusters at all possible distance thresholds, is provided as input, EvalTree enables not only the inter-pipeline clustering comparison, but also detection of stable clustering regions, cluster characterization using metadata, and assessment of outbreak signal overlap. EvalTree was validated and benchmarked using a large (2946 isolates) and diverse dataset of Salmonella enterica, showing it accurately reproduces a recently published large-scale evaluation of inter-pipeline congruence at the European level. Its running time was mainly affected by dataset diversity rather than size. To further demonstrate its applicability, EvalTree supported the implementation of the S. enterica genomic surveillance pipeline at the Portuguese National Institute of Health (INSA), by comparing its performance with that of the European Food Safety Authority (EFSA), revealing high cluster congruence and similar resolution power. In summary, EvalTree is a novel bioinformatics tool (available through conda installation) that offers a practical, flexible solution to evaluate cluster congruence between the pipelines of different laboratories, supporting inter-laboratory communication in a One Health framework. It also promotes the long-term sustainability of any pipeline by enabling informed decision-making throughout its life-cycle (e.g., evaluating software updates).
Descrição
Tese de mestrado, Bioinformática e Biologia Computacional, 2025, Universidade de Lisboa, Faculdade de Ciências
Palavras-chave
EvalTree Clustering Congruence Genomic Surveillance Whole-Genome Sequencing Outbreaks
