| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 901.48 KB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Recently there have been published several works that focus on clustering based on infectious disease
data (e.g., Maugeri et al., 2020; Mahmoudi et al., 2020; Zarikas et al., 2020). These studies did not use
the effective reproduction number in their clustering method, did not consider different types of infected
individuals and the problem of ties in clustering was not thoroughly addressed.
We develop a method to cluster regions based on infectious disease type-specific prevalence and typespecific reproduction numbers. To incorporate these two characteristics into one formula, the beginning
of an epidemic is modelled by a two-type Galton-Watson branching process model. We define the
model parameter as the expected number of total infections arising in a finite number of generations
from one infected individual whose type is unknown. Nonparametric bootstrap is used for estimation of
the model parameter. Empirical bootstrap distributions of the model parameter are then clustered using
the supremum distance and variable-group hierarchical agglomerative single linkage clustering technique.
By doing a simulation study, we examined how well the clusters obtained by bootstrap sampling distributions resemble the clusters obtained by using transformed multinomial distributions as a reference. Using
the scaled version of the transfer distance as the performance measure, we found that the best performance
was observed in scenarios where the prevalence was uniformly distributed, the sample size was 500 and
two clusters were retained. Problematic ties occurred in approximately 0,5% of the simulations.
The results suggest that our method performs well in some circumstances. When there is a large
proportion of countries with low disease prevalence, the number of individuals sampled in each country
should be increased. Besides that, if there is not an important reason to prefer to retain four clusters, then
three or preferably two clusters should be retained to get better performance.
Descrição
Tese de mestrado, Bioestatística, 2022, Universidade de Lisboa, Faculdade de Ciências
Palavras-chave
Processo de ramificação Galton-Watson agrupamento hierárquico bootstrap não paramétrico simulação distância de transferência Teses de mestrado - 2022
