Hierarchical Clustering Based on a Two-Type Branching Process Model: A Simulation Study

Varmann, Lauri

Publicação

Hierarchical Clustering Based on a Two-Type Branching Process Model: A Simulation Study

2022Dissertação de mestrado

datacite.subject.fos	Departamento de Estatística e Investigação Operacional	pt_PT
dc.contributor.advisor	Nunes, Maria Helena Mouriño Silva, 1969-
dc.contributor.author	Varmann, Lauri
dc.date.accessioned	2022-08-31T14:29:15Z
dc.date.available	2024-05-30T00:31:07Z
dc.date.issued	2022
dc.date.submitted	2022
dc.description	Tese de mestrado, Bioestatística, 2022, Universidade de Lisboa, Faculdade de Ciências	pt_PT
dc.description.abstract	Recently there have been published several works that focus on clustering based on infectious disease data (e.g., Maugeri et al., 2020; Mahmoudi et al., 2020; Zarikas et al., 2020). These studies did not use the effective reproduction number in their clustering method, did not consider different types of infected individuals and the problem of ties in clustering was not thoroughly addressed. We develop a method to cluster regions based on infectious disease type-specific prevalence and typespecific reproduction numbers. To incorporate these two characteristics into one formula, the beginning of an epidemic is modelled by a two-type Galton-Watson branching process model. We define the model parameter as the expected number of total infections arising in a finite number of generations from one infected individual whose type is unknown. Nonparametric bootstrap is used for estimation of the model parameter. Empirical bootstrap distributions of the model parameter are then clustered using the supremum distance and variable-group hierarchical agglomerative single linkage clustering technique. By doing a simulation study, we examined how well the clusters obtained by bootstrap sampling distributions resemble the clusters obtained by using transformed multinomial distributions as a reference. Using the scaled version of the transfer distance as the performance measure, we found that the best performance was observed in scenarios where the prevalence was uniformly distributed, the sample size was 500 and two clusters were retained. Problematic ties occurred in approximately 0,5% of the simulations. The results suggest that our method performs well in some circumstances. When there is a large proportion of countries with low disease prevalence, the number of individuals sampled in each country should be increased. Besides that, if there is not an important reason to prefer to retain four clusters, then three or preferably two clusters should be retained to get better performance.	pt_PT
dc.identifier.tid	203200551	pt_PT
dc.identifier.uri	http://hdl.handle.net/10451/54256
dc.language.iso	eng	pt_PT
dc.subject	Processo de ramificação Galton-Watson	pt_PT
dc.subject	agrupamento hierárquico	pt_PT
dc.subject	bootstrap não paramétrico	pt_PT
dc.subject	simulação	pt_PT
dc.subject	distância de transferência	pt_PT
dc.subject	Teses de mestrado - 2022	pt_PT
dc.title	Hierarchical Clustering Based on a Two-Type Branching Process Model: A Simulation Study	pt_PT
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess	pt_PT
rcaap.type	masterThesis	pt_PT
thesis.degree.name	Mestrado em Bioestatística	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: TM_Lauri Varmann.pdf
Tamanho:: 901.48 KB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.2 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

FC - Dissertações de Mestrado