| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.58 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Advances in DNA sequencing technologies, particularly next-generation sequencing (NGS), have significantly increased the speed and scale of genomic data production. However, the vast amounts of generated data pose considerable privacy risks if not properly safeguarded. Genomes are unique, largely stable throughout life, and reveal sensitive information not only about individuals but also about their relatives. As such, genomic data require strong protection, yet existing measures must also preserve the high performance necessary for sequencing workflows. This thesis contributes by systematically mapping privacy-sensitive regions of the human genome and deriving location-based quantitative metrics designed to support selective, post-alignment protection strategies, strengthening privacy while helping preserve workflow efficiency. By analyzing genomic elements that have been exploited in documented privacy attacks, such as Tandem Repeats (TRs), Disease-related Genes (DGs), and Genomic Variants (GVs), and correlating them with their locations in cytobands, we construct density maps that highlight regions of higher sensitivity. These maps are then combined with re-identification and attribute disclosure attack scenarios to derive privacy sensitivity values for different genomic regions. The resulting maps and metrics provide a fine-grained view of genomic privacy risk, enabling selective protection measures to be applied only where most needed. Empirically, we find that the TR densities cluster at centromeres (≈ 70–100% vs. ≈ 1—10% elsewhere), a Y-STR surname-inference map isolates five chrY cytobands (especially q11.21) as hotspots, and an Alzheimer’s membership-inference map peaks on a few cytobands in chromosomes 6 and 19. This approach paves the way for integrating privacy sensitivity mapping into privacy-aware genomic workflows, thereby elevating privacy protection without compromising sequencing efficiency or limiting data sharing.
Descrição
Tese de Mestrado, Informática, 2025, Universidade de Lisboa, Faculdade de Ciências
Palavras-chave
Genomic data privacy Genomics Genomic data Privacy-sensitivity
