Logo do repositório
 
Publicação

COSMIC: A Framework for taking the census of star clusters in the Milky Way

dc.contributor.authorDias, Ariana Ferreira
dc.contributor.institutionFaculty of Sciences
dc.contributor.institutionDepartment of Informatics
dc.contributor.supervisorBarros, Márcia Cristina Afonso
dc.contributor.supervisorAlmeida, André Maria da Silva Dias Moitinho de
dc.date.accessioned2026-01-20T10:50:01Z
dc.date.available2026-01-20T10:50:01Z
dc.date.issued2025
dc.descriptionTese de mestrado, Informática, 2025, Universidade de Lisboa, Faculdade de Ciências
dc.description.abstractWith the wealth of data provided by the Gaia mission of the European Space Agency (ESA), including astrometry and photometry for approximately two billion stars, the interest in open star clusters has grown significantly, becoming a highly relevant research topic. The availability of precise Gaia data, advances in machine learning, increased computational power, and open-source software have led to a surge in cluster discoveries, often published in separate catalogues with varying levels of crossmatching rigour, often duplicating previously reported clusters. To address this challenge, we developed a framework to integrate, clean, and crossmatch multiple catalogues, producing a final compiled catalogue based on cluster memberships rather than solely on centre coordinates and radii. The system uses three interlinked databases: raw data storage, a data warehouse with cleaned and normalised data, and a final compiled catalogue. Data extraction uses NASA/ADS and CDS/VizieR APIs, and Gaia Archive queries validate member IDs, recovering approximately 97 % of stars via cone searches when Gaia IDs are missing. Crossmatching is guided by similarity metrics (Jaccard, Dice, Overlap), clustering quality measure (silhouette score), and distribution tests (Kolmogorov-Smirnov test and Jensen–Shannon divergence) for parallax, proper motions, magnitude, and BP-RP colour. A baseline dataset of manually labelled cluster pairs was used to train supervised machine learning models, including logistic regression, random forest, XGBoost, and SVM, with XGBoost performing best. The framework reduced 25577 initial clusters to 12310 unique clusters, of which 48.1% are new and 51.9% previously known, totalling 3456379 members. A web application allows querying the final catalogue and accessing original catalogue data. Future improvements include integrating additional surveys, exploring alternative machine learning models, optimising the Extraction, Transformation and Loading process, refining cluster merging rules, enhancing the determination of cluster membership probabilities and web application functionality. Overall, the framework provides a scalable and robust pipeline for consolidating open cluster catalogues, producing a curated dataset essential for galactic structure and evolution studies.en
dc.formatapplication/pdf
dc.identifier.tid204176140
dc.identifier.urihttp://hdl.handle.net/10400.5/116725
dc.language.isoeng
dc.subjectOpen Clusters
dc.subjectETL Process
dc.subjectCrossmatching
dc.subjectMachine Learning
dc.subjectFramework
dc.titleCOSMIC: A Framework for taking the census of star clusters in the Milky Wayen
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccess

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TM_Ariana_Dias.pdf
Tamanho:
21.93 MB
Formato:
Adobe Portable Document Format