Logo do repositório
 
A carregar...
Miniatura
Publicação

Understanding a data-based model: Insights from extreme misclassifications of machine-assisted selection of AGN

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TM_Bruno_Carrazedo.pdf5.82 MBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

Active Galactic Nuclei (AGNs) and star-forming galaxies (SFGs) are key populations for understanding galaxy evolution, yet their separation is challenging due to overlapping photometry and spectroscopy. In this work, we evaluate a supervised Machine Learning (ML) classifier that distinguishes AGNs from SFGs using multiwavelength data from Panoramic Survey Telescope and Rapid Response System Data Release 1 (Pan-STARRS DR1), Wide-field Infrared Survey Explorer (WISE), Two-Micron All-Sky Survey (2MASS) and spectroscopic labels from Sloan Digital Sky Survey Data Release 16 (SDSS-DR16) and the Hobby-Eberly Telescope Dark Energy Experiment Spring Field (HETDEX) survey. We analyze both global classification metrics and extreme misprediction cases, focusing on highconfidence predictions where the model disagrees with survey labels. Color–color diagrams and spectral inspection are used to assess whether these mispredictions reflect model limitations or misassigned survey labels. Our results show that the classifier achieves high overall accuracy, with AGNs typically predicted with > 99% confidence across most redshift intervals. Misclassifications are concentrated in the SFGs, lowering the Matthews Correlation Coefficient (MCC) in regions where SFGs are statistically negligible. Spectral analysis of extreme cases reveals that a fraction of sources labeled as AGNs but predicted as SFGs lack characteristic AGN features, supporting the model’s prediction. Conversely, some sources labeled as SFGs but predicted as AGNs exhibit broad emission lines and blue continua, consistent with AGN profiles. These findings suggest that the ML model can be used to improve survey classifications, particularly when redshift assignments are uncertain. We conclude that integrating ML with optical / Mid-Infrared (MIR) surveys enhances source classification and helps identify mislabeled objects. Future work should include feature importance analysis and expanded confidence intervals to further interpret the model’s decision-making and refine its application to large extragalactic samples.

Descrição

Tese de Mestrado, Física (Astrofísica e Cosmologia), 2025, Universidade de Lisboa, Faculdade de Ciências

Palavras-chave

Active galactic nuclei Photometry Machine learning Data analysis

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo