Understanding a data-based model: Insights from extreme misclassifications of machine-assisted selection of AGN

Carrazedo, Bruno Manuel Teixeira

Publicação

Understanding a data-based model: Insights from extreme misclassifications of machine-assisted selection of AGN

2025Dissertação de mestrado

dc.contributor.author	Carrazedo, Bruno Manuel Teixeira
dc.contributor.institution	Faculty of Sciences
dc.contributor.institution	Department of Physics
dc.contributor.supervisor	Troncoso, Israel Matute
dc.contributor.supervisor	Carvajal, Rodrigo
dc.date.accessioned	2026-02-16T15:05:01Z
dc.date.available	2026-02-16T15:05:01Z
dc.date.issued	2025
dc.description	Tese de Mestrado, Física (Astrofísica e Cosmologia), 2025, Universidade de Lisboa, Faculdade de Ciências
dc.description.abstract	Active Galactic Nuclei (AGNs) and star-forming galaxies (SFGs) are key populations for understanding galaxy evolution, yet their separation is challenging due to overlapping photometry and spectroscopy. In this work, we evaluate a supervised Machine Learning (ML) classifier that distinguishes AGNs from SFGs using multiwavelength data from Panoramic Survey Telescope and Rapid Response System Data Release 1 (Pan-STARRS DR1), Wide-field Infrared Survey Explorer (WISE), Two-Micron All-Sky Survey (2MASS) and spectroscopic labels from Sloan Digital Sky Survey Data Release 16 (SDSS-DR16) and the Hobby-Eberly Telescope Dark Energy Experiment Spring Field (HETDEX) survey. We analyze both global classification metrics and extreme misprediction cases, focusing on highconfidence predictions where the model disagrees with survey labels. Color–color diagrams and spectral inspection are used to assess whether these mispredictions reflect model limitations or misassigned survey labels. Our results show that the classifier achieves high overall accuracy, with AGNs typically predicted with > 99% confidence across most redshift intervals. Misclassifications are concentrated in the SFGs, lowering the Matthews Correlation Coefficient (MCC) in regions where SFGs are statistically negligible. Spectral analysis of extreme cases reveals that a fraction of sources labeled as AGNs but predicted as SFGs lack characteristic AGN features, supporting the model’s prediction. Conversely, some sources labeled as SFGs but predicted as AGNs exhibit broad emission lines and blue continua, consistent with AGN profiles. These findings suggest that the ML model can be used to improve survey classifications, particularly when redshift assignments are uncertain. We conclude that integrating ML with optical / Mid-Infrared (MIR) surveys enhances source classification and helps identify mislabeled objects. Future work should include feature importance analysis and expanded confidence intervals to further interpret the model’s decision-making and refine its application to large extragalactic samples.	en
dc.format	application/pdf
dc.identifier.tid	204175186
dc.identifier.uri	http://hdl.handle.net/10400.5/117107
dc.language.iso	eng
dc.subject	Active galactic nuclei
dc.subject	Photometry
dc.subject	Machine learning
dc.subject	Data analysis
dc.title	Understanding a data-based model: Insights from extreme misclassifications of machine-assisted selection of AGN	en
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: TM_Bruno_Carrazedo.pdf
Tamanho:: 5.82 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Coleções

Pure > Dspace
PURE > Dspace - Faculdade de Ciências