Logo do repositório
 
Publicação

Improving Machine Learning Pipeline Creation using Visual Programming and Static Analysis

datacite.subject.fosDepartamento de Informáticapt_PT
dc.contributor.advisorFonseca, Alcides Miguel Cachulo Aguiar
dc.contributor.authorDavid, João Pedro Vieira
dc.date.accessioned2022-03-25T13:48:33Z
dc.date.available2022-03-25T13:48:33Z
dc.date.issued2021
dc.date.submitted2021
dc.descriptionTese de mestrado, Engenharia Informática (Engenharia de Software), Universidade de Lisboa, Faculdade de Ciências, 2021pt_PT
dc.description.abstractML pipelines are composed of several steps that load data, clean it, process it, apply learning algorithms and produce either reports or deploy inference systems into production. In real-world scenarios, pipelines can take days, weeks, or months to train with large quantities of data. Unfortunately, current tools to design and orchestrate ML pipelines are oblivious to the semantics of each step, allowing developers to easily introduce errors when connecting two components that might not work together, either syntactically or semantically. Data scientists and engineers often find these bugs during or after the lengthy execution, which decreases their productivity. We propose a Visual Programming Language (VPL) enriched with semantic constraints regarding the behavior of each component and a verification methodology that verifies entire pipelines to detect common ML bugs that existing visual and textual programming languages do not. We evaluate this methodology on a set of six bugs taken from a data science company focused on preventing financial fraud on big data. We were able detect these data engineering and data balancing bugs, as well as detect unnecessary computation in the pipelines.pt_PT
dc.identifier.tid202934071
dc.identifier.urihttp://hdl.handle.net/10451/51973
dc.language.isoengpt_PT
dc.subjectProgramação Visualpt_PT
dc.subjectAprendizagem Automáticapt_PT
dc.subjectPipelinept_PT
dc.subjectVerificação de Tipospt_PT
dc.subjectCompiladorpt_PT
dc.subjectTeses de mestrado - 2021pt_PT
dc.titleImproving Machine Learning Pipeline Creation using Visual Programming and Static Analysispt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameTese de mestrado em Engenharia Informática (Engenharia de Software)pt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TM_João_David.pdf
Tamanho:
1.9 MB
Formato:
Adobe Portable Document Format