Logo do repositório
 
A carregar...
Miniatura
Publicação

Explicitly Involving the User in a Data Cleaning Process

Utilize este identificador para referenciar este registo.

Orientador(es)

Resumo(s)

Data cleaning and Extract-Transform-Load processes are usually modeled as graphs of data transformations. These graphs typically involve a large number of data transformations, and must handle large amounts of data. The involvement of the users responsible for executing the corresponding programs over real data is important to tune data transformations and to manually correct data items that cannot be treated automatically. In this paper, we extend the notion of data cleaning graph in order to better support the user involvement in data cleaning processes. We propose that data cleaning graphs include: (i) data quality constraints to help users to identify the points of the graph and the records that need their attention and (ii) manual data repairs for representing the way users can provide the feedback required to manually clean some data items. We provide preliminary experimental results that show, for a real-world data cleaning process, the significant gains obtained with our approach in terms of the quality of the data produced and the cost incurred by users in data visualization and updating tasks.

Descrição

Reviewed by Mário Silva

Palavras-chave

Data Cleaning User feedback Data Transformation

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Licença CC