Logo do repositório
 
A carregar...
Logótipo do projeto
Projeto de investigação

Resource Aware Programming

Autores

Publicações

Optimization of feature learning through grammar-guided genetic programming
Publication . Ingelse, Leon Kornelis; Fonseca, Alcides Aguiar
Machine Learning (ML) is becoming more prominent in daily life. A key aspect in ML is Feature Engineering (FE), which can entail a long and tedious process. Therefore, the automation of FE, known as Feature Learning (FL), can be highly rewarding. FL methods need not only have high prediction performance, but should also produce interpretable methods. Many current high-performance ML methods that can be considered FL methods, such as Neural Networks and PCA, lack interpretability. A popular ML used for FL that produces interpretable models is Genetic Programming (GP), with multiple successful applications and methods like M3GP. In this thesis, I present two new GP-based FL methods, namely M3GP with Domain Knowledge (DK-M3GP) and DK-M3GP with feature Aggregation (DKA-M3GP). Both use grammars to enhance the search process of GP, in a method called GrammarGuided GP (GGGP). DK-M3GP uses grammars to incorporate domain knowledge in the search process. In particular, I use DK-M3GP to define what solutions are humanly valid, in this case by disallowing operating arithmetically on categorical features. For example, the multiplication of the postal code of an individual with their wage is not deemed sensible and thus disallowed. In DKA-M3GP, I use grammars to include a feature aggregation method in the search space. This method can be used for time series and panel datasets, to aggregate the target value of historic data based on a known feature value of a new data point. For example, if I want to predict the number of bikes seen daily in a city, it is interesting to know how many were seen on average in the last week. Furthermore, DKA-M3GP allows for filtering the aggregation based on some other feature value. For example, we can include the average number of bikes seen on past Sundays. I evaluated my FL methods for two ML problems in two environments. First, I evaluate the independent FL process, and, after that, I evaluate the FL steps within four ML pipelines. Independently, DK-M3GP shows a two-fold advantage over normal M3GP; better interpretability in general, and higher prediction performance for one problem. DKA-M3GP has a much better prediction performance than M3GP for one problem, and a slightly better one for the other. Furthermore, within the ML pipelines it performed well in one of two problems. Overall, my methods show potential for FL. Both methods are implemented in Genetic Engine an individual-representation-independent GGGP framework, created as part of this thesis. Genetic Engine is completely implemented in Python and shows competing performance with the mature GGGP framework PonyGE2.

Unidades organizacionais

Descrição

Palavras-chave

Contribuidores

Financiadores

Entidade financiadora

Fundação para a Ciência e a Tecnologia

Programa de financiamento

3599-PPCDT

Número da atribuição

EXPL/CCI-COM/1306/2021

ID