Classificação automática de locais de acidentes rodoviários em Lisboa - no âmbito dos desafios do LxDataLab

Borges, Diogo Miguel Correia

http://hdl.handle.net/10451/51820

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TM_Diogo_Borges.pdf		3.33 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Borges, Diogo Miguel Correia

Orientador(es)

Teixeira, Carlos J. C.

Resumo(s)

Os acidentes rodoviários têm um forte impacto socioeconómico nas populações e como tal é um grande objetivo preveni-los. É neste sentido que este estudo foi realizado, em colaboração com o LxData-Lab, com o objetivo de identificar locais e fatores com maior incidência de acidentes rodoviários no município de Lisboa. Para tal, foi fornecido um conjunto de dados com informações relativas a acidentes ocorridos no município, no ano de 2019. Existem diversos trabalhos já realizados nesta área que se focam na classificação de acidentes rodoviários, nomeadamente na sua gravidade. Também existem estudos com o objetivo de identificar locais ou vias mais perigosas, tanto pelo número de acidentes, como pela gravidade dos mesmos. Técnicas de aprendizagem automática foram utilizadas para tentar resolver os problemas, nomeadamente através de clustering, regras de associação e classificação. O algoritmo DBSCAN foi utilizado para identificar pontos negros, ou seja, locais onde existe um número elevado de acidentes com alguma gravidade. Um grande número de pontos negros foi identificado bem como algumas das características associadas aos mesmos, o que permite uma análise mais detalhada desses locais. Também se procurou encontrar algumas regras e padrões relevantes nos dados recorrendo-se ao FP-Growth. Tal permitiu observar quais são as características que mais estão associadas a determinados tipos de acidentes. Por fim, procurou-se classificar os acidentes tendo em conta a sua gravidade e a sua natureza utilizando algoritmos simples, interpretáveis e que tenham dado bons resultados em estudos já realizados com este objetivo. Assim sendo os quatro algoritmos escolhidos foram: árvores de decisão, Random Forest, regressão logística e Naïve Bayes. Os modelos criados a partir dos algoritmos foram comparados entre si e tentou-se retirar informação acerca das variáveis escolhidas e da importância que lhes foi atribuída. Apesar de todos os modelos terem tido dificuldades na realização das tarefas de classificação, foi possível retirar conclusões relevantes dos dados que foram colocados à disposição, das quais poderá ser interessante para estudos mais aprofundados.

Road accidents have a strong negative socio-economic impact on populations and therefore, it is a major objective to prevent them. This study was carried out, in collaboration with LxData-Lab, in order to identify places and factors with a higher incidence of road accidents in the city of Lisbon. With this aim, this municipe provide the author with a dataset with information on accidents that occurred in 2019. There are several works already carried out in this area that focus on the classification of road accidents, namely their severity. There are also studies with the purpose of identifying more dangerous places or roads both by the number of accidents and their severity. Machine learning techniques were used to try to solve the problems, namely through clustering, association rules, and classification. The first goal of the present work was to identify black spots, that is, places where there is a high number of accidents and with some severity. A large number of black spots were identified with the DBSCAN algorithm, as well as some of the characteristics associated with them, which allows a more detailed analysis of these locations. Another goal was to identify relevant rules and patterns in the data using FP-Growth, which allowed us to observe which characteristics are most associated with certain types of accidents. Finally, the accidents were classified according to their severity and nature, using algorithms that are simple, interpretable, and that have given good results in studies already carried out with this objective. Therefore, the four chosen algorithms were: decision trees, Random Forest, logistic regression, and Naïve Bayes. The models created from the algorithms were compared with each other, then information about the chosen variables and their importance were extracted. Although all the models had difficulties performing the classification tasks, it was possible to draw relevant conclusions from the data that were made available, which may be interesting for further studies.

Descrição

Tese de Mestrado, Ciência de Dados, 2021, Universidade de Lisboa, Faculdade de Ciências

Palavras-chave

Acidente rodoviário Ponto negro DBSCAN FP-Growth Classificação de acidentes Teses de mestrado - 2021

URI

http://hdl.handle.net/10451/51820

Coleções

FC-DI - Master Thesis (dissertation)

Ver registo completo