Repository logo
 
No Thumbnail Available
Publication

Development of a Website for Creation of Vulnerability Datasets

Use this identifier to reference this record.
Name:Description:Size:Format: 
TM_Miguel_Ferreira.pdf794.84 KBAdobe PDF Download

Abstract(s)

With the evolution of the digital era, guaranteeing the robustness and security of software has become a major concern. In order to address this subject, it is important to effectively not only detect, but also mitigate software vulnerabilities. Static Analysis Tools (SATs) present a cost-effective solution to this, being able to achieve a cheap and fast analysis, but often incur in a high percentage of false positives and negatives. Recent studies suggest that machine learning (ML) techniques could enhance the effectiveness of these tools, but this requires trustworthy and reliable datasets to train the ML models. This dissertation aims to provide a way of create the aforesaid datasets that can help with the development of ML models capable of identifying vulnerabilities in computer programs. To achieve this, we propose a novel approach to construct these datasets, which consists in collecting inputs from the crowd as a way of mitigating the false positives and negatives generated by the SATs, but at the same time leverage from their deterministic classifications. This approach is applied within the context of web vulnerabilities that appear in applications built with the PHP programming language. To facilitate crowdsourcing, we developed a user-friendly website called BugSpotting where users can classify PHP code snippets, indicating whether these are vulnerable (or not vulnerable) to a set of vulnerability classes. With the results obtained both from the crowd and the SATs, we are able to obtain a reliable and trustworthy dataset comprised of accurately classified PHP code snippets. We evaluated BugSpotting in terms of UI and UX and the results obtained were very satisfactory. Moreover, although we were not able to reach a consensus about the code snippet’s final label, we still manage to analyse the data we have collected until the moment, showing promising results.

Description

Tese de Mestrado, Engenharia Informática, 2024, Universidade de Lisboa, Faculdade de Ciências

Keywords

Vulnerabilidades em aplicações web Deteção de vulnerabilidades Análise estática Aprendizagem automática Contribuição coletiva Teses de mestrado - 2024

Pedagogical Context

Citation

Research Projects

Organizational Units

Journal Issue

Publisher

CC License