Publicação
Development of a Website for Creation of Vulnerability Datasets
| datacite.subject.fos | Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática | pt_PT |
| dc.contributor.advisor | Neves, Nuno Fuentecilla Maia Ferreira | |
| dc.contributor.advisor | Medeiros, Ibéria Vitória de Sousa | |
| dc.contributor.author | Ferreira, Miguel Pinto da Silva | |
| dc.date.accessioned | 2024-02-16T17:23:14Z | |
| dc.date.available | 2024-02-16T17:23:14Z | |
| dc.date.issued | 2024 | |
| dc.date.submitted | 2023 | |
| dc.description | Tese de Mestrado, Engenharia Informática, 2024, Universidade de Lisboa, Faculdade de Ciências | pt_PT |
| dc.description.abstract | With the evolution of the digital era, guaranteeing the robustness and security of software has become a major concern. In order to address this subject, it is important to effectively not only detect, but also mitigate software vulnerabilities. Static Analysis Tools (SATs) present a cost-effective solution to this, being able to achieve a cheap and fast analysis, but often incur in a high percentage of false positives and negatives. Recent studies suggest that machine learning (ML) techniques could enhance the effectiveness of these tools, but this requires trustworthy and reliable datasets to train the ML models. This dissertation aims to provide a way of create the aforesaid datasets that can help with the development of ML models capable of identifying vulnerabilities in computer programs. To achieve this, we propose a novel approach to construct these datasets, which consists in collecting inputs from the crowd as a way of mitigating the false positives and negatives generated by the SATs, but at the same time leverage from their deterministic classifications. This approach is applied within the context of web vulnerabilities that appear in applications built with the PHP programming language. To facilitate crowdsourcing, we developed a user-friendly website called BugSpotting where users can classify PHP code snippets, indicating whether these are vulnerable (or not vulnerable) to a set of vulnerability classes. With the results obtained both from the crowd and the SATs, we are able to obtain a reliable and trustworthy dataset comprised of accurately classified PHP code snippets. We evaluated BugSpotting in terms of UI and UX and the results obtained were very satisfactory. Moreover, although we were not able to reach a consensus about the code snippet’s final label, we still manage to analyse the data we have collected until the moment, showing promising results. | pt_PT |
| dc.identifier.tid | 203882067 | |
| dc.identifier.uri | http://hdl.handle.net/10451/62676 | |
| dc.language.iso | eng | pt_PT |
| dc.relation | LASIGE - Extreme Computing | |
| dc.subject | Vulnerabilidades em aplicações web | pt_PT |
| dc.subject | Deteção de vulnerabilidades | pt_PT |
| dc.subject | Análise estática | pt_PT |
| dc.subject | Aprendizagem automática | pt_PT |
| dc.subject | Contribuição coletiva | pt_PT |
| dc.subject | Teses de mestrado - 2024 | pt_PT |
| dc.title | Development of a Website for Creation of Vulnerability Datasets | pt_PT |
| dc.type | master thesis | |
| dspace.entity.type | Publication | |
| oaire.awardTitle | LASIGE - Extreme Computing | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F00408%2F2020/PT | |
| oaire.fundingStream | 6817 - DCRRNI ID | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| rcaap.rights | openAccess | pt_PT |
| rcaap.type | masterThesis | pt_PT |
| relation.isProjectOfPublication | b429b8f0-500f-4a0b-8e91-33e0a200ad1c | |
| relation.isProjectOfPublication.latestForDiscovery | b429b8f0-500f-4a0b-8e91-33e0a200ad1c | |
| thesis.degree.name | Mestrado em Engenharia Informática | pt_PT |
