Logo do repositório
 
A carregar...
Miniatura
Publicação

Implementation of a Data Lake in a Microservices Architecture

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TM_Ricardo_Macedo.pdf1.7 MBAdobe PDF Ver/Abrir

Resumo(s)

As the world entered the big data era, companies faced lots of challenges due to the increase in volume, velocity, and variety of data. Traditional relational databases could no longer be an option to tackle this problem. Acknowledging the pressing demand for a revolutionary solution, the concept of the data lake came to fruition in the early 2010s. This thesis presents a solution for the integration of a data lake into microservices-driven projects, aiming to equip a company with insights and strategies to harness the power of data lakes in modern data management paradigms. The objectives of this thesis encompass a comprehensive understanding of big data challenges, a detailed analysis of evolving data lake trends and technologies, a comparison of data lakes and traditional data warehouses, and the design and implementation of a tailored data lake solution. It begins by exploring the concepts of big data, data lake, and microservices architecture before delving into a literature review of current trends in data lake architectures and technologies. Through a comparative analysis, this study highlights the advantages of data lakes compared to other solutions, such as their ability to handle diverse data types, scalability, and performance, while addressing challenges related to data governance. The core of this thesis revolves around designing and implementing a data lake solution, meticulously crafted to seamlessly integrate into the company’s microservices projects. This solution is designed using an on-premises architecture and incorporates technologies such as Apache Spark, Hadoop, Apache Superset, and the Spring framework. Furthermore, a case study is presented highlighting the data lake’s processing and reporting capabilities.

Descrição

Tese de mestrado, Engenharia Informática, 2024, Universidade de Lisboa, Faculdade de Ciências

Palavras-chave

Data lake Big Data Microsserviços Spark Hadoop Teses de mestrado - 2024

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Licença CC