Logo do repositório
 
Publicação

Exploring Causal Attention Models in Transformers for Large Language Models

datacite.subject.fosDepartamento de Informáticapt_PT
dc.contributor.advisorFalcão, André Osório e Cruz de Azerêdo, 1969-
dc.contributor.authorTerroa, João Filipe Gonçalves Vieira
dc.date.accessioned2025-01-07T10:01:42Z
dc.date.available2025-01-07T10:01:42Z
dc.date.issued2024
dc.date.submitted2024
dc.descriptionTese de mestrado, Informática, 2024, Universidade de Lisboa, Faculdade de Ciênciaspt_PT
dc.description.abstractTransformers have played a significant role in Natural Language Processing (NLP) since 2017, enabling significant advancements in applications like machine translation and text generation. Despite their success, they face challenges such as high computational costs and environmental impacts due to energy consumption. Recent research has focused on the development of more compact and efficient models operable on resource-limited devices without reliance on cloud infrastructures. Optimizing transformers seeks to balance performance with computational resources. The fundamental structure of transformers remains largely unchanged, with understanding based on empirical observations, leaving theoretical gaps. Studies suggest that components considered essential can be removed without compromising performance, indicating potential for reevaluating transformer components. This study aims to analyze the self-attention mechanism theoretically and empirically, examining existing optimizations and evaluating alternatives for improvements. Five modifications were designed: Simple Self-Attention (SSA), Layered Self-Attention (LSA), Variable Self-Attention (VSA), Simple Layered Self-Attention (SLSA), and Variable Layered Self-Attention (VLSA). Implementation involved exploratory and confirmatory phases. The exploratory phase altered the self-attention mechanism and extensively tested promising alternatives using the nanoGPT repository. The confirmatory phase fine-tuned the best-performing mechanisms, evaluating modified versions alongside the original self-attention as a baseline. Results indicated that Variable Layered Self-Attention (VLSA) models, especially with higher k values, outperformed the standard self-attention mechanism, achieving lower validation losses and improved generalization, even with fewer training iterations. These findings suggest that alternative attention mechanisms can enhance transformer-based language models without extensive architectural changes, offering a practical approach to improving efficiency and accuracy. In conclusion, this study demonstrates that the current self-attention implementation may not be optimal, and exploring alternative mechanisms can lead to significant improvements. Future work proposes applying these mechanisms to larger models and datasets and expanding evaluation metrics to include broader benchmarks, aiming to generalize the improvements and better understand the benefits of the proposed attention mechanisms.pt_PT
dc.identifier.tid203878442
dc.identifier.urihttp://hdl.handle.net/10400.5/96896
dc.language.isoengpt_PT
dc.subjecttransformadorespt_PT
dc.subjectmecanismos de auto-atençãopt_PT
dc.subjectprocessamento de linguagem naturalpt_PT
dc.subjectmodelos de linguagempt_PT
dc.subjectotimização de desempenhopt_PT
dc.subjectTeses de mestrado - 2024pt_PT
dc.titleExploring Causal Attention Models in Transformers for Large Language Modelspt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameTese de mestrado em Informáticapt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TM_João_Terroa.pdf
Tamanho:
3.41 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.2 KB
Formato:
Item-specific license agreed upon to submission
Descrição: