| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 3.41 MB | Adobe PDF |
Orientador(es)
Resumo(s)
Transformers have played a significant role in Natural Language Processing (NLP) since 2017,
enabling significant advancements in applications like machine translation and text generation.
Despite their success, they face challenges such as high computational costs and environmental
impacts due to energy consumption.
Recent research has focused on the development of more compact and efficient models operable on resource-limited devices without reliance on cloud infrastructures. Optimizing transformers
seeks to balance performance with computational resources.
The fundamental structure of transformers remains largely unchanged, with understanding
based on empirical observations, leaving theoretical gaps. Studies suggest that components considered essential can be removed without compromising performance, indicating potential for
reevaluating transformer components.
This study aims to analyze the self-attention mechanism theoretically and empirically, examining existing optimizations and evaluating alternatives for improvements. Five modifications were
designed: Simple Self-Attention (SSA), Layered Self-Attention (LSA), Variable Self-Attention
(VSA), Simple Layered Self-Attention (SLSA), and Variable Layered Self-Attention (VLSA).
Implementation involved exploratory and confirmatory phases. The exploratory phase altered
the self-attention mechanism and extensively tested promising alternatives using the nanoGPT
repository. The confirmatory phase fine-tuned the best-performing mechanisms, evaluating modified versions alongside the original self-attention as a baseline.
Results indicated that Variable Layered Self-Attention (VLSA) models, especially with higher
k values, outperformed the standard self-attention mechanism, achieving lower validation losses
and improved generalization, even with fewer training iterations.
These findings suggest that alternative attention mechanisms can enhance transformer-based
language models without extensive architectural changes, offering a practical approach to improving efficiency and accuracy.
In conclusion, this study demonstrates that the current self-attention implementation may not
be optimal, and exploring alternative mechanisms can lead to significant improvements. Future
work proposes applying these mechanisms to larger models and datasets and expanding evaluation metrics to include broader benchmarks, aiming to generalize the improvements and better
understand the benefits of the proposed attention mechanisms.
Descrição
Tese de mestrado, Informática, 2024, Universidade de Lisboa, Faculdade de Ciências
Palavras-chave
transformadores mecanismos de auto-atenção processamento de linguagem natural modelos de linguagem otimização de desempenho Teses de mestrado - 2024
