Exploring Causal Attention Models in Transformers for Large Language Models

Terroa, João Filipe Gonçalves Vieira

Publicação

Exploring Causal Attention Models in Transformers for Large Language Models

2024Dissertação de mestrado

datacite.subject.fos	Departamento de Informática	pt_PT
dc.contributor.advisor	Falcão, André Osório e Cruz de Azerêdo, 1969-
dc.contributor.author	Terroa, João Filipe Gonçalves Vieira
dc.date.accessioned	2025-01-07T10:01:42Z
dc.date.available	2025-01-07T10:01:42Z
dc.date.issued	2024
dc.date.submitted	2024
dc.description	Tese de mestrado, Informática, 2024, Universidade de Lisboa, Faculdade de Ciências	pt_PT
dc.description.abstract	Transformers have played a significant role in Natural Language Processing (NLP) since 2017, enabling significant advancements in applications like machine translation and text generation. Despite their success, they face challenges such as high computational costs and environmental impacts due to energy consumption. Recent research has focused on the development of more compact and efficient models operable on resource-limited devices without reliance on cloud infrastructures. Optimizing transformers seeks to balance performance with computational resources. The fundamental structure of transformers remains largely unchanged, with understanding based on empirical observations, leaving theoretical gaps. Studies suggest that components considered essential can be removed without compromising performance, indicating potential for reevaluating transformer components. This study aims to analyze the self-attention mechanism theoretically and empirically, examining existing optimizations and evaluating alternatives for improvements. Five modifications were designed: Simple Self-Attention (SSA), Layered Self-Attention (LSA), Variable Self-Attention (VSA), Simple Layered Self-Attention (SLSA), and Variable Layered Self-Attention (VLSA). Implementation involved exploratory and confirmatory phases. The exploratory phase altered the self-attention mechanism and extensively tested promising alternatives using the nanoGPT repository. The confirmatory phase fine-tuned the best-performing mechanisms, evaluating modified versions alongside the original self-attention as a baseline. Results indicated that Variable Layered Self-Attention (VLSA) models, especially with higher k values, outperformed the standard self-attention mechanism, achieving lower validation losses and improved generalization, even with fewer training iterations. These findings suggest that alternative attention mechanisms can enhance transformer-based language models without extensive architectural changes, offering a practical approach to improving efficiency and accuracy. In conclusion, this study demonstrates that the current self-attention implementation may not be optimal, and exploring alternative mechanisms can lead to significant improvements. Future work proposes applying these mechanisms to larger models and datasets and expanding evaluation metrics to include broader benchmarks, aiming to generalize the improvements and better understand the benefits of the proposed attention mechanisms.	pt_PT
dc.identifier.tid	203878442
dc.identifier.uri	http://hdl.handle.net/10400.5/96896
dc.language.iso	eng	pt_PT
dc.subject	transformadores	pt_PT
dc.subject	mecanismos de auto-atenção	pt_PT
dc.subject	processamento de linguagem natural	pt_PT
dc.subject	modelos de linguagem	pt_PT
dc.subject	otimização de desempenho	pt_PT
dc.subject	Teses de mestrado - 2024	pt_PT
dc.title	Exploring Causal Attention Models in Transformers for Large Language Models	pt_PT
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess	pt_PT
rcaap.type	masterThesis	pt_PT
thesis.degree.name	Tese de mestrado em Informática	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: TM_João_Terroa.pdf
Tamanho:: 3.41 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.2 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

FC-DI - Master Thesis (dissertation)