Logo do repositório
 
A carregar...
Miniatura
Publicação

COMBINA-PT: a Large Corpus-extracted and Hand-checked Lexical Database of Portuguese Multiword Expressions

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
poster_combina_LREC2006_final.pdf447.46 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

This paper presents the COMBINA-PT project, a study of corpus-extracted Portuguese Multiword (MW) expressions. The objective of this on-going project is to compile a large lexical database of multiword (MW) units of the Portuguese language, automatically extracted from a balanced 50 million word corpus, interpreted with lexical association measures and manually validated. MW expressions considered in the database include named entities and lexical associations with different degrees of cohesion, ranging from frozen groups, which undergo little or no variation, to lexical collocations composed of words that tend to occur together and that constitute syntactic dependencies, although with a low degree of fixedness. This new resource has a two-fold objective: (i) to be an important research tool which supports the development of MW expressions typologies and their lexicographic treatment; (ii) to be of major help in developing and evaluating language processing tools able of dealing with MW expressions

Descrição

Palavras-chave

Contexto Educativo

Citação

Mendes, A., Antunes, S., Bacelar do Nascimento, M. F., Casteleiro, J. M., Pereira, L. & Sá, T. (2006): "COMBINA-PT: a Large Corpus-extracted and Hand-checked Lexical Database of Portuguese Multiword Expressions", in Proceedings of the V International Conference on Language Resources and Evaluation - LREC2006, Genoa, May 22-28, 2006.

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

European Language Resources Association

Licença CC