Repository logo
 
Loading...
Thumbnail Image
Publication

A Lexical Database of Portuguese Multiword Expressions

Use this identifier to reference this record.
Name:Description:Size:Format: 
antunes_propor2006.pdf472.6 KBAdobe PDF Download

Advisor(s)

Abstract(s)

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms with high corpus frequency and low cohesion to strongly lexicalized expressions with no, or minimum, variation. This new resource has a two-fold objective: to be an important research tool which supports the development of collocation typologies and their integration in a larger theory of MW units; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.

Description

Keywords

Pedagogical Context

Citation

Antunes, S., Bacelar do Nascimento, M. F., Casteleiro, J. M., Mendes, A., Pereira, L. & Sá, T. (2006): "A Lexical Database of Portuguese Multiword Expressions" in VIEIRA, R. et al. (2006) PROPOR 2006, LNAI 3960. Berlin: Springer-Verlag

Organizational Units

Journal Issue

Publisher

Sringer-Verlag

CC License