Repository logo
 
Loading...
Thumbnail Image
Publication

Providing on-line access to Portuguese language resources: corpora and lexicons

Use this identifier to reference this record.
Name:Description:Size:Format: 
lrec2004.pdf173.28 KBAdobe PDF Download

Advisor(s)

Abstract(s)

Several Language Resources (LRs) for Portuguese, developed at the Center of Linguistics of the Lisbon University (CLUL), are available on-line at CLUL’s webpage: www.clul.ul.pt/english/sectores/projecto_rld.html. These LRs have been extracted from or developed based on the Reference Corpus of Contemporary Portuguese(CRPC1), a monitor corpus containing, at the present, more than 350 million words, taken by sampling from several types of written text (literary, newspaper, technical, didactic, juridical, parlamentary, etc.) and spoken text (informal and formal), pertaining to national and regional varieties of Portuguese (including European, Brazilian, African and Asian Portuguese).The LRs available for on-line queries include: a) several subcorpora (written and spoken, tagged and untagged) compiled and extracted from CRPC for specific CLUL’s projects and now available for on-line queries; b) a published sample of “Português Fundamental”, a spoken CRPC subcorpus, available for texts download; c) a frequency lexicon extracted from a CRPC subcorpus available for both on-line queries and download. Other RLs available for Portugueseare also referred: C-ORAL-ROM - Integrated Reference Corpora for Spoken Romance Languages, a CD-ROM edition of a spoken corpus with text-to-sound alignment; the LE-PAROLE corpus; the LE-PAROLE Lexicon and the SIMPLE Lexicon.

Description

Keywords

Pedagogical Context

Citation

Bacelar do Nascimento, M. F., Mendes, A. & Pereira, L. (2004): "Providing on-line access to Portuguese language resources: corpora and lexicons", in Proceedings of the IV International Conference on Language Resources and Evaluation - LREC2004, Lisbon, Centro de Cultural de Belém, May 26-28, 2004.

Research Projects

Organizational Units

Journal Issue

Publisher

European Language Resources Association

CC License