The COPLE2 Corpus: a Learner Corpus for Portuguese

Mendes, Amália; Antunes, Sandra; Jansseen, Maarten; Gonçalves, Anabela

http://hdl.handle.net/10451/30692

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Mendes_et_al_COPLE2_LREC_2016.pdf		194.88 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Resumo(s)

We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.

Palavras-chave

Learner corpus Corpus compilation Language learning Language teaching

URI

http://hdl.handle.net/10451/30692

Citação

Mendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214

Projetos de investigação

Sem título

Projeto de investigaçãoVer mais

Editora

European Language Resources Association

Coleções

FL - CLUL - Livros de Actas

Ver registo completo