Towards error annotation in a learner corpus of Portuguese

del Río, Iria; Antunes, Sandra; Mendes, Amália; Janssen, Maarten

http://hdl.handle.net/10451/31214

Utilize este identificador para referenciar este registo.

Contacte-nos

Autores

Resumo(s)

In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in xml format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.

URI

http://hdl.handle.net/10451/31214

Citação

Río, Iria del; Antunes, Sandra; Mendes, Amália & Janssen, Maarten (2016). Towards error annotation in a learner corpus of Portuguese. 5th NLP4CALL and 1st NLP4LA workshop in Sixth Swedish Language Technology Conference (SLTC). Umeå University, Sweden, 17-18 November.

Projetos de investigação

Sem título

Projeto de investigaçãoVer mais

Editora

Linköping University Electronic Press

Coleções

FL - CLUL - Livros de Actas

Ver registo completo