| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 446.65 KB | Adobe PDF |
Orientador(es)
Resumo(s)
In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in xml format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.
Descrição
Palavras-chave
Contexto Educativo
Citação
Río, Iria del; Antunes, Sandra; Mendes, Amália & Janssen, Maarten (2016). Towards error annotation in a learner corpus of Portuguese. 5th NLP4CALL and 1st NLP4LA workshop in Sixth Swedish Language Technology Conference (SLTC). Umeå University, Sweden, 17-18 November.
