A Portuguese Native Language Identification Dataset

del Río, Iria; Zampieri, Marcos; Malmasi, Shervin

Publicação

A Portuguese Native Language Identification Dataset

2018Documento de conferência

dc.contributor.author	del Río, Iria
dc.contributor.author	Zampieri, Marcos
dc.contributor.author	Malmasi, Shervin
dc.date.accessioned	2018-05-24T10:58:38Z
dc.date.available	2018-05-24T10:58:38Z
dc.date.issued	2018
dc.description.abstract	In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author’s first language based on their second language writing. The dataset includes 1,868 student essays written by learners of European Portuguese, native speakers of the following L1s: Chinese, English, Spanish, German, Russian, French, Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian, and Swedish. NLI-PT includes the original student text and four different types of annotation: POS, fine-grained POS, constituency parses, and dependency parses. NLI-PT can be used not only in NLI but also in research on several topics in the field of Second Language Acquisition and educational NLP. We discuss possible applications of this dataset and present the results obtained for the first lexical baseline system for Portuguese NLI.	pt_PT
dc.description.version	info:eu-repo/semantics/publishedVersion	pt_PT
dc.identifier.citation	del Río, Iria; Zampieri, Marcos; Malmasi, Shervin (2018): A Portuguese Native Language Identification Dataset in "The Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications". The Association for Computational Linguistics: New Orleans	pt_PT
dc.identifier.uri	http://hdl.handle.net/10451/33644
dc.language.iso	eng	pt_PT
dc.publisher	The Association for Computational Linguistics	pt_PT
dc.relation	DETECÇÃO E CORREÇÃO AUTOMÁTICA DE ERROS EM PORTUGUÊS SEGUNDA LÍNGUA/LÍNGUA ESTRANGEIRA
dc.title	A Portuguese Native Language Identification Dataset	pt_PT
dc.type	conference object
dspace.entity.type	Publication
oaire.awardNumber	SFRH/BPD/109914/2015
oaire.awardTitle	DETECÇÃO E CORREÇÃO AUTOMÁTICA DE ERROS EM PORTUGUÊS SEGUNDA LÍNGUA/LÍNGUA ESTRANGEIRA
oaire.awardURI	info:eu-repo/grantAgreement/FCT/OE/SFRH%2FBPD%2F109914%2F2015/PT
oaire.citation.conferencePlace	New Orleans	pt_PT
oaire.citation.title	The Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications	pt_PT
oaire.fundingStream	OE
project.funder.identifier	http://doi.org/10.13039/501100001871
project.funder.name	Fundação para a Ciência e a Tecnologia
rcaap.rights	openAccess	pt_PT
rcaap.type	conferenceObject	pt_PT
relation.isProjectOfPublication	a663ff8b-f624-4c4d-af6f-69ab41fcbe3b
relation.isProjectOfPublication.latestForDiscovery	a663ff8b-f624-4c4d-af6f-69ab41fcbe3b

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: A Portuguese Native Language Identification Dataset.pdf
Tamanho:: 97.93 KB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.2 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

FL - CLUL - Livros de Actas