Logo do repositório
 
A carregar...
Miniatura
Publicação

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
2016ArantxaAranberriBrancoEtAl.pdf145.28 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.

Descrição

Palavras-chave

Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference

Contexto Educativo

Citação

Otegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016.

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

European Language Resources Association

Licença CC