Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank

Zeyrek, Deniz; Mendes, Amália; Kurfali, Murathan

http://hdl.handle.net/10451/37351

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
141.pdf		580.25 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Zeyrek, Deniz

Mendes, Amália

Kurfali, Murathan

Resumo(s)

We introduce TED-Multilingual Discourse Bank, a corpus of TED talks transcripts in 6 languages (English, German, Polish, EuropeanPortuguese, Russian and Turkish), where the ultimate aim is to provide a clearly described level of discourse structure and semanticsin multiple languages. The corpus is manually annotated following the goals and principles of PDTB, involving explicit and implicitdiscourse connectives, entity relations, alternative lexicalizations and no relations. In the corpus, we also aim to capture the character-istics of spoken language that exist in the transcripts and adapt the PDTB scheme according to our aims; for example, we introducehypophora. We spot other aspects of spoken discourse such as the discourse marker use of connectives to keep them distinct from theirdiscourse connective use. TED-MDB is, to the best of our knowledge, one of the few multilingual discourse treebanks and is hoped tobe a source of parallel data for contrastive linguistic analysis as well as language technology applications. We describe the corpus, theannotation procedure and provide preliminary corpus statistics.

Palavras-chave

Discourse Parallel Multilingual corpus

URI

http://hdl.handle.net/10451/37351

Citação

Zeyrek, Deniz, Amália Mendes, Murathan Kurfalı (2018) Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank. In Proceedings of the 11th Language Resources and Evaluation Conference - LREC’2018, 7-12 May 2018, Miyazaki, Japan, pp. 1913-1919.

Editora

European Language Resources Association

Coleções

FL - CLUL - Livros de Actas

Carregar mais

Ver registo completo