Utilize este identificador para referenciar este registo: http://hdl.handle.net/10451/30690
Título: The Gulf of Guinea Creole Corpora
Autor: Hagemeijer, Tjerk
Généreux, Michel
Hendrickx, Iris
Mendes, Amália
Tiny, Abigail
Zamora, Armando
Palavras-chave: Gulf of Guinea creoles
Corpus annotation and management
Language documentation
Data: 2014
Editora: European Language Resources Association
Citação: Hagemeijer, Tjerk, Michel Généreux, Iris Hendrickx, Amália Mendes, Abigail Tiny, Armando Zamora (2014) “The Gulf of Guinea Creole Corpora” in Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland, pp. 523-529
Resumo: We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.
URI: http://hdl.handle.net/10451/30690
ISBN: 978-2-9517408-8-4
Aparece nas colecções:FL - CLUL - Livros de Actas

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Hagemeijer_et_al_LREC_2014.pdf524,95 kBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.