Advisor(s)
Abstract(s)
The first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, and which correspond to words that are neither markers nor disfluencies (chunks). Our goal is to automatically predict discourse markers and include them in rich transcripts, along with other structural metadata events (e.g., disfluencies and punctuation marks) that are already encompassed in the language models of our in-house speech recognizer. Results show that the automatic classification of discourse markers is better for the lectures corpus (87%) than for the dialogue corpus (84%). Nonetheless, in both corpora, discourse markers are more easily confused with chunks than with disfluencies.
Description
Keywords
Discourse markers Prosódia Lectures Dialogues Structural Metadata Events
Pedagogical Context
Citation
Cabarrão, V., Moniz, H., Ferreira, J., Batista, F., Trancoso, I., Mata, A. I. & Curto, S. (2105) "Prosodic Classification of Discourse Markers", in International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, Scotland, UK.