Browsing by Author "Mata, Ana Isabel"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
- Classificação prosódica de marcadores discursivosPublication . Cabarrão, Vera; Moniz, Helena; Ferreira, Jaime; Batista, Fernando; Trancoso, Isabel; Mata, Ana Isabel; Curto, SérgioThis work describes the discourse markers present in two corpora for European Portuguese, in different domains (university lectures and map-task dialogues). In this study, we also perform a multiclass automatic classification task based on prosodic features to verify in both corpora which words are discourse markers, which are disfluencies, and which are sentence like-units (SUs). Results show that the selection of discourse markers varies across domain and between speakers. As for the classification task, results show that the discourse markers are better classified in the lectures corpus (87%) than in the dialogue corpus (84%). However, cross-domain experiments evidenced that data trained with the dialogue corpus predicts better the events in the lecture corpus, since this domain displays more speakers and therefore complex patterns. In both corpora, markers are more easily classified as SUs than as disfluencies.
- Extending AuToBI to prominence detection in European PortuguesePublication . Moniz, Helena; Mata, Ana Isabel; Hirschberg, Julia; Batista, Fernando; Rosenberg, Andrew; Trancoso, IsabelThis paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however, is almost impractical for extensive data sets. For that reason, automatic systems such as AuToBi stand as an alternate solution. We have started by applying the AuToBI prosodic event detection system using the existing English models to the prediction of prominent prosodic events (accents) in European Portuguese. This approach achieved an overall accuracy of 74% for prominence detection, similar to state-of-the-art results for other languages. Later, we have trained new models using prepared and spontaneous Portuguese data, achieving a considerable improvement of about 6% accuracy (absolute) over the existing English models. The achieved results are quite encouraging and provide a starting point for automatically predicting prominent events in European Portuguese.
- Global analysis of entrainment in dialoguesPublication . Cabarrão, Vera; Trancoso, Isabel; Mata, Ana Isabel; Moniz, Helena; Batista, FernandoThis paper performs a global analysis of entrainment between dyads in map-task dialogues in European Portuguese (EP), including 48 dialogues, between 24 speakers. Our main goals focus on the acoustic-prosodic similarities between speakers, namely if there are global entrainment cues displayed in the dialogues, if there are degrees of entrainment manifested in distinct sets of features shared amongst the speakers, if entrainment depends on the role of the speaker as either giver or follower, and also if speakers tend to entrain more with specific pairs regardless of the role. Results show global entrainment in almost all the dyads, but the degrees of entrainment (stronger within the same gender), and the role effects tend to be less striking than the interlocutors’ effect. Globally, speakers tend to be more similar to their own speech in other dialogues than to their partners. However, speakers are also more similar to their interlocutors than to speakers with whom they never spoke.
- Intonational Grammar in Ibero-Romance: Approaches across linguistic subfieldsPublication . Moniz, Helena; Batista, Fernando; Mata, Ana Isabel; Trancoso, IsabelThis work describes a framework that encompasses multi-layered linguistic information, focusing on prosodic features (pitch, energy, and tempo patterns), uses such features to distinguish between sentence-form types and disfluency/fluency repairs, and contributes to the characterization of intonational patterns of spontaneous and prepared speech in European Portuguese. Different machine learning methods have been applied for discriminating between structural metadata events, both in university lectures and in map-task dialogues, containing large amounts of spontaneous speech. Results show that prosodic features, and particularly a set of very informative features, are crucial to distinguish between sentence-form types and disfluency/fluency repair events. This is the first work for European Portuguese on both fully automatic processing of multilayered linguistically description of spoken corpora and intonational labeling.
- Prosódia, variação e processamento automáticoPublication . Mata, Ana Isabel; Moniz, HelenaNeste capítulo apresentamos um olhar panorâmico sobre a variação prosódica e sobre a sua interface com a área do processamento automático de fala. Tendo por base essencialmente a investigação que tem sido desenvolvida no português europeu sobre corpora de fala espontânea e preparada, em contextos de exposição e de interação, nomeadamente na variedade padrão falada em Lisboa, analisamos a variação da entoação em contextos declarativos e interrogativos, e abordamos as funções pragmáticodiscursivas que podem associar-se também a outros parâmetros prosódicos. Partindo de estudos comparativos inter-estilos (com maior/menor grau de espontaneidade e de planeamento, e natureza mais interativa/expositiva) e inter-falantes (espaço geográfico, género, grupo etário/estatuto), destacamos o papel da variação estilística e sociolinguística da prosódia no português europeu. Mostramos também o papel da variação no processamento automático de proeminência prosódica, pontuação, disfluências e emoções.
- Prosodic Classification of Discourse MarkersPublication . Cabarrão, Vera; Moniz, Helena; Ferreira, Jaime; Batista, Fernando; Trancoso, Isabel; Mata, Ana Isabel; Curto, SérgioThe first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, and which correspond to words that are neither markers nor disfluencies (chunks). Our goal is to automatically predict discourse markers and include them in rich transcripts, along with other structural metadata events (e.g., disfluencies and punctuation marks) that are already encompassed in the language models of our in-house speech recognizer. Results show that the automatic classification of discourse markers is better for the lectures corpus (87%) than for the dialogue corpus (84%). Nonetheless, in both corpora, discourse markers are more easily confused with chunks than with disfluencies.
- Prosodic, Syntactic, Semantic Guidelines for Topic Structures Across Domains and CorporaPublication . Mata, Ana Isabel; Moniz, Helena; Móia, Telmo; Gonçalves, Anabela; Silva, Fátima; Batista, Fernando; Duarte, Inês; Oliveira, Fátima; Falé, IsabelThis paper presents the annotation guidelines applied to naturally occurring speech, aiming at an integrated account of contrast and parallel structures in European Portuguese. These guidelines were defined to allow for the empirical study of interactions among intonation and syntax-discourse patterns in selected sets of different corpora (monologues and dialogues, by adults and teenagers). In this paper we focus on the multilayer annotation process of left periphery structures by using a small sample of highly spontaneous speech in which the distinct types of topic structures are displayed. The analysis of this sample provides fundamental training and testing material for further application in a wider range of domains and corpora. The annotation process comprises the following time-linked levels (manual and automatic): phone, syllable and word level transcriptions (including co-articulation effects); tonal events and break levels; part-of-speech tagging; syntactic-discourse patterns (construction type; construction position; syntactic function; discourse function), and disfluency events as well. Speech corpora with such a multi-level annotation are a valuable resource to look into grammar module relations in language use from an integrated viewpoint. Such viewpoint is innovative in our language, and has not been often assumed by studies for other languages.
- Revising the Annotation of a Broadcast News Corpus: a Linguistic ApproachPublication . Cabarrão, Vera; Moniz, Helena; Batista, Fernando; Ribeiro, Ricardo; Mamede, Nuno; Meinedo, Hugo; Trancoso, Isabel; Mata, Ana Isabel; Matos, DavidThis paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.
- Stylistic variation in the intonation of European Portuguese teenagers and adultsPublication . Mata, Ana Isabel; Moniz, Helena; Batista, FernandoThe present study aims to investigate intonation contours in phrase-final position, in a corpus of spontaneous and prepared unscripted presentations from teenagers (14-15 years old) and adults, collected in a school context. Taking into account the differences between phrasing levels (ToBI breaks 3 and 4), we show that the frequency of low/falling vs. high/rising contours – mainly (H+)L* L and (L+)H* H – varies across oral presentation types. Adults and teenagers follow distinct strategies, though cross-gender differences are also a source of variation. We interpret these changes as an adaptation effect to the speaking styles specifically required at school, which call for the speaker´s effort to speak clearly and to keep the listeners attention, and ultimately as “intelligibility-oriented” speaking style changes.
- Teenage and Adult Speech in School Context: Building and Processing a Corpus of European PortuguesePublication . Mata, Ana Isabel; Moniz, Helena; Batista, Fernando; Hirschberg, JuliaWe present a corpus of European Portuguese spoken by teenagers and adults in school context, CPE-FACES, with an overview of the differential characteristics of high school oral presentations and the challenges this data poses to automatic speech processing. The CPE-FACES corpus has been created with two main goals: to provide a resource for the study of prosodic patterns in both spontaneous and prepared unscripted speech, and to capture inter-speaker and speaking style variations common at school, for research on oral presentations. Research on speaking styles is still largely based on adult speech. References to teenagers are sparse and cross-analyses of speech types comparing teenagers and adults are rare. We expect CPE-FACES, currently a unique resource in this domain, will contribute to filling this gap in European Portuguese. Focusing on disfluencies and phrase-final phonetic-phonological processes we show the impact of teenage speech on the automatic segmentation of oral presentations. Analyzing fluent final intonation contours in declarative utterances, we also show that communicative situation specificities, speaker status and cross gender differences are key factors in speaking style variation at school.
