Loading...
Research Project
Untitled
Funder
Authors
Publications
Automatic Recognition of Prosodic Patterns in Semantic Verbal Fluency Tests - an Animal Naming Task for Edutainment Applications
Publication . Moniz, Helena; Pompili, Anna; Batista, Fernando; Trancoso, Isabel; Abad, Alberto; Amorim, Cristiana
This paper automatically detects prosodic patterns in the domain of semantic fluency tests. Verbal fluency tests aim at evaluating the spontaneous production of words under constrained conditions. Mostly used for assessing cognitive impairment, they can be used in a plethora of domains, as edutainment applications or games with educational purposes. This work discriminates between list effects, disfluencies, and other linguistic events in an animal naming task. Recordings from 42 Portuguese speakers were automatically recognized and AuToBI was applied in order to detect prosodic patterns, using both European Portuguese and English models. Both models allowed to differentiate list effects from the other events, mostly represented by the tunes: L* H/L(-%) (English models) or L*+H H/L(-%) (Portuguese models). However, English models proved to be more suitable because they rely in substantial more training material.
Prosodic Classification of Discourse Markers
Publication . Cabarrão, Vera; Moniz, Helena; Ferreira, Jaime; Batista, Fernando; Trancoso, Isabel; Mata, Ana Isabel; Curto, Sérgio
The first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, and which correspond to words that are neither markers nor disfluencies (chunks). Our goal is to automatically predict discourse markers and include them in rich transcripts, along with other structural metadata events (e.g., disfluencies and punctuation marks) that are already encompassed in the language models of our in-house speech recognizer. Results show that the automatic classification of discourse markers is better for the lectures corpus (87%) than for the dialogue corpus (84%). Nonetheless, in both corpora, discourse markers are more easily confused with chunks than with disfluencies.
Detection of vulnerabilities and automatic protection for web applications
Publication . Medeiros, Ibéria; Correia, Miguel Nuno Dias Alves Pupo, 1968-
In less than three decades of existence, the Web evolved from a platform for accessing hypermedia to a framework for running complex web applications. These applications appear in many forms, from small home-made to large-scale commercial services such as Gmail, Office 365, and Facebook. Although a significant research effort on web application security has been on going for a while, these applications have been a major source of problems and their security continues to be challenged. An important part of the problem derives from vulnerable source code, often written in unsafe languages like PHP, and programmed by people without the appropriate knowledge about secure coding, who leave flaws in the applications. Nowadays the most exploited vulnerability category is the input validation, which is directly related with the user inputs inserted in web application forms. The thesis proposes methodologies and tools for the detection of input validation vulnerabilities in source code and for the protection of web applications written in PHP, using source code static analysis, machine learning and runtime protection techniques. An approach based on source code static analysis is used to identify vulnerabilities in applications programmed with PHP. The user inputs are tracked with taint analysis to determine if they reach a PHP function susceptible to be exploited. Then, machine learning is applied to determine if the identified flaws are actually vulnerabilities. In the affirmative case, the results of static analysis are used to remove the flaws, correcting the source code automatically thus protecting the web application. A new technique for source code static analysis is suggested to automatically learn about vulnerabilities and then to detect them. Machine learning applied to natural language processing is used to, in a first instance, learn characteristics about flaws in the source code, classifying it as being vulnerable or not, and then discovering and identifying the vulnerabilities. A runtime protection technique is also proposed to flag and block injection attacks against databases. The technique is implemented inside the database management system to improve the effectiveness of the detection of attacks, avoiding a semantic mismatch. Source code identifiers are employed so that, when an attack is flagged, the vulnerability is localized in the source code. Overall this work allowed the identification of about 1200 vulnerabilities in open source web applications available in the Internet, 560 of which previously unknown. The unknown vulnerabilities were reported to the corresponding software developers and most of them have already been removed.
Combining Multiple Approaches to Predict the Degree of Nativeness
Publication . Ribeiro, Eugénio; Ferreira, Jaime; Olcoz, Julia; Abad, Alberto; Moniz, Helena; Batista, Fernando; Trancoso, Isabel
Automatic speaker nativeness assessment has multiple applications, such as second language learning and IVR systems. In this paper we view this as a regression problem, since the available labels are on a continuous scale. Multiple approaches were applied, such as phonotactic models, i-vectors, and goodness of pronunciation, covering both segmental and suprasegmental features. Different phonotactic models were adopted, either trained with the challenge data, or using additional multilingual data from other domains. The obtained values were later combined in multiple ways and fed to a support vector machine regressor. Results on the test set surpass the provided baseline and are in line with the results obtained on the remaining sets. This suggests that our models generalize well to other datasets
Classificação prosódica de marcadores discursivos
Publication . Cabarrão, Vera; Moniz, Helena; Ferreira, Jaime; Batista, Fernando; Trancoso, Isabel; Mata, Ana Isabel; Curto, Sérgio
This work describes the discourse markers present in two corpora for European Portuguese, in different domains (university lectures and map-task dialogues). In this study, we also perform a multiclass automatic classification task based on prosodic features to verify in both corpora which words are discourse markers, which are disfluencies, and which are sentence like-units (SUs). Results show that the selection of discourse markers varies across domain and between speakers. As for the classification task, results show that the discourse markers are better classified in the lectures corpus (87%) than in the dialogue corpus (84%). However, cross-domain experiments evidenced that data trained with the dialogue corpus predicts better the events in the lecture corpus, since this domain displays more speakers and therefore complex patterns. In both corpora, markers are more easily classified as SUs than as disfluencies.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
5876
Funding Award Number
UID/CEC/50021/2013
