Sem título

Financiador

Organização

Publicações

OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries

Publication . Barreiro, Anabela; Batista, Fernando; Ribeiro, Ricardo; Moniz, Helena; Trancoso, Isabel

This paper presents 3 sets of OpenLogos resources, namely the English-German, the English-French, and the English-Italian bilingual dictionaries. In addition to the usual information on part-of-speech, gender, and number for nouns, offered by most dictionaries currently available, OpenLogos bilingual dictionaries have some distinctive features that make them unique: they contain cross-language morphological information (inflectional and derivational), semantico-syntactic knowledge, indication of the head word in multiword units, information about whether a source word corresponds to an homograph, information about verb auxiliaries, alternate words (i.e., predicate or process nouns), causatives, reflexivity, verb aspect, among others. The focal point of the paper will be the semantico-syntactic knowledge that is important for disambiguation and translation precision. The resources are publicly available at the METANET platform for free use by the research community.

2014Artigo científico

Acesso aberto

Ver mais

Modality annotation for Portuguese: from manual annotation to automatic labeling

Publication . Mendes, Amália; Hendrickx, Iris; Ávila, Luciana; Quaresma, Paulo; Gonçalves, Teresa; Sequeira, João

We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.

2016Artigo científico

Acesso aberto

Ver mais

Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach

Publication . Cabarrão, Vera; Moniz, Helena; Batista, Fernando; Ribeiro, Ricardo; Mamede, Nuno; Meinedo, Hugo; Trancoso, Isabel; Mata, Ana Isabel; Matos, David

This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.

2014Artigo científico

Acesso aberto

Ver mais

Automatic Tagging of Modality: identifying triggers and modal values

Publication . Quaresma, Paulo; Mendes, Amália; Hendrickx, Iris; Gonçalves, Teresa

We present an experiment in the automatic tagging of modality in Portuguese. As we are currently lacking a suitable resource with detailed modal information for Portuguese, we experiment with small sample of 160.000 tokens, manually annotated according to the modality scheme that we previously developed for European Portuguese (Hendrickx et al., 2012). We consider modality as the expression of the speaker (or subject)’s attitude towards the proposition and our modality scheme accounts for seven major modal values, and nine sub values. This experiment focuses on three modal verbs, poder ‘may/can’, dever ‘shall/might’ and conseguir ‘manage to/ succeed in/ be able to’, which may all have more than one modal value. We first report on the task of correctly detecting the modal uses of poder and dever, since these two verbs may have non modal meanings. For the identification of the modal value of each occurrence of those three verbs, we applied a machine learning approach that takes into consideration all the features available from a syntactic parser’s output. We obtained the best performance using SVM with a string kernel and the system improved the baseline for all three verbs, with a maximum F-score of 76.2.

2014Artigo científico

Acesso aberto

Ver mais

TypOn: the microbial typing ontology

Publication . Vaz, Cátia; Francisco, Alexandre P.; Silva, Mickael; Jolley, Keith A.; Bray, James E.; Pouseele, Hannes; Rothganger, Joerg; Ramirez, Mário; Carrico, Joao Andre

Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resistance. Due to advances in DNA sequencing technology, these methods have evolved to become focused on sequence-based methodologies. The need to have a common understanding of the concepts described and the ability to share results within the community at a global level are increasingly important requisites for the continued development of portable and accurate sequence-based typing methods, especially with the recent introduction of Next Generation Sequencing (NGS) technologies. In this paper, we present an ontology designed for the sequence-based microbial typing field, capable of describing any of the sequence-based typing methodologies currently in use and being developed, including novel NGS based methods. This is a fundamental step to accurately describe, analyze, curate, and manage information for microbial typing based on sequence based typing methods.

2014Artigo científico

Acesso aberto

Ver mais

Entidade financiadora

Fundação para a Ciência e a Tecnologia

Programa de financiamento

3599-PPCDT

Número da atribuição

PEst-OE/EEI/LA0021/2013

Sem título

Financiador

Autores

Publicações

Unidades organizacionais

Descrição

Palavras-chave

Contribuidores

Financiadores

Entidade financiadora

Programa de financiamento

Número da atribuição

ID