Loading...
Research Project
Untitled
Funder
Authors
Publications
Lexical semantics annotation for enriched Portuguese corpora
Publication . Neale, Steven; Valadas, Rita; Silva, João; Branco, António
The semantic annotation of corpora has an important role to play in ensuring that sentences occurring in natural language texts are correctly understood based on their intended context. Two examples of lexical semantic units that contribute to this knowledge are word senses – which allow words with multiple meanings to be understood based on the context in which they are used – and named entities – which can be disambiguated and linked back to the specific encyclopedic resources that describe them. In this paper, we describe the construction of lexical semanticallyannotated corpora for Portuguese, annotated with both word senses linked to senses in a Portuguese wordnet and named entities linked to Portuguese Wikipedia entries using DBpedia. The result is a goldstandard lexical semantically-annotated resource that is useful in supporting the training and evaluation of tools for the disambiguation of these lexical units in Portuguese.
Named Entities in the QTLeap Corpus of Online Helpdesk Interactions
Publication . Querido, Andreia; Carvalho, Rita de; Rodrigues, João; Silva, João; Neale, Steven; Pereira, Rita; Gomes, Patrícia; Correia, Catarina; Amaral, Diana; Branco, António
In this paper we present the annotation of a corpus with named entities that are classified into semantic types and disambiguated by linking them to their corresponding entry in the Portuguese DBpedia. This corpus, QTLeap Corpus, is a multilingual collection of question and answer pairs from a chat-based helpdesk service for Information and Communication Technologies. The resulting annotated corpus is a gold-standard named entity annotated lexical resource that is useful in supporting the training and evaluation of named entity annotation and disambiguation tools for Portuguese.
CINTIL DependencyBank PREMIUM Handbook: Design options for the representation of grammatical dependencies
Publication . Branco, António; Silva, João; Querido, Andreia; Carvalho, Rita
CINTIL DependencyBank PREMIUM. A corpus of grammatical dependencies for Portuguese
Publication . Carvalho, Rita de; Querido, Andreia; Campos, Marisa; Valadas, Rita; Silva, João; Branco, António
This paper presents a new linguistic resource for the study and computational processing of Portuguese. CINTIL DependencyBank PREMIUM is a corpus of Portuguese news text, accurately manually annotated with a wide range of linguistic information (morpho-syntax, named-entities, syntactic function and semantic roles), making it an invaluable resource specially for the development and evaluation of data-driven natural language processing tools. The corpus is under active development, reaching 4,000 sentences in its current version. The paper also reports on the training and evaluation of a dependency parser over this corpus. CINTIL DependencyBank PREMIUM is freely-available for research purposes through META-SHARE.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
3599-PPCDT
Funding Award Number
PTDC/EEI-SII/1940/2012