Repository logo
 
Publication

Corpus-based extraction and identification of Portuguese Multiword Expressions

dc.contributor.authorAntunes, Sandra
dc.contributor.authorNascimento, Maria Fernanda Bacelar do
dc.contributor.authorCasteleiro, João Miguel
dc.contributor.authorMendes, Amália
dc.contributor.authorPereira, Luísa
dc.contributor.authorSá, Tiago
dc.date.accessioned2019-03-13T16:20:58Z
dc.date.available2019-03-13T16:20:58Z
dc.date.issued2006
dc.description.abstractThis presentation reports the methodology followed and the results attained on an on-going project aiming at building a large lexical database of corpus-extracted multiword (MW) expressions for the Portuguese language. MW expressions were automatically extracted from a balanced 50 million word corpus compiled for this project, furthermore statistically interpreted using lexical association measures and are undergoing a manual validation process. The lexical database covers different types of MW expressions, from named entities to lexical associations with different degrees of cohesion, ranging from totally frozen idioms to favoured co-occurring forms, like collocations. We aim to achieve two main objectives with this resource: to build on the large set of data of different types of MW expressions to revise existing typologies of collocations and to integrate them in a larger theory of MW units; to use the extensive hand-checked data as training data to evaluate existing statistical lexical association measures.pt_PT
dc.description.abstractCet article présente la méthodologie suivie et les résultats obtenus dans le cadre d’un projet qui a pour objectif la construction d’une large base de données d’expressions multi-mots de la langue portugaise. Ces expressions multi-mots ont été automatiquement extraites d’un corpus équilibré de 50 millions de mots, interprétées statistiquement à l’aide de mesures d’association lexicales et ont été ensuite manuellement vérifiées. La base de données lexicales recouvre différent types d’expressions multi-mots avec différents degrés de cohésion, qui vont de la quasi totale fixité jusqu’aux groupes de mots qui se réalisent préférentiellement ensemble, comme les collocations. Le large ensemble de données de cette ressource permettra une révision des typologies d’unités multi-mots en portugais et l’évaluation de différentes mesures d’associations lexicales.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationAntunes, S., Bacelar do Nascimento, M. F., Casteleiro, J. M., Mendes, A., Pereira, L. & Sá, T. (2006): "Corpus-based extraction and identification of Portuguese Multiword Expressions", in Traitement Automatique des Langues Naturelles - TALN 2006, Leuven, April10-13, 2006.pt_PT
dc.identifier.urihttp://hdl.handle.net/10451/37497
dc.language.isoengpt_PT
dc.publisherUniversité Catholique de Louvainpt_PT
dc.relationWord combinations in portuguese language COMBINA-PT
dc.subjectMultiword expressionspt_PT
dc.subjectCollocationspt_PT
dc.subjectInformation extractionpt_PT
dc.subjectLexical databasept_PT
dc.subjectLexical association measurespt_PT
dc.subjectTypology of multiword expressionspt_PT
dc.titleCorpus-based extraction and identification of Portuguese Multiword Expressionspt_PT
dc.typeconference object
dspace.entity.typePublication
oaire.awardTitleWord combinations in portuguese language COMBINA-PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/POCI/POCTI%2FLIN%2F48465%2F2002/PT
oaire.citation.conferencePlaceLeuvenpt_PT
oaire.citation.titleTraitement Automatique des Langues Naturelles - TALN 2006pt_PT
oaire.fundingStreamPOCI
person.familyNameMendes
person.givenNameAmália
person.identifier.ciencia-id4018-7A6F-1873
person.identifier.orcid0000-0001-6815-2674
person.identifier.scopus-author-id14035817100
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typeconferenceObjectpt_PT
relation.isAuthorOfPublication94be597b-a42a-42f4-8f1d-822fa454b910
relation.isAuthorOfPublication.latestForDiscovery94be597b-a42a-42f4-8f1d-822fa454b910
relation.isProjectOfPublication72c89123-b496-47ac-928f-b20b984f092a
relation.isProjectOfPublication.latestForDiscovery72c89123-b496-47ac-928f-b20b984f092a

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
paper_taln_2006_antunes_final_version.pdf
Size:
766.81 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.2 KB
Format:
Item-specific license agreed upon to submission
Description: