Logo do repositório
 
Publicação

Validation of Automated Protein Annotation

dc.contributor.authorCouto, Francisco M.por
dc.contributor.authorSilva, Mário J.por
dc.contributor.authorCoutinho, Pedro M.por
dc.date.accessioned2009-02-10T13:11:48Zpor
dc.date.accessioned2014-11-14T16:24:13Z
dc.date.available2009-02-10T13:11:48Zpor
dc.date.available2014-11-14T16:24:13Z
dc.date.issued2005-12por
dc.description.abstractGiven the large amount of data stored in biological databases, the management of uncertainty and incompleteness in them is a non-trivial problem. To cope with the large amount of sequences being produced, a significant number of genes and proteins have been functionally characterized by automated tools. However, these tools have also produced a significant number of misannotations that are now present in the databases. This paper proposes a new approach for validating the automated annotations, which uses the large amount of publicly available information to compare automated annotations with preexisting curated annotations. To test the proposed approach, we developed a novel unsupervised method for filtering misannotations provided by automated annotation systems. We evaluated our method using the automated annotations submitted to BioCreAtIvE, a joint evaluation of state-of-the-art text-mining systems in Biology. The method scored each of these annotations and those scored below a certain threshold were discarded. The results have shown a small trade-off in recall for a large improvement in precision. For example, we were able to discard 44.6%, 66.8% and 81% of the misannotations, maintaining 96.9%, 84.2%, and 47.8% of the correct annotations, respectively. Moreover, we were able to outperform each individual submission to BioCreAtIvE by proper adjustment of the threshold. These results show the effectiveness of our approach in assisting curators of large biological databases in the use of contemporary tools for automatic identification of annotationspor
dc.identifier.urihttp://hdl.handle.net/10451/14256por
dc.identifier.urihttp://repositorio.ul.pt/handle/10455/2957por
dc.language.isoporpor
dc.publisherDepartment of Informatics, University of Lisbonpor
dc.relation.ispartofseriesdi-fcul-tr-05-24por
dc.subjectdata miningpor
dc.subjecttext miningpor
dc.subjectgene and protein annotationpor
dc.titleValidation of Automated Protein Annotationpor
dc.typereport
dspace.entity.typePublication
rcaap.rightsopenAccesspor
rcaap.typereportpor

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
05-24.pdf
Tamanho:
276.53 KB
Formato:
Adobe Portable Document Format