Logo do repositório
 
Publicação

Development of a text mining approach to disease network discovery

datacite.subject.fosCiências Naturais::Ciências Biológicaspt_PT
dc.contributor.authorLamurias, Andre
dc.date.accessioned2020-03-11T16:15:26Z
dc.date.available2020-03-11T16:15:26Z
dc.date.issued2019-02
dc.date.submitted2019-01
dc.description.abstractScientific literature is one of the major sources of knowledge for systems biology, in the form of papers, patents and other types of written reports. Text mining methods aim at automatically extracting relevant information from the literature. The hypothesis of this thesis was that biological systems could be elucidated by the development of text mining solutions that can automatically extract relevant information from documents. The first objective consisted in developing software components to recognize biomedical entities in text, which is the first step to generate a network about a biological system. To this end, a machine learning solution was developed, which can be trained for specific biological entities using an annotated dataset, obtaining high-quality results. Additionally, a rule-based solution was developed, which can be easily adapted to various types of entities. The second objective consisted in developing an automatic approach to link the recognized entities to a reference knowledge base. A solution based on the PageRank algorithm was developed in order to match the entities to the concepts that most contribute to the overall coherence. The third objective consisted in automatically extracting relations between entities, to generate knowledge graphs about biological systems. Due to the lack of annotated datasets available for this task, distant supervision was employed to train a relation classifier on a corpus of documents and a knowledge base. The applicability of this approach was demonstrated in two case studies: microRNAgene relations for cystic fibrosis, obtaining a network of 27 relations using the abstracts of 51 recently published papers; and cell-cytokine relations for tolerogenic cell therapies, obtaining a network of 647 relations from 3264 abstracts. Through a manual evaluation, the information contained in these networks was determined to be relevant. Additionally, a solution combining deep learning techniques with ontology information was developed, to take advantage of the domain knowledge provided by ontologies. This thesis contributed with several solutions that demonstrate the usefulness of text mining methods to systems biology by extracting domain-specific information from the literature. These solutions make it easier to integrate various areas of research, leading to a better understanding of biological systems.pt_PT
dc.identifier.tid101544804pt_PT
dc.identifier.urihttp://hdl.handle.net/10451/42317
dc.language.isoengpt_PT
dc.relationDevelopment of a Text Mining Approach to Disease Network Discovery
dc.subjectText Miningpt_PT
dc.subjectInformation Extractionpt_PT
dc.subjectSystems Biologypt_PT
dc.subjectMachine Learningpt_PT
dc.titleDevelopment of a text mining approach to disease network discoverypt_PT
dc.typedoctoral thesis
dspace.entity.typePublication
oaire.awardNumberPD/BD/106083/2015
oaire.awardTitleDevelopment of a Text Mining Approach to Disease Network Discovery
oaire.awardURIinfo:eu-repo/grantAgreement/FCT//PD%2FBD%2F106083%2F2015/PT
person.familyNameMartins Lamúrias
person.givenNameAndré Francisco
person.identifierCusKTEIAAAAJ
person.identifier.ciencia-idC61E-8B6E-A1B4
person.identifier.orcid0000-0001-7965-6536
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typedoctoralThesispt_PT
relation.isAuthorOfPublication7a26492c-00ef-4a2f-ac7c-e8e454ee2814
relation.isAuthorOfPublication.latestForDiscovery7a26492c-00ef-4a2f-ac7c-e8e454ee2814
relation.isProjectOfPublication7850df3e-e7de-4743-90af-e30dad7a1997
relation.isProjectOfPublication.latestForDiscovery7850df3e-e7de-4743-90af-e30dad7a1997
thesis.degree.nameTese de doutoramento, Biologia (Biologia de Sistemas), Universidade de Lisboa, Faculdade de Ciências, 2019pt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
ULSD733942_td_Andre_Lamurias.pdf
Tamanho:
4.58 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.2 KB
Formato:
Item-specific license agreed upon to submission
Descrição: