Logo do repositório
 
Publicação

Identification of biotechnological potential on genomic nonfunctionalized orthologs elements

datacite.subject.fosCiências Naturais::Ciências Biológicaspt_PT
dc.contributor.advisorDias, Ricardo Pedro Moreira
dc.contributor.advisorHenry, Christopher S.
dc.contributor.authorEscudeiro, Pedro Miguel Agostinho
dc.date.accessioned2023-01-03T15:58:17Z
dc.date.available2024-12-01T01:30:33Z
dc.date.issued2022-07
dc.date.submitted2022-02
dc.description.abstractA staggering amount of gene products imputed from prokaryotic sequence data fail to be annotated by mainstream computational methods. This has led to an ever-increasing growth in gene products of unknown function within public databases. Evidence suggests that these gene products might provide biotechnological solutions to pressing societal concerns. In this thesis we sought to understand if the putative functions of these uncharacterized gene products attained biotechnological interest. To this end, we collected >134 million protein sequences of unknown function from public databases. By doing so, we created the first worldwide repository of prokaryotic gene products of unknown molecular function. Upon clustering these sequences we generated a representative dataset containing ~12 million proteins. We managed to annotate 99.97% of this dataset with at least one term. From the foregoing sum, 2.78% (351,917) were annotated with at least one Enzyme Commission (EC) number. We postulate that these are putative enzymes. We found that the most abundant enzymatic classes were those of Transferases (182,797) and Hydrolases (100,475). We also found that 9,622 putative enzymes might portray catalytic promiscuity, or multiple catalytic functions altogether. Afterwards we developed a new family of information-theoretic metrics that allow to quantify the annotation content and quality of a given sequence. These metrics enabled us to systemize our dataset according to distinct spectra of annotation. They also allow to expedite the selection of the best annotated sequences for ensuing experimental validation. We also provide a proof-of-concept for the usefulness of the work developed in this thesis by characterizing a putative enzyme subclass of utmost societal significance. We conclude that there is both a tremendous quantity and diversity of uncharacterized gene products whose predicted functions are directly implicated with well-established biotechnological applications. This reservoir can be tapped into at the present time, conceivably allowing to solve several societal demands.pt_PT
dc.identifier.tid101615310pt_PT
dc.identifier.urihttp://hdl.handle.net/10451/55593
dc.language.isoengpt_PT
dc.relationFCT/PD/00065/2012pt_PT
dc.relationIdentification of biotechnological on genomic nonfunctionalized orthologs elements.
dc.subjectGenómica Funcionalpt_PT
dc.subjectImputação de Função Molecularpt_PT
dc.subjectMatéria Negra Microbianapt_PT
dc.subjectConteúdo de Informaçãopt_PT
dc.subjectBiotecnologiapt_PT
dc.subjectFunctional Genomicspt_PT
dc.subjectMolecular Function Predictionpt_PT
dc.subjectMicrobial Dark Matterpt_PT
dc.subjectInformation Contentpt_PT
dc.subjectBiotechnologypt_PT
dc.titleIdentification of biotechnological potential on genomic nonfunctionalized orthologs elementspt_PT
dc.typedoctoral thesis
dspace.entity.typePublication
oaire.awardNumberPD/BD/131416/2017
oaire.awardTitleIdentification of biotechnological on genomic nonfunctionalized orthologs elements.
oaire.awardURIinfo:eu-repo/grantAgreement/FCT//PD%2FBD%2F131416%2F2017/PT
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typedoctoralThesispt_PT
relation.isProjectOfPublication25160a86-93d8-4b65-820d-1efb1f3499ce
relation.isProjectOfPublication.latestForDiscovery25160a86-93d8-4b65-820d-1efb1f3499ce
thesis.degree.nameTese de doutoramento, Biologia (Biologia de Sistemas), Universidade de Lisboa, Faculdade de Ciências, 2022pt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
scnd740248_td_Pedro_Escudeiro.pdf
Tamanho:
27.88 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.2 KB
Formato:
Item-specific license agreed upon to submission
Descrição: