| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 27.88 MB | Adobe PDF |
Orientador(es)
Resumo(s)
A staggering amount of gene products imputed from prokaryotic sequence data fail to be annotated by mainstream computational methods. This has led to an ever-increasing growth in gene products of unknown function within public databases. Evidence suggests that these gene products might provide biotechnological solutions to pressing societal concerns.
In this thesis we sought to understand if the putative functions of these uncharacterized gene products attained biotechnological interest. To this end, we collected >134 million protein sequences of unknown function from public databases. By doing so, we created the first worldwide repository of prokaryotic gene products of unknown molecular function. Upon clustering these sequences we generated a representative dataset containing ~12 million proteins. We managed to annotate 99.97% of this dataset with at least one term. From the foregoing sum, 2.78% (351,917) were annotated with at least one Enzyme Commission (EC) number. We postulate that these are putative enzymes. We found that the most abundant enzymatic classes were those of Transferases (182,797) and Hydrolases (100,475). We also found that 9,622 putative enzymes might portray catalytic promiscuity, or multiple catalytic functions altogether. Afterwards we developed a new family of information-theoretic metrics that allow to quantify the annotation content and quality of a given sequence. These metrics enabled us to systemize our dataset according to distinct spectra of annotation. They also allow to expedite the selection of the best annotated sequences for ensuing experimental validation. We also provide a proof-of-concept for the usefulness of the work developed in this thesis by characterizing a putative enzyme subclass of utmost societal significance.
We conclude that there is both a tremendous quantity and diversity of uncharacterized gene products whose predicted functions are directly implicated with well-established biotechnological applications. This reservoir can be tapped into at the present time, conceivably allowing to solve several societal demands.
Descrição
Palavras-chave
Genómica Funcional Imputação de Função Molecular Matéria Negra Microbiana Conteúdo de Informação Biotecnologia Functional Genomics Molecular Function Prediction Microbial Dark Matter Information Content Biotechnology
