Logo do repositório
 
Publicação

A new generation of user-friendly and machine learning-accelerated methods for protein pKa calculations

datacite.subject.fosCiências Naturais::Ciências Biológicaspt_PT
dc.contributor.advisorMachuqueiro, Miguel Ângelo dos Santos
dc.contributor.advisorViçosa, Diogo Ruivo dos Santos Vila
dc.contributor.advisorRocchia, Walter
dc.contributor.authorReis, Pedro B P S
dc.date.accessioned2023-10-17T14:20:00Z
dc.date.available2023-10-17T14:20:00Z
dc.date.issued2023-05
dc.date.submitted2022-10
dc.description.abstractThe ability to sense and react to external and internal pH changes is a survival requirement for any cell. pH homeostasis is tightly regulated, and even minor disruptions can severely impact cell metabolism, function, and survival. The pH dependence of proteins can be attributed to only 7 out of the 20 canonical amino acids, the titratable amino acids that can exchange protons with water in the usual 0-14 pH range. These amino acids make up for approximately 31% of all amino acids in the human proteome, meaning that, on average, roughly one-third of each protein is sensitive not only to the medium pH but also to alterations in the electrostatics of its surroundings. Unsurprisingly, protonation switches have been associated with a wide array of protein behaviors, including modulating the binding affinity in protein-protein, protein-ligand, or protein-lipid systems, modifying enzymatic activity and function, and even altering their stability and subcellular location. Despite its importance, our molecular understanding of pHdependent effects in proteins and other biomolecules is still very limited, particularly in big macromolecular complexes such as protein-protein or membrane protein systems. Over the years, several classes of methods have been developed to provide molecular insights into the protonation preference and dependence of biomolecules. Empirical methods offer cheap and competitive predictions for time- or resource-constrained situations. Albeit more computationally expensive, continuum electrostatics-based are a cost-effective solution for estimating microscopic equilibrium constants, pKhalf and macroscopic pKa. To study pH-dependent conformational transitions, constant-pH molecular dynamics (CpHMD) is the appropriate methodology. Unfortunately, given the computational cost and, in many cases, the difficulty associated with using CE-based and CpHMD, most researchers overuse empirical methods or neglect the effect of pH in their studies. Here, we address these issues by proposing multiple pKa predictor methods and tools with different levels of theory designed to be faster and accessible to more users. First, we introduced PypKa, a flexible tool to predict Poisson–Boltzmann/Monte Carlo-based (PB/MC) pKa values of titratable sites in proteins. It was validated with a large set of experimental values exhibiting a competitive performance. PypKa supports CPU parallel computing and can be used directly on proteins obtained from the Protein Data Bank (PDB) repository or molecular dynamics (MD) simulations. A simple, reusable, and extensible Python API is provided, allowing pKa calculations to be easily added to existing protocols with a few extra lines of code. This capability was exploited in the development of PypKa-MD, an easy-to-use implementation of the stochastic titration CpHMD method. PypKa-MD supports GROMOS and CHARMM force fields, as well as modern versions of GROMACS. Using PypKa’s API and consequent abstraction of PB/MC contributed to its greatly simplified modular architecture that will serve as the foundation for future developments. The new implementation was validated on alanine-based tetrapeptides with closely interacting titratable residues and four commonly used benchmark proteins, displaying highly similar and correlated pKa predictions compared to a previously validated implementation. Like most structural-based computational studies, the majority of pKa calculations are performed on experimental structures deposited in the PDB. Furthermore, there is an ever-growing imbalance between scarce experimental pKa values and the increasingly higher number of resolved structures. To save countless hours and resources that would be spent on repeated calculations, we have released pKPDB, a database of over 12M theoretical pKa values obtained by running PypKa over 120k protein structures from the PDB. The precomputed pKa estimations can be retrieved instantaneously via our web application, the PypKa Server. In case the protein of interest is not in the pKPDB, the user may easily run PypKa in the cloud either by uploading a custom structure or submitting an identifier code from the PBD or UniProtKB. It is also possible to use the server to get structures with representative pH-dependent protonation states to be used in other computational methods such as molecular dynamics. The advent of artificial intelligence in biological sciences presented an opportunity to drastically accelerate pKa predictors using our previously generated database of pKa values. With pKAI, we introduced the first deep learning-based predictor of pKa shifts in proteins trained on continuum electrostatics data. By combining a reasonable understanding of the underlying physics, an accuracy comparable to that of physics-based methods, and inference time speedups of more than 1000 ×, pKAI provided a game-changing solution for fast estimations of macroscopic pKa from ensembles of microscopic values. However, several limitations needed to be addressed before its integration within the CpHMD framework as a replacement for PypKa. Hence, we proposed a new graph neural network for protein pKa predictions suitable for CpHMD, pKAI-MD. This model estimates pH-independent energies to be used in a Monte Carlo routine to sample representative microscopic protonation states. While developing the new model, we explored different graph representations of proteins using multiple electrostatics-driven properties. While there are certainly many new features to be introduced and a multitude of development to be expanded, the selection of methods and tools presented in this work poses a significant improvement over the alternatives and effectively constitutes a new generation of user-friendly and machine learning-accelerated methods for pKa calculations.pt_PT
dc.identifier.tid101661797pt_PT
dc.identifier.urihttp://hdl.handle.net/10451/59848
dc.language.isoengpt_PT
dc.relationMelhoramento da eficiência das simulações de dinâmica molecular a pH constante em sistemas biológicos complexos
dc.subjectPKapt_PT
dc.subjectProtonaçãopt_PT
dc.subjectPh constantept_PT
dc.subjectMachine learningpt_PT
dc.subjectProtonationpt_PT
dc.subjectConstant-pHpt_PT
dc.titleA new generation of user-friendly and machine learning-accelerated methods for protein pKa calculationspt_PT
dc.typedoctoral thesis
dspace.entity.typePublication
oaire.awardNumberSFRH/BD/136226/2018
oaire.awardTitleMelhoramento da eficiência das simulações de dinâmica molecular a pH constante em sistemas biológicos complexos
oaire.awardURIinfo:eu-repo/grantAgreement/FCT//SFRH%2FBD%2F136226%2F2018/PT
person.familyNamede Brito Pires Santos Reis
person.givenNamePedro
person.identifier.ciencia-idBA1A-8F17-5F20
person.identifier.orcid0000-0003-3563-6239
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typedoctoralThesispt_PT
relation.isAuthorOfPublication2c5580ce-a063-4c41-8a82-c060520a43e3
relation.isAuthorOfPublication.latestForDiscovery2c5580ce-a063-4c41-8a82-c060520a43e3
relation.isProjectOfPublicatione3ee07f0-ea72-4c82-85a0-9e8e42986fcf
relation.isProjectOfPublication.latestForDiscoverye3ee07f0-ea72-4c82-85a0-9e8e42986fcf
thesis.degree.nameTese de doutoramento, Bioquímica (Biofísica Molecular), Universidade de Lisboa, Faculdade de Ciências, 2023pt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
scnd741463_td_Pedro_Reis.pdf
Tamanho:
21.25 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.2 KB
Formato:
Item-specific license agreed upon to submission
Descrição: