A new generation of user-friendly and machine learning-accelerated methods for protein pKa calculations

Reis, Pedro B P S

Publicação

A new generation of user-friendly and machine learning-accelerated methods for protein pKa calculations

2023-05Tese de doutoramento

datacite.subject.fos	Ciências Naturais::Ciências Biológicas	pt_PT
dc.contributor.advisor	Machuqueiro, Miguel Ângelo dos Santos
dc.contributor.advisor	Viçosa, Diogo Ruivo dos Santos Vila
dc.contributor.advisor	Rocchia, Walter
dc.contributor.author	Reis, Pedro B P S
dc.date.accessioned	2023-10-17T14:20:00Z
dc.date.available	2023-10-17T14:20:00Z
dc.date.issued	2023-05
dc.date.submitted	2022-10
dc.description.abstract	The ability to sense and react to external and internal pH changes is a survival requirement for any cell. pH homeostasis is tightly regulated, and even minor disruptions can severely impact cell metabolism, function, and survival. The pH dependence of proteins can be attributed to only 7 out of the 20 canonical amino acids, the titratable amino acids that can exchange protons with water in the usual 0-14 pH range. These amino acids make up for approximately 31% of all amino acids in the human proteome, meaning that, on average, roughly one-third of each protein is sensitive not only to the medium pH but also to alterations in the electrostatics of its surroundings. Unsurprisingly, protonation switches have been associated with a wide array of protein behaviors, including modulating the binding affinity in protein-protein, protein-ligand, or protein-lipid systems, modifying enzymatic activity and function, and even altering their stability and subcellular location. Despite its importance, our molecular understanding of pHdependent effects in proteins and other biomolecules is still very limited, particularly in big macromolecular complexes such as protein-protein or membrane protein systems. Over the years, several classes of methods have been developed to provide molecular insights into the protonation preference and dependence of biomolecules. Empirical methods offer cheap and competitive predictions for time- or resource-constrained situations. Albeit more computationally expensive, continuum electrostatics-based are a cost-effective solution for estimating microscopic equilibrium constants, pKhalf and macroscopic pKa. To study pH-dependent conformational transitions, constant-pH molecular dynamics (CpHMD) is the appropriate methodology. Unfortunately, given the computational cost and, in many cases, the difficulty associated with using CE-based and CpHMD, most researchers overuse empirical methods or neglect the effect of pH in their studies. Here, we address these issues by proposing multiple pKa predictor methods and tools with different levels of theory designed to be faster and accessible to more users. First, we introduced PypKa, a flexible tool to predict Poisson–Boltzmann/Monte Carlo-based (PB/MC) pKa values of titratable sites in proteins. It was validated with a large set of experimental values exhibiting a competitive performance. PypKa supports CPU parallel computing and can be used directly on proteins obtained from the Protein Data Bank (PDB) repository or molecular dynamics (MD) simulations. A simple, reusable, and extensible Python API is provided, allowing pKa calculations to be easily added to existing protocols with a few extra lines of code. This capability was exploited in the development of PypKa-MD, an easy-to-use implementation of the stochastic titration CpHMD method. PypKa-MD supports GROMOS and CHARMM force fields, as well as modern versions of GROMACS. Using PypKa’s API and consequent abstraction of PB/MC contributed to its greatly simplified modular architecture that will serve as the foundation for future developments. The new implementation was validated on alanine-based tetrapeptides with closely interacting titratable residues and four commonly used benchmark proteins, displaying highly similar and correlated pKa predictions compared to a previously validated implementation. Like most structural-based computational studies, the majority of pKa calculations are performed on experimental structures deposited in the PDB. Furthermore, there is an ever-growing imbalance between scarce experimental pKa values and the increasingly higher number of resolved structures. To save countless hours and resources that would be spent on repeated calculations, we have released pKPDB, a database of over 12M theoretical pKa values obtained by running PypKa over 120k protein structures from the PDB. The precomputed pKa estimations can be retrieved instantaneously via our web application, the PypKa Server. In case the protein of interest is not in the pKPDB, the user may easily run PypKa in the cloud either by uploading a custom structure or submitting an identifier code from the PBD or UniProtKB. It is also possible to use the server to get structures with representative pH-dependent protonation states to be used in other computational methods such as molecular dynamics. The advent of artificial intelligence in biological sciences presented an opportunity to drastically accelerate pKa predictors using our previously generated database of pKa values. With pKAI, we introduced the first deep learning-based predictor of pKa shifts in proteins trained on continuum electrostatics data. By combining a reasonable understanding of the underlying physics, an accuracy comparable to that of physics-based methods, and inference time speedups of more than 1000 ×, pKAI provided a game-changing solution for fast estimations of macroscopic pKa from ensembles of microscopic values. However, several limitations needed to be addressed before its integration within the CpHMD framework as a replacement for PypKa. Hence, we proposed a new graph neural network for protein pKa predictions suitable for CpHMD, pKAI-MD. This model estimates pH-independent energies to be used in a Monte Carlo routine to sample representative microscopic protonation states. While developing the new model, we explored different graph representations of proteins using multiple electrostatics-driven properties. While there are certainly many new features to be introduced and a multitude of development to be expanded, the selection of methods and tools presented in this work poses a significant improvement over the alternatives and effectively constitutes a new generation of user-friendly and machine learning-accelerated methods for pKa calculations.	pt_PT
dc.identifier.tid	101661797	pt_PT
dc.identifier.uri	http://hdl.handle.net/10451/59848
dc.language.iso	eng	pt_PT
dc.relation	Melhoramento da eficiência das simulações de dinâmica molecular a pH constante em sistemas biológicos complexos
dc.subject	PKa	pt_PT
dc.subject	Protonação	pt_PT
dc.subject	Ph constante	pt_PT
dc.subject	Machine learning	pt_PT
dc.subject	Protonation	pt_PT
dc.subject	Constant-pH	pt_PT
dc.title	A new generation of user-friendly and machine learning-accelerated methods for protein pKa calculations	pt_PT
dc.type	doctoral thesis
dspace.entity.type	Publication
oaire.awardNumber	SFRH/BD/136226/2018
oaire.awardTitle	Melhoramento da eficiência das simulações de dinâmica molecular a pH constante em sistemas biológicos complexos
oaire.awardURI	info:eu-repo/grantAgreement/FCT//SFRH%2FBD%2F136226%2F2018/PT
person.familyName	de Brito Pires Santos Reis
person.givenName	Pedro
person.identifier.ciencia-id	BA1A-8F17-5F20
person.identifier.orcid	0000-0003-3563-6239
project.funder.identifier	http://doi.org/10.13039/501100001871
project.funder.name	Fundação para a Ciência e a Tecnologia
rcaap.rights	openAccess	pt_PT
rcaap.type	doctoralThesis	pt_PT
relation.isAuthorOfPublication	2c5580ce-a063-4c41-8a82-c060520a43e3
relation.isAuthorOfPublication.latestForDiscovery	2c5580ce-a063-4c41-8a82-c060520a43e3
relation.isProjectOfPublication	e3ee07f0-ea72-4c82-85a0-9e8e42986fcf
relation.isProjectOfPublication.latestForDiscovery	e3ee07f0-ea72-4c82-85a0-9e8e42986fcf
thesis.degree.name	Tese de doutoramento, Bioquímica (Biofísica Molecular), Universidade de Lisboa, Faculdade de Ciências, 2023	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: scnd741463_td_Pedro_Reis.pdf
Tamanho:: 21.25 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.2 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

FC - Teses de Doutoramento