- DePristo MA (2007) The subtle benefits of being promiscuous:
adaptive evolution potentiated by enzyme promiscuity. HFSP Journal. [PDF (302.95 KB)]
Adaptation is often regarded as the sequential fixation of individually, intrinsically beneficial mutations. Contrary to this expectation, we find a surprisingly large number of evolutionary trajectories on which natural selection first favors a mutation, then favors its removal, and later still favors its ultimate restoration during the course of antibiotic resistance evolution. The existence of reversion trajectories implies that natural selection may not follow the most parsimonious path separating two alleles, even during adaptation. Altogether, this discovery highlights the unusual and potentially circuitous routes natural selection can follow during adaptation.
- DePristo MA, Hartl DL, Weinreich DM (2007) Mutational reversions during adaptive protein evolution. Mol Biol Evol. [Medline] [PDF (155.15 KB)]
Adaptation is often regarded as the sequential fixation of individually, intrinsically beneficial mutations. Contrary to this expectation, we find a surprisingly large number of evolutionary trajectories on which natural selection first favors a mutation, then favors its removal, and later still favors its ultimate restoration during the course of antibiotic resistance evolution. The existence of reversion trajectories implies that natural selection may not follow the most parsimonious path separating two alleles, even during adaptation. Altogether, this discovery highlights the unusual and potentially circuitous routes natural selection can follow during adaptation.
- Furnham N, Dore AS, Chirgadze DY, de Bakker PI, Depristo MA, Blundell TL (2006) Knowledge-based real-space explorations for low-resolution structure determination. Structure. [Medline] [PDF (442.79 KB)]
Protein sequences frequently contain regions composed of a reduced number of amino acids. Despite their presence in about half of all proteins and their unusual prevalence in the malaria parasite Plasmodium falciparum, the function and evolution of such low-complexity regions (LCRs) remain unclear. Here we show that LCR abundance and amino acid composition depend largely, but not exclusively, on genomic A+T content and obey power-law growth dynamics. Further, our results indicate that LCRs are analogous to microsatellites in that DNA replication slippage and unequal crossover recombination are important molecular mechanisms for LCR expansion. We support this hypothesis by demonstrating that the size of LCR insertions/deletions among orthologous genes depends upon length. Moreover, we show that LCRs enable intra-exonic recombination in a key family of cell-surface antigens in P. falciparum and thus likely facilitate the generation of antigenic diversity. We conclude with a mechanistic model for LCR evolution that links the pattern of LCRs within P. falciparum to its high genomic A+T content and recombination rate.
- Best RB, Lindorff-Larsen K, DePristo MA, Vendruscolo M. (2006) Relation between native ensembles and experimental structures of proteins. PNAS. [Medline] [PDF (1.16 MB)]
Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of "high-sequence similarity Protein Data Bank" (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble.
- Depristo MA, Zilversmit MM, Hartl DL (2006) On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins. Gene. [Medline] [PDF (607.15 KB)]
Protein sequences frequently contain regions composed of a reduced number of amino acids. Despite their presence in about half of all proteins and their unusual prevalence in the malaria parasite Plasmodium falciparum, the function and evolution of such low-complexity regions (LCRs) remain unclear. Here we show that LCR abundance and amino acid composition depend largely, but not exclusively, on genomic A+T content and obey power-law growth dynamics. Further, our results indicate that LCRs are analogous to microsatellites in that DNA replication slippage and unequal crossover recombination are important molecular mechanisms for LCR expansion. We support this hypothesis by demonstrating that the size of LCR insertions/deletions among orthologous genes depends upon length. Moreover, we show that LCRs enable intra-exonic recombination in a key family of cell-surface antigens in P. falciparum and thus likely facilitate the generation of antigenic diversity. We conclude with a mechanistic model for LCR evolution that links the pattern of LCRs within P. falciparum to its high genomic A+T content and recombination rate.
- Marcaida MJ, Depristo MA, Chandran V, Carpousis AJ, Luisi BF (2006) The RNA degradosome: life in the fast lane of adaptive molecular evolution. Trends Biochem Sci. [Medline] [PDF (306.41 KB)]
In Escherichia coli, the multi-enzyme RNA degradosome contributes to the global, posttranscriptional regulation of gene expression. The degradosome components are recognized through natively unstructured microdomains comprising as few as 15-40 amino acids. Consequently, the degradosome might experience a comparatively smaller number of evolutionary constraints, because there is little requirement to maintain a folded state for the interaction sites. New regulatory properties of the degradosome could arise with relative rapidity, because partners that modify its function could be recruited by quickly evolving microdomains. The unusual combination of the centrality of RNA degradation in gene expression and the generality of natively unstructured microdomains in recognition can fortuitously confer a capacity for efficacious adaptive change to degradosome-like assemblies in eubacteria.
- Weinreich DM, Delaney NF, Depristo MA, Hartl DL (2006) Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312 (5770) 111-114. [Medline] [PDF (186.04 KB)]
Five point mutations in a particular beta-lactamase allele jointly increase bacterial resistance to a clinically important antibiotic by a factor of approximately 100,000. In principle, evolution to this high-resistance beta-lactamase might follow any of the 120 mutational trajectories linking these alleles. However, we demonstrate that 102 trajectories are inaccessible to Darwinian selection and that many of the remaining trajectories have negligible probabilities of realization, because four of these five mutations fail to increase drug resistance in some combinations. Pervasive biophysical pleiotropy within the beta-lactamase seems to be responsible, and because such pleiotropy appears to be a general property of missense mutations, we conclude that much protein evolution will be similarly constrained. This implies that the protein tape of life may be largely reproducible and even predictable.
- Furnham N, Blundell TL, DePristo MA, Terwilliger TC (2006) Is one solution good enough? Nature Structure and Molecular Biology 3 (3) 184-185. [Medline] [PDF (82.96 KB)]
- Paul I.W. de Bakker, Nick Furnham, Tom L. Blundell, Mark A. DePristo (2006) Conformer generation under restraints. Current Opinion in Structural Biology 6 (2) 1311-1319. [Medline] [PDF (1.64 MB)]
Conformational sampling by direct optimization of an all-atom energy function is ineffective and inefficient because of the ruggedness of the energy landscape. Discrete sampling schemes represent an attractive alternative for generating ensembles of conformers consistent with spatial restraints derived from empirical data. Conformational sampling is becoming increasingly important for structure prediction as the bottleneck in accurate prediction shifts from energy functions to the methods used to find low-energy conformers. Experimental structure determination remains a perennial challenge as investigators tackle larger macromolecular systems, and begin to incorporate more complete descriptions of uncertainty, heterogeneity and dynamics into their models. Computational approaches that combine dense, discrete sampling with all-atom energy evaluation and refinement may help to overcome the remaining barriers to solving these problems.
- Mark A. DePristo, Paul I.W. de Bakker, Russell J. Johnson, Tom L. Blundell (2005) Crystallographic refinement by knowledge-based exploration of complex energy landscapes. Structure 13 (9) 1311-1319. [Medline] [PDF (504.96 KB)]
Although X-ray crystallography remains the most versatile method to determine the three-dimensional atomic structure of proteins and much progress has been made in model building and refinement techniques, it remains a challenge to elucidate accurately the structure of proteins in medium-resolution crystals. This is largely due to the difficulty of exploring an immense conformational space to identify the set of conformers that collectively best fits the experimental diffraction pattern. We show here that combining knowledge-based conformational sampling in RAPPER with molecular dynamics/simulated annealing (MD/SA) vastly improves the quality and power of refinement compared to MD/SA alone. The utility of this approach is highlighted by the automated determination of a lysozyme mutant from a molecular replacement solution that is in congruence with a model prepared independently by crystallographers. Finally, we discuss the implications of this work on structure determination in particular and conformational sampling and energy minimization in general.
This article has been highlighted in a number of journals!
- Randy Read (2005) Liberating Crystallographers. Structure.13(9) 1236-1237. [PDF (70.75 KB)]
- Mark A. DePristo, Dan M. Weinreich, Daniel L. Hartl (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Reviews Genetics 6 (9) 678-687. [Medline] [PDF (674.49 KB)]
Proteins are finicky molecules; they are barely stable and are prone to aggregate, but they must function in a crowded environment that is full of degradative enzymes bent on their destruction. It is no surprise that many common diseases are due to missense mutations that affect protein stability and aggregation. Here we review the literature on biophysics as it relates to molecular evolution, focusing on how protein stability and aggregation affect organismal fitness. We then advance a biophysical model of protein evolution that helps us to understand phenomena that range from the dynamics of molecular adaptation to the clock-like rate of protein evolution.
- Kresten Lindorff-Larsen, Robert B. Best, Mark A. DePristo, Christopher M. Dobson & Michele Vendruscolo (2005) Simultaneous determination of protein structure and dynamics. Nature 433 (7022) 128-32. [Medline] [PDF (293.59 KB)]
We present a protocol for the experimental determination of ensembles of protein conformations that represent simultaneously the native structure and its associated dynamics. The procedure combines the strengths of nuclear magnetic resonance spectroscopy--for obtaining experimental information at the atomic level about the structural and dynamical features of proteins--with the ability of molecular dynamics simulations to explore a wide range of protein conformations. We illustrate the method for human ubiquitin in solution and find that there is considerable conformational heterogeneity throughout the protein structure. The interior atoms of the protein are tightly packed in each individual conformation that contributes to the ensemble but their overall behaviour can be described as having a significant degree of liquid-like character. The protocol is completely general and should lead to significant advances in our ability to understand and utilize the structures of native proteins.
- M.A. DePristo, P.I.W. de Bakker, T.L. Blundell (2004) Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure 12(5) 831-838. [Medline] [PDF (280.16 KB)]
Proteins are dynamic molecules, exhibiting structural heterogeneity in the form of anisotropic motion and discrete conformational substates, often of functional importance. In protein structure determination by X-ray crystallography, the observed diffraction pattern results from the scattering of X-rays by an ensemble of heterogeneous molecules, ordered and oriented by packing in a crystal lattice. The majority of proteins diffract to resolutions where heterogeneity is difficult to identify and model, and are therefore approximated by a single, average conformation with isotropic variance. Here we show that disregarding structural heterogeneity introduces degeneracy into the structure determination process, as many single, isotropic models exist that explain the diffraction data equally well. The large differences among these models imply that the accuracy of crystallographic structures has been widely overestimated. Further, it suggests that analyses that depend on small differences in the relative positions of atoms may be flawed.
Our article has been highlighted in a number of journals!
- Uncertainty in Structures (2004) Journal of Cell Biology.165(5) 605. [PDF (270.68 KB)]
- Jane Richardson (2004) The Protein Surface is a Moving Target. Structure.12(6) 912-913. [PDF (137.65 KB)]
- Joanna Owens (2004) Not so crystal clear. Nature Reviews Drug Discovery.3 552. [PDF (383.85 KB)]
- R.P. Shetty, P.I.W. de Bakker, M.A. DePristo, T.L. Blundell (2003) On the advantages of fine-grained side chain conformer library for protein modelling. Protein Engineering 16(12) 963-969. [Medline] [PDF (173.61 KB)]
We compare the modelling accuracy of two common rotamer libraries--the Dunbrack-Cohen and the "Penultimate" rotamer libraries--to that of a novel library of discrete side chain conformations extracted from the Protein Data Bank. These side chain conformer libraries are extracted automatically from high-quality protein structures using stringent filters, and maintain crystallographic bond lengths and angles. This contrasts with traditional rotamer libraries defined in terms of c angles under the assumption of idealized covalent geometry. We demonstrate that side chain modelling onto native and near-native main chain conformations is significantly more successful with the conformer libraries than with the rotamer libraries when solely considering excluded-volume interactions. The rotamer libraries are inadequate to model side chains without atomic clashes on over 20% of targets if the backbone is held fixed in the native conformation. An algorithm is described for simultaneously modelling both main chain and side chain atoms during discrete ab initio sampling. The resulting models have equivalent root-mean-square deviations from the experimentally determined protein loops as models from backbone-only ensembles, indicating that all-atom modelling does not detract from the accuracy of conformational sampling.
- M.A. DePristo, P.I.W. de Bakker, R.P. Shetty, T.L. Blundell. (2003) Discrete restraint-based protein modeling and the Cα-trace problem. Protein Science 12(9):2032-46 [Medline] [PDF (251.4 KB)]
We present a novel de novo method to generate protein models from sparse, discretized restraints on the conformation of the main chain and side chain atoms. We focus on Cα-trace generation, the problem of constructing an accurate and complete model from approximate knowledge of the positions of the Cα atoms and, in some cases, the side chain centroids. Spatial restraints on the Cα atoms and side chain centroids are supplemented by constraints on main chain geometry, φ/ψ angles, rotameric side chain conformations, and inter-atomic separations derived from analyses of known protein structures. A novel conformational search algorithm, combining features of tree-search and genetic algorithms, generates models consistent with these restraints by propensity-weighted dihedral angle sampling. Models with ideal geometry, good φ/ψ angles, and no inter-atomic overlaps are produced with 0.8 Å main chain and, with side chain centroid restraints, 1.0 A all-atom root-mean-square deviation (RMSD) from the crystal structure over a diverse set of target proteins. The mean model derived from 50 independently generated models is closer to the crystal structure than any individual model, with 0.5 Å main chain RMSD under only Cα restraints and 0.7 Å all-atom RMSD under both Calpha and centroid restraints. The method is insensitive to randomly distributed errors of up to 4 Å in the Cα restraints. The conformational search algorithm is efficient, with computational cost increasing linearly with protein size. Issues relating to decoy set generation, experimental structure determination, efficiency of conformational sampling, and homology modeling are discussed.
- P.I.W. de Bakker, M.A. DePristo, D.F. Burke, T.L. Blundell (2002)
Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model.
Proteins: Structure, Function, and Genetics 51:21-40
[Medline] [PDF (2.09 MB)]
The accuracy of model selection from decoy ensembles of protein loop conformations was explored by comparing the performance of the Samudrala-Moult all-atom statistical potential (RAPDF) and the AMBER molecular mechanics force field, including the Generalized Born/surface area solvation model. Large ensembles of consistent loop conformations, represented at atomic detail with idealized geometry, were generated for a large test set of protein loops of 2 to 12 residues long by a novel ab initio method called RAPPER that relies on fine-grained residue-specific φ/ψ propensity tables for conformational sampling. Ranking the conformers on the basis of RAPDF scores resulted in selected conformers that had an average global, non-superimposed RMSD for all heavy mainchain atoms ranging from 1.2 Å for 4-mers to 2.9 Å for 8-mers to 6.2 Å for 12-mers. After filtering on the basis of anchor geometry and RAPDF scores, ranking by energy minimization of the AMBER/GBSA potential energy function selected conformers that had global RMSD values of 0.5 Å for 4-mers, 2.3 Å for 8-mers, and 5.0 Å for 12-mers. Minimized fragments had, on average, consistently lower RMSD values (by 0.1 Å) than their initial conformations. The importance of the Generalized Born solvation energy term is reflected by the observation that the average RMSD accuracy for all loop lengths was worse when this term is omitted. There are, however, still many cases where the AMBER gas-phase minimization selected conformers of lower RMSD than the AMBER/GBSA minimization. The AMBER/GBSA energy function had better correlation with RMSD to native than the RAPDF. When the ensembles were supplemented with conformations extracted from experimental structures, a dramatic improvement in selection accuracy was observed at longer lengths (average RMSD of 1.3 Å for 8-mers) when scoring with the AMBER/GBSA force field. This work provides the basis for a promising hybrid approach of ab initio and knowledge-based methods for loop modeling.
- M.A. DePristo, P.I.W. de Bakker, S.C. Lovell, T.L. Blundell (2002)
Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles.
Proteins: Structure, Function, and Genetics 51:41-55
[Medline] [PDF (408.48 KB)]
We describe a novel method to generate ensembles of conformations of the main-chain atoms [N, Cα, C, O, Cβ] for a sequence of amino acids within the context of a fixed protein framework. Each conformation satisfies fundamental stereo-chemical restraints such as idealized geometry, favorable phi/psi angles, and excluded volume. The ensembles include conformations both near and far from the native structure. Algorithms for effective conformational sampling and constant time overlap detection permit the generation of thousands of distinct conformations in minutes. Unlike previous approaches, our method samples dihedral angles from fine-grained phi/psi state sets, which we demonstrate is superior to exhaustive enumeration from coarse phi/psi sets. Applied to a large set of loop structures, our method samples consistently near-native conformations, averaging 0.4, 1.1, and 2.2 Å main-chain root-mean-square deviations for four, eight, and twelve residue long loops, respectively. The ensembles make ideal decoy sets to assess the discriminatory power of a selection method. Using these decoy sets, we conclude that quality of anchor geometry cannot reliably identify near-native conformations, though the selection results are comparable to previous loop prediction methods. In a subsequent study (de Bakker et al.: Proteins 2003;51:21-40), we demonstrate that the AMBER forcefield with the Generalized Born solvation model identifies near-native conformations significantly better than previous methods.
- M.A. DePristo, R. Zubek. (2001) being-in-the-world. in Proceedings of the 2001 AAAI Spring Symposium on Artificial Intelligence and Interactive Entertainment [PDF (23.43 KB)]
being-in-the-world is an intelligent agent capable of living autonomously in a Multi-User Dungeon world. In this paper we present a hybrid-architecture approach to building such an agent, discuss the successes and pitfalls of this technique, and potential improvements.
- M.A. DePristo. (2000) SINTL: A Strongly-Typed Generic Intermediate Language for Scheme. Northwestern University, Computer Science Honors Thesis. [PDF (3.39 MB)]
This paper describes SINTL, a strongly-typed generic intermediate language for Scheme. The paper begins by outlining the motivation for developing a sophisticated intermediate language for a new Scheme compiler. We consider two novel aspects of SINTL as an intermediate language for Scheme, a declarative type system and type inference algorithm, and a register to stack machine conversion algorithm. We then demonstrate the effectiveness of these two techniques by discussing the JVM-SINTL backend, a fully functional backend to the Java Virtual Machine. Following a discussion of the representational choices and compilation techniques used in JVM-SINTL, we compare the performance of the JVM-SINTL backend relative to several popular Scheme compilers. Ultimately, this thesis demonstrates that a sophisticated compiler with well-selected analyses and optimizations can generate high-quality code running on the JVM, and acheive performance an order of magnitude better than current Scheme compilers to the JVM.