Computational Methods To Study The Structure and Dynamics of Biomolecules and Biomolecular Processes
Computational Methods To Study The Structure and Dynamics of Biomolecules and Biomolecular Processes
Computational Methods To Study The Structure and Dynamics of Biomolecules and Biomolecular Processes
Computational Methods
to Study the Structure
and Dynamics of
Biomolecules and
Biomolecular Processes
From Bioinformatics to Molecular
Quantum Mechanics
Second Edition
Springer Series on Bio- and Neurosystems
Volume 8
Series editor
Nikola Kasabov, Knowledge Engineering and Discovery Research Institute,
Auckland University of Technology, Penrose, New Zealand
The Springer Series on Bio- and Neurosystems publishes fundamental principles
and state-of-the-art research at the intersection of biology, neuroscience, informa-
tion processing and the engineering sciences. The series covers general informatics
methods and techniques, together with their use to answer biological or medical
questions. Of interest are both basics and new developments on traditional methods
such as machine learning, artificial neural networks, statistical methods, nonlinear
dynamics, information processing methods, and image and signal processing. New
findings in biology and neuroscience obtained through informatics and engineering
methods, topics in systems biology, medicine, neuroscience and ecology, as well as
engineering applications such as robotic rehabilitation, health information tech-
nologies, and many more, are also examined. The main target group includes
informaticians and engineers interested in biology, neuroscience and medicine, as
well as biologists and neuroscientists using computational and engineering tools.
Volumes published in the series include monographs, edited volumes, and selected
conference proceedings. Books purposely devoted to supporting education at the
graduate and post-graduate levels in bio- and neuroinformatics, computational
biology and neuroscience, systems biology, systems neuroscience and other related
areas are of particular interest.
Computational Methods
to Study the Structure and
Dynamics of Biomolecules
and Biomolecular Processes
From Bioinformatics to Molecular Quantum
Mechanics
Second Edition
123
Editor
Adam Liwo
Faculty of Chemistry
University of Gdańsk
Gdańsk, Poland
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface to the Second Edition
In silico, studies of the biomolecular system are now routinely performed to aid
experiment as well as to get some knowledge of the systems and processes that
occur there, in situations in which the experiment requires too much cost and labor
or gives fragmentary information (e.g., the details of protein dynamics). Such
studies constitute a truly interdisciplinary field, which comprises quantum
mechanics, molecular physics, molecular biology, numerical mathematics, and
computer science, which makes it virtually impossible for anyone to be an expert in
all these diverse domains.
The motivation behind the shape and structure of the book, starting from its first
edition published 4 years ago, was the old Latin proverb, which says “Verba
docent, exempla trahunt” or, slightly rephrasing, it is best to learn by looking at
good examples. Therefore, this book is a collection of chapters written by leading
scientists in the field, who are developers of the methods or experts in applying the
existing methods to solve concrete problems. As in the first edition of this book, the
chapters are grouped into four thematic sections (methodology, applications of
molecular simulations, bioinformatics, and molecular quantum mechanics), plus the
introduction written by Harold A. Scheraga, one of the very pioneers of the
application of theoretical methods in studying biological systems. The book is
addressed both to end users and to method developers; the researchers who start
applying or developing computational methods can learn, by the case studies
reported in the consecutive chapters, how to proceed and how to avoid errors, while
advanced researchers in the field can grasp on good solutions.
Considerable attention received by the first edition of the book was the motivation
to work on the second one. Because the field is advancing rapidly, many chapters were
updated, often extended in scope. These are the chapters authored by the scientists
from the laboratories of Andrzej Koliński, Mariusz Makowski, Joanna Trylska,
Ulrich Hansmann, Marek Cieplak, Marta Pasenkiewicz-Gierula, Sławomir Filipek,
Anders Irbäck, Patrick Senet, Istvan Simon, Irena Roterman, and Giovanni La Penna.
Two more chapters have been added, one about all-atom MD studies of peptide
aggregation, authored by Maksim Kouza, Andrzej Kolinski, Irina Buhimschi, and
Andrzej Kloczkowski, and another one, pertaining to the bioinformatics section,
v
vi Preface to the Second Edition
Since the second half of the twentieth century, machine computations play a con-
tinuously increasing role in science and engineering. Computer simulations are
particularly important in studying biological systems at the molecular level, because
they are often the only way to get an idea of the behavior of the whole system. The
difference in timescale and size scale, as well as in the required accuracy of
description, demands the use of different approaches, from comparative analysis of
sequence and structural databases or analyzing the networks of interdependence
between cell components and processes, through coarse-grained modeling where
individual molecules come into play, although at an approximate level, to atomi-
cally detailed simulations and, finally, molecular quantum mechanics.
Aside to contributing to our understanding of the complex machinery of living
cells and organisms, the computation of three-dimensional structure and dynamic
behavior of biomacromolecules and their complexes with ligands is slowly
becoming an alternative to expensive screening experiments, which are vital in the
search for lead compounds in drug design. The variety of available techniques made
it necessary to set up systems with which to test the development of the existing and
the quality of new approaches. For the prediction of protein structure, such a system
known as Community-Wide Experiment on the Critical Assessment of Techniques
for Protein Structure Prediction (CASP; see http://www.predictioncenter.org) was
established in the year 1994 by John Moult and colleagues, and already the
10th edition of this experiment was held in 2012. Similar systems to test the
performance of protein-docking algorithms (CAPRI), prediction of crystal struc-
tures of small organic molecules, and prediction of RNA structures, respectively,
were established later, following the successful example of CASP. Consequently,
the computational techniques are constantly subject to rigorous verification.
This book provides an overview of modern computer-based techniques for the
calculations of structure, properties, and dynamics of biomolecules and biomolec-
ular processes. Its 22 chapters have been contributed by leading scientists from all
over the world and address computer simulation techniques for studying biological
phenomena from the perspective of both methodology and applications. The
chapters are grouped into four thematic issues on the methodology of molecular
vii
viii Preface to the First Edition
protein conformation are presented. Chapter 7 from the Yuko Okamoto group
discusses optimization of force field parameters.
The last two chapters of the simulation methods part of the book are devoted to
techniques for conformational search and dynamics. Chapter 8 from the group of
Ulrich Hansmann, who is one of the leading developers of conformational sampling
techniques, discusses approaches for the enhancement of the capability of Monte
Carlo and molecular dynamics methods to search the conformational space. The
theory and applications of generalized ensemble sampling methods, including the
widely used replica-exchange method and multicanonical sampling, are discussed.
In chapter 9, written by Alfredo Cardenas, methods for construction of the entire
trajectory from short independently simulated fragments are discussed with
emphasis on the milestoning method developed largely by the author and Ron Elber.
These approaches enable us to parallelize the otherwise serial task of computing a
dynamic trajectory of a system through initial conversion of the initial-value prob-
lem to minimization of the action of a system, which is a parallelizable boundary-
value problem, and then determination of the timescale of subsequent events by
using, e.g., the milestoning method. Such an approach is likely to become a viable
alternative if not replacement for molecular dynamics because of its potential to be
implemented on distributed computing architectures.
The next section of the book, composed of chapters 10–15, is devoted to bio-
logical applications of molecular simulation techniques. In chapter 10, written by
Marek Cieplak, application of the structure-based (Gō-like) models of proteins in
simulating mechanostability of virus capsids is discussed. A comprehensive review
of modeling lipid membranes by means of all-atom molecular dynamics is provided
in chapter 11 from the Marta Pasenkiewicz-Gierula group. This chapter is followed
by a review of the molecular modeling of membrane proteins contributed by the
Sławomir Filipek group. Chapters 13 and 14 from the Anders Irback and Sylwia
Rodziewicz-Motowidlo groups, respectively, discuss simulations of amyloid for-
mation. Finally, chapter 15 from the Patrick Senet group discusses the application
of molecular dynamics to study functionally important motions of the human
Hsp70 chaperone. A procedure for verification of the calculated dynamic profiles
based on neutron-scattering measurements is also outlined.
Chapters 16–19 describe examples of the use of structural database or experi-
mental information in molecular simulations, a topic commonly termed bioinfor-
matics. Chapter 16, contributed by the Istvan Simon group, addresses the important
issue of intrinsically disordered proteins, the discovery of which has overthrown the
old paradigm that a protein must have a well-defined 3D structure to exert its
biological function. The authors give a comprehensive overview of bioinformatics
methods for the prediction of intrinsically disordered regions from amino acid
sequence of a protein. The importance of the topic is best demonstrated by the fact
that blind prediction of intrinsically disordered regions in proteins is a separate
category in recent CASP experiments. In Chapter 17 from the Bogdan Lesyng
group, techniques for finding the alignment (similarities) between protein structures
are discussed and a new method thereof is introduced based on local descriptors. In
Chapter 18, contributed by the Irena Roterman group, a new method for the
x Preface to the First Edition
xi
Contents
Part I Introduction
Simulations of the Folding of Proteins: A Historical Perspective . . . . . . 3
Harold A. Scheraga
xiii
xiv Contents
13
C Chemical Shifts in Proteins: A Rich Source of Encoded
Structural Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
Jorge A. Vila and Yelena A. Arnautova
Protein Secondary Structure Assignments and Their Usefulness
for Dihedral Angle Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Eshel Faraggi and Andrzej Kloczkowski
Harold A. Scheraga
1 Introduction
H. A. Scheraga (B)
Baker Laboratory of Chemistry and Chemical Biology, Cornell University,
Ithaca, NY 14853-1301, USA
e-mail: [email protected]
denaturation. Much later, Anfinsen [8] provided convincing evidence for the refolding
of unfolded bovine pancreatic ribonuclease A (RNase A) to the native conformation
in experiments that were later followed up in many other laboratories with many
other proteins.
Later, Scheraga implemented use of the theory to determine rotational diffusion
constants from flow-birefringence measurements [9] and, with Mandelkern [10],
made use of Flory’s theory of the hydrodynamic properties of solutions of syn-
thetic polymers [11, 12], to modify Neurath’s and Oncley’s treatment, and showed
that proteins of various asymmetric shapes were not rigid molecules, that they had
asymmetries different from those computed by Neurath and Oncley, and that their
hydrodynamic properties depend not only on their asymmetries but also on their
flexible volumes which swell considerably upon application of increasing amounts
of denaturing agents such as urea.
This was followed by a series of attempts by Brant and Flory [37], Ooi et al. [38],
Gibson and Scheraga [39, 40], Scott et al. [41], Yan et al. [42, 43], Momany et al.
[44, 45], Levitt and Lifson [46], and Hagler et al. [47] to derive improved all-atom
potential functions. Our effort in this regard led to our Empirical Conformational
Energy Program for Peptides (ECEPP) [48], which was subsequently upgraded sev-
eral times as ECEPP/2 [49, 50], ECEPP/3 [51], and ECEPP-05 [52]. Several other
all-atom empirical potentials have since been introduced, for example, CHARMM
[53], AMBER [54], and GROMOS [55]. Efforts continue in many laboratories to
improve the current potentials. These potential functions are augmented by either
explicit or continuum treatments of hydration, e.g., those of Jorgensen et al. [56];
Ooi et al. [57] and Vila et al. [58].
6 H. A. Scheraga
Whereas an all-atom approach could be used for simulating the folding of protein A
[79], the presently-available computer facilities cannot be used for larger proteins.
Therefore, a coarse-grained approach is used [89, 90] to extend the computational
ability to proteins ranging in size of up to several hundred amino acid residues. Early
efforts to use such an approach, but applied to small proteins, are those of Levitt and
Warshel [91] and Pincus and Scheraga [92].
As cited by Sieradzan et al. [93], a UNited RESidue (UNRES) model was devel-
oped in our laboratory to compute the structures of large native proteins [93–114]. In
the UNRES model, a polypeptide chain is represented as a sequence of α-carbon (Cα )
atoms with attached united side chains (SC’s) and united peptide groups (p’s) posi-
tioned halfway between two consecutive Cα ’s. Only the united side chains and united
peptide groups act as interaction sites, while the Cα atoms assist only in the definition
Simulations of the Folding of Proteins: A Historical Perspective 7
of geometry (Fig. 1). The effective energy function is defined as the restricted free
energy (RFE) or the potential of mean force (PMF) of the chain constrained to a
given coarse-grained conformation along with the surrounding solvent [109]. This
effective energy function is expressed by Eq. (1).
Fig. 1 The UNRES model of polypeptide chains. The interaction sites are peptide-group centers
(p) and side-chain centers (SC) attached to the corresponding α carbons with different Cα …SC
bond lengths, d sc . The peptide groups are represented as gray circles, and the side chains are
represented as gray ellipsoids of different sizes. The α-carbon atoms are represented by small open
circles. The geometry of the chain can be described either by the virtual-bond vectors (dCi from
Ciα to Ci+1
α , i 1, 2, . . . , n − 1, and dX from Cα to SC , i 1, 2, . . . , n − 1), represented by thick
i i i
lines, where n is the number of residues, or in terms of virtual-bond lengths, backbone virtual-bond
angles θ i , i 1, 2, . . . , n − 2, backbone virtual-bond-dihedral angles γi , i 1, 2, . . . , n − 3, and
the angles αi and βi i 1, 2, . . . , n − 1, that describe the location of a side chain with respect to the
coordinate frame defined by Ci−1 α , Cα , and Cα . Reprinted with permission from J. Chem. Phys.,
i i+1
115, 2323–2347 (2001). Copyright 2001 American Institute of Physics
8 H. A. Scheraga
pp
U w SC U SCi SC j + w SC p U SCi p j + wV DW p j + w pp f 2 (T )
U pVi DW el
U peli p j
i< j i j i< j−1 j<i−1
+ wtor f 2 (T ) Utor (γi ) + wtor d f 3 (T ) Utor d (γi , γi+1 )
i i
+ wb Ub (θi ) + wr ot Ur ot α SCi , β SCi + wbond Ubond (di )
i i i
+ wcorr f 3 (T )Ucorr
(3) (3)
+ wcorr f 4 (T )Ucorr
(4) (4)
+ wcorr f 5 (T )Ucorr
(5) (5)
+ wcorr
(6)
f 6 (T )Ucorr
(6)
where the U’s are energy terms, θi is the backbone virtual-bond angle, γi is the back-
bone virtual-bond-dihedral angle, αi and βi are the angles defining the location of
the center of the united side chain of residue i (Fig. 1), and di is the length of the
ith virtual bond, which is either a Cα ···Cα virtual bond or Cα ···SC virtual bond. Each
energy term is multiplied by an appropriate weight, wx , and the terms corresponding
to factors of order higher than 1 are additionally multiplied by the respective tem-
perature factors (107) which reflect the dependence of the first generalized-cumulant
term in those factors on temperature, as discussed in refs. 107 and 115. The factors
f n are defined by Eq. (2).
ln exp(1) + exp(−1)
f n (T ) (2)
ln exp[T /T◦ ]n−1 + exp[−T /T◦ ]n−1
where To 300 K. The term USCiSCj represents the mean free energy of the hydropho-
bic (hydrophilic) interactions between the side chains, which implicitly contains the
contributions from the interactions of the side chain with the solvent. The term USCipj
denotes the excluded-volume potential of the side-chain—peptide-group interac-
tions. The peptide-group interaction potential is split into two parts: the Lennard-
Jones interaction energy between peptide-group centers (UVDW pipj ) and the average
electrostatic energy between peptide-group dipoles (U peli p j ); the second of these terms
accounts for the tendency to form backbone hydrogen bonds between peptide groups
pi and pj . The terms Utor , Utord, Ub , Urot , and Ubond are the virtual-bond-dihedral
angle torsional terms, virtual-bond dihedral angle double-torsional terms, virtual-
bond angle bending terms, side-chain rotamer, and virtual-bond-deformation terms;
these terms account for the local properties of the polypeptide chain. The terms
U(m)
corr represent correlation or multibody contributions from the coupling between
backbone-local and backbone-electrostatic interactions, and the terms U(m) turn are cor-
relation contributions involving m consecutive peptide groups; they are, therefore,
termed turn contributions. The multibody terms are indispensable for reproduction
of regular α-helical and β-sheet structures [98, 101, 116].
The energy-term weights are determined by force-field calibration to reproduce
the structure and folding thermodynamics of selected training proteins [107, 116].
Initially, the UNRES surface was searched with a conformational space anneal-
ing (CSA) algorithm [117] to identify the region of the global minimum at con-
Simulations of the Folding of Proteins: A Historical Perspective 9
stant temperature, i.e., with constant values of the f n (T ) terms of Eq. (1). Then the
UNRES representation of this region is converted to an all-atom one [118, 119], and
a global-minimization search is continued with an all-atom potential. Alternatively,
a procedure developed by Elber and coworkers [120, 121], in which the action is
minimized with appropriate constraints, can convert UNRES trajectories to all-atom
trajectories. Later, canonical molecular dynamics was used (see Sect. 7) to identify
the global-minima conformations in the UNRES representation. The f n (T ) terms
were also included in Eq. (1) [107, 122], based on Kubo’s cumulant series in powers
of (RT)−1 [123], in order to introduce the entropy and thereby evaluate thermody-
namic quantities and proper folding temperatures as well as the native structure and
stable intermediates leading to it.
Early calculations were carried out for single-chain proteins. Subsequently, the
methodology was extended to apply UNRES to molecular dynamics calculations
(see Sect. 7) of multi-chain proteins [124].
Successful early applications to compute structure with UNRES and CSA [95, 97,
117] encouraged us to submit predicted protein structures for evaluation in the CASP
(Critical Assessment of Structure Prediction) blind tests. An example of our initial
submissions [125] to CASP is shown in Fig. 2.
Fig. 2 Structure of HDEA from CASP blind test, Superposition of the crystal (red) and predicted
(yellow) structures. Top: Helices 3, 4 and 5 (between residues D25 and I85) are indicated as H-3,
H-4 and H-5, respectively. Bottom: Helices 2 and 3 (between residues W16 and K42) are indicated
as H-2 and H-3, respectively. Reproduced with permission from Figs. 1 and 2 of reference 125.
Copyright 1999 Proceedings of the National Academy of Sciences, U.S.A.
The quantity (AT MA + H) is the inertia matrix, where A is the matrix of a linear
transformation from the space of generalized coordinates and velocities (q and q̇) to
the space of the Cartesian coordinates and velocities of the interacting sites, M is the
diagonal matrix of the masses of the interacting sites, and H is the part of the inertia
matrix that corresponds to the internal (stretching) motions of the virtual bonds. The
quantity U is the UNRES potential energy (Eq. 1), ∇q is the gradient of U, is the
friction matrix (elements of Stokes law), and the random forces are
2RT T 1/2
f rand A N(0, 1) (4)
δt
where R is the universal gas constant, T is the absolute temperature, δt is the inte-
gration time step, and N(0, 1) is a 3D vector whose components are sampled inde-
pendently from a normal distribution with zero mean and unit variance. Together,
the last two terms (friction and random forces) of Eq. (3) constitute a thermostat that
maintains the average temperature at the preset value.
Simulations of the Folding of Proteins: A Historical Perspective 11
Applications of Eq. (3) to several single-chain [104] and multiple-chain [124] pro-
tein systems led to the conclusion [107] that the UNRES/MD approach can facilitate
microsecond and, possibly, millisecond simulations of protein folding and, conse-
quently, of the folding process of proteins in real time.
In addition to the computation of protein structure, UNRES/MD has been applied
to the calculation of folding kinetics [126] and, with the introduction of temperature
dependence in Eq. (1) [107], UNRES/MD has been applied to the calculation of
thermodynamic properties.
To speed up and extend the exploration of conformational space, replica exchange
molecular dynamics (REMD) and multiplexed replica exchange molecular dynamics
(MREMD) have been introduced.
In the REMD method [127–129], M canonical MD simulations are carried
out simultaneously, each one at a different temperature. Initially the temperatures
increase with the sequential number of the simulation (trajectory). After every M
steps, an exchange of temperatures between neighboring trajectories is attempted,
the decision about the exchange being made being based on the Metropolis crite-
rion . If ≤ 0, the two temperatures are exchanged; otherwise, the exchange is
performed with probability exp(−).
The multiplexed variant (MREMD) developed by Rhee and Pande [130] differs
from the original REMD method in that several trajectories are run at a given tem-
perature. Each set of trajectories run at a different temperature constitutes a layer.
Exchanges are attempted not only within a single layer but also between layers. It has
been demonstrated [131] that MREMD increases the power of REMD considerably,
and convergence of the thermodynamic quantities is achieved much faster.
Many biological problems involve proteins of large size that cannot be simulated with
an all-atom force field. For this reason, resort has been had to the use of UNRES to
treat the following biological problems: the aggregation of Aβ [132], the structure
of PICK1 [133], and the opening and closing of the Hsp70 chaperone [134].
Figure 3 illustrates the starting configuration in which one monomer was removed
from the native structure of a 7-chain fibril of Aβ and arranged in an extended confor-
mation. At various times in the simulation, the monomer undergoes conformational
changes, including formation of a partial α-helix, and ends up in a hairpin conforma-
tion bound as the native structure of the fibril. In Fig. 4, even though the PDZ domain
of PICK1 was started from opposite sides of the BAR domains in the simulation, both
starting structures end in the same stable configuration, namely, near the center of the
concave surface of the BAR domains. In simulations of the opening and closing of
an Hsp70 chaperone, UNRES does not include parameters for ADP and ATP which
participate in the configurational changes of the substrate binding domain (SBD) and
the nucleotide binding domain (NBD). Therefore, the simulations were carried out
by constraining the NBD domain to the structures with ADP and ATP, respectively,
12 H. A. Scheraga
Fig. 3 Selected snapshots, between t 0.01 and 14.70 ns, along a representative trajectory of an
initially fully-extended monomer at t 0 binding to a 7-chain fibril of Aβ. After forming an α-
helical portion at t 0.14 ns, the monomer docks at t 0.26 ns, with native orientation. At t
1.8 ns, the N-terminal strand is locked into the template. Meanwhile the C-terminus, which is still
free to move, bends and makes a β strand with itself. This conformation is very stable but, at t
14.49 ns, the β strand is finally disrupted. Shortly after that, at t 14.7 ns, the monomer binds with
the native conformation. Reprinted from J. Mol. Biol., 404 (3), A. Rojas, A. Liwo, D. Browne, H.A.
Scheraga, Mechanism of Fiber Assembly: Treatment of Aβ Peptide Aggregation with a Coarse-
Grained United-Residue Force Field, 537–552 (2010), with permission from Elsevier
bound. SBD is split into the β-sheet (SBD-β) and α-helical (SBD-α) subdomains,
and NBD consists of NBD-I and NBD-II subdomains. Binding of ATP to the NBD
facilitates a transformation (Fig. 5) in which a substrate protein is released from the
SBD.
In a landmark experiment, Paul Doty and Julius Marmur provided experimental sup-
port for the Watson and Crick double-helical model of DNA by demonstrating that
DNA could be unfolded and then re-folded thermally [135]. In order to simulate these
processes, and also to be able to treat protein-DNA interactions (in connection with
UNRES), it is necessary to have a coarse-grained model of nucleic acids. Several
coarse-grained models of nucleic acids have already been reported [136–141]. We
have also formulated a coarse-grained model of nucleic acid bases [142]. Each base
is represented by 3–5 interaction centers. The interactions between bases are divided
into a van der Waals component modeled with a Lennard-Jones 12-6 energy function
Simulations of the Folding of Proteins: A Historical Perspective 13
Fig. 5 Illustration of the rotation of NBD-I with respect to NBD-II of Hsp70, which brings SBD-
β close to the backside of NBD-II. a Hsp70 structure in which NBD-I crosses NBD-II; b NBD
structure in which NBD-I moves closer to a parallel orientation with respect to NBD-II; c The
structure after 10 ps of simulations, in which NBD-I is nearly parallel to NBD-II. The “switch”
(SW) α-helix, which runs from E369 to G380, rotates with respect to NBD-II, following the rotation
of NBD-I, with which it is associated through interactions with the E171–Y179 “holder” (HO) α-
helix. The SW α-helix is connected to SBD-β by a linker. Consequently, its motion switches the
orientation of SBD-β from the top of NBD to the back side of this domain and brings it closer to
NBD-I; it also brings the linker segment closer to the β-sheet of NBD-II and enables it to join it
as a β-strand in the Hsp70 structure, thus fixing SBD at a short distance from the NBD. Reprinted
with permission from the Journal of Chemical Theory and Computation, 8, 1750–1764 (2012).
Copyright 2012 American Chemical Society
Other simulations of large structures besides Aβ, PICK1 and Hsp70 have been carried
out. Klaus Schulten has treated proteins in the lipid environment of a membrane, the
structure of tobacco mosaic virus, and several other viruses [144]. The feasibility
of such simulations was enhanced by the development in Schulten’s group of the
NAMD parallelized software [144].
Brooks and coworkers obtained an atomically detailed picture of functionally
important structural rearrangements that occur during translocation by combin-
ing structural data for a ribosome from X-ray crystallography and cryo-electron
microscopy with dynamic models based on elastic network normal mode analysis
[145].
Jernigan and coworkers used a coarse-grained elastic network model to explore
how well conformational transitions in proteins can be predicted by normal mode
motions [146]. They concluded that the applicability of an elastic network model to
explore conformational changes depends strongly on how collective is the transition.
12 Conclusions
The foregoing brief historical perspective has traced the development of our consider-
ations of the physical interactions in proteins, and the applications of this information
to the formulation of simulation approaches to treat biological processes. In the near
future, we may expect to see simulations of protein-protein and protein-DNA inter-
actions and the treatment of very large biological complexes that make up the living
cell. Hopefully, this will facilitate the treatment of many diseases that originate from
malfunction of protein systems.
Further reading about simulations of biological systems can be found in the rest
of this book, in Gregory Voth’s book [90], and in a new two-volume treatise edited
by Tamar Schlick [150].
References
1. Cohn, E.J., Edsall, J.T.: Proteins, Amino Acids and Peptides as Ions and Dipolar Ions. Reinhold
publishers, New York (1943)
2. Linderstrøm-Lang, K.U.: On the ionisation state of proteins. Compt. Rend. Trav. Lab. 15,
1–29 Carlsberg (1924)
3. Debye, P., Hückel, E.: Zur Theorie der Electrolyte. Phys Zeit 24, 185–206 (1923)
4. Svedberg, T., Pederson, K.O.: The Ultracentrifuge. Clarendon Press, Oxford (1940)
5. Neurath, H., Saum, A.M.: The denaturation of serum albumin: diffusion and viscosity mea-
surements of serum albumin in the presence of urea. J. Biol. Chem. 128, 347–362 (1939)
6. Oncley, J.L.: Evidence from physical chemistry regarding the size and shape of protein
molecules from ultra-centrifugation, diffusion, viscosity, dielectric dispersion, and double
refraction of flow. Annals N.Y. Acad. Sci. 41, 121–150 (1941)
Simulations of the Folding of Proteins: A Historical Perspective 17
7. Edsall, J.T.: On the laboratory that produced the book, proteins, amino acids and peptides.
AIChE J. 44, 949–953 (1995)
8. Anfinsen, C.B.: Principles that govern the folding of protein chains. Science 181, 223–230
(1973)
9. Scheraga, H.A., Edsall, J.T., Gadd Jr., J.O.: Double refraction of flow: numerical evaluation
of extinction angle and birefringence as a function of velocity gradient. J Chem Phys 19,
1101–1108 (1951)
10. Scheraga, H.A., Mandelkern, L.: Consideration of the hydrodynamic properties of proteins.
J. Am. Chem. Soc. 75, 179–184 (1953)
11. Flory, P.J., Fox Jr., T.G.: Treatment of intrinsic viscosities. J. Am. Chem. Soc. 73, 1904–1908
(1951)
12. Mandelkern, L., Krigbaum, W.R., Scheraga, H.A., Flory, P.J.: Sedimentation behavior of
flexible chain molecules: polyisobutylene. J Chem Phys 20, 1392–1397 (1952)
13. Pauling, L., Corey, R.B., Brauson, H.R.: The structure of proteins: two hydrogen-bonded
helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. U.S.A. 37, 205–211
(1951)
14. Pauling, L., Corey, R.B.: Configurations of polypeptide chains with favored orientations
around single bonds: Two new pleated sheets. Proc. Natl. Acad. Sci. U.S.A. 37, 729–740
(1951)
15. Sanger, F.: The arrangement of amino acids in proteins. Adv. Protein Chem. 7, 1–66 (1952)
16. Ryle, A.P., Sanger, F., Smith, I.F., Kitai, R.: The disulfide bonds of insulin. Biochem. J. 60,
542–556 (1955)
17. Perutz, M.F., Rossman, M.G., Cullis, A.F., Muirhead, H., Will, G., North, A.C.T.: Structure
of haemoglobin, a three-dimensional Fourier synthesis at 5.5 Å resolution, obtained by x-ray
analysis. Nature 185, 416–422 (1960)
18. Kendrew, J.C., Dickerson, R.E., Strandberg, B.E., Hart, R.G., Davies, D.R., Philips, D.C.,
Shore, V.C.: Structure of myoglobin, a three-dimensional Fourier synthesis at 2 Å resolution.
Nature 185, 422–427 (1960)
19. Laskowski Jr., M., Scheraga, H.A.: Thermodynamic considerations of protein reactions. I.
Modified reactivity of polar groups. J. Am. Chem. Soc. 76, 6305–6319 (1954)
20. Laskowski Jr., M., Scheraga, H.A.: Thermodynamic considerations of protein reactions. II.
Modified reactivity of primary valence bonds. J. Am. Chem. Soc. 78, 5793–5798 (1956)
21. Némethy, G., Scheraga, H.A. :The structure of water and hydrophobic bonding in proteins.
III. The thermodynamic properties of hydrophobic bonds in proteins. J. Phys. Chem. 66,
1773–1789 (1962). Erratum: J. Phys. Chem. 67:2888 (1963)
22. Griffith, J.H., Scheraga, H.A.: Statistical thermodynamics of aqueous solutions. I. Water
structure, solutions with non-polar solutes, and hydrophobic interactions. J. Mol. Struct. 682,
97–113 (2004)
23. Owicki, J.C., Scheraga, H.A.: Monte Carlo calculations in the isothermal isobaric ensemble.
2. Dilute aqueous solution of methane. J. Am. Chem. Soc. 99, 7413–7418 (1977)
24. Rapaport, D.C., Scheraga, H.A.: Hydration of inert solutes. A molecular dynamics study. J.
Phys. Chem. 86, 873–880 (1982)
25. Kendrew, J.C.: The structure of globular proteins. Comp. Biochem. Physiol. 4, 249–252 (1962)
26. Scheraga, H.A.: Theory of hydrophobic interactions. J. Biomol. Struct. Dyn. 16, 447–460
(1998)
27. Sturtevant, J.M., Laskowski Jr., M., Donnelly, T.H., Scheraga, H.A.: Equilibria in the
fibrinogen-fibrin conversion. III. Heats of polymerization and clotting of fibrin monomer.
J. Am. Chem. Soc. 77, 6168–6172 (1955)
28. Scheraga, H.A.: Structural studies of pancreatic ribonuclease. Fed. Proc. 26, 1380–1387
(1967)
29. Wlodawer, A., Svensson, L.A., Sjölin, L., Gilliland, G.L.: Structure of phosphate-free ribonu-
clease A refined at 1.26Å. Biochemistry 27, 2705–2717 (1988)
30. Némethy, G., Scheraga, H.A.: Theoretical determination of sterically allowed conformations
of a polypeptide chain by a computer method. Biopolymers 3, 155–184 (1965)
18 H. A. Scheraga
31. Scheraga, H.A.: Calculations of conformations of polypeptides. Adv. Phys. Org. Chem. 6,
103–184 (1968)
32. Ramachandran, G.N., Ramakrishnan, C., Sasisekharan, V.: Stereochemistry of polypeptide
chain configurations. J. Mol. Biol. 7, 95–99 (1963)
33. Scheraga, H.A., Leach, S.J., Scott, R.A., Némethy, G.: Intramolecular forces and protein
conformation. Disc Faraday Soc. 40, 268–277 (1965)
34. Némethy, G., Leach, S.J., Scheraga, H.A.: The influence of amino acid side chains on the free
energy of helix coil transitions. J. Phys. Chem. 70, 998–1004 (1966)
35. Leach, S.J., Némethy, G., Scheraga, H.A.: Computation of the sterically allowed conforma-
tions of peptides. Biopolymers 4, 369–407 (1966)
36. de Santis, P., Giglio, E., Liquori, A.M., Ripamonti, A.: Stability of helical conformations of
simple linear polymers. J. Polym. Sci. Part A 1, 1383–1404 (1963)
37. Brant, D.A., Flory, P.J.: The configuration of random polypeptide chains. II. Theory. J. Am.
Chem. Soc. 87, 2791–2800 (1965)
38. Ooi, T., Scott, R.A., Vanderkooi, G., Scheraga, H.A.: Conformational analysis of macro-
molecules. IV. Helical structures of poly-L-alanine, poly-L-valine, poly-β-methyl L-aspartate,
poly-γ-methyl-L-glutamate, and poly-L-tyrosine. J. Chem. Phys. 46, 4410–4426 (1967)
39. Gibson, K.D., Scheraga, H.A.: Minimization of polypeptide energy. II. Preliminary structures
of oxytocin, vasopressin and an octapeptide from ribonuclease. Proc. Natl. Acad. Sci. U.S.A.
58, 1317–1323 (1967)
40. Gibson, K.D., Scheraga, H.A.: Minimization of polypeptide energy. VII. Second derivatives
and statistical weights of energy minima for deca–L–alanine. Proc. Natl. Acad. Sci. U.S.A.
63, 242–245 (1969)
41. Scott, R.A., Vanderkooi, G., Tuttle, R.W., Shames, P.M., Scheraga, H.A.: Minimization of
polypeptide energy. III. Application of a rapid energy minimization technique to the calcula-
tion of preliminary structures of gramicidin–S. Proc. Natl. Acad. Sci. 58, 2204–2211 (1967)
42. Yan, J.F., Vanderkooi, G., Scheraga, H.A.: Conformational analysis of macromolecules. V.
Helical structures of poly–L–aspartic acid and poly–L–glutamic acid, and related compounds.
J. Chem. Phys. 49, 2713–2726 (1968)
43. Yan, J.F., Momany, F.A., Scheraga, H.A.: Conformational analysis of macromolecules. VI.
Helical Structures of o–, m–, and p–chlorobenzyl esters of poly–L–aspartic acid. J. Am. Chem.
Soc. 92, 1109–1115 (1970)
44. Momany, F.A., Vanderkooi, G., Scheraga, H.A.: Determination of intermolecular potentials
from crystal data. I. General theory and application to crystalline benzene at several temper-
atures. Proc. Natl. Acad. Sci. U.S.A. 61, 429–436 (1968)
45. Momany, F.A., McGuire, R.F., Yan, J.F., Scheraga, H.A.: Energy parameters in polypeptides.
IV. Semiempirical molecular orbital calculations of conformational dependence of energy and
partial charge in di– and tripeptides. J. Phys. Chem. 75, 2286–2297 (1971)
46. Levitt, M., Lifson, S.: Refinement of protein confirmations using a macromolecular energy
minimization procedure. J. Mol. Biol. 46, 269–279 (1969)
47. Hagler, A.T., Huler, E., Lifson, S.: Energy functions for peptides and proteins. I. Derivation
of a consistent force field including the hydrogen bond from amide crystals. J. Am. Chem.
Soc. 96, 5319–5327 (1974)
48. Momany, F.A., McGuire, R.F., Burgess, A.W., Scheraga, H.A.: Energy parameters in polypep-
tides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen
bond interactions, and intrinsic torsional potentials for the naturally occurring amino acids.
J. Phys. Chem. 79, 2361–2381 (1975)
49. Némethy, G., Pottle, M.S., Scheraga, H.A.: Energy parameters in polypeptides. 9. Updating
of geometrical parameters, nonbonded interactions, and hydrogen bond interactions for the
naturally occurring amino acids. J. Phys. Chem. 87, 1883–1887 (1983)
50. Sippl, M.J., Némethy, G., Scheraga, H.A.: Intermolecular potentials from crystal data. 6.
Determination of empirical potentials for O–H···O C hydrogen bonds from packing config-
urations. J. Phys. Chem. 88, 6231–6233 (1984)
Simulations of the Folding of Proteins: A Historical Perspective 19
51. Némethy, G., Gibson, K.D., Palmer, K.A., Yoon, C.N., Paterlini, G., Zagari, A., Rumsey, S.,
Scheraga, H.A.: Energy parameters in polypeptides. 10. Improved geometrical parameters
and nonbonded interactions for use in the ECEPP/3 algorithm, with application to proline—
containing peptides. J. Phys. Chem. 96, 6472–6484 (1992)
52. Arnautova, Y.A., Jagielska, A., Scheraga, H.A.: A new force field (ECEPP-05) for peptides,
proteins and organic molecules. J. Phys. Chem. B 110, 5025–5044 (2006)
53. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M.:
CHARMM: a program for macromolecular energy, minimization, and dynamics calculations.
J. Comput. Chem. 4, 187–217 (1983)
54. Cornell, W.D., Cieplak, P., Bayley, C.I., Gould, I.R., Merz Jr., K.M., Ferguson, D.M.,
Spellmeyer, D.C., Fox, T., Caldwell, J.W., Kollman, P.A.: A second generation force field
for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117,
5179–5197 (1995)
55. Scott, W.R.P., Huenenberger, P.H., Tironi, I.G., Mark, A.E., Billeter, S.R., Fennen, J., Torda,
A.E., Huber, T., Krueger, P., van Gusteren, W.F.: The GROMOS biomolecular simulation
program package. J. Phys. Chem. A 103, 3596–3607 (1999)
56. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., Klein, M.L.: Comparison of
simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983)
57. Ooi, T., Oobatake, M., Némethy, G., Scheraga, H.A.: Accessible surface areas as a measure
of the thermodynamic parameters of hydration of peptides. Proc. Natl. Acad Sci. U.S.A. 84,
3086–3090 (1987). Erratum: ibid., 84, 6015 (1987)
58. Vila, J., Williams, R.L., Vasquez, M., Scheraga, H.A.: Empirical solvation models can be used
to differentiate native from near native conformations of bovine pancreatic trypsin inhibitor.
Proteins: Struct. Funct. Genet. 10, 199–218 (1991)
59. Nicholls, A., Honig, B.: A rapid finite difference algorithm, utilizing successive over-
relaxation to solve the Poisson-Boltzmann equation. J. Comp. Chem. 12, 435–445 (1991)
60. Nicholls, A., Sharp, K.A., Honig, B.: Protein folding and association: Insights from the inter-
facial and thermodynamic properties of hydrocarbons. Proteins: Struct. Funct. Genet. 11,
281–296 (1991)
61. Vorobjev, Y.N., Vila, J.A., Scheraga, H.A.: FAMBE-pH: a fast and accurate method to compute
the total solvation free energies of proteins. J. Phys. Chem. B 112, 11122–11136 (2008)
62. Still, W.C., Tempczyk, A., Hawley, R.C., Henderickson, T.: Semianalytical treatment of sol-
vation for molecular mechanics and dynamics. J. Am. Chem. Soc. 112, 6127–6129 (1990)
63. Bashford, D., Case, D.: Generalized Born models of macromolecular solvation effects. Annu.
Rev. Phys. Chem. 51, 129–152 (2000)
64. Ferrara, P., Apostolakis, J., Caflisch, A.: Evolution of a fast implicit solvent model for molec-
ular dynamics simulations. Proteins 46, 24–33 (2002)
65. Bursulaya, B., Brooks III, C.I.: Comparative study of folding free energy landscape of a
three-stranded-sheet protein with explicit and implicit solvent models. J. Phys. Chem. B 104,
12378–12383 (2002)
66. Im, W., Lee, M., Brooks, C.: Generalized Born model with a simple smoothing function. J.
Comp. Chem. 24, 1691–1702 (2003)
67. Scheraga, H.A., Pillardy, J., Liwo, A., Lee, J., Czaplewski, C., Ripoll, D.R., Wedemeyer, W.J.,
Arnautova, Y.A.: Evolution of physics-based methodology for exploring the conformational
energy landscape of proteins. J. Comput. Chem. 23, 28–34 (2002)
68. Alder, B.J., Wainwright, T.: Molecular dynamics by electronic computers. In: Prigogine, I.
(ed.) Proceedings of the International Symposium on Transport. Process in Statistical Mechan-
ics, pp. 97–131. Interscience, New York (1957)
69. McCammon, J.A., Gelin, B.R., Karplus, M.: Dynamics of folded proteins. Nature 267,
585–590 (1977)
70. Scheraga, H.A., Khalili, M., Liwo, A.: Protein folding dynamics: overview of molecular
simulation techniques. Annu. Rev. Phys. Chem. 58, 57–83 (2007)
71. Shirts, M., Pande, V.S.: Screen savers of the world unite! Science 290, 1903–1904 (2000)
20 H. A. Scheraga
72. Shaw, D.E., et al.: Structure and dynamics of an unfolded protein examined by molecular
dynamics simulation. J. Am. Chem. Soc. 336, 3787–3791 (2012)
73. Li, Z., Scheraga, H.A.: Monte Carlo–minimization approach to the multiple–minima problem
in protein folding. Proc. Natl. Acad. Sci. U.S.A. 84, 6611–6615 (1987)
74. Hansmann, U.H.E., Masuya, M., Okamoto, Y.: Characteristic temperatures of folding of a
small peptide. Proc. Natl. Acad. Sci. U.S.A. 94, 10652–10656 (1997)
75. Dygert, M., Go, N., Scheraga, H.A.: Use of a symmetry condition to compute the conformation
of gramicidin S. Macromolecules 8, 750–761 (1975)
76. Gō, N., Scheraga, H.A.: Ring closure in chain molecules with Cn, I or S2n symmetry. Macro-
molecules 6, 273–281 (1973)
77. Mirau, R.A., Bovey, F.A.: 2D and 3D NMR studies of polypeptide structure and function.
Abstracts, 199th ACS meeting. Polymer Division, Boston, 206 (1990)
78. Ripoll, D.R., Vila, J.A., Scheraga, H.A.: Folding of the villin headpiece subdomain from
random structures. Analysis of the charge distribution as a function of pH. J. Mol. Biol. 339,
915–925 (2004)
79. Vila, J.A., Ripoll, D.R., Scheraga, H.A.: Atomically detailed folding simulation of the B
domain of staphylococcal protein A from random structures. Proc. Natl. Acad. Sci. U.S.A.
100, 14812–14816 (2003)
80. Vila, J.A., Arnautova, Y.A., Martin, O.A., Scheraga, H.A.: Quantum-mechanics-derived 13 Cα
chemical shift server (Che Shift) for protein structure validation. Proc. Natl. Acad. Sci. U.S.A.
106, 16972–16977 (2009)
81. Vila, J.A., Scheraga, H.A.: Assessing the accuracy of protein structures by quantum mechan-
ical computations of 13 Cα chemical shifts. Acc. Chem. Res. 42, 1545–1553 (2009)
82. Miller, M.H., Scheraga, H.A.: Calculation of the structures of collagen models. Role
of interchain interactions in determining the triple–helical coiled–coil conformation. I.
Poly(glycyl–prolyl–prolyl). J. Polym. Sci.: Polym. Symp. 54, 171–200 (1976)
83. Miller, M.H., Némethy, G., Scheraga, H.A.: Calculation of the structures of collagen models.
Role of interchain interactions in determining the triple–helical coiled–coil conformation. 2.
Poly(glycyl–prolyl–hydroxyprolyl). Macromolecules 13, 470–478 (1980)
84. Miller, M.H., Némethy, G., Scheraga, H.A.: Calculation of the structures of collagen models.
Role of interchain interactions in determining the triple-helical-coiled coil conformation. 3.
Poly(glycyl-prolyl-alanyl). Macromolecules 13, 910–913 (1980)
85. Némethy, G., Miller, M.H., Scheraga, H.A.: Calculation of the structures of collagen models.
Role of interchain interactions in determining the triple–helical coiled–coil conformation. 4.
Poly(glycyl–alanyl–prolyl). Macromolecules 13, 914–919 (1980)
86. Pincus, M.R., Scheraga, H.A.: Conformational energy calculations of enzyme–substrate and
enzyme–inhibitor complexes of lysozyme. 2. Calculation of the structures of complexes with
a flexible enzyme. Macromolecules 12, 633–644 (1979)
87. Smith-Gill, S.J., Rupley, J.A., Pincus, M.R., Carty, R.P., Scheraga, H.A.: Experimental iden-
tification of a theoretically predicted “left–sided” binding mode for (GlcNAc)6 in the active
site of lysozyme. Biochemistry 23, 993–997 (1984)
88. Simon, I., Glasser, L., Scheraga, H.A., Manley, R.S.J.: Structure of cellulose. 2. Low–energy
crystalline arrangements. Macromolecules 21, 990–998 (1988)
89. Kolinski, A.: Protein modeling and structure prediction with a reduced representation. Acta
Biochim. Pol. 51, 349–371 (2004)
90. Voth, G.A.: Coarse-graining of Condensed Phase and Biomolecular Systems. CRC Press,
Boca Raton, FL (2009)
91. Levitt, M., Warshel, A.: Computer simulation of protein folding. Nature 253, 694–698 (1975)
92. Pincus, M.R., Scheraga, H.A.: An approximate treatment of long–range interactions in pro-
teins. J. Phys. Chem. 81, 1579–1583 (1977)
93. Sieradzan, A.K., Hansmann, U.H.E., Scheraga, H.A., Liwo, A.: Extension of UNRES force
field to treat polypeptide chains with D-amino-acid residues. J. Chem. Theory. Comput. 8,
4746–4757 (2006)
Simulations of the Folding of Proteins: A Historical Perspective 21
94. Liwo, A., Pincus, M.R., Wawak, R.J., Rackovsky, S., Scheraga, H.A.: Prediction of pro-
tein conformation on the basis of a search for compact structures; test on avian pancreatic
polypeptide. Protein Sci. 2, 1715–1731 (1993)
95. Liwo, A., Oldziej, S., Pincus, M.R., Wawak, R.J., Rackovsky, S., Scheraga, H.A.: A united-
residue force field for off-lattice protein-structure simulations. I. Functional forms and param-
eters of long-range side-chain interaction potentials from protein crystal data. J. Comput.
Chem. 18, 849–873 (1997)
96. Liwo, A., Pincus, M.R., Wawak, R.J., Rackovsky, S., Oldziej, S., Scheraga, H.A.: A united-
residue force field for off-lattice protein-structure simulations. II. Parameterization of short-
range interactions and determination of weights of energy terms by Z-score optimization. J.
Comput. Chem. 18, 874–887 (1997)
97. Liwo, A., Kazmierkiewicz, R., Czaplewski, C., Groth, M., Oldziej, S., Wawak, R.J., Rack-
ovsky, S., Pincus, M.R., Scheraga, H.A.: United-residue force field for off-lattice protein-
structure simulations. III. Origin of backbone hydrogen-bonding cooperativity in united-
residue potentials. J. Comput. Chem. 19, 259–276 (1998)
98. Liwo, A., Czaplewski, C., Pillardy, J., Scheraga, H.A.: Cumulant-based expressions for the
multibody terms for the correlation between local and electrostatic interactions in the united-
residue force field. J. Chem. Phys. 115, 2323–2347 (2001)
99. Liwo, A., Arlukowicz, P., Czaplewski, C., Ołdziej, S., Pillardy, J., Scheraga, H.A.: A method
for optimizing potential-energy functions by a hierarchical design of the potential-energy
landscape: Application to the UNRES force field. Proc. Natl. Acad. Sci. U.S.A. 99, 1937–1942
(2002)
100. Liwo, A., Ołdziej, S., Czaplewski, C., Kozłowska, U., Scheraga, H.A.: Parameterization of
backbone-electrostatic and multibody contributions to the UNRES force field for protein-
structure prediction from ab initio energy surfaces of model systems. J. Phys. Chem. B 108,
9421–9438 (2004)
101. Liwo, A., Arłukowicz, P., Ołdziej, S., Czaplewski, C., Makowski, M., Scheraga, H.A.: Opti-
mization of the UNRES force field by hierarchical design of the potential-energy landscape. 1.
Tests of the approach using simple lattice protein models. J. Phys. Chem. B 108, 16918–16933
(2004)
102. Ołdziej, S., Liwo, A., Czaplewski, C., Pillardy, J., Scheraga, H.A.: Optimization of the UNRES
force field by hierarchical design of the potential-energy landscape. 2. Off-lattice tests of the
method with single proteins. J. Phys. Chem. B 108, 16934–16949 (2004)
103. Ołdziej, S., Lagiewka, J., Liwo, A., Czaplewski, C., Chinchio, M., Nanias, M., Scheraga,
H.A.: Optimization of the UNRES force field by hierarchical design of the potential-energy
landscape. 3. Use of many proteins in optimization. J. Phys. Chem. B 108, 16950–16959
(2004)
104. Liwo, A., Khalili, M., Scheraga, H.A.: Ab initio simulations of protein-folding pathways by
molecular dynamics with the united-residue model of polypeptide chains. Proc. Natl. Acad.
Sci. U.S.A. 102, 2362–2367 (2005)
105. Khalili, M., Liwo, A., Rakowski, F., Grochowski, P., Scheraga, H.A.: Molecular dynamics
with the united-residue model of polypeptide chains. I. Lagrange equations of motion and
tests of numerical stability in the microcanonical mode. J. Phys. Chem. B 109, 13785–13797
(2005)
106. Khalili, M., Liwo, A., Jagielska, A., Scheraga, H.A.: Molecular dynamics with the united-
residue model of polypeptide chains. II. Langevin and Berendsen-bath dynamics and tests on
model α-helical systems. J. Phys. Chem. B 109, 13798–13810 (2005)
107. Liwo, A., Khalili, M., Czaplewski, C., Kalinowski, S., Ołdziej, S., Wachucik, K., Scheraga,
H.A.: Modification and optimization of the united-residue (UNRES) potential-energy function
for canonical simulations. I. Temperature dependence of the effective energy function and
tests of the optimization method with single training proteins. J.Phys. Chem. B 111, 260–285
(2007)
108. Kozlowska, U., Liwo, A., Scheraga, H.A.: Determination of virtual-bond-angle potentials
of mean force for coarse-grained simulations of protein structure and folding from ab initio
22 H. A. Scheraga
energy surfaces of terminally-blocked glycine, alanine, and proline. J. Phys.: Condens. Matter
19, 285203-1—285203-15 (2007)
109. Liwo, A., Czaplewski, C., Ołdziej, S., Rojas, A.V., Kazmierkiewicz, R., Makowski, M.,
Murarka, R.K., Scheraga, H.A.: Simulation of protein structure and dynamics with the coarse-
grained UNRES force field. In: Voth, G.A. (ed.) Coarse-Graining of Condensed Phase and
Biomolecular Systems, pp. 107–122. CRC Press, Boca Raton, FL (2008)
110. Ołdziej, S., Czaplewski, C., Liwo, A., Scheraga, H.A.: Towards temperature dependent coarse-
grained potential of side-chain interactions for protein folding simulations, BIBE. In: IEEE
International Conference on Bioinformatics and Bioengineering, pp 263–266 (2010)
111. Liwo, A., Ołdziej, S., Czaplewski, C., Kleinerman, D.S., Blood, P., Scheraga, H.A.: Imple-
mentation of molecular dynamics and its extensions with the coarse-grained UNRES force
field on massively parallel systems; towards millisecond-scale simulations of protein struc-
ture, dynamics, and thermodynamics. J. Chem. Theor. Comput. 6, 890–909 (2010)
112. Maisuradze, G.G., Senet, P., Czaplewski, C., Liwo, A., Scheraga, H.A.: Investigation of protein
folding by coarse-grained molecular dynamics with the UNRES force field. J. Phys. Chem.
A 114, 4471–4485 (2010)
113. Makowski, M., Liwo, A., Scheraga, H.A.: Simple physics-based analytical formulas for the
potentials of mean force of the interaction of amino-acid side chains in water. VI. Oppositely-
charged side chains. J Phys Chem 115, 6130–6137 (2011)
114. Sieradzan, A.K., Scheraga, H.A., Liwo, A.: Determination of effective potentials for the
stretching of Cα …Cα virtual bonds in polypeptide chains for coarse-grained simulations of
proteins from ab initio energy surfaces of N-methylacetamide and N-acetylpyrrolidine. J.
Chem. Theor. Comput. 8, 1334–1343 (2012)
115. Kolinski, A., Skolnick, J.: Discretized model of proteins: I. Monte Carlo study of cooperativity
in homopolypeptides. J. Chem. Phys. 97, 9412–9426 (1992)
116. He, Y., Xiao, Y., Liwo, A., Scheraga, H.A.: Exploring the parameter space of the coarse-
grained UNRES force field by random search: Selecting a transferable medium-resolution
force field. J. Comput. Chem. 30, 2127–2135 (2009)
117. Lee, J., Scheraga, H.A., Rackovsky, S.: New optimization method for conformational
energy calculations on polypeptides: conformational space annealing. J. Comput. Chem. 18,
1222–1232 (1997)
118. Kazmierkiewicz, R., Liwo, A., Scheraga, H.A.: Energy-based reconstruction of a protein
backbone from its α-carbon trace by a Monte-Carlo method. J. Comput. Chem. 23, 715–723
(2002)
119. Kazmierkiewicz, R., Liwo, A., Scheraga, H.A.: Addition of side chains to a known backbone
with defined side-chain centroids. Biophys. Chem. 100, 261–280 (2003). Erratum: Biophys.
Chem. 106, 91 (2003)
120. Elber, R., Ghosh, A., Cardenas, A.: Long time dynamics of complex systems. Acc. Chem.
Res. 35, 396–403 (2002)
121. Ghosh, A., Elber, R., Scheraga, H.A.: An atomically detailed study of the folding path-
ways of protein A with the stochastic difference equation. Proc. Natl. Acad. Sci. U.S.A. 99,
10394–10398 (2002)
122. Shen, H., Liwo, A., Scheraga, H.A.: An improved functional form for the temperature, scaling
factors of the components of the mesoscopic UNRES force field for simulations of protein
structure and dynamics. J. Phys. Chem. B 113, 8738–8744 (2009)
123. Kubo, R.: Generalized cumulant expansion method. J. Phys. Soc. Jpn. 17, 1100–1120 (1962)
124. Rojas, A.V., Liwo, A., Scheraga, H.A.: Molecular dynamics with the united-residue (UNRES)
force field. Ab initio folding simulations of multi-chain proteins. J. Phys. Chem. B 111,
293–309 (2007)
125. Liwo, A., Lee, J., Ripoll, D.R., Pillardy, J., Scheraga, H.A.: Protein structure prediction
by global optimization of a potential energy function. Proc. Natl. Acad. Sci. U.S.A. 96,
5482–5485 (1999)
126. Khalili, M., Liwo, A., Scheraga, H.A.: Kinetic studies of folding of the B-domain of staphylo-
coccal protein A with molecular dynamics and a united-residue (UNRES) model of polypep-
tide chains. J. Mol. Biol. 355, 536–547 (2006)
Simulations of the Folding of Proteins: A Historical Perspective 23
127. Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulations of spin-glasses. Phys. Rev.
Lett. 57, 2607–2609 (1986)
128. Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding.
Chem. Phys. Lett. 314, 141–151 (1999)
129. Nanias, M., Chinchio, M., Ołdziej, S., Czaplewski, C., Scheraga, H.A.: Protein struc-
ture prediction with the UNRES force-field using Replica-Exchange Monte Carlo-with-
Minimization; Comparison with MCM, CSA and CFMC. J. Comput. Chem. 26, 1472–1486
(2005)
130. Rhee, Y.M., Pande, V.S.: Multiplexed-replica exchange molecular dynamics method for pro-
tein folding simulation. Biophys. J. 84, 775–786 (2003)
131. Czaplewski, C., Kalinowski, S., Liwo, A., Scheraga, H.A.: Application of multiplexed replica
exchange molecular dynamics to the UNRES force field: tests with α and α + β proteins. J.
Chem. Theor. Comput. 5, 627–640 (2009)
132. Rojas, A., Liwo, A., Browne, D., Scheraga, H.A.: Mechanism of fiber assembly; treatment
of Aβ-peptide aggregation with a coarse-grained united-residue force field. J. Mol. Biol. 404,
537–552 (2010)
133. He, Y., Liwo, A., Weinstein, H., Scheraga, H.A.: PDZ binding to the BAR domain of PICK1
is elucidated by coarse-grained molecular dynamics. J. Mol. Biol. 405, 298–314 (2011)
134. Golas, E., Maisuradze, G.G., Senet, P., Ołdziej, S., Czaplewski, C., Scheraga, H.A., Liwo,
A.: Simulation of the opening and closing of Hsp70 chaperones by coarse-grained molecular
dynamics. J. Chem. Theor. Comput. 8, 1750–1764 (2012)
135. Marmur, J., Doty, P.: Thermal renaturation of deoxyribonucleic acids. J. Mol. Biol. 3, 585–594
(1961)
136. Peyrard, M., Bishop, A.R.: Statistical mechanics of a nonlinear model for DNA denaturation.
Phys. Rev. Lett. 62, 2755–2758 (1989)
137. Olson, W.K.: Simulating DNA at low resolution. Curr. Opinion Struct. Biol. 6, 242–256 (1996)
138. Hyeon, C., Thirumalai, D.: Mechanical unfolding of RNA hairpins. Proc. Natl. Acad. Sci.
U.S.A. 102, 6789–6794 (2005)
139. Knotts 4th, T., Rathore, N., Schwartz, D.C., de Pablo, J.J.: A coarse grain model for DNA. J.
Chem. Phys. 126, 084901 (2007)
140. Voltz, K., Trylska, J., Tozzini, V., Kurkal-Siebert, V., Langowski, J., Smith, J.: Coarse-
grained force field for the nucleosome from self-consistent multiscaling. J. Comput. Chem.
29, 1429–1439 (2008)
141. Ouldridge, T.E., Louis, A.A., Doye, J.P.K.: DNA Nanotweezers studied with a coarse-grained
model of DNA. Phys. Rev. Lett. 104, 178101-1–178101-4 (2010)
142. Maciejczyk, M., Spasic, A., Liwo, A., Scheraga, H.A.: Coarse-grained model of nucleic acid
bases. J. Comput. Chem. 31, 1644–1655 (2010)
143. He, Y., Maciejczyk, M., Ołdziej, S., Scheraga, H.A., Liwo, A.: Mean-field interactions
between nucleic-acid-base dipoles drive formation of the double helix. Phys. Rev. Lett. 110,
098101 (2003)
144. Pollack, L.: Fashioning NAMD, a history of risk and reward: Klaus Schulten Reminisces.
In: Schlick, T. (ed.) Innovations in Biomolecular Modeling and Simulations, vol. 1. Royal
Society of Chemistry, Cambridge, UK (2012)
145. Tama, F., Valle, M., Frank, J., Brooks III, C.L.: Dynamic reorganization of the functionally
active ribosome explored by normal mode analysis and cryo-electron microscopy. Proc. Natl.
Acad. Sci. U.S.A. 100, 9319–9323 (2003)
146. Yang, L., Song, G., Jernigan, R.L.: How well can we understand large-scale protein motions
using normal modes of elastic network models? Biophys. J. 93, 920–929 (2007)
147. Senet, P., Maisuradze, G.G., Foulie, C., Delarue, P., Scheraga, H.A.: How main-chains of
proteins explore the free-energy landscape in native states. Proc. Natl. Acad. Sci. U.S.A. 105,
19708–19713 (2008)
148. Cote, Y., Senet, P., Delarue, P., Maisuradze, G.G., Scheraga, H.A.: Nonexponential decay of
internal rotational correlation functions of native proteins and self-similar structural fluctua-
tions. Proc. Natl. Acad. Sci. U.S.A. 107, 19844–19849 (2010)
24 H. A. Scheraga
149. Cote, Y., Senet, P., Delarue, P., Maisuradze, G.G., Scheraga, H.A.: Anomalous diffusion and
dynamical correlation between the side chains and the main chain of proteins in their native
state. Proc. Natl. Acad. Sci. 109, 10346–10351 (2012)
150. Schlick, T. (ed.): Innovations in Biomolecular Modeling and Simulations, vols. 1 and 2. Royal
Society of Chemistry, Cambridge, UK (2012)
Part II
Molecular Simulations: Methodology
Protein Structure Prediction Using
Coarse-Grained Models
1 Introduction
Proteins are key components of all life processes. Thus, the development of rela-
tively cheap and automatic methods for determining amino acid sequences of proteins
raised hope for a breakthrough in many branches of science, including pharmacy and
biotechnology. However, the knowledge of sequence is insufficient for the majority of
Coarse graining is used in biomolecular modeling since the very beginning of this
discipline. In their seminal work, Levitt and Warshel [87] started protein simulations
with a model where each residue in a peptide chain was represented by its alpha
carbon (Cα) and a united atom substituting its side chain. Since then, a huge variety
of models have been proposed that cover the whole range of complexity: from the
most simplistic Cα-only approaches to all-atom representation [3, 19, 22, 45, 68, 60,
65, 74, 113, 125, 126, 127]. Between these two extreme representations we can find
models with or without residue side chains. Each side chain may in turn be represented
by one or more interacting centers. A few different methods for the protein backbone
Protein Structure Prediction Using Coarse-Grained Models 29
Fig. 1 Protein structure prediction stages in CG modeling. The diagram presents a general pipeline
for multiscale modeling (CG merged with all-atom) and depicts major differences between easy
and difficult modeling tasks. Easy or medium-difficulty cases, if necessary, require only limited CG
sampling of the conformational space, usually to fill small gaps and quite small uncertainties in
available experimental or homology inference data. Extensive CG sampling is required for difficult
cases when knowledge about the expected structure is limited
have also been proposed. Finally, discretization may be used to impose additional
limits on the space of possible conformations the model can adopt. A quick glimpse
of the review articles [20, 41, 58, 67, 140] suggests that most likely all the choices
of CG representations have already been explored. This review certainly does not
describe all these solutions. Instead, we introduce several important concepts of CG
modeling of proteins and other biomolecules and describe the way they have evolved
in the past few decades.
One of such inspiring ideas was lattice models [71]. Restricting atomic coor-
dinates to a grid became a very straightforward and simple way to discretize the
conformational space. The search space size was greatly reduced and many of its
local minima vanished. Atomic coordinates became integer values, which opened
many possibilities for use of hash tables. Most importantly, the Cartesian space itself
could be stored in a three-dimensional array which resulted in the O(1) time com-
plexity (constant time) of collision detection. Due to these advantages lattice models
were at least one or two orders of magnitude faster than their continuous space
counterparts. Low resolution grids, however, have a few serious drawbacks. First of
all, simple lattice (cubic lattice, face-centered lattice, etc.) representations of protein
structures (usually limited to Cα traces, or at least to a few atom centers) were of
relatively low resolution, with average errors of such representation of 2–6 Å. Even
more risky aspects of low resolution lattice models are related to lattice anisotropy.
30 M. Blaszczyk et al.
Depending on the orientation in respect to the fixed lattice, the resulting model chains
changed local geometry and local resolution. Moreover, essentially for all types of
interactions the local energy may change with chain orientation. Consequently, the
models were highly degenerated, with changing energetic preferences for various
orientations of protein fragments.
The first detailed analysis of these problems was published by Godzik and cowork-
ers [34]. Higher resolution reduced protein models, for example Chess-Knight lattice
models, led to more accurate representations and smaller anisotropy effects. Proba-
bly the best functionality of lattice models was achieved by higher resolutions (more
lattice steps per residue) and allowed fluctuation of Cα-Cα distances in the mod-
els. This led to a large number of single-bead orientations, higher resolution (1–2 Å
in respect to experimental structures) and essentially negligible anisotropy. Obvi-
ously, the higher resolution of such models led to a much larger number of allowed
structures which caused somewhat higher computational cost. With increasing com-
puting power, the higher resolution lattice models could still deliver fast simulations
of protein folding, multibody interactions and related problems. The most impor-
tant advantage of high-resolution CG protein models (with slightly fluctuating fixed
distances in protein chains, usually the Cα-Cα distance) is their computational effi-
ciency. In comparison to continuous models the high-resolution lattice models could
be simulated much faster.
Another very prolific concept in protein structure modeling is the use of frag-
ments, that is, short peptides extracted from known protein structures. This concept
was originally introduced by Jones and Thirup [50] as a crystallographic method
for rapid model building based on experimental electron density. The authors also
discussed potential applications of short protein fragments in purely theoretical mod-
eling approaches, which in practice was applied in the late 1990s by J.R. Gunn as
well as by Baker and coworkers [13]. The latter application soon became the famous
Rosetta program [113], one of the most successful methods in ab initio structure pre-
diction. Later, fragment-based sampling was applied to numerous protein modeling
approaches [145, 147]. It should be noted, however, that fragment-based sampling
introduces a very strong bias in the dynamics of the sampled chain which makes it
unsuitable for studying numerous research problems.
Multibody Force Fields
Interactions within a CG model cannot be directly learned from a physical system.
Therefore, they have to be established in the form of a mean field potential. Such a
potential can be derived either from statistics extracted from known protein structures
[124] or from Molecular Dynamics simulations [90]. Knowledge-based force field
models have been actively developed for the past few decades, which resulted in the
remarkable improvement of their performance. Among the most important elements
we note the proper choice of the reference state [151] and multibody terms. It has
recently been shown that two-body potentials are not capable of recognizing all native
folds against large datasets of decoy structures [138]. They also cannot properly
mimic the cooperativity of the protein folding process [21]. Multibody potentials
Protein Structure Prediction Using Coarse-Grained Models 31
remediate these problems to some extent and perform significantly better than two-
body terms.
TRP. For a better description of these entities, finer coarse graining must be defined
with more than two united atoms substituting a side chain [8, 11].
The choice of atoms used to represent a protein chain is strongly connected to
the set of the degrees of freedom to be sampled. In Cα-only, Cα + SG and similar
approaches, conformational search is done in the Cartesian space. In many cases,
however, the conformational space is smaller than 3 N DOFs, because conforma-
tions of some these atoms depend on the others. In CABS [68], for example, each
residue comprises up to 4 atoms, but only Cα atom is independent. All the remain-
ing atomic positions are unambiguously defined by the Cα trace. To the contrary, in
the SICHO model [63, 72, 139] with two interaction centers: Cα and SG, only SG
is independent and Cα coordinates based on them are back-calculated. In another
example the Rosetta model definition [113] is based on all backbone atoms and SG,
but the conformation of a peptide chain is defined by three degrees of freedom only,
dihedral angles of each residue: ϕ, ψ and ω.
Another method used to increase the computational efficiency of a computational
model is the discretization of the conformational space. It has been realized since
the early days of protein simulations [106] that even a small set of distinct states
allowed for a residue can result in the reasonable accuracy of a projected structure.
Such a set of selected states can be easily defined when a conformation is described
by its internal coordinates. For models defined in the Cartesian space a lattice (grid)
is used to limit the search space. In practice a set of basis vectors is defined to connect
any two Cα atoms that follow each other in a protein chain. This implies that any
conformation of a chain of N residues can be uniquely written as N-1 integer indexes
that refer to particular vectors in the basic set. Other atoms of the CG representation
(such as SG) may or may not be restricted to the grid.
The already mentioned methods: SICHO, CABS and Rosetta [68, 60, 113] use only
three degrees of freedom per residue; however, they employ more than one center
to define the interactions of a particular residue. All these atoms, united atoms and
virtual points are used to calculate geometric properties, such as distances and planar
and dihedral angles. These properties underlie the definition of the energy function
of a system. The definition usually assumes a very complex mathematical form of
the function which we discuss in detail below. The mathematical formula must be
completed with a (possibly large) set of parameters, such as various constants, scaling
factors, etc. In the case of all-atom models used for biomolecular studies, the param-
eters can be derived from experimental data, such as small molecule measurements.
This is, however, not possible in the case of a CG model, simply because none of the
models reviewed here exists in the real world and many of their properties cannot be
measured. Therefore, the energy function for a CG model comprises at least partially
statistical potentials of mean force. The construction of such force fields has become
Protein Structure Prediction Using Coarse-Grained Models 33
a research discipline on its own. Here we provide a very basic description of mean
field force fields and focus on differences between particular CG approaches.
The evaluation of energy of a particular conformation requires computation of rel-
evant geometrical properties (for example distances or angles). This enforces recal-
culation of the Cartesian representation of a biomolecule if a CG model is defined in
internal coordinates [107]. Lattice models, on the other hand, use hashing and store
some local geometrical properties, such as planar or dihedral angles, vector products
etc., in look-up tables. Moreover, the energy function may be conveniently stored in
an array and indexed by a distance bin or vector indices.
The hydrogen bonding is one of the indispensable terms of the force field. There
were numerous actual attempts proposed in the literature based on different atom
types which capture local geometrical properties of the main chain in different ways
[24, 35, 73, 102]. For better recapitulation of the local geometry of secondary struc-
ture elements, correlations between neighboring hydrogen bonds may be modeled
explicitly by an additional potential [68, 88, 92].
Another very important energy component is the one corresponding to hard core
repulsion between atoms, often described as an excluded volume term. A rapidly
growing function may be used to model this interaction, such as the so-called “12”
Lennard-Jones potential term. Relevant radii for united atoms are computed as an
average over all the relevant conformations of a group that has been coarse grained
into a sphere. Hard core repulsion in low- to medium-resolution on-lattice models
may be evaluated instantly just by a single look-up in the 3D matrix that stores the
lattice space.
The attractive pairwise potential is established by the Boltzmann inversion of
relevant statistics extracted from known protein structures. The potential may depend
solely on the distance between interacting partners; in other approaches it takes into
account the mutual orientation of the groups and their neighborhood [12, 68].
Local backbone conformation and secondary structure formation is controlled
by mean force potentials encoding local correlations between degrees of freedom.
Typically, the potentials also depend on amino acid sequences and encode propen-
sities of particular amino acid types to form a given secondary structure. The actual
formulation of these potentials depends on how the main chain is represented in the
model. In the cases where all backbone atoms are available, Ramachandran-type
energy maps are utilized. Otherwise, local interactions depend on local distances,
for example between the ith and i + 2nd Cα, usually denoted as R13 , as well as on
R14 and R15 . Another choice is to define energy terms based on planar and dihedral
angles between successive Cα atoms.
CG force field is often completed by terms that mimic solvent-induced effects and
long range electrostatics. Examples of such terms include centrosymmetric (com-
pacting) potential and various environmental terms.
34 M. Blaszczyk et al.
Such an algorithm constructs a Markov chain over a number of Markov chain pro-
cesses. The exchange between the structures in different replicas facilitates relaxation
of structures that might otherwise be trapped in local energy minima. The density
of states of the sampled system can be recalculated by a histogram reweighting
technique [27, 28, 39, 80]. The Parallel Tempering algorithm can also be applied to
Molecular Dynamics simulations [131].
There are also many variants of Molecular Dynamics [51, 82, 101]. In its standard
formulation, the trajectory of a molecular system is calculated by solving the New-
ton’s equations of motion at each time step. The forces on the system are computed
as the gradient of the potential energy function (the force field) which is dependent
Protein Structure Prediction Using Coarse-Grained Models 35
3 Representative CG Methods
Above, we described all the major components of a coarse-grained model. Now let
us summarize a few well-established computational models with particular emphasis
on these elements. The models differ in the level of coarse graining and the number
of degrees of freedom utilized to define a polypeptide chain. For convenience, the
key features of the models are presented in Table 1.
The model originally proposed by Levitt and Warshel in 1975 [87] uses two inter-
action centers per residue: the Cα atom, which is modeled explicitly, and the side
chain, represented by a SG sphere. Each residue is allowed one degree of freedom
only: the torsion angle between the 4 successive Cα atoms. Interactions between
side chains are modeled by a van der Waals type potential. The radius of each united
atom representing a side chain is calculated as the average radius of gyration of the
particular group. Another important contribution to the potential energy is the side
chain-solvent interaction estimated by the experimental free energy of transfer from
water to ethanol. The force field is completed by local interaction expressed as a
Fourier expansion function of the torsion angle defined by four Cα atoms. Classi-
cal molecular dynamics is used to sample the conformational space. Simulations of
the bovine pancreatic trypsin inhibitor sometimes produced structures resembling
the native fold, with the best structures having root-mean-square deviation from the
native in the range of 6.5 Å. In his later works [86], Levitt introduced an additional
degree of freedom for each residue, namely the planar angle between three adjoining
Cα atoms. A virtual atom has also been added in the middle of a Cα-Cα vector for a
more accurate definition of hydrogen bonding interactions.
3.2 CABS
The coarse-grained representation of the CABS model [68] uses up to four interaction
centers per residue: Cα, Cβ, the center of mass of the side group and a virtual point
placed at the center of each peptide bond (see Fig. 2a). The Cα trace of the model
is restricted to an underlying cubic lattice with a spacing of 0.61 Å. In lattice units,
the distance between consecutive Cα atoms varies from 291/2 to 491/2 . This implies
that the Cα-Cα distance is allowed to fluctuate between 3.29 and 4.27 Å. There are
800 possible orientations (lattice vectors) of the virtual Cα-Cα. Therefore, the model
essentially avoids any lattice-related artifacts. Cβ atoms and side chains are located
off-lattice, and their positions are calculated for each residue using the coordinates of
three consecutive Cα atoms as a reference frame. For each amino acid, two distinct
conformations are defined which mimic the averaged side chain position found in
helical and expanded conformations. The rotamer type is uniquely defined by the on-
lattice Cα trace; hence, a protein chain comprising N residues has 3 N independent
degrees of freedom.
Protein Structure Prediction Using Coarse-Grained Models 37
3.3 SICHO
The most unique feature of the Side Chain Only model [63, 72, 139] is the definition
of the polypeptide chain. Each residue is represented as a spherical united atom which
substitutes its side chain (see Fig. 2b). The united atoms are restricted to a cubic grid
with 1.45 Å spacing. The chain vectors representing virtual bonds between interac-
tion centers are of variable length, ranging from 91/2 to 301/2 lattice units. Unlike
other protein models, Cα atom positions are not independent degrees of freedom.
Conversely, they are uniquely defined in a local frame of three neighboring side
chains and are recalculated after any conformational change. The knowledge-based
force field is defined based on both Cα and side chain centers and includes a chain
stiffness potential, a secondary structure bias, short-range interactions, hydrogen-
bond interactions, and long-range interactions. Such deeply coarse-grained models
(a) (b)
(c) (d)
are computationally very effective [130], and they can be effective in difficult tasks
of structure prediction and studies of large scale protein dynamics if the model
structures resolution allows for the atom-level reconstruction. Even lower resolution
realistic models of proteins can be designed if the crude structure representation can
be compensated by specific patterns of knowledge based statistical potentials [23].
3.4 Rosetta
Rosetta [113, 129] utilizes a library of short peptide fragments (typically 3 and 9
residue long) as a Monte Carlo moves set. In practice a fragment is defined by three
internal coordinates (ϕ, ψ and ω backbone dihedral angles) per residue. Each time a
fragment is inserted, a number of subsequent DOFs (9 or 18, for 3mers and 9mers,
respectively) are affected in the simulated polypeptide chain. The fragments them-
selves are extracted from known protein structures [40]. Such a sampling method
reduces the conformational space, changes the respective DOFs in a correlated man-
ner and introduces a strong bias toward protein-like geometries. Rosetta utilizes
two representations: a coarse-grained, termed “centroid” (shown in Fig. 2c) and an
all-atom one. In both representations the protein backbone is treated explicitly. In
the centroid mode, each side chain is represented by a united atom located at the
side-chain center of mass. In the high-resolution mode, atomic coordinates for all
side-chain atoms, including hydrogens, are utilized. Side chains are restricted to
discrete conformations as described by a backbone-dependent rotamer library. The
Rosetta energy function is different for the two representations and in both cases it
comprises numerous mean-field terms.
3.5 UNRES
In the UNited RESidue model [90] the protein backbone is reduced to a sequence
of Cα atoms and a united peptide group (p) connected by virtual bonds (Fig. 2d).
United side-chains are attached to the α-carbons (SG). In the most recent version of
UNRES, the positions of these atoms are defined by internal Cartesian coordinates
(vectors of the virtual bonds). Previously, planar and torsion angles were used as a set
of generalized coordinates [91]. UNRES employs a physics-based mean-field force
field for simulations of protein structure and dynamics. The energy function defini-
tion and conformational space sampling methods have evolved over time. Initially
the effective energy function was described as a restricted free energy (RFE) func-
tion or the potential of mean force (PMF) of polypeptide chains in water. Currently,
it is defined as an approximate cumulant expansion of restricted free temperature-
dependent energy whose calibration is based on protein-folding thermodynamic data.
UNRES is the only coarse-grained force field which explicitly depends on tempera-
ture and can compute thermodynamic quantities of protein folding.
Protein Structure Prediction Using Coarse-Grained Models 39
In UNRES the conformational space search was initially based on the global
optimization of the potential-energy function to find the lowest-energy conforma-
tion. It was performed by stochastic Monte Carlo-based algorithms, namely Monte
Carlo plus Energy Minimization (MCM) [89] and hybrid approaches, such as Con-
formational Space Annealing (CSA) [85] which turned out to be the most effective.
Later, UNRES was extended to mesoscopic Molecular Dynamics (MD) to study
pathways and kinetics of the protein folding process. This implementation of MD
reformulates the conformational sampling as a search for the most probable confor-
mational ensembles with the lowest free energy at temperatures below the folding
transition temperature. The UNRES extension of MD can also be used to simu-
late multichain proteins. To improve the conformational space search, UNRES can
use Replica Exchange Molecular Dynamics (REMD) and Replica Exchange Monte
Carlo (REMC) sampling.
The UNRES coarse-grained model has been successfully applied to the protein
structure prediction problem [76, 78, 105] to study folding trajectories [118] and to
investigate folding process thermodynamics [77, 152].
maintain large and up-to-date collections of fragments. The second group of methods
utilizes averaged knowledge about backbone geometry. The computations are per-
formed based on the statistics of backbone atom positions derived from representative
known protein structures.
Methods for side chain position prediction [46, 75, 115] are based on sampling
the conformational space by a rotamer library. This involves statistical clustering
of observed side chain conformations in known structures. Other algorithms use
conformer libraries which contain samples of side chains from known protein struc-
tures. In both approaches a scoring function is required to evaluate the quality of the
sampled conformations.
The reconstruction to an all-atom representation from a reduced CG representa-
tion of the protein is an important part of structure modeling pipelines. Such all-atom
models may be directly used for further refinement with molecular mechanics pro-
grams [36] and are essential for later structural studies. Most of the post-processing
applications, such as structure quality assessment, protein-protein interaction pre-
diction, protein function analysis or ligand docking, require an all-atom model of
the protein [121, 143]. There are many tools available for such model conversion [2,
10, 43, 52, 53, 108], but only a small number of them is commonly used. Below, we
describe selected servers and applications freely accessible for use online. The time
in which all computations are performed by these methods is a matter of seconds to
minutes.
4.1 BBQ
4.2 SABBAC
4.3 SCWRL4
4.4 MaxSprout
This automatic database procedure [46] for generating the all-atom representation of
a protein requires the input Cα trace and amino acid sequence. The computations are
split into two basic steps: backbone reconstruction using the Cα trace and side-chain
coordinates prediction using the reconstructed backbone.
During backbone construction, a protein structure database is scanned for frag-
ments that locally fit the alpha carbon trace and candidates for a complete overlap-
ping cover of the chain are matched. The optimal continuous path is then found by a
42 M. Blaszczyk et al.
4.5 PULCHRA
For very small proteins, CG methods of structure prediction may provide satisfactory
models. However, for the great majority of targets it is necessary to use additional
sources of information. The databases of known protein structures are the most easily
available among them—e.g., the Protein Data Bank (PDB) [114].
As during the evolution protein structure has become much more strongly con-
served than sequence [47], the most straightforward approaches use comparison of
sequences of known protein structures (templates) with the query sequence. How-
ever, the inability to detect sequence similarity with any of the known structures does
Protein Structure Prediction Using Coarse-Grained Models 43
not exclude the existence of a good template. The solution in such cases can be so-
called threading methods which compare predicted structural features (for example
the secondary structure, burial) of the target and the template [36, 122, 128]. Regard-
less which approach is chosen to detect homology, the aim of this method is to create
an alignment, which highlights the similarities between the query and templates.
Obviously, the level of similarity affects the correctness of template selection and
the quality of the alignments. For easy cases (high similarity), classical approaches
such as PsiBlast [4] almost always provide sufficiently accurate alignment. Therefore,
it is relatively easy to build a good model for the query. However, even in those
cases CG methods can be useful for the local sampling of some regions, such as
loops, which are not defined by the alignment. The difficulty of the problem rapidly
increases with the decreasing level of similarity, not only due to the ambiguity of
the alignments, but also because of differences in the geometry of correctly aligned
regions or suboptimal template selection. One of the most effective approaches to
those problems is the incorporation of CG models. Below we present certain strategies
that incorporate the information obtained with comparative modeling into the CG-
based protocols.
In one of the most straightforward approaches, the query chain is allowed to move in
a tube formed by a chain of spheres surrounding the template structure [70]. In this
method, the query chain is confined within the tube by imposing energetic penalties
for any excursion outside. Thus, the disadvantage of this approach is the limited
degree of possible improvement of the initial model.
The answer to this limitation was the application of a more complex set of restraints
within GENECOMP [60], a method in which the energy function is constructed in
a way which allows two-residue shifts of the target chain along the template. This
feature enables changing the initial alignment, and thus correction of possible errors.
Additionally, the GENECOMP restraining scheme includes two types of restraints:
(i) based on the predicted contacts in the target and (ii) target distances predicted
from the fragment threading procedure.
In the more recent studies, the pairwise distances observed in the templates are a
source for deriving restraints for the CABS modeling tool [68]. For the number of
templates given by comparative modeling procedures, distances between all pairs of
Cα atoms are calculated and the minimum and maximum distances between equiv-
alent pairs of residues are taken as limits of the restraint. The restraints are included
in the CABS energy function as trapezoid-shaped potential wells, where the gradi-
ent of the lateral sides depends on the weight of the restraint. The spatial restraints
significantly reduce conformational space, which decreases computation time and
increases the probability of obtaining a successful model (see Fig. 4a).
44 M. Blaszczyk et al.
Fig. 4 Sample strategies of combining comparative modeling methods with CG models. a T0592
target from CASP9, templates (in gray) define conformational space sampled with CABS. The final
model (navy) is more similar (in terms of GDT_TS) to the native (green) than any of the templates.
b The idea of TRACER. Template scaffold is represented as spheres, query Cα trace as red lines.
Query residues within the gray sphere satisfy the free criteria of the query-template pseudo energy,
while those within the navy sphere satisfy the additional secondary structure identity criterion (see
the text for details)
A more sophisticated technique was originally used in the Modeller method [117].
In this approach, spatial restraints are defined in terms of a probability density func-
tion (PDF). The PDF used for restraining a certain feature x (distance or angle, for
instance) can be written as P (x|A, B … C). This formula gives a probability density
for x when A, B … C are known. For instance, in Rosetta [134], the feature which
is restrained is the distance between pairs of Cα atoms (r) and PDF is given as a
Gaussian and defined as P(r|G, L, B, D), where G, L, B and D are predictor variables
(see Table 2).
As we know, Gaussian can be defined by two parameters: mean and standard
deviation. The latter was calculated using a non-redundant database of nearly 8,000
known protein structures. The HHSearch algorithm [128] was employed to align all
pairs of proteins. The standard deviations of r were computed for 10,000 combina-
tions of different G, L, B, D based on differences in the equivalent atoms distances in
the aligned structures and put into the four-dimensional table spanned by the values
of the predictor variables.
Such a table of standard deviations enables prediction of restraints for a query
sequence aligned with the template. For each pair of Cα atoms (apart from those
closer than 10 Å or separated by less the 10 residues along the query sequence) the
values of four predicting variables are calculated. Then, pairwise distance Gaussian
Protein Structure Prediction Using Coarse-Grained Models 45
Table 2 Predictor variables used for deriving restraints for the ROSETTA modeling tool
Feature Value
G Global alignment quality −log(E) where E is HHsearch e-value
L Residue-pair alignment quality Blosum62 [44] score
B Burial in the template structure Number of Cβ’s within 8 Å of the
template residue Cβ
D Average distance to an alignment gap Distances in a number of residues from
the aligned pair to the nearest gap in the
sequence alignment
L, B and D are averaged over the pairs of aligned residues, G is constant for the given alignment
restraints are assigned: the mean is given by a distance between the equivalent atoms
in the template structure, and the standard deviation is taken from the table according
to the calculated predictor variables.
It is also possible to combine prediction from the multiple templates as weighted
mixture of the Gaussians. Such restraints can be combined with the Rosetta energy
function by adding a component equal to i, j − ln(P(di, j )) where summation is
done over pairs of residues, and P(di, j ) is the probability of the distance di, j given
by the calculated PDF.
amino acid similarity (quarter of the negative value of the BLOSUM62 substitution
matrix [44]; cut-off: 4 Å)
similarity of hydrophobic/hydrophilic features (quarter of the negative value of the
product of Kyte-Dollitle indexes [83]; cut-off: 4 Å)
46 M. Blaszczyk et al.
similarity of the orientation and directions of the chains in the vicinity of the ith and
jth residue. (−1 if the angle between the flanking Cα-Cα vector is smaller than 90°;
cut-off: 4 Å)
identity of the secondary structures (helical or extended) of fragments consisting ith
and jth residues (−1 if identical; cut-off: 2.5 Å)
As in the CABS model, the conformational space is sampled by the REMC
scheme. The conformational updates include those originally applied in CABS mod-
ifications of small fragments (2–4 pseudo-bonds of the Cα trace) and, additionally,
rearrangements of larger parts of the chain consisting of up to 22 residues. These
larger-scale modifications enable effective sampling of the scaffold, which corre-
sponds to changing the alignments between the query and the template.
TRACER significantly extends the application of comparative modeling methods,
especially to regions of very low or even undetectable sequence identity. However,
the major drawback of the current version of TRACER, in comparison to some other
methods described in this section, is inability to use more than one template.
Table 3 Top groups in CASP9 in terms of the number of models with the highest GDT_TS score
submitted to CASP9 as first models
Group name Method Number of Mean GDT_TS Rank in CASP9
(number) models with the for all
highest server/human
GDT_TS targets
PRMLS (65) Rosetta/Modeller 7 54.10 12
LTB (400) CABS 5 51.86 28
BAKER (172) Rosetta 5 51.77 29
ZHANG_AB_INITIO QUARK 4 52.95 18
(418)
14
BAKER
LTB
12
ZHANG_AB_INITIO
10
ΔGDT_TS
8
6
4
2
0
FM FM/TBM TBM
Category
Fig. 5 Differences for three difficulty categories between mean GDT_TS for a particular group
and the mean GDT_TS for models submitted to CASP9 by all groups
difficulty of the targets in various CASP editions. However, a general tendency can
be observed that after dramatic improvements in early editions, in the last ones the
progress is modest [79].
The latest CASP experiments confirm this relatively slow, however permanent
progress in theoretical structure prediction [26, 55, 56, 103]. Combinations of coarse-
grained modeling strategies with careful bioinformatics analysis of sequence simi-
larities and final selection/refinement prove to be the most efficient [54, 133, 146,
149].
The so-called loop closure problem has been a focus of research from the earliest days
of computational protein modeling [32, 33, 144]. The prediction of loop structure is
often the most difficult challenge in comparative modeling efforts [36]. The accuracy
of homology models is usually the lowest in loop regions. Since loop regions often
exhibit very low sequence conservation, they have to be modeled without a structural
template. In that case, simple homology modeling methods cannot be used. To illus-
trate some of the applications of the CG approach to protein structure prediction,
we briefly review recent modeling efforts using the CABS CG model toward the
accurate prediction of protein loops conformation.
In the benchmark study of loop modeling methods [49] the performance of the
following tools was compared: MODELLER, ROSETTA, CABS and a combination
of MODELLER with CABS. MODELLER [25] is commonly considered a standard
comparative modeling package. It employs explicitly designed loop modeling strate-
gies relying on the optimization-based approach (conjugate gradients and molecu-
lar dynamics with simulated annealing). ROSETTA and CABS, in turn, employ a
knowledge-based driven search of the discretized conformational space. These meth-
ods were tested on a large set of loops of various lengths (4–25 residues). The tests
showed that classical modeling with MODELLER gives more accurate predictions
for short loops, while CG de novo modeling by CABS performs better for longer
loops. In the cases of long gaps in protein structures (~20 residues), loops were pre-
dicted by CABS with medium or medium-low resolution (RMSD on the level of
2–6 Å from the native). Results of similar quality were obtained for the structure
prediction of three extracellular loops of 13 G-protein coupled receptors (GPCRs)
by a de novo CABS procedure [59, 61]. This modelling task was particularly chal-
lenging for the de novo blind prediction method, as all three extracellular loops were
fully flexible during the prediction procedure. Still, the best resulting conformations
showed RMSD values lower than 3 Å from the experimental structure (see Fig. 6).
Previous benchmark studies, aimed at the prediction of missing protein structure
fragments, also indicated that the CG models (an early version of CABS and two
other tools based on similar principles) performed relatively well in the range of large
fragments [11].
Protein Structure Prediction Using Coarse-Grained Models 49
Fig. 6 Structure prediction of GPCR loops using the de novo CABS method [59]. The picture
shows the best models for second extracellular loop (EL2) in muscarinic acetylcholine receptor M2
(CHRM2), neurotensin receptor type 1 (NTSR1) and mu-type opioid receptor (OPMR1). The pre-
dicted loops are shown in red (EL2) and green (EL1 and EL3), the reference loops (crystallographic
structure) and the adjacent intracellular receptor structures in silver. The resulting conformations
of the longest EL2 exhibited the following RMSD values from the experimental conformation:
2.65 Å for CHRM2 (15 residue long), 2.99 Å for NTSR1 (21 residues) and 1.92 Å for OPMR1 (17
residues)
As shown by Jamroz and Kolinski [49] CG models can be effectively used for
the prediction of loop structures in combination with other techniques. Namely,
top ranked models generated by MODELLER were used as multiple templates for
CABS modeling. As a result of such a hybrid procedure, the predicted models were
on average more accurate than those from the single individual methods.
Fig. 7 The figure presents the results of docking nuclear receptor coactivator 1 (sequence:
HKLVQLLTTT) to peroxisome proliferator-activated receptor gamma (PDB code 2FVJ:A) without
using prior knowledge about the binding site. The docking was performed with CABS-dock method
[81]. Panel a shows 1000 lowest energy models (light blue, best model RMSD to native pose is
1.43 A) while panel b shows the top scored model (dark blue, RMSD to native pose is 3.46 A)
together with the experimental structure of bound peptide (light blue) in the close up frame (native
complex PDB code: 2FVJ). The protein receptor is presented in surface representation
One of the main purposes of this chapter was to demonstrate that the most inter-
esting CG models are based on quite complex sets of assumptions, such as protein
representation, force field, coordination system and sampling scheme. Obviously,
the accuracy of particular assumptions of CG protein models defines the range of
applicability of modeling procedures. It seems to be reasonable to state that the future
development of CG models will focus on a more accurate reconstruction of real phys-
ical effects. Increasing computational power should lead to a considerable decrease
Protein Structure Prediction Using Coarse-Grained Models 51
in the assumed simplifications of the existing models, and, therefore, provide a more
accurate description of the observed physics of biomacromolecules.
Another promising direction of the development of CG models is a more effective
combination of existing CG methods with comparative modeling approaches [58,
116, 147]. Perhaps, the term “unification” would be more accurate as we believe
that the incorporation of comparative modeling methods should go further than mere
utilization of information provided by stand-alone comparative modeling tools. Such
a precursor approach has been shown in Sect. 5.3.
Finally, we expect that the development of integrative approaches which use exper-
imental data from various sources together with different computational techniques,
as well CG models, will be critical. The most recent (and spectacular) examples of
the integrative structure determination include the use of Cryo-Electron Microscopy
(cryo-EM) in combination with CG modeling techniques. One of the biggest advan-
tage of Cryo-EM experiments is the fact that, contrary to the popular X-ray crystal-
lography, specimens can be observed in their native environment, which enables the
exploration of conformational states. The main problem for Cryo-EM maps is their
low resolution which can be solved by the application of CG computational tech-
niques for fitting high-resolution protein structures [150]. Probably, such integrative
approaches will become widespread in the near future.
References
1. Abagyan, R.A., Mazur, A.K.: New methodology for computer-aided modelling of biomolecu-
lar structure and dynamics. 2. Local Deformations Cycles J. Biomol. Struct. Dyn. 6, 833–845
(1989). doi: citeulike-article-id:673543
2. Adcock, S.A.: Peptide backbone reconstruction using dead-end elimination and a knowledge-
based forcefield. J. Comput. Chem. 25, 16–27 (2004). https://doi.org/10.1002/jcc.10314
3. Altschul, M., Simpson, K.W., Dykes, N.L., Mauldin, E.A., Reubi, J.C., Cummings, J.F.:
Evaluation of somatostatin analogues for the detection and treatment of gastrinoma in a dog.
J. Small Anim. Pract. 38, 286–291 (1997)
4. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.:
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res. 25, 3389–3402 (1997b). doi: gka562 [pii]
5. Anfinsen, C.B.: Principles that govern the folding of protein chains. Science 181, 223–230
(1973)
6. Anfinsen, C.B., Haber, E., Sela, M., White Jr., F.H.: The kinetics of formation of native
ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. USA
47, 1309–1314 (1961)
7. Berman, H., Henrick, K., Nakamura, H.: Announcing the worldwide protein data bank. Nat.
Struct. Biol. 10, 980 (2003). https://doi.org/10.1038/nsb1203-980 nsb1203-980 [pii]
52 M. Blaszczyk et al.
8. Betancourt, M.: A reduced protein model with accurate native-structure identification ability.
Proteins 53, 889–907 (2003). doi: citeulike-article-id:5200969
9. Blaszczyk, M., Kurcinski, M., Kouza, M., Wieteska, L., Debinski, A., Kolinski, A., Kmiecik,
S.: Modeling of protein-peptide interactions using the CABS-dock web server for binding
site search and flexible docking. Methods 93, 72–83 (2016). https://doi.org/10.1016/j.ymeth.
2015.07.004
10. Blundell, T., et al.: 18th Sir Hans Krebs lecture. Knowl.-Based Protein Model. Design Eur. J.
Biochem. 172, 513–520 (1988)
11. Boniecki, M., Rotkiewicz, P., Skolnick, J., Kolinski, A.: Protein fragment reconstruction
using various modeling techniques. J. Comput. Aided Mol. Des. 17, 725–738 (2003). doi:
citeulike-article-id:668480
12. Buchete, N.V., Straub, J.E., Thirumalai, D.: Orientation-dependent coarse-grained potentials
derived by statistical analysis of molecular structural databases Polymer 45, 597–608 (2004).
doi: citeulike-article-id:10750645
13. Bystroff, C., Baker, D.: Prediction of local structure in proteins using a library of sequence-
structure motifs. J. Mol. Biol. 281, 565–577 (1998). doi: citeulike-article-id:669894
14. Camproux, A.C., Gautier, R., Tuffery, P.: A hidden markov model derived structural alpha-
bet for proteins. J. Mol. Biol. 339, 591–605 (2004). https://doi.org/10.1016/j.jmb.2004.04.
005s0022283604004085 [pii]
15. Ciemny, M.P., Kurcinski, M., Blaszczyk, M., Kolinski, A., Kmiecik, S.: Modeling EphB4-
EphrinB2 protein-protein interaction using flexible docking of a short linear motif. Biomed.
Eng. Online 16, 71 (2017). https://doi.org/10.1186/s12938-017-0362-7
16. Ciemny, M.P., Kurcinski, M., Kozak, K.J., Kolinski, A., Kmiecik, S.: Highly flexible protein-
peptide docking using CABS-Dock. Methods Mol. Biol. 1561, 69–94 (2017). https://doi.org/
10.1007/978-1-4939-6798-8_6
17. Ciemny, M.P., Debinski, A., Paczkowska, M., Kolinski, A., Kurcinski, M., Kmiecik, S.:
Protein-peptide molecular docking with large-scale conformational changes: the p 53-MDM2
interaction. Sci. Rep. 6, 37532 (2016). https://doi.org/10.1038/srep37532
18. Ciemny, M., Kurcinski, M., Kamel, K., Kolinski, A., Alam, N., Schueler-Furman, O.,
Kmiecik, S.: Protein–peptide docking: opportunities and challenges. Drug Discov. Today
23(8), 1530–1537, ISSN 1359-6446 (2018). https://doi.org/10.1016/j.drudis.2018.05.006
19. Covell, D.G.: Folding protein alpha-carbon chains into compact forms by Monte Carlo meth-
ods. Proteins 14, 409–420 (1992). https://doi.org/10.1002/prot.340140310
20. Czaplewski, C., Liwo, A., Makowski, M., Ołdziej, S., Scheraga, H.A.: Coarse-grained models
of proteins: theory and applications. In: Kolinski, A. (ed.) Multiscale approaches to protein
modeling, pp. 85–109. Springer, New York (2011)
21. Czaplewski, C., Rodziewicz-Motowidlo, S., Liwo, A., Ripoll, D.R., Wawak, R.J., Scheraga,
H.A.: Molecular simulation study of cooperativity in hydrophobic association. Protein Sci. 9,
1235–1245 (2000). https://doi.org/10.1110/ps.9.6.1235
22. Dashevskii, V.G.: [Lattice model for globular protein three-dimensional structure] Mol. Biol.
(Mosk) 14, 105–117 (1980)
23. Dawid, A.E., Gront, D., Kolinski, A.: SURPASS low-resolution coarse-grained protein model-
ing. J. Chem. Theor. Comput. 13, 5766–5779 (2017). https://doi.org/10.1021/acs.jctc.7b00642
24. De Sancho, D., Rey, A.: Evaluation of coarse grained models for hydrogen bonds in proteins.
J. Comput. Chem. 28 (2007). doi: citeulike-article-id:1127406
25. Eswar, N., Eramian, D., Webb, B., Shen, M.Y., Sali, A.: Protein structure modeling with MOD-
ELLER. Methods Mol. Biol. 426, 145–159 (2008). https://doi.org/10.1007/978-1-60327-058-
8_8
26. Feig, M., Mirjalili, V.: Protein structure refinement via molecular-dynamics simulations: what
works and what does not? Proteins 84(Suppl 1), 282–292 (2016). https://doi.org/10.1002/prot.
24871
27. Ferrenberg, A., Landau, D.P., Swendsen, R.: Statistical errors in histogram reweighting. Phys.
Rev. E 51, 5092 (1995). doi:citeulike-article-id:875595
Protein Structure Prediction Using Coarse-Grained Models 53
28. Ferrenberg, A., Swendsen, R.: Optimized Monte Carlo data analysis. Phys. Rev. Lett. 63,
1195–1198 (1989). doi:citeulike-article-id:774372
29. Fosgerau, K., Hoffmann, T.: Peptide therapeutics: current status and future directions. Drug
Discov. Today 20, 122–128 (2015). https://doi.org/10.1016/j.drudis.2014.10.003
30. Gautier, R., Camproux, A.C., Tuffery, P.: SCit: web tools for protein side chain conformation
analysis. Nucleic Acids Res. 32, W508–511 (2004). https://doi.org/10.1093/nar/gkh38832/
suppl_2/w508 [pii]
31. Geyer, C.J.: Markov chain Monte Carlo maximum likelihood. In: Computing Science and
Statistics: Proceedings of 23rd Symposium on the Interface Interface Foundation. Fairfax
Station, pp. 156–163 (1991). doi: citeulike-article-id:606345
32. Go, N., Scheraga, H.: Ring closure and local conformational deformations of chain molecules.
Macromolecules 3, 178–187 (1970)
33. Go, N., Scheraga, H.A.: Ring-Closure in Chain Molecules with Cn, I, or S2n Symmetry.
Macromolecules 6, 273–281 (1973)
34. Godzik, A., Kolinski, A., Skolnick, J.: Lattice representations of globular proteins: how good
are they? J. Comput. Chem. 14, 1194–1202 (1993). https://doi.org/10.1002/jcc.540141009
35. Grishaev, A., Bax, A.: An empirical backbone–backbone hydrogen-bonding potential in pro-
teins and its applications to NMR structure refinement and validation. J. Am. Chem. Soc. 126,
7281–7292 (2004). doi: citeulike-article-id:1896684
36. Gront, D., Kmiecik, S., Blaszczyk, M., Ekonomiuk, D., Koliński, A.: Optimization of protein
models Wiley interdisciplinary reviews: computational molecular. Science 2, 479–493 (2012).
https://doi.org/10.1002/wcms.1090
37. Gront, D., Kmiecik, S., Kolinski, A.: Backbone building from quadrilaterals: a fast and accu-
rate algorithm for protein backbone reconstruction from alpha carbon coordinates. J. Comput.
Chem. 28, 1593–1597 (2007). https://doi.org/10.1002/jcc.20624
38. Gront, D., Kolinski, A., Skolnick, J.: Comparison of three Monte Carlo conformational search
strategies for a proteinlike homopolymer model: Folding thermodynamics and identification of
low-energy structures. J. Chem. Phys. 113, 5065–5071 (2000). doi: citeulike-article-id:606324
39. Gront, D., Kolinski, A., Skolnick, J.: A new combination of replica exchange Monte Carlo and
histogram analysis for protein folding and thermodynamics. J. Chem. Phys. 115, 1569–1574
(2001). doi: citeulike-article-id:876359
40. Gront, D., Kulp, D., Vernon, R., Strauss, C., Baker, D.: Generalized fragment picking in
rosetta: design, protocols and applications. PLoS ONE 6, e23294 (2011). doi: citeulike-article-
id:9705043
41. Guardiani, C., Livi, R., Cecconi, F.: Coarse Grained Modeling and Approaches to Protein
Folding. Curr. Bioinform. 5, 217–240 (2010)
42. Hansmann, U.: parallel tempering algorithm for conformational studies of biological
molecules. Chem. Phys. Lett. 281, 140–150 (1997). doi: citeulike-article-id:715765
43. Heath, A.P., Kavraki, L.E., Clementi, C.: From coarse-grain to all-atom: toward multiscale
analysis of protein landscapes. Proteins 68, 646–661 (2007). https://doi.org/10.1002/prot.
21371
44. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc.
Natl. Acad. Sci. USA 89, 10915–10919 (1992)
45. Hinds, D.A., Levitt, M.: A lattice model for protein structure prediction at low resolution.
Proc. Natl. Acad. Sci. USA 89, 2536–2540 (1992)
46. Holm, L., Sander, C.: Database algorithm for generating protein backbone and side-chain
co-ordinates from a C alpha trace application to model building and detection of co-ordinate
errors. J. Mol. Biol. 218, 183–194 (1991). doi: 0022-2836(91)90883-8 [pii]
47. Illergard, K., Ardell, D.H., Elofsson, A.: Structure is three to ten times more conserved than
sequence–a study of structural response in protein cores. Proteins 77, 499–508 (2009). https://
doi.org/10.1002/prot.22458
48. Irbäck, A., Mohanty, S.: PROFASI: A Monte Carlo simulation package for protein folding
and aggregation. J. Comput. Chem. 27, 1548–1555 (2006). doi: citeulike-article-id:7290910
54 M. Blaszczyk et al.
49. Jamroz, M., Kolinski, A.: Modeling of loops in proteins: a multi-method approach. BMC
Struct. Biol. 10, 5+ (2010)
50. Jones, T.A., Thirup, S.: Using known substructures in protein model building and crystallog-
raphy. EMBO J. 5, 819–822 (1986). doi: citeulike-article-id:705742
51. Karplus, M., McCammon, J.A.: Molecular dynamics simulations of biomolecules. Nat. Struct.
Biol. 9, 646–652 (2002). https://doi.org/10.1038/nsb0902-646nsb0902-646 [pii]
52. Kazmierkiewicz, R., Liwo, A., Scheraga, H.A.: Energy-based reconstruction of a protein back-
bone from its alpha-carbon trace by a Monte-Carlo method. J. Comput. Chem. 23, 715–723
(2002). https://doi.org/10.1002/jcc.10068 [pii]
53. Kazmierkiewicz, R., Liwo, A., Scheraga, H.A.: Addition of side chains to a known
backbone with defined side-chain centroids. Biophys. Chem. 100, 261–280 (2003). doi:
S0301462202002855 [pii]
54. Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., Sternberg, M.J.: The Phyre2 web portal
for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015). https://doi.
org/10.1038/nprot.2015.053nprot.2015.053 [pii]
55. Kim, H., Kihara, D.: Protein structure prediction using residue- and fragment-environment
potentials in CASP11. Proteins 84(Suppl 1), 105–117 (2016). https://doi.org/10.1002/prot.
24920
56. Kinch, L.N., Li, W., Monastyrskyy, B., Kryshtafovych, A., Grishin, N.V.: Evaluation of free
modeling targets in CASP11 and ROLL. Proteins 84(Suppl 1), 51–66 (2016). https://doi.org/
10.1002/prot.24973
57. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science
220, 671–680 (1983). doi: citeulike-article-id:379797
58. Kmiecik, S., Gront, D., Kolinski, M., Wieteska, L., Dawid, A.E., Kolinski, A.: Coarse-grained
protein models and their applications. Chem. Rev. 116, 7898–7936 (2016). https://doi.org/10.
1021/acs.chemrev.6b00163
59. Kmiecik, S., Jamroz, M., Kolinski, M.: Structure prediction of the second extracellular loop
in G-protein-coupled receptors. Biophys. J. 106, 2408–2416 (2014). https://doi.org/10.1016/
j.bpj.2014.04.022
60. Kolinski, A., Betancourt, M.R., Kihara, D., Rotkiewicz, P., Skolnick, J.: Generalized compar-
ative modeling (GENECOMP): a combination of sequence comparison, threading, and lattice
modeling for protein structure prediction and refinement. Proteins 44, 133–149 (2001)
61. Kolinski, M., Filipek, S.: Study of a structurally similar kappa opioid receptor agonist and
antagonist pair by molecular dynamics simulations. J. Mol. Model. 16, 1567–1576 (2010).
https://doi.org/10.1007/s00894-010-0678-8
62. Kolinski, A., Galazka, W., Skolnick, J.: Computer design of idealized beta-motifs. J. Chem.
Phys. 103, 10286–10297 (1995)
63. Kolinski, A., Ilkowski, B., Skolnick, J.: Dynamics and thermodynamics of beta-hairpin assem-
bly: insights from various simulation techniques. Biophys. J. 77, 2942–2952 (1999)
64. Kolinski, A., Milik, M., Rycombel, J., Skolnick, J.: A reduced model of short-range interac-
tions in polypeptide-chains. J. Chem. Phys. 103, 4312–4323 (1995)
65. Kolinski, A., Milik, M., Skolnick, J.: Static and dynamic properties of a new lattice model of
polypeptide-chains. J. Chem. Phys. 94, 3978–3985 (1991)
66. Kolinski, A., Skolnick, J.: Monte Carlo simulations of protein folding. I. Lattice model and
interaction scheme. Proteins 18, 338–352 (1994). https://doi.org/10.1002/prot.340180405
67. Kolinski, A., Skolnick, J.: Reduced models of proteins and their applications. Polymer 45,
511–524 (2004). https://doi.org/10.1016/j.polymer.2003.10.064
68. Kolinski, A.: Protein modeling and structure prediction with a reduced representation. Acta
Biochimica. Polonica 51, 349–371 (2004). doi: citeulike-article-id:606304
69. Kolinski, A., Gront, D.: Comparative modeling without implicit sequence alignments. Bioin-
formatics 23, 2522–2527 (2007). doi: btm380 [pii]https://doi.org/10.1093/bioinformatics/
btm380
70. Kolinski, A., Rotkiewicz, P., Ilkowski, B., Skolnick, J.: A method for the improvement
of threading-based protein models. Proteins 37, 592–610 (1999b). https://doi.org/10.1002/
(sici)1097-0134(19991201)37:4%3c592::aid-prot10%3e3.0.co;2-2 [pii]
Protein Structure Prediction Using Coarse-Grained Models 55
71. Kolinski, A., Skolnick, J.: Lattice Models of Protein Folding, Dynamics and Thermodynamics.
Landes (1996). doi: citeulike-article-id:877252
72. Kolinski, A., Skolnick, J.: Assembly of protein structure from sparse experimental data: an
efficient Monte Carlo model. Proteins 32, 475–494 (1998). https://doi.org/10.1002/(sici)1097-
0134(19980901)32:4%3c475::aid-prot6%3e3.0.co;2-f [pii]
73. Kortemme, T., Morozov, A.V., Baker, D.: An orientation-dependent hydrogen bonding poten-
tial improves prediction of specificity and structure for proteins and protein-protein complexes.
J. Mol. Biol. 326, 1239–1259 (2003). doi: citeulike-article-id:556189
74. Krigbaum, W.R., Lin, S.F.: Monte-Carlo simulation of protein folding using a lattice model.
Macromolecules 15, 1135–1145 (1982)
75. Krivov, G.G., Shapovalov, M.V., Dunbrack Jr., R.L.: Improved prediction of protein side-
chain conformations with SCWRL4. Proteins 77, 778–795 (2009). https://doi.org/10.1002/
prot.22488
76. Krupa, P., Mozolewska, M.A., Joo, K., Lee, J., Czaplewski, C., Liwo, A.: Prediction of protein
structure by template-based modeling combined with the UNRES force field. J. Chem. Inf.
Model. 55, 1271–1281 (2015). https://doi.org/10.1021/acs.jcim.5b00117
77. Krupa, P., Sieradzan, A.K., Mozolewska, M.A., Li, H., Liwo, A., Scheraga, H.A.: Dynamics
of disulfide-bond disruption and formation in the thermal unfolding of ribonuclease A. J.
Chem. Theor. Comput. 13, 5721–5730 (2017). https://doi.org/10.1021/acs.jctc.7b00724
78. Krupa, P., et al.: Performance of protein-structure predictions with the physics-based UNRES
force field in CASP11. Bioinformatics 32, 3270–3278 (2016). doi:btw404 [pii]https://doi.org/
10.1093/bioinformatics/btw404
79. Kryshtafovych, A., Fidelis, K., Moult, J.: CASP9 results compared to those of previous CASP
experiments. Proteins 79(Suppl 10), 196–207 (2011). https://doi.org/10.1002/prot.23182
80. Kumar, S., Rosenberg, J., Bouzida, D., Swendsen, R., Kollman, P.: Multidimensional free-
energy calculations using the weighted histogram analysis method. J. Comput. Chem. 16,
1339–1350 (1995). doi: citeulike-article-id:774417
81. Kurcinski, M., Jamroz, M., Blaszczyk, M., Kolinski, A., Kmiecik, S.: CABS-dock web server
for the flexible docking of peptides to proteins without prior knowledge of the binding site.
Nucleic Acids Res. 43, W419–424 (2015). https://doi.org/10.1093/nar/gkv456gkv456 [pii]
82. Kwak, W., Hansmann, U.H.: Efficient sampling of protein structures by model hopping. Phys.
Rev. Lett. 95, 138102 (2005). https://doi.org/10.1103/PhysRevLett.95.138102
83. Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein.
J. Mol. Biol. 157, 105–132 (1982). doi: 0022-2836(82)90515-0 [pii]
84. Lee, H., Heo, L., Lee, M.S., Seok, C.: GalaxyPepDock: a protein-peptide docking tool based on
interaction similarity and energy optimization. Nucleic Acids Res. 43, W431–W435 (2015).
https://doi.org/10.1093/nar/gkv495
85. Lee, J., Scheraga, H.A., Rackovsky, S.: New optimization method for conformational
energy calculations on polypeptides: conformational space annealing. J. Comput. Chem. 18,
1222–1232 (1997)
86. Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein
folding. J. Mol. Biol. 104, 59–107 (1976). doi: citeulike-article-id:4000523
87. Levitt, M., Warshel, A.: Computer simulation of protein folding. Nature 253, 694–698 (1975).
doi: citeulike-article-id:4275709
88. Levy-Moonshine, A., Amir, E-a. D., Keasar, C.: Enhancement of beta-sheet assembly by
cooperative hydrogen bonds potential. Bioinformatics 25, 2639–2645 (2009). doi: citeulike-
article-id:7012194
89. Li, Z., Scheraga, H.A.: Monte Carlo-minimization approach to the multiple-minima problem
in protein folding. Proc. Natl. Acad. Sci. USA 84, 6611–6615 (1987)
90. Liwo, A., He, Y., Scheraga, H.A.: Coarse-grained force field: general folding theory. Phys.
Chem. Chem. Phys. 13, 16890–16901 (2011). https://doi.org/10.1039/c1cp20752k
91. Liwo, A., et al.: Simulation of Protein Structure and Dynamics with the Coarse-Grained
UNRES Force Field. Coarse-Graining of Condensed Phase and Biomolecular Systems. CRC
Press (2008). doi: citeulike-article-id:3822586
56 M. Blaszczyk et al.
92. Liwo, A., Czaplewski, C., Pillardy, J., Scheraga, H.: Cumulant-based expressions for the
multibody terms for the correlation between local and electrostatic interactions in the united-
residue force field. J. Chem. Phys. 115, 2323–2347 (2001). doi: citeulike-article-id:715745
93. Liwo, A., Khalili, M., Scheraga, H.: Ab initio simulations of protein-folding pathways by
molecular dynamics with the united-residue model of polypeptide chains. Proc. Natl. Acad.
Sci. U.S.A. 102, 2362–2367 (2005). doi: citeulike-article-id:1365687
94. Liwo, A., Pincus, M.R., Wawak, R.J., Rackovsky, S., Scheraga, H.A.: Prediction of protein
conformation on the basis of a search for compact structures: test on avian pancreatic polypep-
tide. Protein Sci.: Publ. Protein Soc. 2, 1715–1731 (1993). doi: citeulike-article-id:7558759
95. London, N., Raveh, B., Cohen, E., Fathi, G., Schueler-Furman, O.: Rosetta FlexPepDock
web server–high resolution modeling of peptide-protein interactions. Nucleic Acids Res. 39,
W249–W253 (2011). https://doi.org/10.1093/nar/gkr431
96. Maupetit, J., Gautier, R., Tuffery, P.: SABBAC: Online structural alphabet-based protein
BackBone reconstruction from alpha-carbon trace. Nucleic Acids Res. 34, W147–151 (2006).
doi: 34/suppl_2/W147 [pii]https://doi.org/10.1093/nar/gkl289
97. Mazur, A.K., Dorofeev, V.E., Abagyan, R.A.: Derivation and testing of explicit equations of
motion for polymers described by internal coordinates. J. Comput. Phys. 92, 261–272 (1991).
doi: citeulike-article-id:10750684
98. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of state cal-
culations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953). doi: citeulike-
article-id:531300
99. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44, 335–341 (1949).
doi: citeulike-article-id:1886002
100. Milik, M., Kolinski, A., Skolnick, J.: Algorithm for rapid reconstruction of protein backbone
from alpha carbon coordinates. J. Comput. Chem. 18, 80–85 (1997)
101. Mitsutake, A., Sugita, Y., Okamoto, Y.: Generalized-ensemble algorithms for molecu-
lar simulations of biopolymers. Biopolymers 60, 96–123 (2001). https://doi.org/10.1002/
1097-0282(2001)60:2%3c96::aid-bip1007%3e3.0.co;2-f [pii]https://doi.org/10.1002/1097-
0282(2001)60:2%3c96::AID-BIP1007%3e3.0.CO;2-F
102. Morozov, A., Lin, S.: Accuracy and convergence of the Wang-Landau sampling algorithm.
Phys. Rev. E (Statistical, Nonlinear, and Soft Matter Physics) 76 (2007). doi: citeulike-article-
id:3802626
103. Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., Tramontano, A.: Critical assessment
of methods of protein structure prediction (CASP)-round XII. Proteins (2017). https://doi.
org/10.1002/prot.25415
104. Moult, J., Fidelis, K., Kryshtafovych, A., Tramontano, A.: Critical assessment of methods
of protein structure prediction (CASP)–round IX. Proteins 79(Suppl 10), 1–5 (2011). https://
doi.org/10.1002/prot.23200
105. Mozolewska, M.A., Krupa, P., Zaborowski, B., Liwo, A., Lee, J., Joo, K., Czaplewski, C.: Use
of restraints from consensus fragments of multiple server models to enhance protein-structure
prediction capability of the UNRES force field. J. Chem. Inf. Model. 56, 2263–2279 (2016).
https://doi.org/10.1021/acs.jcim.6b00189
106. Park, B.H., Levitt, M.: The complexity and accuracy of discrete state models of protein
structure. J. Mol. Biol. 249, 493–507 (1995). doi: citeulike-article-id:5845728
107. Parsons, J., Holmes, B., Rojas, M., Tsai, J., Strauss, C.: Practical conversion from torsion
space to Cartesian space forin silico protein synthesis. J. Comput. Chem. 26, 1063–1068
(2005). doi: citeulike-article-id:1036763
108. Payne, P.W.: Reconstruction of protein conformations from estimated positions of the C-alpha
coordinates. Protein Sci. 2, 315–324 (1993)
109. Peterson, L.X., et al.: Modeling the assembly order of multimeric heteroprotein com-
plexes. PLoS Comput. Biol. 14, e1005937 (2018). https://doi.org/10.1371/journal.pcbi.
1005937pcompbiol-d-17-00872 [pii]
110. Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequences (RefSeq): a curated non-
redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35,
D61–65 (2007). doi: gkl842 [pii]https://doi.org/10.1093/nar/gkl842
Protein Structure Prediction Using Coarse-Grained Models 57
111. Pundir, S., Martin, M.J., O’Donovan, C.: UniProt protein knowledgebase methods. Mol. Biol.
1558, 41–55 (2017). https://doi.org/10.1007/978-1-4939-6783-4_2
112. Raveh, B., London, N., Zimmerman, L., Schueler-Furman, O.: Rosetta FlexPepDock ab-initio:
simultaneous folding, docking and refinement of peptides onto their receptors. PLoS ONE 6,
e18934 (2011). https://doi.org/10.1371/journal.pone.0018934
113. Rohl, C., Strauss, C., Misura, K., Baker, D.: Protein structure prediction using Rosetta. In:
Numerical Computer Methods, Part D, vol. 383, pp. 66–93. Elsevier (2004). doi: citeulike-
article-id:441859
114. Rose, P.W., et al.: The RCSB protein data bank: integrative view of protein, gene and 3D
structural information. Nucleic Acids Res. 45, D271–D281 (2017). https://doi.org/10.1093/
nar/gkw1000
115. Rotkiewicz, P., Skolnick, J.: Fast procedure for reconstruction of full-atom protein models
from reduced representations. J. Comput. Chem. 29, 1460–1465 (2008). https://doi.org/10.
1002/jcc.20906
116. Sali, A., et al.: Outcome of the first wwPDB hybrid/integrative methods task force workshop.
Structure 23, 1156–1167 (2015). https://doi.org/10.1016/j.str.2015.05.013
117. Sali, A., Blundell, T.L.: Comparative protein modelling by satisfaction of spatial restraints. J.
Mol. Biol. 234, 779–815 (1993). doi: S0022-2836(83)71626-8 [pii] https://doi.org/10.1006/
jmbi.1993.1626
118. Scheraga, H.A., Khalili, M., Liwo, A.: Protein-folding dynamics: overview of molecular
simulation techniques. Annu. Rev. Phys. Chem. 58, 57–83 (2007). https://doi.org/10.1146/
annurev.physchem.58.032806.104614
119. Schindler, C.E., de Vries, S.J., Zacharias, M.: iATTRACT: simultaneous global and local
interface optimization for protein-protein docking refinement. Proteins 83, 248–258 (2015).
https://doi.org/10.1002/prot.24728
120. Schindler, C.E., de Vries, S.J., Zacharias, M.: Fully blind peptide-protein docking with pepAT-
TRACT. Structure 23, 1507–1515 (2015a). https://doi.org/10.1016/j.str.2015.05.021s0969-
2126(15)00224-5 [pii]
121. Shenoy, S.R., Jayaram, B.: Proteins: sequence to structure and function–current status. Curr.
Protein Pept. Sci. 11, 498–514 (2010)
122. Shi, J., Blundell, T.L., Mizuguchi, K.: FUGUE: sequence-structure homology recognition
using environment-specific substitution tables and structure-dependent gap penalties. J. Mol.
Biol. 310, 243–257 (2001). https://doi.org/10.1006/jmbi.2001.4762s0022-2836(01)94762-x
[pii]
123. Shin, W.H., Christoffer, C.W., Kihara, D.: In silico structure-based approaches to dis-
cover protein-protein interaction-targeting drugs. Methods 131, 22–32 (2017). doi: S1046-
2023(17)30208-6 [pii]https://doi.org/10.1016/j.ymeth.2017.08.006
124. Sippl, M.J.: Boltzmann’s principle, knowledge-based mean fields and protein folding. An
approach to the computational determination of protein structures. J. Comput. Aided Mol.
Des. 7, 473–501 (1993)
125. Skolnick, J., Kolinski, A.: Dynamic Monte Carlo simulations of globular protein fold-
ing/unfolding pathways. I. Six-member, Greek key beta-barrel proteins. J. Mol. Biol. 212,
787–817 (1990a). doi:0022-2836(90)90237-G [pii]
126. Skolnick, J., Kolinski, A.: Simulations of the folding of a globular protein. Science 250,
1121–1125 (1990b). doi: 250/4984/1121 [pii]https://doi.org/10.1126/science.250.4984.1121
127. Skolnick, J., Kolinski, A., Brooks III, C.L., Godzik, A., Rey, A.: A method for predicting
protein structure from sequence. Curr. Biol. 3, 414–423 (1993). doi:0960-9822(93)90348-R
[pii]
128. Soding, J.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21,
951–960 (2005). doi: bti125 [pii]https://doi.org/10.1093/bioinformatics/bti125
129. Stein, A., Kortemme, T.: Improvements to robotics-inspired conformational sampling in
rosetta. PLoS ONE 8, e63090 (2013). https://doi.org/10.1371/journal.pone.0063090pone-d-
13-06862 [pii]
58 M. Blaszczyk et al.
130. Stumpff-Kane, A.W., Maksimiak, K., Lee, M.S., Feig, M.: Sampling of near-native pro-
tein conformations during protein structure refinement using a coarse-grained model, normal
modes, and molecular dynamics simulations. Proteins 70, 1345–1356 (2008). https://doi.org/
10.1002/prot.21674
131. Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding.
Chem. Phys. Lett. 314, 141–151 (1999). doi:citeulike-article-id:197524
132. Swendsen, R., Wang, J.: Replica Monte Carlo simulation of spin-glasses. Phys. Rev. Lett. 57,
2607–2609 (1986). doi: citeulike-article-id:773436
133. Tai, C.H., Bai, H., Taylor, T.J., Lee, B.: Assessment of template-free modeling in CASP10
and ROLL. Proteins 82(Suppl 2), 57–83 (2014). https://doi.org/10.1002/prot.24470
134. Thompson, J., Baker, D.: Incorporation of evolutionary information into Rosetta comparative
modeling. Proteins 79, 2380–2388 (2011). https://doi.org/10.1002/prot.23046
135. Trabuco, L.G., Lise, S., Petsalaki, E., Russell, R.B.: PepSite: prediction of peptide-binding
sites from protein surfaces. Nucleic Acids Res. 40, W423–W427 (2012). https://doi.org/10.
1093/nar/gks398
136. Trojanowski, S., Rutkowska, A., Kolinski, A.: TRACER. A new approach to comparative
modeling that combines threading with free-space conformational sampling. Acta Biochim.
Pol. 57, 125–133 (2010)
137. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158-D169 (2017)
https://doi.org/10.1093/nar/gkw1099
138. Vendruscolo, M., Najmanovich, R., Domany, E.: Can a pairwise contact poten-
tial stabilize native protein folds against decoys obtained by threading? Proteins
38, 134–148 (2000). https://doi.org/10.1002/(sici)1097-0134(20000201)38:2%3c134::aid-
prot3%3e3.0.co;2-a [pii]
139. Vinals, J., Kolinski, A., Skolnick, J.: Numerical study of the entropy loss of dimerization and
the folding thermodynamics of the GCN4 leucine zipper. Biophys. J. 83, 2801–2811 (2002).
doi: S0006-3495(02)75289-2 [pii]https://doi.org/10.1016/s0006-3495(02)75289-2
140. Voth, G. (ed): Coarse-Graining of Condensed Phase and Biomolecular Systems. CRC Press
Taylor & Francis, Farmington, CT (2008)
141. Wabik, J., Kurcinski, M., Kolinski, A.: Coarse-grained modeling of peptide docking associated
with large conformation transitions of the binding protein: Troponin I fragment-Troponin C
system. Molecules 20, 10763–10780 (2015). https://doi.org/10.3390/molecules200610763
142. Wales, D.: Energy Landscapes: Applications to Clusters, Biomolecules and Glasses (Cam-
bridge Molecular Science). Cambridge University Press (2004). doi: citeulike-article-
id:755112
143. Wang, T., Wu, M.B., Zhang, R.H., Chen, Z.J., Hua, C., Lin, J.P., Yang, L.R.: Advances in
computational structure-based drug design and application in drug discovery. Curr. Top Med.
Chem. 16, 901–916 (2016). doi: CTMC-EPUB-69847 [pii]
144. Wedemeyer, W.J., Scheraga, H.A.: Exact analytical loop closure in proteins using polynomial
equations. J. Comput. Chem. 20, 819–844 (1999)
145. Xu, D., Zhang, J., Roy, A., Zhang, Y.: Automated protein structure modeling in CASP9
by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based
structure refinement. Proteins 79(Suppl 10), 147–160 (2011). https://doi.org/10.1002/prot.
23111
146. Yan, C.H., et al.: Minimal residual disease- and graft-vs.-host disease-guided multiple consol-
idation chemotherapy and donor lymphocyte infusion prevent second acute leukemia relapse
after allotransplant. J. Hematol. Oncol. 9, 87 (2016). https://doi.org/10.1186/s13045-016-
0319-5
147. Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., Zhang, Y.: The I-TASSER Suite: protein
structure and function prediction. Nat. Methods 12(1), 7–8 (2015). https://doi.org/10.1038/
nmeth.3213
148. Zhang, J., He, Z., Wang, Q., Barz, B., Kosztin, I., Shang, Y., Xu, D.: Prediction of protein
tertiary structures using MUFOLD methods. Mol. Biol. 815, 3–13 (2012). https://doi.org/10.
1007/978-1-61779-424-7_1
Protein Structure Prediction Using Coarse-Grained Models 59
149. Zhang, Y.: Interplay of I-TASSER and QUARK for template-based and ab initio protein
structure prediction in CASP10. Proteins 82(Suppl 2), 175–187 (2014). https://doi.org/10.
1002/prot.24341
150. Zheng, W.: Accurate flexible fitting of high-resolution protein structures into cryo-electron
microscopy maps using coarse-grained pseudo-energy minimization. Biophys. J. 100,
478–488 (2011). doi: S0006-3495(10)05186-6 [pii]
151. Zhou, H., Zhou, Y.: Distance-scaled, finite ideal-gas reference state improves structure-derived
potentials of mean force for structure selection and stability prediction. Protein Sci. 11,
2714–2726 (2002). https://doi.org/10.1110/ps.0217002
152. Zhou, R., et al.: Folding kinetics of WW domains with the united residue force field for
bridging microscopic motions and experimental measurements. Proc. Natl. Acad. Sci. U.S.A.
111, 18243–18248 (2014). https://doi.org/10.1073/pnas.14209141111420914111 [pii]
Protein Dynamics Simulations Using
Coarse-Grained Models
1 Introduction
The steady increase in computational power constantly sets new limits in simula-
tions of biomolecular dynamics [164]. Nevertheless, the majority of biologically
relevant protein dynamic processes remain beyond the reach of atomistic Molec-
ular Dynamics (MD), the classical simulation tool. In such cases, the introduction
of properly designed simplifications that capture relevant physical features can be
the only option, or incomparably cheaper than atomistic MD, to better understand
macromolecular processes [64].
A variety of purely theoretical models for analyzing the dynamic properties of
proteins have been proposed [109, 171]. Nevertheless they appeared to be rather
limited in their predictions. This is due to the complicated nature of proteins and
rules governing their structure. Compared to purely analytical methods, the molec-
ular simulation approach is better suited to handling protein complexity. Presently,
molecular simulations represent a powerful and the most widely used theoretical
approach for the understanding of protein dynamics [64, 99, 117].
Fig. 1 Conceptual components of CG protein simulation models and their variants: a protein
representation, b interaction schemes (Go-like potentials are protein specific, i.e., native interactions
are favored to assure the lowest energy for the native conformation, and are used individually or in
combination with non-protein specific: physics- or knowledge based schemes), c sampling model.
This diagram applies either to continuous-space or discrete (lattice) models. For detailed review of
these variants and coarse-graining levels, refer to [64]
Protein Dynamics Simulations Using Coarse-Grained Models 63
One of the major future tasks of CG dynamics studies is the design of methods
for the reliable and efficient transition between simplified and atomic resolution
levels [132], as the element of multiscale methodologies. The idea of multiscale
modeling is efficient computation on a CG scale to send it to the detailed all-atom
simulation, or vice versa [68]. Obviously, the CG model used in the multiscale
methods must produce physically realistic coarse-grain protein structures. Even if
it is fulfilled, it is a non-trivial problem to add all-atom details to CG structures to
produce physically realistic all-atom counterparts [63]. It has been demonstrated in
applications to protein folding CG trajectories that reliable and efficient transitions
between CG and atomic resolution are feasible [46, 65]. Finally, it is accepted that one
of the most promising future directions is to develop approaches that can minimize
the difference between the simplified and atomic models [58].
64 S. Kmiecik et al.
Fig. 2 A simple chaperonin model used in protein folding studies with the CG CABS model [67].
The chaperonin cage was simulated as a sphere with a thick wall of variable hydrophobicity. In
the basic state the walls are inert for 9/10 of the simulation time. Periodically (see the simulation
timescale above in the Figure) the walls became hydrophobic, attracting the encapsulated protein
chain with a strength typical for hydrophobic interactions within folded proteins (according to the
CABS force field)
The CABS simulation results showed that periodic distortion of the simulated
proteins by hydrophobic chaperonin interactions promotes rapid folding and leads to
a decrease in folding temperature. According to the observed mechanism of folding
promotion, chaperonin prevents kinetically trapped conformations. This is contrary
to the so far accepted interpretation of the IAM model suggesting not the prevention
but rather the unfolding action from already trapped conformations. Interestingly,
the analysis of the folding trajectories enables general observation of chaperonin-
induced modulation of the observed folding mechanisms from nucleation–condensa-
tion to more framework-like. All these observations are in good agreement with the
experimental data on chaperonin-bound protein substrates, generally indicating an
ensemble of compact and locally expanded states lacking stable tertiary interactions.
It is worth to mention that theoretical studies of chaperonin-mediated folding may
have important conceptual applications in other fields [102] e.g. in the development
of structure-refinement software or in the construction of chaperonin-like molecules
designed for specific biotech and medical applications. We have to emphasize that
we are only at the beginning of the understanding of how chaperonins work. As
pointed out by Lucent et al. [102], so far most theoretical and experimental research
focused on GroEL, a specific prokaryotic chaperonin. Since chaperonins exhibit
apparently different modes of action in prokaryotic and eukaryotic organisms, the
investigation of these differences may be essential for the complete understanding of
underlying mechanisms and protein folding itself. This challenging issue has already
been addressed by a very simple lattice model [50].
66 S. Kmiecik et al.
Fig. 3 a Force-extension profile obtained by stretching of Ig8 titin fragment (adapted from Ref.
[124]). Each peak corresponds to unfolding of a single domain with maximum resisting force to
stretching, Fmax . Smooth curves are fits to the wormlike chain model. b Conceptual plot for the free
energy landscape of protein unfolding without (red) and under (blue) the external force. An applied
force lowers unfolding barrier by Fxu increasing exponentially the unfolding rate constant (ku ),
but decreasing exponentially the folding rate constant(kf ). xu is the distance between native and
transition state and xf is the distance between transition and denatured state. c Distance to transition
state, xf in two different regimes for titin protein (pdb ID 1tit). The crossover from the low- to middle-
force regimes occurs at f switch ~5 pN. d Cartoon representation of native state conformation of
I27 domain (PDB code: 1tit) with eight β-strands labeled: A(4–8), A (11–15), B(18–25), C(32–36),
D(47–52), E(55–61), F(69–75), G(78–88). Importance of HBs between beta-strands marked by red
color is described in the text
HBs between beta strands A and B (Fig. 3d) [101]. Mechanical unfolding of a num-
ber of proteins has been also probed by all-atom simulations with implicit solvent
[115]. The major shortcoming of all-atom MD simulations is that the pulling speed is
about 6 orders of magnitude higher than that used in AFM experiments. It is unclear
if in silico results obtained in such extreme conditions are meaningful to understand
experiments (strong forces may considerably disturb FEL), although recent studies
claimed that unfolding pathways are not sensitive to pulling forces and speeds [90,
97].
The time scale discrepancy (and the related discrepancy in stretching forces
required to induce unfolding) between AFM experiments and simulation can be
reduced by the usage of CG models. Nowadays GPU technique allows reaching
experimental pulling speeds by CG Go models [176]. CG Go models have been suc-
cessfully used by many groups to study mechanical properties of proteins [2, 9, 24,
170]. Despite their simplicity, in many cases they correctly capture unfolding path-
68 S. Kmiecik et al.
ways, FEL and mechanical stability of proteins. For example, a complete description
of mechanical unfolding pathways of single and multidomain Ubiquitin at the level
of secondary structure was obtained [95]. It was shown that thermal and mechanical
pathways for fibronectin type III and I27 domain are different [115]. This is because
the thermal fluctuations have more global effect on entire protein and unfold the most
unstable part of protein while the force should propagate protein unwinding from
the points to which force is applied. Having used Go-model, mechanical unfolding
pathways of protein DDFLN4 [94] and two slipknotted proteins (pdb codes—1e2i
and 1p6x) [150], were shown to depend on the pulling speed.
The CG Go-models may be suitable for deciphering the FEL (Fig. 3b). Consider-
ing FEL as a function of end-to-end distance, one can use Bell-approximation [7] to
estimate the distance between the native state (NS) and transition state (TS), xu , using
either the dependences of unfolding rates on the external force [7] or the dependence
of force on pulling speed [31]. The distance between the NS and TS xu (Fig. 3b),
estimated by the C-alpha Go-like model [25], was in excellent agreement with exper-
imental results [15, 76]. Furthermore, Li showed that xu (Fig. 3b) is defined by the
secondary structure content and approximately depends linearly on the contact order
[83, 92], thus the helix proteins have larger distances from the native state to the
transition state than beta proteins. It should be noted that the phenomenological Bell
theory is based on the assumption that xu is not moving under stretching. Recently,
applying Kramers theory [81] and assuming that the distance between NS and TS is
force-dependent, Dudko et al. [29] have gone beyond the Bell assumption. With the
help of proposed non-linear kinetic theory [29] one can estimate not only intrinsic
rate coefficient, ku , and the distance between NS and TS, xu , but also the unfolding
barrier, G ++
u (Fig. 3d).
One of the most successful application areas of CG Go models were estimations
of the mechanical stability of proteins [13, 92, 144, 152]. It has been found that helix
proteins are less stable than beta proteins and unfolding force Fmax may be expressed
as a linearly function of the contact order [119]. This is understandable because
beta proteins have a larger fraction of long-range residue-residue contacts leading to
higher resistance to external perturbation [83]. Using the Go models, Cieplak et al.
computed Fmax for thousands of proteins [144, 152] and have found that the mechan-
ical clamp (resistance-determining region of a protein) of the top strongest proteins
is not only consisted of hydrogen bonded β-strands being sheared during the pulling.
Structures tied by disulfide bonds were found to contribute to significantly larger
mechanical stability than shear-based mechanical clamps. Novel mechanical clamps
were identified and classified [143, 144]. Later on, the high resistance to stretching
of top 13 proteins (cysteine-slipknots) was confirmed by all-atom steered molecular
dynamics (SMD) simulations [116] and observed experimentally [163]. Recently CG
model was successfully applied even for proteins with non-trivial structures [150,
151], which was confirmed by experiment [45]. For a more detailed review of protein
mechanostability, see Chap. 10 of this book entitled “Mechanostability of proteins
and virus capsids.”
The success of CG Go models is possibly associated with the fact that the pulling
starts from the native state and that these models are based on topology of the native
Protein Dynamics Simulations Using Coarse-Grained Models 69
state. However, in particular cases one has to be careful with predictions emerging
from these simple models. In the case of DDFLN4 protein, the Go model did not give
the peak in the force-extension curve observed in the experiment. It was shown that
the occurrence of that peak is due to non-native interactions neglected in Go model
[77]. Thus, in certain cases the non-native interactions are important because non-
native contacts appear in intermediate state during the unfolding process. To avoid
possible artifacts associated with neglecting non-native interactions, CG models with
more realistic potentials may be used. Using the CABS model [78] it was shown that
non-native interactions have led to an additional intermediate state along mechanical
unfolding pathway, which was previously detected in the AFM experiments [134] and
in explicit-solvent all-atom simulation, but not in CG Go-model. Another example
of such case is the force-induced intermediate of Ubiquitin, which was neglected in
CG Go-model simulations [95], but detected by the Lund force-field [49].
Recently, Steered Molecular Dynamics (SMD) simulations have become a pow-
erful tool to assess the strength of the molecular interactions. The idea behind using
SMD simulations is that the mechanical stability, or rupture force (measured as a
peak in force-extension profile), required to unbind a ligand from a receptor is related
to the strength of the interaction between them [8, 26, 38, 39, 74, 75, 96, 128]. Over
the last 5 years, SMD method has been implemented in many CG protein simula-
tion packages including CABS [78], UNRES [142], AWSEM [41] and many others.
With the ability of simplified models to sample longer timescales, when compared
to atomistic models, application of CG models is a promising direction for studies
of mechanical stability of large biomolecular complexes.
SMD simulations have been used for a wide variety of applications in the studies
of biological processes and various biomolecules [90]. Going forward, SMD tech-
niques can be used to study cell functions, where proteins are exposed to their native
(crowded) environment [167]. One of the recent applications of SMD is to understand
the mechanism of virus binding to its host cell [141]. Another issue of great interest is
the application of SMD for studying the response of protein to periodic forces [154].
It is also worth to mention some important problems for further studies. For instance,
it remains unclear if the distance between the native and transition states (distance xu
(Fig. 3c) followed from the non-linear theory [29]) depends linearly on contact order
(as it was obtained in the linear Bell approximation). Generally, the deciphering FEL
is done by its projection onto one-dimensional space, usually end-to-end distance.
However, the validity of such approximate mapping is not always true [10], thus this
issue requires further investigation.
In addition to mechanical unfolding studies, CG models can be used to charac-
terize the refolding kinetics of proteins in a presence of external force [80, 131].
Many proteins in human body that are being subjected to a wide range of mechanical
forces face challenges to reach their native states. The question of how an external
force affects the protein refolding remains to be clarified. Single-molecular manipu-
lation experiments have demonstrated that the refolding of protein under small force
can be probed by force-clamp technique [32]. If the quenched force is smaller than
equilibrium critical force separating folded and unfolded states, protein refolds into
native state. Typical time scales for protein folding in the absence of applied external
70 S. Kmiecik et al.
force varies from microseconds to hours [82]. Note that underlying dynamics of the
protein refolding process under force can occur on timescales that are a few orders
of magnitude slower compared to conventional folding process. This is because in
the presence of external force, f , the refolding times exponentially increase with f
[7]. Thus, only CG models can be effectively used to study refolding process under
external load. Using CG Go-model, Kouza and coworkers [80] studied the impact of
the external force from 0 to 14 pN on protein refolding pathways of several proteins.
It was found that there are two force regimes for refolding of titin with different
distance to transition state, x f (Figs. 1b and 1c). In the first or low force regime,
the refolding pathways were in close agreement with the thermal ones. However, the
simulation values of x f obtained in this force range did not agree with the experimen-
tal ones. The results obtained for x f in the second force regime are in good agreement
with experiments (Fig. 1c) [80]. This implies that force-clamp experiments are being
carried out in the second force regime (Fig. 1c) where the pathways are not the same
as thermal ones. Only if the quench force is smaller than f switch , the thermal folding
pathway can be probed by force-clamp experiments. This result calls for a caution
in interpreting results of single-molecular manipulation experiments.
Fig. 4 Multiscale procedure for the description of binding between the Retinoid X Receptor (RXR)
and the peptide (TRAP220) cofactor using CABS CG dynamics [84]. The procedure starts from
the generation of input data for a receptor and a protein cofactor. In the next step, the receptor and
the cofactor are put together in many random configurations, subsequently subjected to CABS CG
simulation. Various types of data stored along the procedure are shown in bold frames, while the
applied computational methods in thin frames
72 S. Kmiecik et al.
myosin motor and an insight into its action. In this case, three levels of coarse-graining
were introduced: chains of secondary structure elements, domains and molecules.
The movement of each component was simulated by Brownian Dynamics. A more
detailed, physicochemical view of the myosin-actin complex was recently obtained
with a CG simulation model [114] in which each bead represented a single amino
acid. In this case conclusions regarded also more general thermodynamic aspects of
protein-protein association.
Another popular and important protein-protein dynamics issue, in which diverse
levels of coarse-graining are applied, is protein aggregation. All-atom MD simu-
lations in explicit solvent can provide insights about early stages of aggregation
process of short peptides derived from full-length amyloidogenic proteins [6, 73,
79, 111, 158]. Larger complexes and longer timescales can be accessed using CG
models. In the simplest CG models, a single unit (cuboid [175] or tube [4]) represents
the whole peptide, while in the most detailed models each amino acid consists of
a few pseudo-atoms [20, 103, 107, 110, 125, 168]. Many practical applications of
CG models have been outlined in recent reviews [64, 108]. Dramatic progress has
been recently achieved in the CG modeling of large polyprotein complexes (made
up of many copies of the same or different proteins) [130]. In their review, Saunders
and Voth present two general classes of CG methods: mapping methods that transfer
information from one level to another only during parameterization and bridging
methods that connect different scales of representation during simulation.
The major challenge in modeling of protein interaction dynamics seems to be
as that outlined in the reviews of the performance of protein docking techniques
[162, 174, 22]. Namely, it is the treatment of substantial conformational changes.
CG simulation models offer perhaps the most prospective means for modeling of
extensive backbone dynamics in the nearest future.
Membrane proteins play an important role in cell biology. They are responsible for
signaling, molecular transport across lipid bilayers, maintaining cell structural stabil-
ity and control of cell-cell interactions. Although 20 to 30% of all ORFs are predicted
to encode membrane proteins, less than 1% of all known 3D protein structures account
for membrane proteins [112]. Moreover, those proteins are embedded in different
types of lipid bilayers. The interaction with lipids is essential for both protein function
(e.g. can affect integral membrane protein activity [89]) and membrane properties
such as hydrophobic thickness or lipid composition [48]. The complex nature of
membrane-protein systems makes CG Molecular Dynamics (CG-MD) simulations
a valuable approach to the investigation of dynamics, structure-function relationship
and stability of membrane–protein systems [64]. One of the best performing, and
probably the most recognized, CG-MD approaches is based on the MARTINI force
field [104] that uses four-to-one atom mapping. Only four main types of interaction
sites are defined: polar (P), non-polar (N), apolar (C), and charged (Q). Each particle
Protein Dynamics Simulations Using Coarse-Grained Models 73
Fig. 5 Final structures from self-assembly CG-MD simulations, starting from a protein surrounded
by randomly positioned water and lipid molecules [129]. The figure presents the results of four sim-
ulations: A—cytochrome bc1 complex, B—putative metal-chelating ABC transporter, C—quinol-
fumarate reductase and D—Mg2+ transporter. Water, ion and DPPC lipid tail particles are excluded
for clarity. The backbone trace of the protein is shown in blue. The particle colors are: phosphate in
DPPC lipid headgroups: red; glycerol linker in the lipid: yellow; choline in PC headgroups: blue.
Picture created based on materials available in the CG Database [129]
74 S. Kmiecik et al.
Fig. 6 Membrane insertion and folding of 1A91 protein observed in CABS-membrane ab initio
simulations [120]. a example simulation snapshots illustrating the insertion and folding mechanism,
b evolution of the RMSD values (reflecting similarity to the experimental structure) vs simulation
time, c comparison of the highest accuracy model obtained in the simulations (RMSD 2.2 Å)
with the experimental structure (colored in green)
Over the last decades, the thermodynamically stable conformation of a protein was
usually treated as the state responsible for biological functions. Nevertheless, at the
end of the 20th century the research community realized that intrinsically disordered
proteins (IDP) or proteins with intrinsically disordered regions (IDR) are ubiquitous
in nature and they can retain their functionality [40, 106, 160, 161, 172]. Confor-
mational studies of these proteins are experimentally extremely challenging [30],
particularly due to their large structural heterogeneity and aggregation tendency.
With the boom of IDP studies, computer simulation models have emerged as use-
ful tools for the description of IDP conformational ensembles [17, 122, 123]. As
76 S. Kmiecik et al.
the effective search of the conformational space is the major advantage of the CG
models, they can be used as methods of choice for possibly the broadest sampling
of conformational disorder.
Owing to their flexibility, disordered proteins have increased tendency of forming
protein-protein complexes. During binding, as compared to folded structures, they
can form a far larger number of interaction contacts. This theory is called the “fly-
casting mechanism” and it was illustrated by Shoemaker et al. [140] who investigated
the kinetics of IDP binding to the receptor using their free energy functional based
on a simplified scheme of amino acid contacts.
Nevertheless, CG simulations of pKID-KIX complexes [47] indicated that the
increased binding affinity can be caused not only by the greater capture radius of
IDPs. The kinetic analysis of this process was based on simulations using the CG Go
model with the continuum C-alpha chain representation and compared with available
experimental data for various ordered and disordered complexes. Interestingly, it was
found that the coupling of folding with binding of IDPs leads to a significant reduction
in the binding free-energy barrier. This work also discusses roles of other structural
factors important for this particular association.
Abeln and Frenkel analyzed other aspects of how intrinsically disordered regions
(IDRs) can influence the protein association process using Monte Carlo (MC) simula-
tion on cubic lattice with C-alpha representation [1]. The simulation results provided
intriguing insights into the effect of IDRs on protein structure. The authors indicated
that proteins with hydrophobic binding motifs without neighboring IDRs tend to
aggregate and consequently form amyloids.
The ability to fold upon binding of some IDPs has been extensively studied using
CG simulation models [27, 159, 165, 166, 169]. A multiscale model was used to
generate the pathway of IDP folding induced by binding to its receptor [169]. The
method included a step of CG simulation with C-alpha representation and optimal
path calculation at an atomic level. The binding process was simulated as fully
flexible and the role of non-native interactions was stressed. In other studies [165,
166] the authors characterized an ensemble of transition states of p27Kip1 protein
binding to a rigid structure of a cyclin A—Cdk2 complex. In this case a knowledge-
based potential was utilized to investigate some aspects of the folding mechanism of
this protein. Intrinsically disordered proteins frequently serve as flexible linkers of
protein domains. CG modeling of such systems was reviewed by Zhou [177].
Similarly to protein structure prediction, IDP modeling approaches can be divided
into de novo methods (based on the prediction power of the method) and those uti-
lizing sparse experimental data. The CG C-alpha model of Norgaard et al. [113] was
designed to simulate disordered proteins and parametrized using data from nuclear
magnetic resonance spin-labeling experiments on the 131 fragment of Staphy-
lococcal nuclease. Importantly, such an approach can be used by utilizing data from
MD trajectories or other experiments.
Interestingly, 2D lattice models have been recently used to explain the worse
performance of sequence-based disorder prediction methods for smaller proteins (or
segments) than for larger ones. Such a simple simulation model enabled a novel
Protein Dynamics Simulations Using Coarse-Grained Models 77
insight into the basic determinants of protein disorder: amino acid composition and
chain length [153].
As shown above, CG models, even very simplistic ones, provided many important
facts for the description of IDP and IDR dynamics. However, the potential of CG
modeling does not seem to be sufficiently exploited in the field [64], perhaps because
of the relatively recent interest in the area.
Acknowledgements We thank Dr. Joanna Sulkowska for critical reading of the section “Mechan-
ical Unfolding and Refolding of Proteins and their Complexes” of the manuscript. We acknowl-
edge partial support from: Foundation for Polish Science TEAM project (TEAM/2011-7/6) co-
financed by the European Regional Development Fund operated within the Innovative Economy
Operational Program; Polish National Science Center (NCN) on the basis of a decision DEC-
2011/01/D/NZ2/05314; Polish National Science Center (NCN) Grant No. NN301071140, Polish
Ministry of Science and Higher Education Grant No. IP2011024371, Polish National Science Center
(NCN) Grant (MAESTRO 2014/14/A/ST6/00088). M. Kouza acknowledges the Polish Ministry
78 S. Kmiecik et al.
of Science and Higher Education for financial support through “Mobilnosc Plus” Program No.
1287/MOB/IV/2015/0.
References
1. Abeln, S., Frenkel, D.: Disordered flanks prevent peptide aggregation. PLoS Comput. Biol.
4, e1000241 (2008). https://doi.org/10.1371/journal.pcbi.1000241
2. Arad-Haase, G., et al.: Mechanical unfolding of acylphosphatase studied by single-molecule
force spectroscopy and MD simulations. Biophys. J. 99, 238–247 (2010). https://doi.org/10.
1016/j.bpj.2010.04.004
3. Arkhipov, A., Freddolino, P.L., Schulten, K.: Stability and dynamics of virus capsids described
by coarse-grained modeling. Structure 14, 1767–1777 (2006). https://doi.org/10.1016/j.str.
2006.10.003
4. Auer, S., Meersman, F., Dobson, C.M., Vendruscolo, M.: A generic mechanism of emergence
of amyloid protofilaments from disordered oligomeric aggregates. PLoS Comput. Biol. 4,
e1000222 (2008). https://doi.org/10.1371/journal.pcbi.1000222
5. Baumketner, A., Jewett, A., Shea, J.E.: Effects of confinement in chaperonin assisted protein
folding: rate enhancement by decreasing the roughness of the folding energy landscape. J.
Mol. Biol. 332, 701–713 (2003). https://doi.org/10.1016/S0022-2836(03)00929-X
6. Baumketner, A., Shea, J.E.: The structure of the Alzheimer amyloid beta 10–35 peptide probed
through replica-exchange molecular dynamics simulations in explicit solvent. J. Mol. Biol.
366, 275–285 (2007)
7. Bell, G.I.: Models for the specific adhesion of cells to cells. Science 200, 618–627 (1978)
8. Bernetti, M., Cavalli, A., Mollica, L.: Protein-ligand (un)binding kinetics as a new paradigm
for drug discovery at the crossroad between experiments and modelling. Medchemcomm 8,
534–550 (2017). https://doi.org/10.1039/c6md00581k
9. Best, R.B., Hummer, G.: Protein folding kinetics under force from molecular simulation. J.
Am. Chem. Soc. 130, 3706–3707 (2008). https://doi.org/10.1021/ja0762691
10. Best, R.B., Paci, E., Hummer, G., Dudko, O.K.: Pulling direction as a reaction coordinate
for the mechanical unfolding of single molecules. J. Phys. Chem. B 112, 5968–5976 (2008).
https://doi.org/10.1021/Jp075955j
11. Betancourt, M.R., Thirumalai, D.: Exploring the kinetic requirements for enhancement of
protein folding rates in the GroEL cavity. J. Mol. Biol. 287, 627–644 (1999). https://doi.org/
10.1006/jmbi.1999.2591
12. Bindschadler, M.: Modeling actin dynamics. Wiley Interdisciplinary Rev. Syst. Biol. Med. 2,
481–488 (2010). https://doi.org/10.1002/wsbm.62
13. Brockwell, D.J., et al.: Pulling geometry defines the mechanical resistance of a beta-sheet
protein (vol 10, pg 731, 2003). Nat. Struct. Biol. 10, 872–872 (2003). https://doi.org/10.1038/
nsb1003-872b
14. Bustamante, C., Chemla, Y.R., Forde, N.R., Izhaky, D.: Mechanical processes in biochem-
istry. Annu. Rev. Biochem. 73, 705–748 (2004). https://doi.org/10.1146/annurev.biochem.72.
121801.161542
15. Caraglio, M., Imparato, A., Pelizzola, A.: Pathways of mechanical unfolding of FnIII(10): low
force intermediates. J. Chem. Phys. 133, 065101 (2010). https://doi.org/10.1063/1.3464476
16. Carrion-Vazquez, M., Li, H., Lu, H., Marszalek, P.E., Oberhauser, A.F., Fernandez, J.M.: The
mechanical stability of ubiquitin is linkage dependent. Nat. struct. Biol. 10, 738–743 (2003).
https://doi.org/10.1038/nsb965
17. Chan, H.S., Zhang, Z., Wallin, S., Liu, Z.: Cooperativity, local-nonlocal coupling, and nonna-
tive interactions: principles of protein folding from coarse-grained models. Annu. Rev. Phys.
Chem. 62, 301–326 (2011). https://doi.org/10.1146/annurev-physchem-032210-103405
Protein Dynamics Simulations Using Coarse-Grained Models 79
18. Chang, S., Hu, J.P., Lin, P.Y., Jiao, X., Tian, X.H.: Substrate recognition and transport behavior
analyses of amino acid antiporter with coarse-grained models. Mol. BioSyst. 6, 2430–2438
(2010). https://doi.org/10.1039/c005266c
19. Chetwynd, A.P., Scott, K.A., Mokrab, Y., Sansom, M.S.: CGDB: a database of membrane
protein/lipid interactions by coarse-grained molecular dynamics simulations. Mol. Membr.
Biol. 25, 662–669 (2008). https://doi.org/10.1080/09687680802446534
20. Chiricotto, M., Tran, T.T., Nguyen, P.H., Melchionna, S., Sterpone, F., Derreumaux, P.:
Coarse-grained and all-atom simulations towards the early and late steps of amyloid fibril
formation. Isr. J. Chem. 57, 564–573 (2017)
21. Chu, J.W., Voth, G.A.: Coarse-grained modeling of the actin filament derived from atomistic-
scale simulations. Biophys. J. 90, 1572–1582 (2006). https://doi.org/10.1529/biophysj.105.
073924
22. Ciemny, M.P., Kurcinski, M., Kamel, K., Kolinski, A., Nawsad, A., Schueler-Furman, O.,
Kmiecik, S.: Protein–peptide docking: opportunities and challenges. Drug Discov. Today. 23,
1530–1537 (2018)
23. Ciemny, M.P., Debinski, A., Paczkowska, M., Kolinski, A., Kurcinski, M., Kmiecik, S.:
Protein-peptide molecular docking with large-scale conformational changes: the p 53-MDM2
interaction. Sci. Rep. 6, 37532 (2016). https://doi.org/10.1038/srep37532
24. Cieplak, M., Hoang, T.X., Robbins, M.O.: Folding and stretching in a go-like model of titin.
Proteins 49, 114–124 (2002). https://doi.org/10.1002/prot.10087
25. Clementi, C., Nymeyer, H., Onuchic, J.N.: Topological and energetic factors: what determines
the structural details of the transition state ensemble and “en-route” intermediates for protein
folding? An investigation for small globular proteins. J. Mol. Biol. 298, 937–953 (2000).
https://doi.org/10.1006/jmbi.2000.3693
26. Colizzi, F., Perozzo, R., Scapozza, L., Recanatini, M., Cavalli, A.: Single-molecule pulling
simulations can discern active from inactive enzyme inhibitors. J. Am. Chem. Soc. 132,
7361–7371 (2010). https://doi.org/10.1021/ja100259r
27. De Sancho, D., Best, R.B.: Modulation of an IDP binding mechanism and rates by helix
propensity and non-native interactions: association of HIF1alpha with CBP. Mol. BioSyst. 8,
256–267 (2012). https://doi.org/10.1039/c1mb05252g
28. Di Fenza, A., Rocchia, W., Tozzini, V.: Complexes of HIV-1 integrase with HAT proteins:
multiscale models, dynamics, and hypotheses on allosteric sites of inhibition. Proteins: Struct.
Funct. Bioinf. 76, 946–958 (2009). https://doi.org/10.1002/prot.22399
29. Dudko, O.K., Hummer, G., Szabo, A.: Intrinsic rates and activation free energies from single-
molecule pulling experiments. Phys. Rev. Lett. 96, 108101 (2006). https://doi.org/10.1103/
PhysRevLett.96.108101
30. Eliezer, D.: Biophysical characterization of intrinsically disordered proteins. Curr. Opin.
Struct. Biol. 19, 23–30 (2009). https://doi.org/10.1016/j.sbi.2008.12.004
31. Evans, E., Ritchie, K.: Dynamic strength of molecular adhesion bonds. Biophys. J. 72,
1541–1555 (1997). https://doi.org/10.1016/S0006-3495(97)78802-7
32. Fernandez, J.M., Li, H.B.: Force-clamp spectroscopy monitors the folding trajectory of a
single protein. Science 303, 1674–1678 (2004). https://doi.org/10.1126/science.1092497
33. Fletcher, D.A., Mullins, R.D.: Cell mechanics and the cytoskeleton. Nature 463, 485–492
(2010). https://doi.org/10.1038/nature08908
34. Florin, E.L., Moy, V.T., Gaub, H.E.: Adhesion forces between individual ligand-receptor pairs.
Science 264, 415–417 (1994). https://doi.org/10.1126/science.8153628
35. Fowler, S.B., et al.: Mechanical unfolding of a titin Ig domain: structure of unfolding
intermediate revealed by combining AFM, molecular dynamics simulations, NMR and
protein engineering. J. Mol. Biol. 322, 841–849 (2002). https://doi.org/10.1016/S0022-
2836(02)00805-7
36. Frembgen-Kesner, T., Elcock, A.H.: Absolute protein-protein association rate constants from
flexible, coarse-grained brownian dynamics simulations: the role of intermolecular hydrody-
namic interactions in Barnase-Barstar association. Biophys. J. 99, L75–L77 (2010). https://
doi.org/10.1016/j.bpj.2010.09.006
80 S. Kmiecik et al.
37. Granzier, H.L., Labeit, S.: The giant protein titin: a major player in myocardial mechan-
ics, signaling, and disease. Circ. Res. 94, 284–295 (2004). https://doi.org/10.1161/01.res.
0000117769.88862.f8
38. Grubmuller, H., Heymann, B., Tavan, P.: Ligand binding: Molecular mechanics calculation of
the streptavidin biotin rupture force. Science 271, 997–999 (1996). https://doi.org/10.1126/
science.271.5251.997
39. Gu, J.F., Li, H.X., Wang, X.C.: A self-adaptive steered molecular dynamics method based
on minimization of stretching force reveals the binding affinity of protein-ligand complexes.
Molecules 20, 19236–19251 (2015). https://doi.org/10.3390/molecules201019236
40. Habchi, J., Tompa, P., Longhi, S., Uversky, V.N.: Introducing protein intrinsic disorder. Chem.
Rev. 114, 6561–6588 (2014). https://doi.org/10.1021/cr400514h
41. Habibi, M., Rottler, J., Plotkin, S.S.: As simple as possible but not simpler: on the reliability
of protein coarse-grained models. Biophys. J. 112, 176a–176a (2017)
42. Hall, B.A., Chetwynd, A.P., Sansom, M.S.: Exploring peptide-membrane interactions with
coarse-grained MD simulations. Biophys. J. 100, 1940–1948 (2011). https://doi.org/10.1016/
j.bpj.2011.02.041
43. Hall, B.A., Sansom, M.S.P.: Coarse-grained MD simulations and protein–protein interactions:
the Cohesin–Dockerin system. J. Chem. Theory Comput. 5, 2465–2471 (2009). https://doi.
org/10.1021/ct900140w
44. Hanson, P.I., Whiteheart, S.W.: AAA + proteins: have engine, will work. Nat. Rev. Mol. Cell
Biol. 6, 519–529 (2005). https://doi.org/10.1038/nrm1684
45. He, C., Genchev, G.Z., Lu, H., Li, H.: Mechanically untying a protein slipknot: multiple
pathways revealed by force spectroscopy and steered molecular dynamics simulations. J.
Am. Chem. Soc. 134, 10428–10435 (2012). https://doi.org/10.1021/ja3003205
46. Heath, A.P., Kavraki, L.E., Clementi, C.: From coarse-grain to all-atom: toward multiscale
analysis of protein landscapes. Proteins 68, 646–661 (2007). https://doi.org/10.1002/prot.
21371
47. Huang, Y., Liu, Z.: Kinetic advantage of intrinsically disordered proteins in coupled folding-
binding process: a critical assessment of the “Fly-Casting” mechanism. J. Mol. Biol. 393,
1143–1159 (2009). https://doi.org/10.1016/j.jmb.2009.09.010
48. Hunte, C.: Specific protein-lipid interactions in membrane proteins. Biochem. Soc. Trans. 33,
938–942 (2005). https://doi.org/10.1042/BST20050938
49. Irback, A., Mitternacht, S., Mohanty, S.: Dissecting the mechanical unfolding of ubiqui-
tin. Proc Natl Acad Sci U S A 102, 13427–13432 (2005). https://doi.org/10.1073/pnas.
0501581102
50. Jacob, E., Horovitz, A., Unger, R.: Different mechanistic requirements for prokaryotic and
eukaryotic chaperonins: a lattice study. Bioinformatics 23, i240–i248 (2007). https://doi.org/
10.1093/bioinformatics/btm180
51. Jamroz, M., Kolinski, A., Kmiecik, S.: CABS-flex: server for fast simulation of protein struc-
ture fluctuations. Nucleic Acids Res. 41, W427–W431 (2013). https://doi.org/10.1093/nar/
gkt332
52. Jamroz, M., Kolinski, A., Kmiecik, S.: CABS-flex predictions of protein flexibility com-
pared with NMR ensembles. Bioinformatics 30, 2150–2154 (2014). https://doi.org/10.1093/
bioinformatics/btu184
53. Jamroz, M., Orozco, M., Kolinski, A., Kmiecik, S.: Consistent view of protein fluctuations
from all-atom molecular dynamics and coarse-grained dynamics with knowledge-based force-
field. J. Chem. Theor. Comput. 9, 119–125 (2013). https://doi.org/10.1021/ct300854w
54. Jewett, A.I., Baumketner, A., Shea, J.E.: Accelerated folding in the weak hydrophobic envi-
ronment of a chaperonin cavity: creation of an alternate fast folding pathway. Proc Natl Acad
Sci U S A 101, 13192–13197 (2004). https://doi.org/10.1073/pnas.0400720101
55. Jewett, A.I., Shea, J.E.: Reconciling theories of chaperonin accelerated folding with experi-
mental evidence. Cell. Mol. Life Sci. 67, 255–276 (2009)
Protein Dynamics Simulations Using Coarse-Grained Models 81
56. Jung, J., Mori, T., Kobayashi, C., Matsunaga, Y., Yoda, T., Feig, M., Sugita, Y.: GENESIS: a
hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algo-
rithms for biomolecular and cellular simulations. Wiley Interdisciplinary Rev. Comput. Mol.
Sci. 5, 310–323 (2015). https://doi.org/10.1002/wcms.1220
57. Kalli, A.C., Hall, B.A., Campbell, I.D., Sansom, M.S.: A helix heterodimer in a lipid bilayer:
prediction of the structure of an integrin transmembrane domain via multiscale simulations.
Structure 19, 1477–1484 (2011). https://doi.org/10.1016/j.str.2011.07.014
58. Kamerlin, S.C., Vicatos, S., Dryga, A., Warshel, A.: Coarse-grained (multiscale) simulations
in studies of biophysical and chemical systems. Annu. Rev. Phys. Chem. 62, 41–64 (2011).
https://doi.org/10.1146/annurev-physchem-032210-103335
59. Kar, P., Gopal, S.M., Cheng, Y.M., Panahi, A., Feig, M.: Transferring the PRIMO coarse-
grained force field to the membrane environment: simulations of membrane proteins and
helix-helix association. J. Chem. Theor. Comput. 10, 3459–3472 (2014). https://doi.org/10.
1021/ct500443v
60. Kim, Y.C., Hummer, G.: Coarse-grained models for simulations of multiprotein complexes:
application to ubiquitin binding. J. Mol. Biol. 375, 1416–1433 (2008). https://doi.org/10.
1016/j.jmb.2007.11.063
61. Kim, B.L., Schafer, N.P., Wolynes, P.G.: Predictive energy landscapes for folding alpha-helical
transmembrane proteins. Proc. Natl. Acad. Sci. U S A 111, 11031–11036 (2014). https://doi.
org/10.1073/pnas.1410529111. 1410529111 [pii]
62. Kim, Y.C., Tang, C., Clore, G.M., Hummer, G.: Replica exchange simulations of tran-
sient encounter complexes in protein-protein association. Proc. Natl. Acad. Sci. U S A 105,
12855–12860 (2008). https://doi.org/10.1073/pnas.0802460105
63. Kmiecik, S., Gront, D., Kolinski, A.: Towards the high-resolution protein structure prediction.
Fast refinement of reduced models with all-atom force field. BMC Struct. Biol. 7, 43 (2007).
https://doi.org/10.1186/1472-6807-7-43
64. Kmiecik, S., Gront, D., Kolinski, M., Wieteska, L., Dawid, A.E., Kolinski, A.: Coarse-grained
protein models and their applications. Chem. Rev. 116, 7898–7936 (2016). https://doi.org/10.
1021/acs.chemrev.6b00163
65. Kmiecik, S., Gront, D., Kouza, M., Kolinski, A.: From coarse-grained to atomic-level char-
acterization of protein dynamics: transition state for the folding of B domain of protein A. J.
Phys. Chem. B 116, 7026–7032 (2012). https://doi.org/10.1021/jp301720w
66. Kmiecik, S., Kolinski, A.: Folding pathway of the b1 domain of protein G explored by multi-
scale modeling. Biophys. J. 94, 726–736 (2008). https://doi.org/10.1529/biophysj.107.116095
67. Kmiecik, S., Kolinski, A.: Simulation of chaperonin effect on protein folding: a shift from
nucleation-condensation to framework mechanism. J. Am. Chem. Soc. 133, 10283–10289
(2011). https://doi.org/10.1021/ja203275f
68. Kmiecik, S., Jamroz, M., Kolinski, A.: Multiscale approach to protein folding dynamics. In:
Kolinski, A. (ed.) Multiscale Approaches to Protein Modeling, pp. 281–293. Springer, New
York (2011). https://doi.org/10.1007/978-1-4419-6889-0_12
69. Knepp, A.M., Periole, X., Marrink, S.J., Sakmar, T.P., Huber, T.: Rhodopsin forms a dimer
with cytoplasmic helix 8 contacts in native membranes. Biochemistry 51, 1819–1821 (2012).
https://doi.org/10.1021/bi3001598
70. Koga, N., Takada, S.: Folding-based molecular simulations reveal mechanisms of the rotary
motor F1-ATPase. Proc. Natl. Acad. Sci. U S A 103, 5367–5372 (2006). https://doi.org/10.
1073/pnas.0509642103
71. Kolinski, A., Skolnick, J.: Reduced models of proteins and their applications. Polymer 45,
511–524 (2004). https://doi.org/10.1016/j.polymer.2003.10.064
72. Kolinski, A.: Protein modeling and structure prediction with a reduced representation. Acta
Biochim. Pol. 51, 349–371 (2004). doi: 035001349
73. Kouza, M., Banerji, A., Kolinski, A., Buhimschi, I.A., Kloczkowski, A.: Oligomerization of
FVFLM peptides and their ability to inhibit beta amyloid peptides aggregation: consideration
as a possible model. Phys. Chem. Chem. Phys. 19, 2990–2999 (2017)
82 S. Kmiecik et al.
74. Kouza, M., Banerji, A., Kolinski, A., Buhimschi, I.A., Kloczkowski, A.: Role of Resultant
Dipole Moment in Mechanical Dissociation of Biological Complexes. Molecules 23, 1995
(2018)
75. Kouza, M., Co, N.T., Li, M.S., Kmiecik, S., Kolinski, A., Kloczkowski, A., Buhimschi, I.A.:
Kinetics and mechanical stability of the fibril state control fibril formation time of polypeptide
chains: A computational study. J. Chem. Phys. 148, 215106 (2018)
76. Kouza, M., Hu, C.K., Li, M.S.: New force replica exchange method and protein folding
pathways probed by force-clamp technique. J. Chem. Phys. 128, 045103 (2008). https://doi.
org/10.1063/1.2822272
77. Kouza, M., Hu, C.K., Zung, H., Li, M.S.: Protein mechanical unfolding: importance of non-
native interactions. J. Chem. Phys. 131, 215103 (2009). https://doi.org/10.1063/1.3272275
78. Kouza, M., Jamroz, M., Gront, D., Kmiecik, S., Kolinski, A.: Mechanical unfolding of
DDFLN4 studied by coarse-grained knowledge-based CABS model. TASK Quaterly 18,
373–378 (2014)
79. Kouza, M., Co, N.T., Nguyen, P.H., Kolinski, A., Li, M.S.: Preformed template fluctuations
promote fibril formation: Insights from lattice and all-atom models. J. Chem. Phys. 142 (2015).
doi: Artn 145104. https://doi.org/10.1063/1.4917073
80. Kouza, M., Lan, P.D., Gabovich, A.M., Kolinski, A., Li, M.S.: Switch from thermal to force-
driven pathways of protein refolding. J. Chem. Phys. 146 (2017b). doi: Artn 135101. https://
doi.org/10.1063/1.4979201
81. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of chemical
reactions. Physica 7(7), 284–303 (1940). https://doi.org/10.1016/S0031-8914(40)90098-2
82. Kubelka, J., Hofrichter, J., Eaton, W.A.: The protein folding ‘speed limit’. Curr. Opin. Struct.
Biol. 14, 76–88 (2004)
83. Kumar, S., Li, M.S.: Biomolecules under mechanical force. Phys. Rep.-Rev. Sect. Phys. Lett.
486, 1–74 (2010). https://doi.org/10.1016/j.physrep.2009.11.001
84. Kurcinski, M., Kolinski, A.: Theoretical study of molecular mechanism of binding TRAP220
coactivator to Retinoid X Receptor alpha, activated by 9-cis retinoic acid. J. Steroid Biochem.
Mol. Biol. 121, 124–129 (2010). https://doi.org/10.1016/j.jsbmb.2010.03.086
85. Kurcinski, M., Kolinski, A., Kmiecik, S.: Mechanism of folding and binding of an intrinsi-
cally disordered protein as revealed by ab initio simulations. J. Chem. Theor. Comput. 10,
2224–2231 (2014). https://doi.org/10.1021/ct500287c
86. Kurcinski, M., Oleniecki, T., Ciemny, M.P., Kuriata, A., Kolinski, A., Kmiecik, S. CABS-flex
standalone: a simulation environment for fast modeling of protein flexibility. Bioinformatics,
bty685 (2018).
87. Kuriata, A., Gierut A.M., Oleniecki, T., Ciemny, M.P., Kolinski, A., Kurcinski, M., Kmiecik,
S. CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures. Nucl.
Acids Res. W1: W338–W343 (2018).
88. Lau, T.L., Kim, C., Ginsberg, M.H., Ulmer, T.S.: The structure of the integrin alphaIIb-
beta3 transmembrane complex explains integrin transmembrane signalling. EMBO J. 28,
1351–1361 (2009). https://doi.org/10.1038/emboj.2009.63
89. Lee, A.G.: How lipids affect the activities of integral membrane proteins. BBA-Biomembr.
1666, 62–87 (2004). https://doi.org/10.1016/j.bbamem.2004.05.012
90. Lee, E.H., Hsin, J., Sotomayor, M., Comellas, G., Schulten, K.: Discovery through the compu-
tational microscope. Structure 17, 1295–1306 (2009). https://doi.org/10.1016/j.str.2009.09.
001
91. Levitt, M., Warshel, A.: Computer simulation of protein folding. Nature 253, 694–698 (1975).
https://doi.org/10.1038/253694a0
92. Li, M.S.: Secondary structure, mechanical stability, and location of transition state of proteins.
Biophys. J. 93, 2644–2654 (2007). https://doi.org/10.1529/biophysj.107.106138
93. Li, L., Huang, H.H., Badilla, C.L., Fernandez, J.M.: Mechanical unfolding intermediates
observed by single-molecule force spectroscopy in a fibronectin type III module. J. Mol.
Biol. 345, 817–826 (2005). https://doi.org/10.1016/j.jmb.2004.11.021
Protein Dynamics Simulations Using Coarse-Grained Models 83
94. Li, M.S., Kouza, M.: Dependence of protein mechanical unfolding pathways on pulling speeds.
J. Chem. Phys. 130, 145102 (2009). https://doi.org/10.1063/1.3106761
95. Li, M.S., Kouza, M., Hu, C.K.: Refolding upon force quench and pathways of mechanical
and thermal unfolding of ubiquitin. Biophys. J. 92, 547–561 (2007). https://doi.org/10.1529/
biophysj.106.087684
96. Li, M.S., Mai, B.K.: Steered molecular dynamics-a promising tool for drug design. Curr.
Bioinform. 7, 342–351 (2012)
97. Lichter, S., Rafferty, B., Flohr, Z., Martini, A.: Protein high-force pulling simulations yield
low-force results. PLoS ONE 7, e34781 (2012). https://doi.org/10.1371/journal.pone.0034781
98. Liphardt, J., Onoa, B., Smith, S.B., Tinoco Jr., I., Bustamante, C.: Reversible unfolding of
single RNA molecules by mechanical force. Science 292, 733–737 (2001). https://doi.org/10.
1126/science.1058498
99. Liu, X., Shi, D., Zhou, S., Liu, H., Yao, X.: Molecular dynamics simulations and novel drug
discovery. Expert Opin. Drug Discov. 13, 23–37 (2018). https://doi.org/10.1080/17460441.
2018.1403419
100. Lu, H., Isralewitz, B., Krammer, A., Vogel, V., Schulten, K.: Unfolding of titin immunoglob-
ulin domains by steered molecular dynamics simulation. Biophys. J. 75, 662–671 (1998)
101. Lu, H., Schulten, K.: The key event in force-induced unfolding of Titin’s immunoglobulin
domains. Biophys. J. 79, 51–65 (2000). https://doi.org/10.1016/S0006-3495(00)76273-4
102. Lucent, D., England, J., Pande, V.: Inside the chaperonin toolbox: theoretical and computa-
tional models for chaperonin mechanism. Phys. Biol. 6, 015003 (2009). https://doi.org/10.
1088/1478-3975/6/1/015003
103. Malolepsza, E., Boniecki, M., Kolinski, A., Piela, L.: Theoretical model of prion propagation:
a misfolded protein induces misfolding. Proc. Natl. Acad. Sci. USA 102, 7835–7840 (2005)
104. Marrink, S.J., Tieleman, D.P.: Perspective on the Martini model. Chem. Soc. Rev. 42,
6801–6822 (2013). https://doi.org/10.1039/c3cs60093a
105. Marszalek, P.E., Lu, H., Li, H., Carrion-Vazquez, M., Oberhauser, A.F., Schulten, K., Fernan-
dez, J.M.: Mechanical unfolding intermediates in titin modules. Nature 402, 100–103 (1999).
https://doi.org/10.1038/47083
106. Mittag, T., Kay, L.E., Forman-Kay, J.D.: Protein dynamics and conformational disorder in
molecular recognition. J. Mol. Recognit. 23, 105–116 (2010). https://doi.org/10.1002/jmr.961
107. Morriss-Andrews, A., Shea, J.E.: Simulations of protein aggregation: insights from atomistic
and coarse-grained models. J. Phys. Chem. Lett. 5, 1899–1908 (2014). https://doi.org/10.
1021/jz5006847
108. Morriss-Andrews, A., Shea, J.E.: Computational studies of protein aggregation: methods and
applications. Annu. Rev. Phys. Chem. 66, 643–666 (2015). https://doi.org/10.1146/annurev-
physchem-040513-103738
109. Munoz, V., Henry, E.R., Hofrichter, J., Eaton, W.A.: A statistical mechanical model for beta-
hairpin kinetics. Proc. Natl. Acad. Sci. U S A 95, 5872–5879 (1998). https://doi.org/10.1073/
pnas.95.11.5872
110. Nasica-Labouze, J., et al.: Amyloid beta protein and Alzheimer’s Disease: when computer
simulations complement experimental studies. Chem. Rev. 115, 3518–3563 (2015)
111. Nguyen, P.H., Li, M.S., Stock, G., Straub, J.E., Thirumalai, D.: Monomer adds to preformed
structured oligomers of A beta-peptides by a two-stage dock-lock mechanism. Proc. Natl.
Acad. Sci. U.S.A. 104, 111–116 (2007). https://doi.org/10.1073/Pnas.0607440104
112. Nilsson, J., Persson, B., von Heijne, G.: Comparative analysis of amino acid distributions in
integral membrane proteins from 107 genomes. Proteins 60, 606–616 (2005). https://doi.org/
10.1002/prot.20583
113. Norgaard, A.B., Ferkinghoff-Borg, J., Lindorff-Larsen, K.: Experimental parameterization of
an energy function for the simulation of unfolded proteins. Biophys. J. 94, 182–192 (2008).
https://doi.org/10.1529/biophysj.107.108241
114. Okazaki, K.-I., Sato, T., Takano, M.: Temperature-enhanced association of proteins due to
electrostatic interaction: a coarse-grained simulation of Actin-Myosin binding. J. Am. Chem.
Soc. 134, 8918–8925 (2012). https://doi.org/10.1021/ja301447j
84 S. Kmiecik et al.
115. Paci, E., Karplus, M.: Unfolding proteins by external forces and temperature: the importance
of topology and energetics. Proc. Natl. Acad. Sci. U S A 97, 6521–6526 (2000). https://doi.
org/10.1073/pnas.100124597
116. Peplowski, L., Sikora, M., Nowak, W., Cieplak, M.: Molecular jamming–the cystine slipknot
mechanical clamp in all-atom simulations. J. Chem. Phys. 134, 085102 (2011). https://doi.
org/10.1063/1.3553801
117. Perilla, J.R., et al.: Molecular dynamics simulations of large macromolecular complexes. Curr.
Opin. Struct. Biol. 31, 64–74 (2015). https://doi.org/10.1016/j.sbi.2015.03.007
118. Periole, X., Knepp, A.M., Sakmar, T.P., Marrink, S.J., Huber, T.: Structural determinants of
the supramolecular organization of G protein-coupled receptors in bilayers. J. Am. Chem.
Soc. 134, 10959–10965 (2012). https://doi.org/10.1021/ja303286e
119. Plaxco, K.W., Simons, K.T., Baker, D.: Contact order, transition state placement and the
refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998). https://doi.org/
10.1006/jmbi.1998.1645
120. Pulawski, W., Jamroz, M., Kolinski, M., Kolinski, A., Kmiecik, S.: Coarse-Grained simula-
tions of membrane insertion and folding of small helical proteins using the CABS model. J.
Chem. Inf. Model. 56, 2207–2215 (2016). https://doi.org/10.1021/acs.jcim.6b00350
121. Rathore, N., Knotts, T.A.T., de Pablo, J.J.: Confinement effects on the thermodynamics of
protein folding: Monte Carlo simulations. Biophys. J. 90, 1767–1773 (2006). https://doi.org/
10.1529/biophysj.105.071076
122. Rauscher, S., Gapsys, V., Gajda, M.J., Zweckstetter, M., de Groot, B.L., Grubmuller, H.:
Structural ensembles of intrinsically disordered proteins depend strongly on force field: a
comparison to experiment. J. Chem. Theor. Comput. 11, 5513–5524 (2015). https://doi.org/
10.1021/acs.jctc.5b00736
123. Rauscher, S., Pomès, R.: Molecular simulations of protein disorder. This paper is one of a
selection of papers published in this special issue entitled “Canadian Society of Biochemistry,
Molecular & Cellular Biology 52nd Annual Meeting—Protein Folding: Principles and Dis-
eases” and has undergone the Journal’s usual peer review process. Biochem. Cell Biol. 88,
269–290 (2010). https://doi.org/10.1139/o09-169
124. Rief, M., Gautel, M., Oesterhelt, F., Fernandez, J.M., Gaub, H.E.: Reversible unfolding of
individual titin immunoglobulin domains by AFM. Science 276, 1109–1112 (1997). https://
doi.org/10.1126/science.276.5315.1109
125. Rojas, A., Liwo, A., Browne, D., Scheraga, H.A.: Mechanism of fiber assembly: treatment of
a beta peptide aggregation with a coarse-grained united-residue force field. J. Mol. Biol. 404,
537–552 (2010)
126. Ruprecht, J.J., Mielke, T., Vogel, R., Villa, C., Schertler, G.F.: Electron crystallography reveals
the structure of metarhodopsin I. EMBO J. 23, 3609–3620 (2004). https://doi.org/10.1038/sj.
emboj.7600374
127. Russel, D., Lasker, K., Phillips, J., Schneidman-Duhovny, D., Velazquez-Muriel, J.A., Sali,
A.: The structural dynamics of macromolecular processes. Curr. Opin. Cell Biol. 21, 97–108
(2009). https://doi.org/10.1016/j.ceb.2009.01.022
128. Rydzewski, J., Nowak, W.: Ligand diffusion in proteins via enhanced sampling in molecular
dynamics. Phys. Life Rev. (2017). https://doi.org/10.1016/j.plrev.2017.03.003
129. Sansom, M.S., Scott, K.A., Bond, P.J.: Coarse-grained simulation: a high-throughput com-
putational approach to membrane proteins. Biochem. Soc. Trans. 36, 27–32 (2008). https://
doi.org/10.1042/BST0360027
130. Saunders, M.G., Voth, G.A.: Coarse-graining of multiprotein assemblies. Curr. Opin. Struct.
Biol. 22, 144–150 (2012). https://doi.org/10.1016/j.sbi.2012.01.003
131. Schafer, K., Oestereich, M., Gauss, J., Diezemann, G.: Force probe simulations using a hybrid
scheme with virtual sites. J. Chem. Phys. 147 (2017)
132. Scheraga, H.A., Khalili, M., Liwo, A.: Protein-folding dynamics: overview of molecular
simulation techniques. Annu. Rev. Phys. Chem. 58, 57–83 (2007). https://doi.org/10.1146/
annurev.physchem.58.032806.104614
Protein Dynamics Simulations Using Coarse-Grained Models 85
133. Schlick, T., Collepardo-Guevara, R., Halvorsen, L.A., Jung, S., Xiao, X.: Biomolecularmod-
eling and simulation: a field coming of age. Q. Rev. Biophys. 44, 191–228 (2011). https://doi.
org/10.1017/S0033583510000284
134. Schwaiger, I., Kardinal, A., Schleicher, M., Noegel, A.A., Rief, M.: A mechanical unfolding
intermediate in an actin-crosslinking protein. Nat. Struct. Mol. Biol. 11, 81–85 (2004)
135. Schwaiger, I., Kardinal, A., Schleicher, M., Noegel, A.A., Rief, M.: A mechanical unfolding
intermediate in an actin-crosslinking protein. Nat. Struct. Mol. Biol. 11, 81–85 (2004). https://
doi.org/10.1038/nsmb705
136. Scott, K.A., Bond, P.J., Ivetac, A., Chetwynd, A.P., Khalid, S., Sansom, M.S.: Coarse-grained
MD simulations of membrane protein-bilayer self-assembly. Structure 16, 621–630 (2008).
https://doi.org/10.1016/j.str.2008.01.014
137. Sen, T.Z., Kloster, M., Jernigan, R.L., Kolinski, A., Bujnicki, J.M., Kloczkowski, A.: Pre-
dicting the complex structure and functional motions of the outer membrane transporter and
signal transducer FecA. Biophys. J. 94, 2482–2491 (2008). https://doi.org/10.1529/biophysj.
107.116046
138. Sengupta, D., Marrink, S.J.: Lipid-mediated interactions tune the association of glycophorin
A helix and its disruptive mutants in membranes. Phys. Chem. Chem. Phys. 12, 12987–12996
(2010). https://doi.org/10.1039/c0cp00101e
139. Serohijos, A.W., Chen, Y., Ding, F., Elston, T.C., Dokholyan, N.V.: A structural model reveals
energy transduction in dynein. Proc. Natl. Acad. Sci. U S A 103, 18540–18545 (2006). https://
doi.org/10.1073/pnas.0602867103
140. Shoemaker, B.A., Portman, J.J., Wolynes, P.G.: Speeding molecular recognition by using the
folding funnel: the fly-casting mechanism. Proc. Natl. Acad. Sci. 97, 8868–8873 (2000).
https://doi.org/10.1073/pnas.160259697
141. Sieben, C., et al.: Influenza virus binds its host cell using multiple dynamic interactions. Proc.
Natl. Acad. Sci. U S A 109, 13626–13631 (2012). https://doi.org/10.1073/pnas.1120265109
142. Sieradzan, A.K., Jakubowski, R.: Introduction of steered molecular dynamics into UNRES
coarse-grained simulations package. J. Comput. Chem. 38, 553–562 (2017)
143. Sikora, M., Cieplak, M.: Mechanical stability of multidomain proteins and novel mechanical
clamps. Proteins 79, 1786–1799 (2011). https://doi.org/10.1002/prot.23001
144. Sikora, M., Sulkowska, J.I., Witkowski, B.S., Cieplak, M.: BSDB: the biomolecule stretching
database. Nucleic Acids Res. 39, D443–D450 (2011). https://doi.org/10.1093/nar/gkq851
145. Simmons, R.M., Finer, J.T., Chu, S., Spudich, J.A.: Quantitative measurements of force and
displacement using an optical trap. Biophys. J. 70, 1813–1822 (1996). https://doi.org/10.1016/
S0006-3495(96)79746-1
146. Smith, S.O., et al.: Implications of threonine hydrogen bonding in the glycophorin A trans-
membrane helix dimer. Biophys. J. 82, 2476–2486 (2002). https://doi.org/10.1016/S0006-
3495(02)75590-2
147. Smith, S.B., Cui, Y., Bustamante, C.: Overstretching B-DNA: the elastic response of individual
double-stranded and single-stranded DNA molecules. Science 271, 795–799 (1996). https://
doi.org/10.1126/science.271.5250.795
148. Spijker, P., van Hoof, B., Debertrand, M., Markvoort, A.J., Vaidehi, N., Hilbers, P.A.: Coarse
grained molecular dynamics simulations of transmembrane protein-lipid systems. Int. J. Mol.
Sci. 11, 2393–2420 (2010). https://doi.org/10.3390/ijms11062393
149. Stossel, T.P., Condeelis, J., Cooley, L., Hartwig, J.H., Noegel, A., Schleicher, M., Shapiro,
S.S.: Filamins as integrators of cell mechanics and signalling. Nat. Rev. Mol. Cell Biol. 2,
138–145 (2001). https://doi.org/10.1038/35052082
150. Sulkowska, J.I., Sulkowski, P., Onuchic, J.N.: Jamming proteins with slipknots and their free
energy landscape. Phys. Rev. Lett. 103, 268103 (2009). https://doi.org/10.1103/PhysRevLett.
103.268103
151. Sulkowska, J.I., Sulkowski, P., Szymczak, P., Cieplak, M.: Untying knots in proteins. J. Am.
Chem. Soc. 132, 13954–13956 (2010). https://doi.org/10.1021/Ja102441z
152. Sulkowska, J.I., Cieplak, M.: Mechanical stretching of proteins - a theoretical survey of the
protein data bank. J. Phys.-Condens. Mat. 19 (2007). https://doi.org/10.1088/0953-8984/19/
28/283201
86 S. Kmiecik et al.
153. Szilagyi, A., Gyorffy, D., Zavodszky, P.: The twilight zone between protein order and disorder.
Biophys. J. 95, 1612–1626 (2008). https://doi.org/10.1529/biophysj.108.131151
154. Szymczak, P., Janovjak, H.: Periodic forces trigger a complex mechanical response in ubiq-
uitin. J. Mol. Biol. 390, 443–456 (2009). https://doi.org/10.1016/j.jmb.2009.04.071
155. Takada, S.: Coarse-grained molecular simulations of large biomolecules. Curr. Opin. Struct.
Biol. 22, 130–137 (2012). https://doi.org/10.1016/j.sbi.2012.01.010
156. Takagi, F., Koga, N., Takada, S.: How protein thermodynamics and folding mechanisms are
altered by the chaperonin cage: molecular simulations. Proc. Natl. Acad. Sci. U.S.A. 100,
11367–11372 (2003). https://doi.org/10.1073/pnas.1831920100
157. Taylor, W.R., Katsimitsoulia, Z.: A coarse-grained molecular model for actin-myosin simu-
lation. J. Mol. Graph. Model. 29, 266–279 (2010). https://doi.org/10.1016/j.jmgm.2010.06.
004
158. Thirumalai, D., Reddy, G., Straub, J.E.: Role of water in protein aggregation and amyloid
polymorphism. Acc. Chem. Res. 45, 83–92 (2012). https://doi.org/10.1021/ar2000869
159. Turjanski, A.G., Gutkind, J.S., Best, R.B., Hummer, G.: Binding-induced folding of a natively
unstructured transcription factor. PLoS Comput. Biol. 4, e1000060 (2008). https://doi.org/10.
1371/journal.pcbi.1000060
160. Uversky, V.N.: Introduction to intrinsically disordered proteins (IDPs). Chem. Rev. 114,
6557–6560 (2014). https://doi.org/10.1021/cr500288y
161. Uversky, V.N., Gillespie, J.R., Fink, A.L.: Why are “natively unfolded” proteins unstructured
under physiologic conditions? Proteins: Struct. Funct. Bioinf. 41, 415–427 (2000). https://
doi.org/10.1002/1097-0134(20001115)41:3%3c415:aid-prot130%3e3.0.co;2-7
162. Vajda, S., Kozakov, D.: Convergence and combination of methods in protein-protein docking.
Curr. Opin. Struct. Biol. 19, 164–170 (2009). https://doi.org/10.1016/j.sbi.2009.02.008
163. Valbuena, A., et al.: On the remarkable mechanostability of scaffoldins and the mechanical
clamp motif. Proc. Natl. Acad. Sci. U S A 106, 13791–13796 (2009). https://doi.org/10.1073/
pnas.0813093106
164. Vendruscolo, M., Dobson, C.M.: Protein dynamics: Moore’s law in molecular biology. Curr.
Biol. 21, R68–R70 (2011). https://doi.org/10.1016/j.cub.2010.11.062
165. Verkhivker, G.M.: Protein conformational transitions coupled to binding in molecular recogni-
tion of unstructured proteins: Deciphering the effect of intermolecular interactions on compu-
tational structure prediction of the p27Kip1 protein bound to the cyclin A–cyclin-dependent
kinase 2 complex. Proteins: Struct. Funct. Bioinf. 58, 706–716 (2005). https://doi.org/10.
1002/prot.20351
166. Verkhivker, G.M., Bouzida, D., Gehlhaar, D.K., Rejto, P.A., Freer, S.T., Rose, P.W.: Simulating
disorder–order transitions in molecular recognition of unstructured proteins: where folding
meets binding. Proc. Natl. Acad. Sci. 100, 5148–5153 (2003). https://doi.org/10.1073/pnas.
0531373100
167. Vogel, V., Sheetz, M.: Local force and geometry sensing regulate cell functions. Nat. Rev.
Mol. Cell Biol. 7, 265–275 (2006). https://doi.org/10.1038/nrm1890
168. Wang, Y.M., Latshaw, D.C., Hall, C.K.: Aggregation of A beta(17-36) in the presence of
naturally occurring phenolic inhibitors using coarse-grained simulations. J. Mol. Biol. 429,
3893–3908 (2017)
169. Wang, J., Wang, Y., Chu, X., Hagen, S.J., Han, W., Wang, E.: Multi-scaled explorations of
binding-induced folding of intrinsically disordered protein inhibitor IA3 to its target enzyme.
PLoS Comput. Biol. 7, e1001118 (2011). https://doi.org/10.1371/journal.pcbi.1001118
170. West, D.K., Olmsted, P.D., Paci, E.: Mechanical unfolding revisited through a simple but
realistic model. J. Chem. Phys. 124 (2006). https://doi.org/10.1063/1.2185100
171. Wolynes, P.G., Onuchic, J.N., Thirumalai, D.: Navigating the folding routes. Science 267,
1619–1620 (1995). https://doi.org/10.1126/science.7886447
172. Wright, P.E., Dyson, H.J.: Intrinsically unstructured proteins: re-assessing the protein
structure-function paradigm. J. Mol. Biol. 293, 321–331 (1999). https://doi.org/10.1006/jmbi.
1999.3110
Protein Dynamics Simulations Using Coarse-Grained Models 87
173. Yao, X.Q., Kenzaki, H., Murakami, S., Takada, S.: Drug export and allosteric coupling in
a multidrug transporter revealed by molecular simulations. Nat. Commun. 1, 117 (2010).
https://doi.org/10.1038/ncomms1116
174. Zacharias, M.: Accounting for conformational changes during protein-protein docking. Curr.
Opin. Struct. Biol. 20, 180–186 (2010). https://doi.org/10.1016/j.sbi.2010.02.001
175. Zhang, J., Muthukumar, M.: Simulations of nucleation and elongation of amyloid fibrils. J.
Chem. Phys. 130, 035102 (2009). https://doi.org/10.1063/1.3050295
176. Zhmurov, A., Dima, R.I., Kholodov, Y., Barsegov, V.: Sop-GPU: accelerating biomolecular
simulations in the centisecond timescale using graphics processors. Proteins 78, 2984–2999
(2010). https://doi.org/10.1002/prot.22824
177. Zhou, H.-X.: Polymer models of protein stability, folding, and interactions†. Biochemistry
43, 2141–2154 (2004). https://doi.org/10.1021/bi036269n
178. Zhou, H.X., Dill, K.A.: Stabilization of proteins in confined spaces. Biochemistry 40,
11289–11293 (2001). https://doi.org/10.3410/f.1002736.29765
179. Zhou, J., Thorpe, I.F., Izvekov, S., Voth, G.A.: Coarse-grained peptide modeling using a
systematic multiscale approach. Biophys. J. 92, 4289–4303 (2007). https://doi.org/10.1529/
biophysj.106.094425
Physics-Based Modeling of Side
Chain—Side Chain Interactions
in the UNRES Force Field
Mariusz Makowski
1 Introduction
Proteins are crucial elements of the cell and simulations of their structure and dynam-
ics are therefore of great significance in biochemistry, molecular biology, genet-
ics, medical sciences, and in other sciences, which focus on investigating processes
which occur in the living systems [1–26]. An important problem encountered while
researching of proteins is the prediction of their spatial structure. This problem is of
crucial importance, because the correct folding of proteins is necessary for proper
function. Knowledge of protein structure is also necessary for drug design and in
research concerning the interactions of specific antibodies, inhibitors, enzymes, etc.
Experimental methods such as X-ray crystallography or NMR spectroscopy do not
keep up with the need for solved protein structures. To date (data as of June 13,
2012) 536,489 sequences have been deposited in the Swiss-Prot data base (http://
us.expasy.org), and only 82,522 of them have been solved experimentally; these
structures are stored in the Protein Data Bank (PDB) (http://www.rcsb.org/pdb). For
M. Makowski (B)
Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
e-mail: [email protected]
comparison, 4831 folded protein structures were deposited in the PDB 2003, whereas
24,156 sequences were deposited in the Swiss-Prot database from February 2003 to
March 2004. Therefore, the ratio of the number of solved structures to that of known
sequence does not increase substantially. That is why, within the last 20 years, an
intensive development of methods for the prediction of the spatial structures of pro-
teins from their amino acid sequence has occurred [1–26].
In order to predict protein structures, one can use knowledge-based (i.e., based on
structural databases) or physics-based methods. Knowledge-based methods are, for
the time being, more effective than physics-based methods provided that the database
contains structures similar to that under consideration. The following methods belong
to this group: the very popular comparative modeling methods [20–22, 24, 25, 27–33],
the threading methods [10, 20, 34, 35], as well the fragment methods [36, 37]. As was
mentioned, the second important group comprises physics-based methods which,
apart from predicting the three-dimensional structure, enable us to investigate the
mechanism of protein folding. In principle, physics-based methods should be able
to predict protein structures from their amino-acid sequences alone. However, they
are also partially based on the data from structural databases [5, 11, 12, 38]. The
thermodynamic hypothesis of Anfinsen [39], according to which the native structure
of a protein corresponds to the minimum of its free energy, is the starting point
for these methods. A series of CASP (Critical Assessment of Techniques for Protein
Structure Prediction) experiments have been organized every other year starting from
1994. During this experiment methods which enable to predict protein structures
from amino-acid-sequence information alone are evaluated. Essential information
regarding the CASP experiment can be found at http://predictioncenter.org.
The research of protein—protein interactions, the interactions of proteins with
other molecules, as well as the understanding of the process of folding and con-
formational changes, and, above all, the understanding of functionally important
motions, is very important. The results of such structural research enable very often
to introduce new methods and provide new direction for the treatment of diseases. In
comparison with experimental research, theoretical calculations have some advan-
tages, such as lower costs, higher safety level, and shorter time of experiment. Unfor-
tunately, the effectiveness of such methods for protein structure prediction, as well
as their accuracy, is not high so far. Methods based on structural databases are really
efficient for the prediction of static structures, but the dynamics of the structure
as well as a description and understanding of the nature of interactions are to a
substantial extent beyond the scope of their possibilities. Therefore, methods based
on the physics of interactions (physics-based methods) are better for this purpose.
Moreover, physics-based methods are independent of databases or data from other
experimental measurements. Consequently, they enable us to examine structures of
more complicated proteins with degenerate native structure such as, e.g., the prion
proteins [40, 41], and also to simulate protein-folding pathways.
Because of the large number of atoms in biomolecules, the use of an all-atom
representation of a system is not practical in protein-structure prediction or ab initio
folding simulations, except for small proteins. One can conduct millisecond simula-
tions of small proteins in the all-atom representation [42, 43]; however, this process
Physics-Based Modeling of Side Chain—Side Chain Interactions … 91
is still too expensive due to the substantial lengthening of the time needed for calcu-
lations. Use of united-residue models is, therefore, more practical. That is why there
is a need for the development of coarse-grained physics-based interaction potentials,
which are applied in large-scale protein simulation.
The structure of a given protein is unambiguously determined by its amino acid
sequence. The interactions between the amino acid side chains in proteins are very
specific and they are encoded in the protein sequence. On the other hand, coarse-
grained potentials of interactions between side chains, which take all kinds of inter-
actions into consideration, including hydrophobic and electrostatic interactions, as
well as the formation of hydrogen bonds with participating polar groups, have not
yet been developed.
In this chapter the results of work, which concern research on a new potential
and describe the amino-acid side chain interactions that occur in proteins have been
presented. This new physics-based potential can also be applied in other large-scale
protein simulations. The motivation of this research was to improve the UNRES
(UNited RESidue) [38] force field. In the present UNRES force field, the side-
chain—side-chain interaction potentials are based on the Gay-Berne functional [44],
which implies spheroidal symmetry of the potentials. It should be kept in mind,
though, that under physiological conditions, some side chains possess a charged or a
polar group. It can, therefore, be assumed that such side chains consist of two parts,
namely a hydrophobic “tail” and a charged/polar “head.” Because of spheroidal
symmetry, the Gay-Berne potential can be used only for the description of uniform
interactions such as those between nonpolar side chains; however, even in this case,
it does not completely describe such interactions, as it does not reproduce the desol-
vation maximum. Such an improper functional
form, which describes the potential
of side chain—side chain interactions U SCi SC j for amphipolar side chains, is there-
fore probably one of the most important reasons for the imperfection of the UNRES
force-field in predicting protein structures. That is why a new model for the inter-
actions of side chains is proposed. Every side chain is represented by two centers,
namely a nonpolar center, which is placed in the middle of the side chain (represented
by an ellipsoid of revolution), and a charged or polar center placed in the headgroup
of the side chain. The energy function of the interaction is then computed as a sum
of components, which includes the electrostatic interactions of charged or polar ele-
ments, the interactions between the charged or polar centre and the nonpolar one, the
interactions between the nonpolar centres, and the energy terms which come from
the molecular surface area accessible to solvent. Search for new potentials which
describe interactions of side chains was conducted on the basis of molecular dynam-
ics simulations of appropriate model systems in water. The main goal of this research
was to develop a new model and to parameterize it. The new model is physics-based
and has been parameterized by fitting analytical formulas which describe the respec-
tive free-energy surfaces to potentials of mean force (PMF) of pairs of interacting
side chain models in water, which were calculated by means of molecular dynamics
in the AMBER force-field [45].
92 M. Makowski
6
(i) (i)
+ wr ot Ur ot (αi , βi ) + wcorr,bb Ucorr,bb
i i3
(2) (3) (3)
+ wcorr ;sc Ucorr
(2)
;r ot + wcorr ;r ot Ucorr ;r ot (1)
particles, which interact [56]. The desolvation maximum was not accounted for in the
UNRES force-field so far. The Gaussian-differential model of hydrophobic associa-
tion has been extended to sites with spheroidal symmetry [56] and an approximate
expression for the free energy of hydrophobic association of spheroidal sites was
developed [56].
The above-presented considerations concern only cavity potential, which is not
the total effective energy of interactions. Interacting sites can have charged or polar
groups. General expressions for the effective energy of all types of interaction in
these systems are given by Eqs. 2–7 (details can be found in [57–62]):
Depending on the kind of interacting pair (Eqs. 2–7), the potential function (W )
contains all or part of the following terms: Coulombic-energy terms (E el ), solvent-
polarization terms represented by the Generalized Born Model E GpolB , solute-
polarization terms (E pol ), van der Waals terms (E vdW ), and the terms that account
for the energy of cavity creation (F cav ). the isotropic Lennard-Jones term (E L J )
expresses the van der Waals interaction energy between two amphiphilic headgroups,
(E cp ) is the interaction energy between charged and polar sites, and (E pp ) is a poten-
tial of interactions between two polar headgroups. Both the Lennard-Jones potential
and the Kihara potential, which is a modified version of Lennard-Jones potential
were tested to express the energy of van der Waals interactions [57].
Further, the proposed expressions for the effective energy of hydrophobic-
association with selected model systems [57] and to propose new analytical functional
forms for determining the U SCi SC j potential in peptides and proteins were tested. Sim-
ple mathematical formulas for the description of all possible interactions between
pairs consisting of the following molecules: a hydrophobic molecule (methane), a
positively charged molecule (ammonium cation), and a negatively charged molecule
(chloride anion) have been proposed [57]. For comparison two models were used for
the estimation of the free-energy term which comes from hydrophobic association.
One of them is a model of molecular-surface area with equations given by Rank and
Baker [57], the other is a model based on the differences between two overlapping
Gaussian functions [56]. Both analytical expressions fit the PMF plots very well.
However, the function with the Kihara potential and molecular surface area compo-
nent was rejected from the further work for the following two reasons [57]. First,
the best fit to the PMF curves was obtained when the Kihara potential consisted
only of the repulsive term. Second, the expression for the molecular surface area
Physics-Based Modeling of Side Chain—Side Chain Interactions … 95
which was proposed by Rank and Baker can readily be expressed analytically only
for spherical sites. Even though the sample solutes considered in [57] were spherical,
real nonpolar amino acid side chains can only be approximated by spheroids. The
Gaussian-differential-based cavity potential reproduces the desolvation maximum
[56, 57] in the PMF curve very well. Moreover, values of the fitted parameters [57]
do also have physical meaning. The same Gaussian-differential-based expression is
used to represent the cavity-creation free energy of pairs of charged, as well as of
those of charged and nonpolar solutes [57].
The results of the research on the U SCi SC j potentials of pairs of identical [58] and
different [59] hydrophobic molecules, which model hydrophobic amino acid side
chains, are presented. Each of these analytical potentials consists of a sum of the
van der Waals potential and expressions for the free energy of cavity creation [56].
The van der Waals energy was expressed by the Gay-Berne potential [44] and the
Gaussian-differential-based term for the free energy of cavity creation for a pair of
spheroidal particles [58, 59]. Based on the definition of the model of interaction of
two hydrophobic particles (Fig. 2), the potentials of mean force were determined for
the side-chain pairs studied by umbrella-sampling molecular dynamics simulations
in water followed by post-processing with the weighted histogram analysis method
[59].
The determined PMF curves are four-dimensional functions of the distance
between geometric centers of the systems studied and their orientation. The results
of calculations described in [58, 59] show that the analytical functions fit well to the
PMF hypersurface determined by molecular dynamics simulations. And the fitted
parameters of the analytical potentials of side chain—side chain interactions have
physical meaning. Moreover, the contact free energies calculated from the PMF
curves correlate well with those determined from PDB data using the quasi-chemical
approximation [38] as shown in Eqs. 8 and 9.
Fig. 2 Definition of variables describing the location of two spheroidal sites (i and j) with respect
(1) (2)
to each other. The vector ûij is the unit vector of the long axis of site i, ûij is the unit vector of
(1)
the long axis of site j, r̂ij is the unit vector of the vector pointing from site i to site j, θi j is the
(1) (2)
angle between the vector r̂ij and vector ûij and θi j is the angle between the vector r̂ij and vector
(2) (2)
ûij , and φi j is the angle of counterclockwise rotation of the vector ûij about the vector r̂ij from
(1)
the plane defined by the vector ûij and vector r̂ij when looking from the center of site j toward the
center of site i. This is also Fig. 1 in Ref. [58]
Physics-Based Modeling of Side Chain—Side Chain Interactions … 97
Fig. 3 Illustration of the new model for the interactions of charged and polar side chains. A side
chain of this type consists of a nonpolar site (represented by an ellipsoid of revolution) and a
polar/charged site (represented by a shaded sphere). The center of the polar/charged site of side
(1)
chain i is at the distance di from the geometric center of that side chain (SC i ) (which is located
between the polar/charged and nonpolar center and represented by a small sphere in the figure),
(1)
and that of side chain j is at the distance d j from the side-chain center (SC i ), while the centers
(2) (2)
of the nonpolar sites of side chains i and j are at distances di and d j , respectively, from their
(1)
geometric centers (SC i and SC j , respectively). The vector ûij is the unit vector of the long axis of
(2)
the nonpolar site of side chain i, ûij is the unit vector of the long axis of the nonpolar site of side
chain j, r̂ij is the unit vector pointing from the geometric center of the nonpolar site of side chain i
to that of side chain j, r ij is the distance between these two centers, ri j is the distance between the
center of the charged/polar site of side chains i and j, rij is the distance between the center of the
charged site of side chain i and the center of the nonpolar site of side chain j, and r ji is the distance
between the center of the charged site of side chain j and the center of the nonpolar site of side
chain i. This is also Fig. 2 in Ref. [60]
charged and hydrophobic centers from the geometric center of the side chain were
defined.
In Eq. 10, the effective energy of interactions between two charged amino-acid
side chains is a sum of the Gay-Berne potential (E GBerne ), which accounts for the
van der Waals interactions between nonpolar sites; the energy of Coulombic inter-
actions between charged sites (E el ); the energy of polarization, which comes from
interactions between charged and nonpolar sites of side chains (E pol ); the energy
of solvent-polarization by charged sites (calculated the generalized Born model)
98 M. Makowski
E GpolB ; the free energy of cavity creation which corresponds to charged parts of
the side chain model ΔFcav iso
; the free energy of cavity creation for nonpolar parts
of side chains (ΔFcav ); the Lennard-Jones potential for the description of van der
Waals interactions between two charged parts of the side chains E L J . It should
be noted that isotropic terms in the U SCi SC j potential between charged parts of side
chains appeared, because if they are not taken into consideration, there is no possibil-
ity to distinguish the “charged head—charged head” orientation from the “nonpolar
tail—nonpolar tail” orientation.
For the EGBerne energy term the Gay-Berne-type potential expressed by Eq. 11.
It should be noted that, previously, Eq. 11 was used to express the complete side
chain—side chain interaction potential in the UNRES force-field.
⎡ ⎤
12 6
σi0j σi0j
E G Ber ne 4εi j ⎣ − ⎦ (11)
ri j − σi j + σi0j ri j − σi j + σi0j
where ri j is the distance between the centers of the side chains, σi j is the distance
corresponding to the zero value of E GBerne for arbitrary orientation of the particles
(σi0j is the distance corresponding to the zero value of E GBerne for the side-to-side
approach of the particles), εi j (depending on the relative orientation of the particles)
is the van der Waals well depth. The dependence of εi j and σi j on the orientation of
the particles is given by Eqs. 12–14 and 15, respectively [60].
with
where ûij(1) and ûij(2) are unit vectors along the principal axes of the interacting sites
(identified in this work with the Cα -SC axes), r̂ij is the unit vector pointing from the
center of site i to that of site j, rij is the distance between the side-chain centers (Figs. 2
and 3), the parameters χ(1) (2)
ij and χij are the anisotropies of the van der Waals distance,
Physics-Based Modeling of Side Chain—Side Chain Interactions … 99
qi q j
E el 332 (19)
ri j
where qi and qj are the charges of sites i and j, respectively, ri j is the distance
between the centers of the charged sites of side chains SCi and SCj (see Fig. 3), and
the coefficient 332 is introduced to express the energies in kcal/mol.
The component E GpolB corresponds to the “bulk dielectric” solvent-polarization. For
the “bulk dielectric” solvent-polarization part involving a pair of charged particles,
the expression from the Generalized Born model was adopted.
1 1 1
E GpolB 332qi q j − (20)
εin εout f G B (ri j )
where 2in is the effective dielectric constant of the “inside” of the interacting particles,
2out is the effective dielectric constant of the solvent, and fGB is expressed by Eq. 21.
2 ri2j
f G B ri j ri2j + ai a j exp − (21)
4ai a j
pol pol
where αi j and α ji are related to the polarizability of the nonpolar parts of side chain
i and side chain j, respectively. At large distances, this contribution of polarization
energy varies as 1/r 4 . The rationale for expression 22 is that a nonpolar particle
replaces the solvent at distance r, consequently removing a part of the polarization
interaction with the solvent. The polarization interaction energy is proportional to
the square of the electric field which, by Coulomb’s law, varies as 1/r 4 .
The expression for isotropic cavity creation (or solvent-restructuring) term ΔFcav iso
was derived previously [56] based on the Gaussian overlap model and is expressed
by Eq. 23. This term enables to differentiate between head-to-head and tail-to-tail
orientations of side chains in our analytical PMF curves. In the original formula-
tion of the Generalized-Born model, this term is proportional to molecular surface
100 M. Makowski
area, hence the complete name of Generalized Born Surface Area (GBSA) which is,
however more difficult to compute less numerically stable compared to Eq. 23.
1
αiiso(1)
j [(x) 2 + αiiso(2)
j x − αiiso(3)
j ]
ΔFcav
iso
(23)
1 + αiiso(4)
j · x 12
with
ri j
x (24)
(σiiso )2 + (σ jiso )2
where ri j is the distance between two charged parts (see Fig. 3) of particles i and j, σiiso
and σ jiso can be identified with the minimum distance between the center of charge
of particle i or j, respectively. The parameters αiiso(1)
j , αiiso(2)
j , αiiso(3)
j , αiiso(4)
j , σiiso , and
σ jiso are determined by least-squares fitting [60] of the analytical expression for the
free energy of two side chains interacting in water (Eq. 10) to the potentials of mean
force determined from MD simulations.
The expression for ΔFcav of spheroidal particles was derived in Ref. [56] on the
side-chain—side-chain interaction potential in the UNRES force field and given by
Eq. 25. This term accounts for the free-energy contribution due to restructuring water
molecules around a hydrophobic dimer.
1
αi(1) (2) (3)
j [(x · λ) + αi j x · λ − αi j ]
2
ΔFcav (25)
1 + αi(4)
j (x · λ)
12
with
ri j
x (26)
σi2 + σ j2
2
χi(1) (1)2 (2) (2)2 (1) (2) (1) (2) (12)
j ωi j + χi j ωi j − 2χi j χi j ωi j ωi j ωi j
λ 1− (27)
1 − χi(1) (2) (12)2
j χi j ωi j
where ri j is the distance between the centers of the charged headgroups, σij is the
distance corresponding to the zero value of E LJ, and εi j is the van der Waals well
depth.
The results of fitting of the analytical function to the PMF curves depending on
the distance and orientations of the propionate anion—propionate anion pair (which
models the interactions between two ionized aspartic-acid side chains) are shown in
Fig. 4.
It follows from Fig. 4 that the proposed analytical expression reproduces the PMF
curves very well. An analysis of these plots demonstrates that the energy expressions
Fig. 4 PMF curves for the propionate—propionate pair (model of side chain pair for Asp—Asp)
determined by the weighed histogram analysis method from molecular dynamics calculations in
water. The curves are coloured according to side-chain orientations and the colour codes are the
same as in [60]. Thinner lines correspond to PMF curves from MD simulations, whereas thicker
correspond to fitted analytical approximation of the PMF function consisting of the sum of the
Gay-Berne potential (Eq. 2 in [60])which describes van der Waals interactions between side chains,
sum of electrostatic (Eq. 10 in [60]) and generalized Born potentials (Eq. 11 in [60]) which describe
electrostatic interactions between charged parts of the side chains, polarization energy term (Eq. 13
in [60]) to express interactions between charged and nonpolar parts of side chains, isotropic cavity
potential (Eq. 14 in [60]) of charged parts, equation to represent cavity potential of hydrophobic
parts (Eq. 16 in [60]), and isotropic Lennard-Jones potential (Eq. 19 in [60]). This is also Fig. 5a in
Ref. [60]
102 M. Makowski
within the fitted parameters have physical sense for like-charged models. In particular,
all energy components tend to zero at large distances [60].
For pairs of oppositely charged side chains [61], the model had to be extended to
introduce multiple locations of a charged head relative to the nonpolar center. The
rationale for that is a fact, that assuming a fixed distance of the center of the charged
site from side-chain center does not reproduce the shape and structure of the contact
(salt-bridge) minima in the PMF surface corresponding to head-to-head orientations
of the side chains; consequently, it was assumed that the charged site can exist in two
states, which differ from each other in the distance of the charged center from the
side-chain center. The introduction of the two-state model also enable to differentiate
the head-to-head and side-to-side orientations from each other. The two state model
of two interacting side chains is shown in Fig. 5.
The respective analytical expression for the potential is given by Eq. 29, in which
the energy is a sum of the Gay-Berne potential which describes van der Waals inter-
actions between nonpolar parts (EGBerne ), the cavity creation term for nonpolar parts
of side chains (ΔFcav ), and the free energy corresponding to the summation of the
Coulombic interactions (Eq ) for interactions of head group quadrupoles (Eqd ), the
solute-polarization energy (Epol ), the solvent-polarization energy expressed by the
generalized Born model EGB pol , the free energy of cavity creation due to headgroups
ΔFcav , and the van der Waals interaction energy between the head groups repre-
iso
sented by the Lennard-Jones potential (ELJ ). For pairs including arginine, the spread
of the charge distribution must also be included by introducing a term correspond-
ing to averaged quadrupole-quadrupole interactions and, for all pairs, an explicit
Lennard-Jones term must be included between the charged centers.
i1
RT
N
+ RTln w(i) (29)
i1
As was mentioned earlier, for pairs of oppositely-charged side chains [61], the
model had to be extended to introduce multiple locations of a charged head relative
to the nonpolar center. It was assumed that every charged side-chain head group can
exist in two states. These states differ in the distance of the charged center from the
center of the side chain. This two-state model of charged side chains is shown in
Fig. 5.
Physics-Based Modeling of Side Chain—Side Chain Interactions … 103
Fig. 5 Illustration of the new model for the interactions of charged and polar side chains. A side
chain of this type is assumed to consist of the charged (shaded) and nonpolar (ellipsoidal) parts. The
geometric centers of side chains i and j are denoted as SC i and SC j , respectively and represented by
small black circles located between the centers of the charged and nonpolar sites. The charged site
of each side chain can exist in two possible states; hence two shaded spheres are shown for each
charged site. The spheres corresponding to alternative positions of the charged sites (farther away
from the centers of side chain i and j, respectively) are boarded by dashed lines and are transparent to
indicate that each of them corresponds to the alternative state of a single site and does not represent
(1) (2)
an additional site. The vector ûij is the unit vector of the long axis of side chain i, ûij is the unit
vector of the long axis of side chain j, r̂ij is the unit vector pointing from the geometric center of
the nonpolar site of side chain i to that of side chain j, r ij is the distance between these two centers,
ri j is the distance between the charged/polar centers of the head groups of side chains i and j, rij
and r ji are the distances between the charged centers of particle i and the center of particle j, and
the charged center of particle j and the center of particle i, respectively (for clarity sake we show
(1,1) (1,2)
only the distances that involve the polar/charged center in the first possible state), di , di and
(1,1) (1,2)
dj , dj are the distances from the geometrical center of side chain i and j, respectively, to the
(2) (2)
center of the charge of head group i and j respectively, and di and d j are the distances from
the geometrical center of side chain i and j, respectively, to the nonpolar center of particles i and j
respectively. This is also Fig. 1 in Ref. [61]
In the Eq. 29 the superscript (i) indicates the index of the microstate, w(i) is the
weight of this microstate (also treated as an adjustable parameter in fitting Eq. 29 to
the PMF), N is the number of microstates (N 4) R is the universal gas constant,
and T is the absolute temperature (T 298). Each of the microstates corresponds to
different distances between the center of the charged headgroups and the side-chain
104 M. Makowski
center (see Fig. 5). Two possible states for the centers of the charged headgroups for
each side-chain model of a pair were assumed, which gives a total number of four
microstates (in one state the headgroup is closer and in the other one farther from
the side-chain center).
Except for Eqd the terms in Eq. 26 have been defined in the previous subsection.
The average energy of interactions of two point quadrupoles (Eqd ) i and j is expressed
by Eq. 27 (see Appendix of the Ref. [61] for more details and derivation; Eqs.
A1–A9):
⎡ , ,
⎤
2 (1) 2 (2)
5 + 3(cos αi j − 1) − 2 (cos θi j + cos θi j )
2 75
⎢ ⎥
A ⎢ 315 , , ⎥
E qd ⎢
5 ⎢ 2+ cos 2 (1)
θ cos 2 (2)
θ ⎥ (30)
ij ij ⎥
f G B ri j ⎣ ⎦
−45 cos αi j cos θi(1) (2)
j cos θi j
The analytical approximations to the PMF curves are reasonable (Fig. 6). The
ability of reproducing salt-bridges (red curves in Fig. 6) by the analytical function
(for the charged headgroup—charged headgroup orientation) is a very important
feature of the new energy function [61]. Even though salt-bridges do not occur very
often in proteins, they can be a factor that stabilizes protein structure at early folding
stages [63]. Although the new potential will certainly result in a somewhat longer
calculation time for oppositely charged side-chain pairs (due to multiple charge
states), this increase will not be substantial, as among twenty amino acids which are
of biological importance, only four possess charged side chains.
2.5 Charged—Hydrophobic/Polar
and Polar—Hydrophobic/Polar Side Chains
pol pol
where αi j and α ji are related to the polarizability of the nonpolar parts of side
chain i and side chain j, respectively.
The E cp interaction potential between charged and polar sites of Eq. 5 is given by
Eq. 32:
(1) (2)
where wdi p and wdi p are the parameters determined by least-squares fitting of the
analytical expressions to the potentials of mean force, q is the net charge of the
charged headgroup, and Rij is the distance between the centers of the amphiphilic
headgroups.
The average energy of the interaction between two polar-group dipoles (E pp ) of
Eq. 7 is expressed by Eq. 33:
106 M. Makowski
w p1 (12) (1) (2)
E pp · cos ωij − 3 · cos θij · cos θij
Ri3j
w p2 (12) (1) (2) 2
2 (1) 2 (2)
− · 4 + (cos ωi j − 3 · cos θi j · cos θi j ) − 3 · cos θi j + cos θi j (33)
Ri6j
where w p1 and w p2 are the parameters determined by least-squares fitting, and Rij is
the distance between the centers of the polar headgroups.
A detailed discussion of the results has been described in Ref. [63]. It was also
observed that the model used reproduces all features of the interacting pairs well.
Based on these results [63] the preliminary tests of the UNRES force-field with the
new side chain—side chain interaction potentials and model were then tested with two
small α-helical proteins, i.e. the N-terminal part of the B-domain of staphylocaccal
protein A, (PDBL 1BDD; a three-α-helix bundle) and UPF0291 protein YnzC from
Bacillus subtilis (PDB: 2HEP; an α-helical hairpin). Results of these tests were
satisfactory [63]. However, it was observed that to achieve better resolution there
was a need to recalibrated the force-field with a larger number of training proteins.
Hydrophobic interactions play a very important role in chemical and biological struc-
tures. They are often responsible for the formation and stabilization of different kinds
of systems or biological structures in aqueous environments such as proteins, bio-
logical membranes, and macromolecular complexes. Entropy is the driving force
of hydrophobic interactions. Hydrophobic interactions are interactions mediated by
the solvent. Hydrophobic particles avoid the water molecules, which leads to indi-
rect interactions between them. This avoiding of water molecules by hydrophobic
surfaces causes specific packing of water particles in the vicinity of nonpolar parti-
cles. The water molecules that are in close contact with the hydrophobic surface are
ordered. This ordering of water molecules diminishes the entropy of the system. The
tendency to lower the entropy is higher when the hydrophobic surface is larger, and
leads to hydrophobic interactions. The result of hydrophobic association is a smaller
solvent accessible surface area compared to that of the separate hydrophobic parti-
cles when considered in total. Because of its specific character and our inability to
determine the structural details of hydro-phobic interactions experimentally, one uses
mainly theoretical methods to study this phenomenon. The potential of mean force
expressed, e.g., as a function of the distance between the centres of hydrophobic par-
ticles is a good quantitative measure of the dependence of hydrophobic interactions
on the geometry of the system.
The results of research concerning the influence of the size of the hydrophobic
particles on the shape of PMF curves were presented [60]. The authors performed
their calculations for five pairs of hydrophobic particles: methane, ethane, pro-pane,
Physics-Based Modeling of Side Chain—Side Chain Interactions … 107
isobutene, and neopentane. For each of the studied systems, PMF curves were deter-
mined both in water and in vacuo.
The solvent contributions to the potentials of mean force were calculated as the
difference of the PMFs determined in water and the respective PMFs determined in
vacuo [64]. Based on the analysis of the results [64], a conclusion can be drawn that
the depth of the contact minimum increases with increasing size of the inter-acting
nonpolar particles, both in water and in the gas phase. Additionally, the changes in
the height and location of the desolvation maximum (which comes from the solvent
contribution to the PMF) can be well described by the molecular surface area [64].
An analysis of the density distribution of water [64] shows that density increases
in the first and second hydration spheres. The highest density of water is observed
in the contact region of the hydrophobic particles. However, the ordering of water
molecules in the first hydration sphere is weak. The average number of hydrogen
bonds is smaller in the first solvation sphere than in bulk water [64]. The observed
average number of hydrogen bonds close to the interacting hydrophobic particles is
smaller for neopentane than for methane. However, if hydrogen bonds appear, they
are stronger than in water. This observation is in qualitative agreement with results of
previous research [64]. This smaller number of hydrogen bonds in the first hydration
sphere can be explained on a basis of the fact that water molecules are in contact
with nonpolar particles, so they have a smaller chance of forming hydrogen bonds
between each other than these ones which are farther from the nonpolar molecules. A
traditional explanation of the hydrophobic effect, which emphasizes the ordering of
water molecules in the first hydration sphere of interacting nonpolar molecules and
which leads to low entropy, is insufficient. For an explanation of this phenomenon,
cavity formation and small size of water molecules is important.
Later, the results of the research on hydrophobic association of the larger non-
polar molecules [65]: bicyclooctane, adamantane, and fullerene (C60) were com-
pared with those obtained for neopentane [64]. For the purpose of data analysis,
it was assumed that the average shape of the molecules is spherical. Additionally,
calculations for the sphere with van der Waals radius equal to the mean radius of
adamantane were also carried out. The shape of the determined PMF curves in water
[65] is characteristic of hydrophobic interactions. Each of these curves possesses a
contact minimum, a desolvation maximum, and a solvent-separated minimum [65].
Based on the results of simulations, it can be concluded that the relative contribu-
tion from nonbonded interactions between the hydrophobic molecules to the PMF
increases with the increase of molecular size. For smaller molecules [64], the minima
of the PMF curves determined in water had more negative free energies compared to
those determined in vacuo. Conversely, the minima of the PMF curves determined
in vacuo are deeper than those of the PMF determined in water for bicyclooctane,
adamantane, and fullerene [65]. Therefore, solvent contribution to the PMF is pos-
itive for the three large hydrophobic molecules [65]. Solvent contribution to PMF
curves for small hydrophobic particles has negative values or tends to zero for isobu-
tane and neopentane [64]. An attempt to explain this difference was an analysis of the
density of water molecules around neopentane (a smaller molecule) and adamantane
(a large molecule) (Fig. 7).
108 M. Makowski
Fig. 7 Normalized distribution functions of the water molecule density in the vicinity of the
adamantane dimer (Figures a–c) and neopentane (Figures d–f) at monomer-separation distances
h a 6.8 Å, b 8.8 Å, c 10.2 Å, d 5.8 Å, e 7.85 Å and f 9.2 Å, which correspond to the contact min-
ima (a and d), the desolvation barrier (b and e), and the solvent-separated minimum configurations
(c and f), respectively. The color scale is shown above the panels; and the bulk water density is
displayed in white. The solute is in grey, the space between the solute and the first hydration layer
is in violet, and the first hydration layer is in blue plus light blue (a–c) and green, red and yellow
(d–f). This is also Fig. 7 in Ref. [66]
This part of this chapter concerns research on the temperature dependence of the
effective potential for the interaction of amino acid side chains in water [66, 67]. Most
contemporary coarse-grained force fields used for protein structure prediction do not
depend on temperature; this is inconsistent with the physical sense of most coarse-
grained force-fields, which stem from potentials of mean force. The temperature
dependence was introduced for the first time in the UNRES force-field in 2007
[68], via multibody-interaction terms. However, the U SCi SC j potential of the UNRES
force field remains temperature-independent. The main component of hydrophobic
interactions is the free energy of cavity creation and changes in entropy due to
the reorganisation of water molecules; these free energies depend on temperature.
Consequently, the U SCi SC j potentials should depend on temperature.
In paper [66], the first attempt at the derivation of temperature-dependent U SCi SC j
potentials was presented. Only a pair of interacting methane molecules at different
temperatures at constant volume or constant pressure in water was considered. The
PMFs and dimensionless PMFs (W/RT , where W is the potential of mean force, R
is the universal gas constant, and T is the absolute temperature) at different sim-
ulation temperatures and constant volume or pressure were plotted and analyzed
against the methane-methane distance. An important finding from this study is that
the dimensionless PMFs at constant volume nearly overlap. Therefore, to obtain
temperature-dependent potentials for the interactions of nonpolar side chains, one
has to multiply the dimensionless potential by absolute temperature. The results
of research [66] are in accordance with literature data [69]. Moreover, it was also
observed that the depths of the contact minimum (the first minimum on the plot, at
the shortest distance) in the PMF plots depend on the temperature very strongly and
increase with the temperature [66]. This means that the entropy of association is posi-
tive because S −∂ F/∂ T (where: S denotes entropy, F denotes the free energy, and
T denotes the absolute temperature). The dimensionless PMFs [66] weakly depend
on the temperature in the contact minimum, which means that the energy of associa-
tion is small. A very strong dependence of the desolvation maximum (first maximum
at the shortest distance on the PMF curve) on the temperature is observed in the PMF
plots determined at constant pressure [66]. The height of the desolvation maximum
decreases with increasing temperature in both cases [66].
One important conclusion from the work discussed above [65] is that the assump-
tion, that the depth of the minimum of the side chain—side chain interaction compo-
nent of coarse-grained potentials is independent of temperature, neglects its actual
increase with increasing temperature. The neglect of this increase is equivalent to
assuming that the strength of the simulated interactions between nonpolar side chains
decreases with temperature at contact distance, which does not agree with the experi-
110 M. Makowski
The last part of this chapter concerns preliminary research on the interaction between
O-phosphorylated and standard amino acid side chains in water [70]. Phosphoryla-
tion of hydroxylated amino-acid side-chains such as serine (Ser), threonine (Thr),
and tyrosine (Tyr) by protein kinases can activate numerous enzymes and play a
very important role in several cellular processes. It is known that more than one
third of proteins in eukaryotas are subjects of phosphorylation. It can regulate for
example metabolic pathways, gene translation and transcription, membrane trans-
port, hormonal response, and many more. It should be noted, that knowledge of
how phosphorylation alters the structure and function of proteins is still not very
well recognized, because of specific physico-chemical properties of phosphorylated
group, i.e., −2 charge at physiological pH, which could perturb the local electrostatic
potential in proteins [70].
Similarly, to the work described in the Chapter “Protein Structure Prediction
Using Coarse-Grained Models” the PMFs dependent on distance and orientation for
interactions of pairs of phosphorylated amino acids and natural amino acids side-
chain models in water were calculated with MD simulations and then discussed.
The positions and depths of the contact minima and the positions and heights of the
desolvation maxima, including their mutual orientation depend on the character of
the interacting pairs. [70]. The same effect was observed in our previous work [56–62,
65, 66]. The quality of results of the coarse-grained model of the interactions of the
O-phosphorylated amino-acid side chains with natural amino-acid side chains and
the respective potentials developed is now being introduced into the UNRES force
field, which will enable to simulate and test its predictive power with the proteins
containing O-phosphorylated amino-acid residues [70].
6 Summary
The results of the research on the development of the new side chain—side chain
interaction potentials to be used in the coarse-grained physics-based UNRES force
field for protein simulations strongly suggest that more additional work is needed.
The parameters of this potential were determined from fitting of analytical functions
to the PMF curves obtained from the MD calculations. The new U SCi SC j potentials
have been implemented into UNRES and were tested on two small α-proteins. Based
on these preliminary tests it was observed that more additional tests were needed
and presumably an additional re-calibration of the parameters should be done to
significantly improve the predictive power of the UNRES. Replacement of the old
potentials of side chain—side chain interactions with the new one eliminated the last
knowledge-based component of UNRES.
112 M. Makowski
Acknowledgements This research was conducted by using the resources of (a) our 818-processor
Beowulf cluster at the Baker Laboratory of Chemistry and Chemical Biology, Cornell University,
(b) the National Science Foundation Terascale Computing System at the Pittsburgh Supercomputer
Center, (c) 45-processor Beowulf cluster at the Faculty of Chemistry, University of Gdańsk, (d) the
Informatics Center of the Metropolitan Academic Network (IC MAN) in Gdańsk. This work was
supported by grants from the U.S. National Institutes of Health (GM-14312), the U.S. National Sci-
ence Foundation (MCB05-41633), the Polish Ministry of Science and Education (N N204 152836),
and the Polish National Science Centre (UMO-2013/10/E/ST4/00755).
References
1. Lee, J., Scheraga, H.A., Rackovsky, S.: Conformational analysis of the 20-residue membrane-
bound portion of melittin by conformational space annealing. Biopolymers 46, 103–115 (1998)
2. Lee, J., Liwo, A., Scheraga, H.A.: Energy-based de novo protein folding by conformational
space annealing and an off-lattice united-residue force field: application to the 10-55 fragment
of staphylococcal protein A and to apo calbindin D9K. Proc. Natl. Acad. Sci. U.S.A. 96,
2025–2030 (1999)
3. Pillardy, J., Czaplewski, C., Wedemeyer, W.J., Scheraga, H.A.: Conformation-Family Monte
Carlo (CFMC): an efficient computational method for identifying the low-energy states of a
macromolecule. Helv. Chim. Acta 83, 2214–2230 (2000)
4. Levitt, M.: Simplified representation of protein conformations for rapid simulation of protein
folding. J. Mol. Biol. 104, 59–107 (1976)
5. Crippen, G.M., Ponnuswamy, P.K.: Determination of an empirical energy function for protein
conformational-analysis by energy embedding. J. Comput. Chem. 8, 972–981 (1987)
6. Scheraga, H.A.: Calculations of stable conformations of polypeptides, proteins, and protein
complexes. Chem. Scr. 29A, 3–13 (1989)
7. Dill, K.A.: Dominant forces in protein folding. Biochemistry 29, 7133–7155 (1990)
8. Scheraga, H.A.: Some approaches to the multiple-minima problem structures. Int. J. Quant.
Chem. 42, 1529–1536 (1992)
9. Scheraga, H.A.: Predicting three-dimensional Structures of Oligopeptides. In: Lipkowitz, K.,
Boyd, D.B. (eds.) Reviews of Computational Chemistry, vol. 3, pp. 73–142. VCH Publ, New
York (1992)
10. Seetharamulu, P., Crippen, G.M.: A potential function for protein folding. J. Math. Chem. 6,
91–110 (1991)
11. Godzik, A., Koliński, A., Skolnick, J.: De-novo and inverse folding predictions of protein-
structure and dynamics. J. Comput. Aided Mol. Des. 7, 397–438 (1993)
12. Koliński, A., Godzik, A., Skolnick, J.: A general-method for the prediction of the 3-dimensional
structure and folding pathway of globular-proteins—application to designed helical proteins.
J. Chem. Phys. 98, 7420–7433 (1993)
13. Sippl, M.J.: Boltzmann principle knowledge-based mean fields and protein-folding—an
approach to the computational determination of protein structures. J. Comput. Aided Mol.
Des. 7, 473–501 (1993)
14. Koliński, A., Skolnick, J.: Monte-Carlo simulations of protein-folding. Lattice model and
interaction scheme. Proteins 18, 338–352 (1994)
Physics-Based Modeling of Side Chain—Side Chain Interactions … 113
40. Jansen, K., Schafer, O., Birkmann, E., Post, K., Serban, H., Prusiner, S.B., Riesner, D.: Struc-
tural intermediates in the putative pathway from the cellular prion protein to the pathogenic
form. Biol. Chem. 382, 683–691 (2001)
41. Morillas, M., Vanik, D.L., Surewicz, W.K.: On the mechanism of alpha-helix to beta-sheet
transition in the recombinant prion protein. Biochemistry 40, 6982–6987 (2001)
42. Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., Eastood, M.P., Bank,
J.A., Jumper, J.M., Salmon, J.K., Yibing, S., Wriggers, W.: Atomic-level characterization of
the structural dynamics of proteins. Science 330, 341–346 (2010)
43. Lindorff-Larsen, K., Piana, S., Dror, R.O., Shaw, D.E.: How fast-folding proteins fold. Science
334, 517–520 (2011)
44. Gay, J.G., Berne, B.J.: Modification of the overlap potential to mimic a linear site-site potential.
J. Chem. Phys. 74, 3316–3319 (1981)
45. Pearlman, D.A., Case, D.A., Caldwell, J.W., Ross, W.S., Cheatham III, T.E., DeBolt, S., Fer-
guson, D., Seibel, G., Kollman, P.A.: AMBER, a package of computer-programs for applying
molecular mechanics, normal-mode analysis, molecular-dynamics and free-energy calcula-
tions to simulate the structural and energetic properties of molecules. Comput. Phys. Commun.
91, 1–41 (1995)
46. Liwo, A., Lee, J., Ripoll, D.R., Pillardy, J., Scheraga, H.A.: Protein structure prediction by
global optimization of a potential energy function. Proc. Natl. Acad. Sci. U.S.A. 96, 5482–5485
(1999)
47. Lee, J., Liwo, A., Ripoll, D.R., Pillardy, J., Scheraga, H.A.: Calculation of protein conformation
by global optimization of a potential energy function. Proteins Struct. Funct. Genet. 37(Suppl.
3), 204–208 (1999)
48. Lee, J., Liwo, A., Ripoll, D.R., Pillardy, J., Saunders, J.A., Gibson, K.D., Scheraga, H.A.:
Hierarchical energy-based approach to protein-structure prediction: blind-test evaluation with
CASP3 targets. Int. J. Quantum Chem. 71, 90–117 (2000)
49. Liwo, A., Pincus, M.R., Wawak, R.J., Rackovsky, S., Scheraga, H.A.: Calculation of protein
backbone geometry from beta-carbon coordinates based on peptide-group dipole alignment.
Protein Sci. 2, 1697–1714 (1993)
50. Ołdziej, S., Kozłowska, U., Liwo, A., Scheraga, H.A.: Determination of the potentials of mean
force for rotation about C-alpha-C-alpha virtual bonds in polypeptides from the ab initio energy
surfaces of terminally blocked glycine, alanine, and proline. J. Phys. Chem. A 107, 8035–8046
(2003)
51. Liwo, A., Ołdziej, S., Czaplewski, C., Kozłowska, U., Scheraga, H.A.: Parametrization of
backbone-electrostatic and multibody contributions to the UNRES force field for protein-
structure prediction from ab initio energy surfaces of model systems. J. Phys. Chem. B 108,
9421–9438 (2004)
52. Czaplewski, C., Liwo, A., Ołdziej, S., Scheraga, H.A.: Improved conformational space anneal-
ing method to treat beta-structure with the UNRES force-field and to enhance scalability of
parallel implementation. Polymer 45, 677–686 (2004)
53. Liwo, A., Arłukowicz, P., Czaplewski, C., Ołdziej, S., Pillardy, J., Scheraga, H.A.: A method
for optimizing potential-energy functions by a hierarchical design of the potential-energy land-
scape: application to the UNRES force field. Proc. Natl. Acad. Sci. U.S.A. 99, 1937–1942
(2002)
54. Liwo, A., Arłukowicz, P., Ołdziej, S., Czaplewski, C., Makowski, M., Scheraga, H.A.: Opti-
mization of the UNRES force field by hierarchical design of the potential-energy landscape. 1.
Tests of the approach using simple lattice protein models. J. Phys. Chem. B 108, 16918–16933
(2004)
55. Ołdziej, S., Liwo, A., Czaplewski, C., Pillardy, J., Scheraga, H.A.: Optimization of the UNRES
force field by hierarchical design of the potential-energy landscape. 2. Off-lattice tests of the
method with single proteins. J. Phys. Chem. B 108, 16934–16949 (2004)
56. Makowski, M., Liwo, A., Scheraga, H.A.: Simple physics-based analytical formulas for the
potentials of mean force for the interaction of amino-acid side chains in water. 1. Approximate
expression for the free energy of hydrophobic association based on a Gaussian-overlap model.
J. Phys. Chem. B 111, 2910–2916 (2007). Erratum: J. Phys. Chem. B 114, 1226 (2010)
Physics-Based Modeling of Side Chain—Side Chain Interactions … 115
57. Makowski, M., Liwo, A., Maksimiak, K., Makowska, J., Scheraga, H.A.: Simple physics-based
analytical formulas for the potentials of mean force for the interaction of amino-acid side chains
in water. 2. Tests with simple spherical systems. J. Phys. Chem. B 111, 2917–2924 (2007)
58. Makowski, M., Sobolewski, E., Czaplewski, C., Liwo, A., Ołdziej, S., No, J.H., Scheraga, H.A.:
Simple physics-based analytical formulas for the potentials of mean force for the interaction of
amino-acid side chains in water. 3. Calculation and parameterization of the potentials of mean
force of pairs of identical hydrophobic side chains. J. Phys. Chem. B 111, 2925–2931 (2007)
59. Makowski, M., Sobolewski, E., Czaplewski, C., Ołdziej, S., Liwo, A., Scheraga, H.A.: Simple
physics-based analytical formulas for the potentials of mean force for the interaction of amino-
acid side chains in water. IV. Pairs of different hydrophobic side chains. J. Phys. Chem. B 112,
11385–11395 (2008)
60. Makowski, M., Liwo, A., Sobolewski, E., Scheraga, H.A.: Simple physics-based analytical
formulas for the potentials of mean force of the interaction of amino-acid side chains in water.
V. Like-charged side chains. J. Phys. Chem. B 115, 6119–6129 (2011)
61. Makowski, M., Liwo, A., Scheraga, H.A.: Simple physics-based analytical formulas for the
potentials of mean force of the interaction of amino-acid side chains in water. VI. Oppositely-
charged side chains. J. Phys. Chem. B 115, 6130–6137 (2011)
62. Makowski, M., Liwo, A., Scheraga, H.A.: Simple physics-based analytical formulas for the
potentials of mean force of the interaction of amino-acid side chains in water. VII. Charged—hy-
drophobic/polar and polar—hydrophobic/polar side chains. J. Phys. Chem. B 121, 379–390
(2017)
63. Lewandowska, A., Ołdziej, S., Liwo, A., Scheraga, H.A.: Beta-hairpin-forming peptides; mod-
els of early stages of protein folding. Biophys. Chem. 151, 1–9 (2010)
64. Sobolewski, E., Makowski, M., Czaplewski, C., Liwo, A., Ołdziej, S., Scheraga, H.A.: Potential
of mean force of hydrophobic association: dependence on solute size. J. Phys. Chem. B 111,
10765–10774 (2007)
65. Makowski, M., Czaplewski, C., Liwo, A., Scheraga, H.A.: Potential of mean force of large
hydrophobic particles: towards nanoscale limit. J. Phys. Chem. B 114, 993–1003 (2010)
66. Sobolewski, E., Makowski, M., Ołdziej, S., Czaplewski, C., Liwo, A., Scheraga, H.A.:
Towards temperature-dependent coarse-grained potentials of side-chain interactions. I. Molec-
ular dynamics study a pair of methane molecules in water at various temperatures. Protein Des.
Eng. Sel. (PEDS) 22, 547–552 (2009)
67. Sobolewski, E., Ołdziej, S., Wiśniewska, M., Liwo, A., Makowski, M.: Toward temperature-
dependent coarse-grained potentials of side-chain interactions for protein folding simulations.
II. Molecular dynamics study of pairs of different types of interactions in water at various
temperatures. J. Phys. Chem. B 116, 6844–6853 (2012)
68. Liwo, A., Khalili, M., Czaplewski, C., Kalinowski, S., Ołdziej, S., Wachucik, K., Scheraga,
H.A.: Modification and optimization of the united-residue (UNRES) potential energy function
for canonical simulations. I. Temperature dependence of the effective energy function and tests
of the optimization method with single training proteins. J. Phys. Chem. B 111, 260–285 (2007)
69. Paschek, D.: Temperature dependence of the hydrophobic hydration and interaction of simple
solutes: an examination of five popular water models. J. Chem. Phys. 120, 6674–6690 (2004)
70. Wiśniewska, M., Sobolewski, E., Ołdziej, S., Liwo, A., Scheraga, H.A., Makowski, M.: The-
oretical studies of interactions between O-phosphorylated and standard amino-acid side-chain
models in water. J. Phys. Chem. B 119, 8526–8534 (2015)
Modeling Nucleic Acids
at the Residue–Level Resolution
Abstract Coarse–grained models and force fields have become useful in the studies
of the dynamics and physicochemical properties of nucleic acids. Reduced represen-
tations of DNA or RNA allow saving computational cost of a few orders of mag-
nitude in comparison with full–atomistic simulations. In this chapter we describe
a few selected coarse–grained models of nucleic acids in which one nucleotide is
represented as either one, two or three beads. We present the examples of the models
designed to investigate the internal dynamics and temperature-dependent denatura-
tion of nucleic acids, as well as created to predict the tertiary structure of RNA or used
for large ribonucleoprotein complexes. We describe how the purpose of the model
affects the design of the potential energy function and the choice of the simulation
method. We also address the limitations of these models.
1 Introduction
Genomes of many species, including human, have been already mapped [50, 140]
and are publicly available [34]. Their analyses give critical information on the cell
components. However, in numerous cases, looking solely at the nucleotide sequence
is not enough to explain how the processes in the cell are controlled. This happens
because these sequences give rise to three-dimensional molecules, immersed in the
environment of the cell, which undergo thermal fluctuations and “precisely” interact.
Therefore, the knowledge of the sequence, even though crucial, is only the first
step to analyze the spatial and temporal pattern of biomolecular interactions. To
understand these interactions one needs to capture both the structural properties
and time-dependent dynamics of single molecules and macromolecular complexes.
F. Leonarski
Faculty of Chemistry, Centre of New Technologies, University of Warsaw, Warsaw, Poland
e-mail: [email protected]
J. Trylska (B)
Centre of New Technologies, University of Warsaw, Warsaw, Poland
e-mail: [email protected]
Below, we give a few examples where the dynamics is indispensable for biological
function.
Cells use multiple strategies to pack and protect long strands of deoxyribonu-
cleic acid (DNA) to provide for both DNA compaction and DNA accessibility for
transcription, replication and repair. For example, in bacteria DNA is supercoiled,
a torsional stress is applied to a circular DNA duplex (plasmid) [141, 145]. The
changes in supercoiling result in a bacterial response to hostile conditions such as
starvation or thermal shock [90, 91]. The unwinding of supercoiled DNA is also the
first step of transcription and replication [20]. In eukaryotes, proteins are used to
help pack DNA in the nucleus to the form of chromatin. The simplest building block
is the nucleosome, which is composed of histone proteins that are wrapped around
by about 140 base pair long DNA duplex [114]. Multiple successive nucleosomes
are separated by DNA linkers and resemble “beads on a string” under an electron
microscope. Such organization allows the cell to control access to nucleosomal DNA,
which is possible only when DNA unwraps from the histone core. Therefore, under-
standing the dynamics of this mechanism is crucial to control the gene expression or
design how to put the genetic material into cells.
Another important aspect of the stability of DNA is related to the flexibility and
dynamics of its double–helical structure. Topological stress, temperature or force–
pulling might break bonds between the complementary bases and destroy the helix.
DNA denaturation is easily tracked by UV–monitored changes of absorbance upon
raising the temperature. This process depends on the sequence and length of DNA,
and solution conditions such as ionic strength and pH [87]. Even though in living
cells a complete denaturation is not desirable, the local opening of a double–helix
is important for gene regulation. A small “bubble” of denatured DNA forms in the
areas where the transcription is initiated and/or regulated [20].
Even though ribonucleic acid (RNA) differs from DNA by just one hydroxyl
group in the sugar ring, this difference has important implications for the RNA
architecture leading to a plethora of RNA structures with diverse roles. Messenger
RNAs (mRNAs) serve as templates to transfer genetic information from DNA to
ribosomes. The RNAs that do not carry genetic information form a large group of
non–coding RNAs [83]. Transport RNAs supply the ribosome with amino acids.
The ribosome itself contains ribosomal RNA which serves not only as a structural
skeleton but also as a catalytic center. There is also a myriad of regulatory RNAs
such as micro RNAs, small interfering RNAs, and small nucleolar RNAs.
Functional differences between DNA and RNA arise from the structural ones.
DNA predominantly forms an ordered double–helical structure, with adenosine–
thymine (A–T) and guanine–cytosine (G–C) complementary canonical base pairs.
RNA is predominantly single–stranded with nucleotides bound by both complemen-
tary and non–complementary hydrogen bonds. Complementary ones, in the Watson–
Crick sense, represent the secondary structure. They are formed first, in microsec-
ond to millisecond time scales [2]. Bonds formed according to other schemes are
responsible for the RNA 3D folds and the entire tertiary structure [12, 66]. The ter-
tiary structure formation requires even seconds. The network of interactions in RNA
leads to double helical regions, intertwined with loops and junctions. Evolutionary
Modeling Nucleic Acids at the Residue–Level Resolution 119
conservation analyses show a strong link between the tertiary structure of RNA and
its function, whereas the secondary structure and sequence are less conserved [15].
The functionality of RNAs is related to its flexibility and ability to change
folds [10]. Some RNAs adapt multiple functional conformations in response to exter-
nal conditions. The examples are riboswitches [137] which respond to ligand or metal
ion concentrations and RNA thermometers [93] which respond to temperature shifts.
These are mRNA fragments that typically include the Shine–Dalgarno sequence [6]
responsible for binding of mRNA to the ribosome and initialization of translation.
The Shine–Dalgarno sequence either forms a hairpin loop which is not exposed to
interact with the ribosome or it switches the fold and the sequence becomes accessi-
ble to the ribosome. The accessibility of Shine–Dalgarno sequence depends on the
environment and can be moderated by external conditions.
Full understanding of the above processes requires the knowledge of how the
structure of nucleic acids changes and fluctuates in time and how this dynamics is
related with function. The methods that gain information solely from sequence are
of great value, e.g., thermodynamic nearest–neighbor model has been successful in
predicting denaturation temperatures of various DNA or RNA duplexes [54, 71, 139].
Also, the secondary structure can be in most cases reliably predicted just based on the
sequence [82]. However, the sequence-based methods fail for more complicated tasks
such as predictions of RNA 3D structure [117] and more importantly dynamics. The
dynamics, which is typically simulated based on the 3D model, helps in understanding
the functional roles of various nucleic acid architectures.
Also, in comparison with the number of available sequences of functional nucleic
acids, the experimentally-determined 3D structural data lag behind. As of December
2017, there have been 3738 structures deposited in the Protein Data Bank contain-
ing RNA molecules [8]. In the year 2016 there were only 311 new RNA structures
resolved. When compared with proteins these numbers are 132768 deposited struc-
tures and 10270 resolved in 2016. Efficient ways to predict the RNA 3D structure
will help filling the gap of still low number of RNA structures in the crystallographic
database. The dynamical data for RNA are even more sparse also because the dynam-
ics is difficult to be monitored experimentally at atomic level and on fast time scales.
So the modeling methods that add the fourth dimension – time-dependence are ben-
eficial to understand the complexity of interactions in the cell.
To characterize the dynamical processes occurring in nucleic acid molecules mul-
tiple techniques have been used. The three main conformational sampling techniques,
are molecular dynamics simulations, normal mode analysis and Monte Carlo algo-
rithms [61]. All typically require, as a starting point, a set of initial coordinates of
the molecule describing its 3D structure. The Monte Carlo (MC) algorithms are
probabilistic methods that help to stochastically explore the conformational space of
molecules. In an MC simulation (e.g. [57, 129]) small modifications of molecule’s
coordinates are randomly introduced and are either accepted or rejected based on
the potential energy of the system. If the modification lowers the potential energy, it
is always accepted. Otherwise, its acceptance is probabilistic, more likely to happen
if an absolute value of energy change is small. A wide number of possible confor-
mations can be probed using this method. Conversely, if one is not interested in a
120 F. Leonarski and J. Trylska
larger simulation time step. Therefore, such coarse–graining (CG) procedure, should
be appropriate to simulate the above introduced nucleic acid dynamical problems that
occur on nanoseconds to seconds.
In this chapter we present the CG FFs for nucleic acids that use between one and
three beads per nucleotide. We believe that such models give a reasonable balance
between the quality of the results and time efficiency of the calculations. However,
there are models that use a higher number of beads and represent the structural details
of bases (e.g., [11, 21, 23, 74, 75, 106, 108, 119, 148]). On the other spectrum
there are coarser models in which the building blocks are formed of helices and
single–stranded loops (not single nucleotides) [3, 4, 14, 51, 98]. Here, we describe
only the models that use spherical beads but some authors implement interaction
centers as ellipsoids [92] or disks [86]. We will also not cover the two models which
are historically important: a one–bead model published in 1970s by Olson [95–97]
which was the first attempt of coarse-grain DNA modeling and a three-bead per
nucleotide model by Vorobjev [144] from 1990, since the latter model was not used
in actual simulations, despite its strong theoretical background. Also, the models
that we review here belong to the class of the off–lattice models for which the bead
coordinates in the simulations are not limited to a certain set of positions such as
nodes of a cubic grid. Our selection of the models is arbitrary and far from complete
because our aim was to give informative examples of how the CG models for nucleic
acids are constructed and to which biological problems they can be applied.
d(r )
V (r ) = −k B T ln , (1)
d0 (r )
The basic criterion that we apply to divide the CG models into classes is the number
of beads used to represent one nucleotide. As stated in the introduction we will cover
only a small spectrum of possible representations—one to three beads per nucleotide.
However, even in this bead range one observes differences between the design and
applicability of the coarser– and finer–grained models.
There are also other measures to compare FFs apart from the number of interact-
ing centers. These are the mapping (where the centers of beads are positioned), the
definition of potential energy function, and the range of applicability (or transfer-
ability). Unfortunately, for the CG methods this applicability range is usually very
narrow. To provide reliable and predictive results CG methods have to be fine–tuned
Modeling Nucleic Acids at the Residue–Level Resolution 123
for a particular process and/or group of molecules. Overall, CG FFs lack the general
transferability of all-atom ones. For example, in the presented set of FFs there is not
even one that can be, out of the box, applied to both RNA and DNA systems. Apart
from the target molecule, we have selected the following main classes of problems
that the FF can be applied to:
• Long timescale dynamics—a model provides reliable information about the time
evolution of a molecular structure on at least nanosecond scale.
• Tertiary structure prediction—a model finds a 3D structure or a set of 3D structures
that are closest to the native state. The focus is only on the final structure, not on
the way it is achieved (in contrast to the folding simulations).
• Temperature denaturation—a model correctly predicts the effects of the tempera-
ture increase on nucleic acid stability.
• Supercoiling—a model predicts the effects associated with supercoiling (e.g.,
unwinding mechanics).
• Large molecule mechanics—a model is designed to simulate the dynamics of large
molecular complexes (>1000 residues) such as the ribosome or nucleosome.
• Interaction with non–nucleic acid molecules—a model is able to predict interac-
tions with ligands, proteins or nanomaterials (ions and solvent are not included in
this category).
The FF applicability results from its implementation details such as the definition
of the potential energy function with respect to the chosen degrees of freedom and
connectivity. For the residue-resolution CG FFs the potential energy function, Vtotal ,
is usually expressed in the following, general way:
The intrastrand term covers the interactions of beads connected by covalent bonds
which extend up to the third neighbor. This term is composed of a pseudo–bond
(Vbond ), pseudo–angle (Vangle ), and pseudo–dihedral (Vdi hedral ) parts (see Fig. 1):
Typically, these bonds are not allowed to break in a simulation, so they are represented
with harmonic potentials (see Fig. 1a, b, c):
Vbond (r ) = kr (r − r0 )2 , (4)
Vangle (θ ) = kθ (θ − θ0 )2 , (5)
(a)
V(r) [kcal/mol]
0
r0
r [Å]
(b)
V(θ) [kcal/mol]
θ0
θ [deg]
(c)
2K φ
V(φ) [kcal/mol]
Kφ
−π + φ0 φ0 π + φ0
φ [deg]
Fig. 1 Intrastrand potentials used in the presented CG FFs. a The pseudo–bond harmonic (solid
line, see (4)), cubic (long–dashed line) and quartic (short–dashed line, see (33)) potential. b The
pseudo–angle potential (see (5)). c The pseudo–dihedral potential implemented using a cosine
function (long-dashed line, see (6)) or harmonic potential (solid line, see (7))
Modeling Nucleic Acids at the Residue–Level Resolution 125
where kr ,1 kθ and kφ are the force constants, r0 the equilibrium distance, and φ0 and
θ0 are the equilibrium angles. The drawback of the above Vdi hedral is that it is not
periodic so to account for full rotation of the pseudo–dihedral angle, a formula with
a cosine is used (see also Fig. 1c):
with the same definition of kφ and φ0 . Beads positioned in the same strand can
form complementary bonds which is especially important for RNA that is usually
composed of only one folded strand. However, as these are usually residues separated
by more than three bases, for the purpose of CG FFs such bonds are not considered
to be “intrastrand” and accounted for in the interstrand part.
The interstrand term describes the interaction of complementary strands. This
term models hydrogen bonds which in nature can be broken by raising the temperature
or adding denaturating agents or enzymes. Breakable bonds are usually implemented
using the Lennard–Jones potential (see Fig. 2a):
σ 6
σ 12
VL J (r ) = 4ε − (8)
r r
Equations (8) and (9) are two forms of the same equation. ε describes the depth of
the potential energy well. σ is the distance where the potential energy is equal to zero
and req is the distance where the potential energy has a minimum. For the Morse
potential of Eq. 10, V0 is also the depth of the energy well and α describes the width
of the potential well. The Lennard–Jones potential might be modified (for example
softened) by changing the powers in the equation. However, not all FF models permit
such actions because this requires a more complex potential energy formulation. It is
not always necessary to allow for the interstrand bond breaking because a particular
CG model may be designed only for non–denaturating conditions. In such case a
simple (4) harmonic potential may suffice. The CG models also differ in the way the
interstrand bond network is set. Simpler models have a predefined network which
is based on the secondary structure prediction and the pairing is not altered during
1 Inthis chapter we ignore the 21 factor because the harmonic potentials in CG FFs are presented
differently (either with or without the 21 factor). Including this factor affects only the numerical
value of a force constant but does not change its general form.
126 F. Leonarski and J. Trylska
a simulation, so even after denaturation the molecule will always return to the same
conformational setting as in native conditions. This is beneficial for RNA structure
prediction, when we are interested in the folds that correspond only to one particular
secondary structure. In the case of more elaborate CG FF models interstrand bonds
can be formed dynamically when the two complementary bases are close and their
topology permits bonding.
The last category of terms are the nonbonded ones nb. They account for the
interactions of residues that are not connected explicitly by intrastrand and inter-
strand terms. Their basic function is to introduce a short–range repulsion to avoid
overlapping of non–interacting beads, however, they also account for long-range
electrostatic interactions and solvent or other environmental conditions. The imple-
mentation of these terms varies among FFs depending on their applications. Some
FFs use Lennard–Jones or Morse terms as in (8) or (10) that describe both the attrac-
tion at short distances and repulsion at long distances. However, for highly charged
molecules, such as nucleic acids, one could also use the Coulomb electrostatic poten-
tial to describe the repulsive–only potential, with or without shielding (see Fig. 2c):
qi q j
VCoulomb (r ) = , (11)
4π ε0 εw r
qi q j
VShCoulomb (r ) = exp (−r/k D ) , (12)
4π ε0 εw r
where qi and q j are the charges of interacting beads, ε0 is the vacuum and εw the
solvent permittivity. The Debye length,
0.5
ε0 εw k B T
kD = , (13)
2N A e2 I
0 0
V(r) [kcal/mol]
V(r) [kcal/mol]
ε V0
σ r0 r0
r [Å] r [Å]
0
V(r) [kcal/mol]
V(r) [kcal/mol]
V3
V2
0
V1
kD 1 kD 2 r1 r2 r3 r4
r [Å] r [Å]
e. Morse with barrier potential f. Restraint potential
V0
V(r) [kcal/mol]
V(r) [kcal/mol]
ak3
0 0
V0c
r0 r1 r2 r3 r4
r [Å] r [Å]
Fig. 2 Interstrand and nonbonded potentials used in the presented CG FFs. a Lennard–Jones
potential. b Morse potential with α = 1.0 (solid line) and α = 2.0 (long–dashed line). c Coulomb
potential without screening k D = ∞ (solid line), Coulomb potential with two example Debye
lengths k D1 < k D2 (short– and long–dashed line, respectively). d Discrete potential taken from
the model of Ding et al. [29, 30]. e Morse potential with a barrier used in Trovato et al. [135]:
Morse potential (solid line), switch function (short–dashed line), final potential (long–dashed line).
f Restraint potential from the model of Malhotra et al. [22, 78, 79]
128 F. Leonarski and J. Trylska
The first example that we describe of a three–bead per nucleotide model is the one
of Knotts et al. [56] designed for DNA. In this model the beads that mimic the sugar
and phosphate are placed at the centers of mass of these groups. The adenine and
guanine base beads are placed in the position of their N1 atoms and the thymine
and cytosine beads in the position of their N3 atoms (see Fig. 5a). The authors argue
that representing the DNA backbone with two beads is necessary to properly model
the deformation of grooves which are important for protein–DNA interactions. The
choice of a three–bead representation also helps in later transformation from a CG
representation to a full–atomistic one. The intrastrand part of the potential energy
function contains one additional term, Vstack , in comparison with Eq. 3:
Fig. 4 Left: RNA hairpin loop (PDB:1ATO [58]); Right: yeast phenylalanine tRNA
(PDB:6TNA [131]): a full–atomistic representation b three–bead per nucleotide representation
as in the work of Ding et al. [30]. c one–bead per nucleotide as in the work of Jonikas et al. [53].
For the RNA hairpin loop (left) we show the bead placement with non–breakable bonds and for
tRNA (right) we show only the bead placement
130 F. Leonarski and J. Trylska
Fig. 5 Guanine—cytosine
nucleotide pair represented
in different CG
representations: a three–bead
model as in Knotts et al. [56],
a similar model is described
in the work of Ding et
al. [30], however the base
atom is placed in the center
of the 6–member nucleotide
ring, b a two–bead model
with pseudo–atoms placed
on the backbone and base as
in Drukker et al. [32], c one
bead centered on the
phosphorus atom as in
Trovato et al. [135] and by
Trylska et al. [136, 143], d
one bead placed in the
nucleotide geometric center
as in Savalyev et al. [122], e
one bead centered on the C3
atom as in Jonikas et al. [53],
f one bead placed on the
phosphorus atom and a
special “dummy” bead in the
middle of a complementary
pair as Malhotra et al. [22,
78, 79]
Table 1 Comparison of features of CG FFs presented in this chapter
Knotts et Ding et al. Hyeon et Ouldridge Drukker et Trovato et Savalyev Jonikas et Trylska et Malhotra
al. [25, 37, [30] al. [48] et al. [100, al. [32] al. [135] et al. [122, al. [53] al. [136, et al. [22,
56, 109] 102, 104] 123] 143] 78, 79]
Number of beads/nt 3 3 3 3 2 1 1 1 1 1.5
Nucleic acid
DNA
RNA
Applicability
Long timescale dynamics
Tertiary structure prediction
Temperature-dependent denaturation
Force–pulling denaturation
Supercoiling
Modeling Nucleic Acids at the Residue–Level Resolution
The pseudo–bond Vbond and pseudo–angle Vangle potentials are implemented using
harmonic potentials (see (4) and (5) and Fig. 1a, b). The pseudo–dihedral potential
Vdihedral is implemented using a cosine potential (see (7) and Fig. 1c). The Vstack term
is modeled with the Lennard–Jones potential (see (8) and Fig. 2b).
The first three terms in (14) are standard but Vstack is an additional Go–type
potential introduced to account for the stacking interactions [46]. This interaction is
modeled only between the base beads that belong to one strand and in the reference
(“native”) structure are positioned within a 9 Å cut–off distance. Therefore, this
potential accounts for both the i:i+1 and i:i+2 interaction.
In the interstrand term the complementary base pairs are connected using the
Lennard-Jones like potential (see Fig. 2a), but with the 12–10 powers instead of
12–6 as in (8):
10
σi j 12 σi j
Vinterstrand (ri j ) = 4εbpi j 5 −6 , (15)
ri j ri j
where the summation is over all G-C and A-T base pairs that are not already consid-
ered in Vstack .
The nonbonded potential in the original paper [56] is composed of an excluded vol-
ume term Vex , implemented using the Lennard–Jones potential (see (8) and Fig. 2a)
and a shielded electrostatic term VShCoulomb (see (12) and Fig. 2c):
where the Vex term is only calculated when the ri j distance between beads is smaller
than a predefined cut–off. The VShCoulomb defines the electrostatic repulsion of only
phosphorus atoms (with the charges qi = q j = −1).
This model was parameterized in an iterative way. The first guess of parameters
was taken from the geometry of an ideal B–DNA helix. Second, a 14 base–pair DNA
duplex was simulated with the CG model using replica-exchange MD [133]. Eight
replicas (or system copies) were simulated in parallel and assigned temperatures in
the range 260–400 K. Temperatures were swapped between two replicas with a prob-
ability related to their potential energy difference. Each replica was equilibrated and
10 ns production runs were performed. The advantage of replica-exchange MD over
constant-temperature MD was that it allowed the authors to determine the melting
curves of the duplex and provided distance distributions in eight different temper-
atures. Also, the effect of parameters on the potential of mean force with varying
temperature was analyzed using a weighted histogram analysis method [59] and the
parameters were improved for the next iteration step.
Next, to validate the model, the obtained FF parameter set was evaluated by per-
forming CG replica–exchange MD simulations and comparing them with the DNA
thermal denaturation experiments. In the simulation the melting and the formation of
the denaturation bubble were observed in accord with the reference data for varying
salt concentrations. Knotts et al. [56] show that with their FF they were able to predict
Modeling Nucleic Acids at the Residue–Level Resolution 133
the melting temperatures of three DNA duplexes with an error lower than 5%. To
validate the mechanical properties of the model, a CG traditional MD was performed
at 300 K. The persistence length for four different fragments of λ-phage plasmids
(one of them was 1489 base pairs and 0.5 µm long) was calculated. Their model
overestimated the persistence length by 2.3 but the authors claim that this is much
less than in other CG models. Based on their parameterization Knotts et al. suggest
that the dihedral force constant (kφ ), potential energy well depths for base–pairing
(εbpi j ), stacking and excluded volume (E ex ), are the most important parameters to
tune.
The presented model was further improved. Sambriski et al. [121] added entropic
effects to the potential energy to allow for rehybridization of the DNA strands, as
the original model of Knotts et al. [56] was unable to model strands’ renaturation.
DeMille et al. [26] added explicit solvation with water as well as monovalent ions.
This modification provides a good cylindrical distribution of ions around DNA but
it over–estimates the DNA melting temperatures. Next, Freeman et al. [37] added to
the model terms for the interactions of DNA with both mono– and di–valent ions.
This model is one of the most comprehensive CG FFs from the ones presented in
this chapter. It can be used to estimate both DNA melting curves and DNA mechan-
ical properties. The subsequent modifications of this model add better treatment
of solvation and electrostatics. Nevertheless, there is still room for improvement,
especially to correct for high errors of the calculated persistence lengths.
The model by Ding et al. [30] was designed to predict the tertiary structure of RNA
but may be also used to study the mechanism of RNA folding. This model is based
on discrete MD previously successfully applied to protein folding [18, 29]. In this
method, the interaction between beads is described using pairwise, discontinuous
functions (see Fig. 2d):
⎧
⎪
⎪ ∞ r < r1
⎪
⎪
⎨V1 r1 < r < r2
Vbond (r ) = V2 r2 < r < r3 , (17)
⎪
⎪
⎪
⎪ ...
⎩
∞ r > rmax
⎧
⎪
⎪ ∞ r < r1
⎪
⎪
⎨ V1 r1 < r < r2
Vnb (r ) = V2 r2 < r < r3 . (18)
⎪
⎪
⎪
⎪ ...
⎩
0 r > rmax
134 F. Leonarski and J. Trylska
there is an additional term which penalizes the bases with too many contacts in the
defined cut–off region.
The model of Ding et al. [30] was parameterized based on the thermodynamic
data from the nearest–neighbor model by Mathews et al. [81] and on distributions
calculated from known 3D RNA structures. It was next evaluated on 153 known
RNA structures of the lengths between 10 and 100 nucleotides. Their sequences
were used to create linear RNA molecules, which were simulated with the discrete
MD method [29] and their folding was analyzed. The so called Q-values, defined as
a fraction of native base pairs present in a given RNA conformation, were assessed.
The average Q-value for all the tested structures was 94%, which is 3% higher
than Mfold [153], a secondary structure prediction software (especially in the case
of pseudo–knots). 84% of RNA structures had a root mean square deviation from
the final reference structure lower than 4 Å, which is a good score. RNA folding
with this potential can be performed using the iFoldRNA web server [126]. The
performance of the model was assessed in the RNA Puzzle competition [89] in which
the participants are provided with a sequence and secondary structure of an RNA
whose crystal structure was solved but not yet released. According to the published
ranking, the pipeline involving the Dang et al. model provided the best solution for
one of the puzzles, i.e. the ydaO riboswitch structure (puzzle 12) [113].
where ΔG i (T ) are the Turner’s parameters of the nearest–neighbor model [81]. For
is an orientation term, including both i : j and i + 1 : j−1 distances and sugar and
base bead angles involving i, i + 1, j, j−1th nucleotides (according to the i:j notation
shown in Fig. 3).
The nonbonded term is described using the Lennard–Jones potential (see (8) and
native
Fig. 2a), with separate formulas Vnb for the interaction of beads forming the native
non-native
contacts (closer than 7 Å in the reference structure) and Vnb for the interactions
of non–native beads, and Debye-Huckel potential VPP for the repulsion of phosphorus
beads (see (12) and Fig. 2c):
Vnb = Vnb
native
+ Vnb
non-native
+ VPP . (20)
The purpose of this three-bead model designed by Ouldridge and coworkers was to
simulate the dynamics of DNA nanodevices [100, 102, 104]. The interactions in such
DNA nanostructures are based on selective binding of complementary nucleotide
pairs. DNA strands can be designed to form two dimensional lattices [80], poly-
hedra [40, 125] or other regular structures [116]. There are also DNA structures
in which the complementary hydrogen bonds are dynamically formed and broken.
Overall, one can design a set of interacting DNA strands with a particular purpose
in mind. A cycle based on single– to double–stranded DNA and reverse transitions
may be used to create DNA tweezers [150] or DNA walkers that perform a direc-
tional movement on a DNA track [5, 42, 99]. To simulate such devices a CG model
needs to correctly predict the complementary bond breaking and forming events.
To satisfy this crucial requirement Ouldridge and coworkers have chosen a top–
down methodology. In contrast to other models presented in this chapter, which are
designed by mapping the full–atomistic structure on a CG set of positions, this model
was designed in order to fit with the DNA hybridization and thermodynamic data. It
might appear strange that the model ignores such basic measures as different sizes
of DNA grooves. Its efficiency, however, is measured by the correspondence with
hybridization enthalpies and entropies. And as long as there is an agreement between
thermodynamic predictions and the 3D model, the model is considered acceptable
for a particular task it was designed for.
In this FF a nucleotide is modeled as three collinear beads (see Fig. 6). A sin-
gle bead mimics the position of the backbone and two beads represent a base—the
first one is responsible for stacking and the second one is responsible for hydrogen–
bonding and excluded volume interactions.2 The distances between the backbone
bead and base sites and between two consecutive backbone sites were chosen to be
consistent with the geometry of the B–DNA helix. Since these three beads are always
collinear and their distances are kept constant, based on the number of degrees of
freedom we classify this model as a two bead one. The top–down methodology pre-
cludes direct transformation of a full–atomistic structure into the CG representation.
However, such relationship is unnecessary because the model was not designed to
reproduce the results from more detailed methods. The (re)mapping is not required
for applying this CG model as long as one is interested solely in the dynamics of
DNA hybridization. Here, the fidelity to the 3D structure is rather substituted with an
adherence to the 2D hydrogen bond topology. Such bonding network may be created
by a user or taken from a cadnano program [31], which facilitates the design of DNA
Origami.
Presented CG FF is consistent with a general form presented in (2). The potential
might be used in both Langevin MD and Virtual Move MC simulation methods [146]
(variant of MC simulation by Whitelam et al. to model system dynamics in time).
For efficient simulation in the latter one all interactions have to be pairwise, so the
authors included in the model only interactions between two nucleotides (treated as
rigid bodies),
The intrastrand interactions are modeled using three terms:
where the Vbond term, responsible for the interaction of two backbone beads, uses a
finitely extensible nonlinear elastic spring:
ε (r − r0 )2
Vbond = − ln 1 − , (22)
2 Δ2
2 There is an earlier version of the model [104] in a four collinear beads variant, with separate beads
Backbone−base vector
Base normal
δrstack
δrHB
0.74 units
0.80 units
θ5
θ1
θ3
θ2 θ6
θ4
θ8 δrbackbone
θ7
δrbase−back
δ rback−base
δ rbase
Fig. 7 Topology of interactions presented in the Ouldridge et al. model. The upper part presents the
stacking and non–bonded interactions. Middle left, middle right, and bottom left pictures show the
angles that modulate the hydrogen bonding and stacking terms. The bottom right figure shows the
topology of the excluded volume terms (Figure was taken from Ref. [100] and used with permission)
The two-bead per nucleotide model by Drukker et al. [32, 33] was designed to
describe thermal denaturation of DNA. The model was applied in nanomaterial sci-
ence and used to model DNA translocation in nanopores [110] and carbon nan-
otubes [152]. The CG beads are placed in the geometrical center of a backbone
group (sugar and phosphate) and a base (see Fig. 5b).
Modeling Nucleic Acids at the Residue–Level Resolution 141
This CG FF uses the standard intrastrand potential scheme (4), where the pseudo–
bond Vbond , pseudo–angle Vangle and pseudo–dihedral Vdihedral potentials are harmonic
(see (4), (5), (6) and Fig. 1). In addition, the i:i+2 bonds between the backbone beads
are added in the intrastrand potential to account for stabilization of the backbone
helical conformation since the Vangle and Vdihedral were insufficient.
In the interstrand potential, this model accounts for the chemical details of hydro-
gen bonding. The A–T pair is connected by two and the C–G pair by three bonds.
Each base can serve as a donor and an acceptor of a hydrogen bond. A and T are both
an acceptor and a donor of one bond. G is a donor of two bonds and an acceptor of
one. C is a donor of one and an acceptor of two. To assure that interstrand interactions
are considered only between the correctly oriented beads, a θijHB angle is introduced.
This is an angle between a donor backbone, donor base and acceptor base beads.
1
VH 2 (r ) = (tanh[λ(r − r2 )] − 1) , (25)
4
1
(cos(γ θiHj B ) + 1) θmin < θiHj B < θmax
f (θiHj B ) = 2 . (26)
0 otherwise
There are three parts of the potential: VMorse is a Morse potential (see (10) and Fig. 2b)
that stabilizes a bond between two complementary residues. VH 2 mimics the solvent
effects, which stabilize the denaturated state, and is a switch function (see Fig. 2e
for an example of a switch function), with the λ parameter controlling the steepness
at the switching distance r2 . Function f (θi j ) describes the effect of the θijHB on the
total potential (only if θijHB is in the range θmin – θmax ). The intrastrand potential
can be applied between any two complementary bases so it is not dependent on
the inputted secondary structure. The nonbonded potential is implemented using the
Lennard–Jones potential (see (8) and Fig. 2a).
This FF was used in 75 ns-long MD simulations and correctly predicted the
melting temperatures of 10 base-pair DNA duplexes containing either A–T or C–G
pairs [32]. For the A–T and G–C duplexes, the calculated melting temperature error
was on average 6.5 and 18.5 K, respectively. The model was also shown to give correct
melting temperatures of these duplexes containing single mismatches. Introducing a
single G–G mismatch to the G–C duplex decreased the melting temperature by 21 K.
Such decrease is consistent with the predictions from the thermodynamic models but
any quantitative conclusions cannot be made because the thermodynamic models
give temperature shifts from 12 to 38 K.
This two–bead FF is useful in the simulations in which the complementary bonds
need to be broken such as in DNA melting. With less interaction sites it gives a
higher efficiency than three–bead models [56]. This CG FF does not depend on a
provided secondary structure as in one–bead models [135]. A two–bead model is the
minimal one to be able to introduce base–base orientation terms, as in (24), and this
is a necessary condition to determine the presence of a hydrogen bond.
142 F. Leonarski and J. Trylska
Trovato and Tozzini [135] designed a one–bead model for MD simulations of a linear
and circular DNA duplexes and parameterized it to account also for the temperature
effects. The nucleotide bead is placed in the position of a phosphorus atom (see
Figs. 3 and 5c). This model was also modified for RNA helices using an automatic
parameterization method based on evolutionary algorithm [63, 64].
The sum of standard terms as in (3) forms the intrastrand potential. The pseudo–
bond Vbond , pseudo–angle Vangle , and pseudo–dihedral Vdihedral potentials have the
harmonic functional form (see (4), (5), (6) and Fig. 1).
The interstrand potential is added based on the information about the secondary
structure. This potential has a specific topology (see Fig. 8). For a complementary pair
i:j the following pseudo–bonds are created: i:j, i:j+1, i+1:j+1. The term is composed
of a Morse with a barrier function (for the graph of the potential function see Fig. 2e):
Fig. 8 One–bead representation of DNA. The interstrand interaction topology for a single com-
plementary pair is shown according to the model of Trovato and Tozzini [135]
Modeling Nucleic Acids at the Residue–Level Resolution 143
i: j i: j
Vi: j (r ) = V0 ([1 − exp(−αi: j (rkl − r0 ))]2 − ci: j )swi: j (ri: j ) , (28)
1 i: j i: j
swi: j (r ) = V [1 − tanh(λi: j (r − r1 ))] , (29)
2 1
i: j i: j
where V0 , r0 and αi: j control the shape of the original Morse potential, c affects
the energy difference between the energy minimum and unbound state, λi: j controls
i: j
the slope of the switch function, V1 controls the switch function energy difference,
i: j
and r1 the position of the switch. Equations (28) and (29) are identical for i:j+1 and
i+1:j+1. Even though the formula seems complicated it is advantageous; the Morse
function (28) enables accounting for the breaking of hydrogen bonds and the switch
function (29) adds a barrier for long–range electrostatic repulsion.
A similar formula is used for the nonbonded potential:
1 nb
swqnb (r ) = V [1 − tanh(λqnb (r − rqnb ))] , (31)
2 q
where the Anb parameter controls the addition of a second switch function and thus
affects the slope of the “unbound” site of the barrier. Other labels are consistent with
(28) and (29), but since two switch functions are used in (30), superscripts in (31)
denote the first and the second switch. The authors found that this formula provides
for the stabilization of DNA grooves. Since both interstrand and nonbonded potential
formulas are computationally expensive, the energy (and force) can be precomputed
for a range of distances. Next, their value at a given bead distance, which is between
two precomputed points, is interpolated. This procedure saves a lot of time in contrast
to calculating exponential and hyperbolic tangents in each simulation step for each
pair of beads connected by interstrand or nonbonded interactions.
First, the potential was parameterized based on the potential of mean force, cal-
culated from the experimentally derived 3D structures containing DNA helices. Sec-
ond, the potential was tuned to match the experimental melting temperatures. The
authors validated their CG FF by performing MD simulations of 92 base-pair DNA
nano–circles with different twist angles. The effects of the initial twist angle on the
nano–circle topology were in agreement with full–atomistic simulations [44, 60].
Next, the authors showed the results of CG MD simulations of a DNA plasmid com-
posed of 861 base pairs (approx. 0.3 µ circumference length) on a microsecond time
scale. These MD simulations show that modification of a torsional stress affects the
stability of the plasmid and allows forming a denaturation “bubble” [135].
The potential was further extended by us to RNA molecules [63–65]. For an RNA
helix we have shown that, if thermal melting of helices is not of interest, the potential
performs equally well with a harmonic potential for Vi: j (r ), Vi: j+1 (r ) and Vi+1: j+1 (r ),
while the nonbonded Vnb (r ) can be simply substituted with Coulomb electrostatics.
While such simpler potentials are less precise in describing the physics of RNA, they
144 F. Leonarski and J. Trylska
This model of Savalyev et al. [122, 123] is more a parameterization method than a
model to study the dynamics of DNA. The authors present a renormalization group
optimization method developed by Swendsen [132] and further improved by Lyubart-
sev and Laaksonen [72], to find the best parameters of a DNA one–bead FF. For
the renormalization group method, categorized as the local optimization method, the
potential energy function V has to be a linear combination of terms, V = iN ki ∗ Vi ,
with a set of linear combination parameters ki . In addition, a set of observables S j
that characterize a CG FF has to be defined. These observables must depend on the
selected ki parameters in the potential energy expansion. The aim of the optimization
is to find a set of ki that result in S j which best resemble the reference data. The
observables used by Savalyev et al. were distance distribution, with reference values
taken from full–atomistic simulations. In the parameterization procedure one cre-
ates a set of ki parameters and calculates the “susceptibility” of a certain parameter
to affect the observables. This susceptibility is expressed as a partial derivative of
an S j observable over a ki parameter. Next, these derivatives are used to calculate
the corrections to parameter sets. This method allows for an objective and effective
parameterization, however, it is only applicable to linear combination terms. This
means that if the methodology was applied to a harmonic potential ki (r − r0 )2 , it
could find an optimal value of the ki force constant but not the equilibrium distance
r0 .
To show the applicability of the renormalization group optimization Savalyev et
al. [122] validate it on a one-bead CG FF of DNA. In the model the pseudo–atoms
are placed in the geometrical center of a nucleotide (see Fig. 5d). The FF uses only
pseudo–bond and pseudo–angle terms omitting the pseudo–dihedral term. These
terms are a sum of the harmonic, cubic and quartic terms to include the anharmonicity
of bonds (see Fig. 1a):
Vintrastrand = Vbond + Vangle , (32)
Vbond (r ) = kr 2 (r − r0 )2 + kr 3 (r − r0 )3 + kr 4 (r − r0 )4 , (33)
The interstrand terms are implemented using the so called “fan” interactions.
The name originates from their topology (see Fig. 9) because they explicitly connect
a nucleotide bead with eleven beads on the opposite strand. Fan interactions are
Modeling Nucleic Acids at the Residue–Level Resolution 145
Fig. 9 A DNA helix in a one–bead representation with the beads placed in the geometrical center
of each base. The cartoon representation in the background shows the positions of the phosphate
backbone (ribbon) and bases. The bonds between the beads represent the “fan” interactions, as
defined in the Savelyev et. al [122] model. These interactions connect the nucleotide corresponding
to bead i with eleven nucleotide beads from j−5 to j+5 on the complementary strand
thus i:j−5 to i:j+5 interactions in the previously introduced notation (see Fig. 3).
These interactions are implemented in the same way as Vbond interactions, i.e., as a
combination of harmonic, cubic and quartic terms (see Fig. 1a)
Vfan = (k2 (ri: j+m − r0 )2 + k3 (ri: j+m − r0 )3 + k4 (ri: j+m − r0 )4 ) . (35)
−6<m<6
For both the intra– and interstrand potentials, the CG equilibrium distances and
angles, as well as the starting values of force constants, are found by matching with
the reference distance distributions. The reference distributions are obtained from a
60 ns full–atomistic MD simulations of a 16 base-pair DNA duplex performed with
the AMBER parmbsc0 FF [107]. The CG force constants are optimized based on
the difference of distance distribution from 20 ns CG MD simulations (preceded by
a 5 ns heating and 10 ns equilibration) and from the reference full–atomistic MD
simulation.
In their first paper Savalyev et al. [122] model the nonbonded interactions using
the following equation to match the potential of mean force:
In the next publication [123] of this group, another expression for this potential was
found, which better describes the interaction of DNA beads with ions:
A 3
qi q j
Vel (r ) = 12 + Bk exp(−Ck (r − Rk )) + , (37)
r k=1
4π ε0 εw r
The Nucleic Acid Simulation Tool (NAST) by Jonikas et al. [53] was designed to
generate candidate tertiary structures of RNAs in order to solve the RNA structure
prediction problem. This is the only FF presented in this chapter that is not intended to
analyze the internal motions of nucleic acids. MD is only used as a means of sampling
the conformational space. The generated trajectories do not serve to understand the
RNA folding process. Only the final 3D RNA structure is of value. Also, NAST was
not designed to perform the tertiary structure prediction from scratch, like the model
by Ding et al. [30]. In advance, one has to provide the RNA secondary structure (from
a secondary structure prediction software [82]) and information about a few tertiary
contacts (from chemical or spectroscopic methods [38, 88, 120, 138]). Even though
NAST imposes these a priori requirements, it is useful. For example, for RNAs which
are difficult to crystalize but were preliminary studied with some chemical methods,
NAST can quickly perform a wide search of possible conformations and generate
multiple candidate structures for further studies. The quality of these structures is
Modeling Nucleic Acids at the Residue–Level Resolution 147
assessed by comparing them with pairwise distance distributions from small angle
X–ray scattering experiments, solvent accessibility data, and the NAST energy tool.
The RNA structures that score best can be later tested with full–atomistic models.
In this model RNA is represented using a single bead, centered on the C3’ atom
of the sugar group (see Figs. 4c and 5e). The total potential energy of RNA is a sum
of four terms:
Vtotal = Vintrastrand + Vinterstrand + Vtertiary + Vnb (38)
where Vintrastrand , Vinterstrand and Vnb are consistent with (2) and Vtertiary is a restraint
for tertiary RNA contacts.
The intrastrand potential is a sum of three terms previously shown in (3) with the
pseudo–bond Vbond and pseudo–angle Vangle potentials using the harmonic functions
(see (4), (5) and Fig. 1a, b) but the Vangle term uses two different force constants, kθ ,
for single– and double–stranded regions. The pseudo–dihedral potential Vdi hedral is
implemented using a cosine potential (see (7) and Fig. 1c).
The interstrand potential is composed in a similar manner as the intrastrand one,
of bonded (Vbond , (4)), angle (Vangle , (5)) and dihedral (Vdihedral , (7)) terms. The
pseudo–bonds connect only complementary base pairs, the pseudo–angle is between
a complementary bond and the next neighbor on the first strand, and pseudo–dihedrals
are the following: j−1:j:i:i+1 and j+1:i−1:i:i+1 (for details of the i, j notation see
Fig. 3).
The nonbonded interactions, Vnb , are implemented using a repulsive–only poten-
tial σ 12
Vrep (r ) = 4V0 . (39)
r
The user-supplied tertiary interactions (Vtertiary ) are implemented as pseudo–bonds
with the assumption that the nucleotides are close to each other in the final structure
(see (4)). If a particular tertiary contact is uncertain, the authors recommend using a
much smaller force constant for that contact.
NAST is a knowledge–based model. It was parameterized by fitting the presented
potential functions to the Boltzmann inversion of distance distributions from three
high resolution ribosome structures. The authors test 3D structure predictions for
tRNA and a P4–P6 medium–sized RNA with the root mean square deviations from
the crystal structures equal to 8 and 16.3 Å, respectively. Another measure, the GDS–
TS score (the average percentage of residues that are within 1, 2, 4 and 8 Å of their
reference position), was equal to 0.2 and 0.06. These numbers are larger than the ones
obtained by Ding et al. [30], where the root mean square deviation was on average
less than 4 Å. This difference is accounted to the smaller number of details that can
be captured in a one–bead NAST FF in comparison with the three–bead FF of Ding
et al. [30].
NAST relies on the provided secondary structure and tertiary contacts. The out-
putted 3D prediction is the best one that reflects the applied constraints. In a more
complex model of Ding et al. [30], in which the mutual base orientation is present,
the tertiary interactions can be predicted. However, the predictions of the NAST
148 F. Leonarski and J. Trylska
one–bead model are useful because they still give a lower deviation from the crys-
tal structures than a random structure of the same sequence and a similar radius of
gyration. This model could be useful e.g., to rebuild missing loops.
NAST also provides a supplementary tool, C2A, to rebuild a full–atomistic struc-
ture from a CG model [52]. The remapping is performed by finding short fragments
of a matching sequence in the 3D structure of the ribosome and then building and
optimizing the user’s RNA structure. However, remapping is as good as the given CG
model. Two measures were used to asses the quality of remapping—root mean square
deviations and interaction network fidelity (INF) score, which measures the number
of correctly found base pairs and stacking [105]. If a tRNA [131] PDB structure
is reduced to a CG representation and then used as a template for C2A rebuilding
the root mean square deviation of 2.81Å and the INF score of 69% are achieved.
However, if the best NAST-predicted tRNA structure is used, the root mean square
deviation becomes 8.30Å and the INF score drops to 46% (35% if only base pairing
is taken into account in the INF score).
Contrary to the previous NAST model, this one was designed for large RNAs and
ribonucleoprotein particles, namely the 30 S ribosomal subunit which contains over
1500 nucleotide long RNA chain (16S RNA). The model is based on previous ones
for supercoiled DNA [134] and ribosomal RNA [22, 78, 79] and can be classified as
either one bead or one and a half bead per residue model. The residues are represented
as beads centered on a P atom (for nucleic acids) or Cα atom (for proteins). In order
to achieve the correct helical conformation, additional space–filling dummy beads
(“X-atoms”) are placed in the geometrical center of complementary base pairs (apart
from the last base pair in the helix, see Figs. 5f and 10).
The intrastrand potential is a sum of standard terms shown in Eq. 3 with Vbond ,
Vangle , and Vdi hedral calculated with harmonic functions ((4), (5), (6), and Fig. 1a,b,c).
The pseudo–bond and pseudo–angle force constants are higher in helical regions than
in non–helical ones.
The interstrand potential is also composed of pseudo–bonded, pseudo–angle and
pseudo–dihedral terms with the same definitions of potentials as in (4), (5) and
(6). Here, the pseudo–bonds connect the nucleotide beads with the corresponding
X-atoms and complementary bases (i:j+1 and i:j−1 configurations). The pseudo–
angle interactions are present in the P–X–P configuration along a complementary
bond. There is also a dihedral angle connecting i−1:i:j:j+1 (see Fig. 10). Proteins are
modeled as simple elastic networks where the harmonic pseudo–bonds are created
between all protein beads that are closer than 8 Å to each other.
The nonbonded potential consists of restraints on the helix–helix and protein–
RNA distances (Vrest ) and a volume exclusion term (Vexcl ) among P-, X- and C-atoms
Fig. 10 Topology of interstrand bonds in the CG model of Malhotra et al. [22, 78, 79] showing the
placement of beads on the phosphorus P atoms. The central nucleotide pair is represented with two
pseudo–atoms: in the position of the P atom and the dummy X atom in the middle. Two neighboring
base pairs are presented only using P atoms. There are 6 pseudo–bonds associated with this i:j pair
(two P–X in the middle and four P–P bonds) and a pseudo–angle P–X–P
⎧
⎪
⎪ k2 (r − r2 )2 r < r2
⎪
⎨ 0 r2 < r < r3
Vrestr (r ) = k (r − r ) 2
3 r3 < r < r4 , (41)
⎪
⎪
3
⎪
⎩k 3 b + 1
r −r3
r > r4
freedom of movement between the r2 and r3 distances (which are independently set
for each type of atom pairs), however, the movement is penalized if going outside
of this range (see Fig. 2f). Therefore, this restraint term generates a bias toward a
starting structure. The space exclusion term, Vexcl , prohibits two nucleotide beads
from getting closer to each other than d0 .
This is also a knowledge–based potential. The crucial parameters for the model,
i.e. the parameters of protein–RNA distance restraints are taken from the high reso-
lution Thermus thermophilus 30S ribosome subunit structure. Other parameters are
taken from the lower resolution ribosome models and/or older models [78]. The force
constants k2 and k3 are optimized to maintain the crystal structure of the 30 S subunit
at room temperature, while allowing for flexibility of the free 16 S ribosomal RNA.
This model was designed and applied to study the assembly of proteins to 16S
RNA of the small ribosomal subunit. Stagg et al. [130] explored one of the assembly
paths using the MC simulated annealing technique. The starting model of 16S RNA
contained only the information on its secondary structure. The restraints of (41)
guided the ribosomal proteins from the initial random positions to their appropriate
binding sites on 16S RNA. The authors examined the changes in the fluctuations of
16S RNA upon binding of proteins and predicted the contributions of each protein to
the organization of its binding site. Cui et al. [22] also used this model to investigate
the assembly of ribosomal proteins but applied MD simulations and additionally
studied the flexibility of 16S RNA during adding the proteins at various orders. The
experimental assembly paths were reproduced even with such a simple CG model.
ij ij
The strength of this potential is adjusted by the A P,Cα (r0 ) = a exp(−r0 /b) function.
The constants a and b are based on the interacting bead types (different for P and
Cα ). For local short-range interactions (within a predefined cut–off of 12 Å for Cα
ij
and 20 Å for P pairs), the r0 equilibrium values are taken from the starting structure.
For all the other long-range nonbonded interactions beyond the short-range cut–off
ij
(but within a certain limit), r0 assumes three different values for P–P, Cα –Cα and
Cα –P pairs and does not depend on the starting conformation. Therefore, the model is
only locally biased toward the starting structure even though breaking of short-range
nonbonded contacts is also possible.
Overall, the model is an extension of an elastic network model but since the
nonbonded interactions are represented with the Morse potential it allows for larger
fluctuations from the initial conformation than the harmonic potential. The model
was parameterized based on the Boltzmann inversion procedure with the distribution
functions taken from a single ribosome structure so it is not immediately transferable
to other systems. This CG FF was used to perform half a microsecond MD simulations
of the ribosome and determine global collective motions of the ribosome fragments,
as well as their correlations. The movement of the distant ribosomal stalks, positioned
at the opposite sides of the tRNA path, appeared to be coupled with the ratchet-like
motion of the subunits.
where V1−n terms are implemented using a harmonic potential (see (4) and Fig. 1a).
For the α–helical regions of the proteins, all terms in (44) are included. However,
in unstructured regions or loops only V1−2 and V1−3 are included, whereas V1−4 and
V1−5 are modeled as nonbonded interactions. For DNA beads, V1−5 is not required.
The model was parameterized with the Boltzmann inversion procedure based
on short 50 ns full-atomistic MD simulations of the nucleosome [143]. Next, it was
applied to perform multiple 10 microsecond scale MD simulations of the nucleosome
complex [142]. In these simulations a biologically relevant partial unwrapping of the
DNA from the nucleosome core was observed. Further remapping to all-atom model
provided a better insight into the interactions that are formed by histone tails after
the DNA detachment from the nucleosome core. One of the histone tails (H3) was
152 F. Leonarski and J. Trylska
seen to stabilize the nucleosome in the open state by interacting with the nucleosome
core. The removal of this H3 tail in the simulations precluded the formation of such
a long-lived detachment of the DNA terminal segment from the nucleosome protein
core. This suggests an active role of this tail not only in the detachment of the DNA
end from the nucleosome core but also in preventing the nucleosomal DNA from
rewrapping.
15 Conclusions
Residue resolution FFs may be applied to solve various kinds of problems in the
nucleic acid field, ranging from RNA structure prediction to global motions of large
ribonucleoprotein complexes. We have described a limited set of CG FFs, with the
number of beads ranging from one to three per nucleotide. Even in this bead range the
design and applicability of the FFs differ. In one bead models the interaction network
is based on an externally supplied secondary structure or native contacts from a
reference structure. Adding a second bead allows for the secondary structure to be
dynamically modified because the orientation of an interstrand bond with regards to
the backbone can be measured. Overall, increasing the number of beads corresponds
to removing the bias from the system. On the other hand, if one accepts the limitations
of one–bead models, problems on much larger spatial and temporal scales may be
investigated. For example, the Jonikas et al. [53] one-bead model was easily applied
to a 158–nucleotide structure but the three-bead Ding et al. [30] model only to RNA
chains shorter than 100 nucleotides. Also, the CG FFs used for large macromolecular
complexes, such as the nucleosome or ribosome, are one–bead FFs.
There are two other crucial things to consider when choosing one- to three-bead
models. First, with one bead models it is problematic to achieve a correct helical
twist. Creating bonds only between complementary pairs, which is easily applied in
two- or three-bead models, is not sufficient to keep the helicity in one-bead models.
The remedy is to create dummy atoms in the middle of a helix (as in the model
of Cui et al. [22]), provide multiple pseudo–bonds per single complementary pair
[122, 135] or use multi–body terms – angle and dihedral over the interstrand bonds
[53]. Such tricks were not required, in the model of Trylska et al. [136] because to
stabilize the helical structure the equilibrium distances were taken from the native
structure. Adding the terms that ensure the correct helicity may give reasonable
dynamics but requires higher computational time. Second thing to consider is that
neither of one-bead models applies interaction terms that are nucleotide-specific.3
Even if such interactions were implemented, they would be inefficient since there
is no information about the relative orientation of bases. The two- and three-bead
models easily incorporate the base specificity.
3 Someof one–bead models, e.g., Trovato et al. [135], assign a mass consistent with the base type
in MD simulations but it has a limited effect on the interactions.
Modeling Nucleic Acids at the Residue–Level Resolution 153
There are also residue-resolution nucleic acid models with more than three beads
per nucleotide, so one may ask if it is worth going beyond the FFs presented in this
chapter. The four- or more bead per nucleotide models include more details such
as base dipole moments [75] or non–canonical hydrogen bonding schemes [106].
Niewieczerzał et al. [94] compared three CG models with different number of beads
per nucleotide: two, three [56], and four/five (depending on the nucleotide type). All
three models were applied to a problem of mechanical stretching and twisting of the
DNA duplexes. The authors showed that the number of beads does not affect the
mechanical properties of DNA at low and moderate temperatures, but may become
an issue at room temperature.
When comparing the three-bead CG models we also have to consider their appli-
cability to other tasks than the ones they were designed for. Typically, their target is
narrow and CG FFs are not transferable to other problems or systems. For example,
the Hyeon et al. [48] potential was created to answer a specific question about a par-
ticular RNA hairpin. A desirable CG FF would be the one that could be easily applied
to different sets of problems, i.e. a FF with a clear parameterization procedure and
universal formulation of the potential energy function. A good example is the model
of Knotts et al. [56] since this model can be easily implemented and modified. This
task would be more difficult with the model of Ding et al. [30]. Despite promising
results for the RNA structure prediction, its applicability and possibilities for mod-
ifications are limited because its formulation using a non–standard engine, discrete
MD, makes this potential much harder to re-implement. There are multiple codes
available to provide classical MD or MC procedures, and to use the model of Ding
et al. [30] one would have to rely on the authors’ or own in-house made code. The
model of Hyeon et al. [48] was tuned for a particular molecule, however, the authors
show the parameterization so it should be possible to re-implement the model for a
different task. Another good example of an extendable model is the one designed
by Trylska et al. [136]. It was originally created and parameterized for a particular
complex—the ribosome. However, there are other studies that applied this model for
a large system involving long chains of DNA, not RNA—the nucleosome [142, 143].
The model is also implemented in a freely available software RedMD [41] (http://
bionano.cent.uw.edu.pl/Software).
The transferability of the present CG models is insufficient and new models will
certainly be needed for particular applications. However, future efforts have to be
also put to solve methodological problems. Just to mention two of such problems:
the definition of the reference state in the Boltzmann inversion procedure and gener-
alization of simulation results obtained for isolated, small systems to larger volumes.
In the first problem we go back to (1), where a function d0 (r ) has been introduced as
a reference state. The FF parameters depend on this function and its choice is often
arbitrary. The second problem, mentioned by Ouldridge et al. [100, 103], refers to
the fact that typically CG simulations are performed in small volumes with only a
single set of interacting molecules. The process of single DNA duplex formation
may give different melting temperatures than when using many duplexes in a larger
volume. The solutions to extrapolate the results of a small-size simulation to a larger
one have been proposed [100, 103].
154 F. Leonarski and J. Trylska
There is still room for improvement in the field of low–resolution nucleic acid
models. For example, creating an unbiased CG model of the ribosome is still an
open problem and it would provide better insight into the mechanics of this system
in comparison with the model based on the concept of native contacts. There is
also a need to create more formal protocols for the parameterization of CG FFs
and assessment of the quality of parameters. Unfortunately, most authors are vague
about the parameterization details. In some parameterizations there is no account
of how well the chosen potential was fitted to experimental data (by means of for
example the R2 regression parameter). The correctness of the model is proven only
by simulations of selected test cases but more details on the parameterization would
give better confidence in these models. Another issue is that most authors do not give
hard evidence why a certain potential energy functional term was used. Test cases
that would justify the use of a particular potential form would be of great value.
A good remedy for the parameterization problems might be the use of automated
procedures to derive the parameters, like the one mentioned by Savelyev et al. [122]
using renormalization group approach or developed by us [63–65] implementing the
evolutionary algorithm and particle swarm optimization.
Acknowledgements The authors acknowledge support from the Interdisciplinary Centre for
Mathematical and Computational Modelling, University of Warsaw (G31-4, GA65-16, GA65-
17, GB65-28 to JT), National Science Centre, Poland (2011/03/N/NZ2/02482 to FL, DEC-
2014/12/W/ST5/00589 Symfonia to JT, 2016/23/B/NZ1/03198 Opus to JT).
References
1. Adams, P.L., Stahley, M.R., Kosek, A.B., Wang, J., Strobel, S.A.: Crystal structure of a self-
splicing group I intron with both exons. Nature 430, 45–50 (2004)
2. Al-Hashimi, H.M., Walter, N.G.: RNA dynamics: it is about time. Curr. Opin. Struct. Biol. 18,
321–329 (2008)
3. Allison, S.A., McCammon, J.A.: Multistep Brownian dynamics: application to short wormlike
chains. Biopolymers 23, 363–375 (1984)
4. Arya, G., Zhang, Q., Schlick, T.: Flexible histone tails in a new mesoscopic oligonucleosome
model. Biophys. J. 91, 133–150 (2006)
5. Bath, J., Green, S.J., Allen, K.E., Turberfield, A.J.: Mechanism for a directional, processive,
and reversible DNA motor. Small 5, 1513–1516 (2009)
6. Berg, J.M., Tymoczko, J.L., Stryer, L.: Biochemistry, 7th edn. Freeman, W. H (2010)
7. Berman, H.M., Olson, W.K., Beveridge, D.L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh,
S.H., Srinivasan, A.R., Schneider, B.: The nucleic acid database. A comprehensive relational
database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759 (1992)
8. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F.: J., Brice, M.D., Rodgers, J.R.,
Kennard, O., Shimanouchi, T., Tasumi, M.: The protein data bank: A computer-based archival
file for macromolecular structures. Arch. Biochem. Biophys. 185, 584–591 (1978)
9. Biyun, S., Cho, S.S., Thirumalai, D.: Folding of human telomerase RNA pseudoknot using
ion-jump and temperature-quench simulations. J. Am. Chem. Soc. 133, 20634–20643 (2011)
10. Bloomfield, V.A., Crothers, D.M., Tinoco, I.J.: Nucleic acids : structures, properties and func-
tions, 1st edn. University Science Books (2000)
Modeling Nucleic Acids at the Residue–Level Resolution 155
11. Boniecki, M.J., Lach, G., Dawson, W.K., Tomala, K., Lukasz, P., Soltysinski, T., Rother, K.M.,
Bujnicki, J.M.: SimRNA: a coarse-grained method for RNA folding simulations and 3D struc-
ture prediction. Nucleic Acids Res. 44, e63 (2016)
12. Brion, P., Westhof, E.: Hierarchy and dynamics of RNA folding. Annu. Rev. Biophys. Biomol.
Struct. 26, 113–137 (1997)
13. Brooks, B.R., Brooks III, C., MacKerell Jr., A., Nilsson, L., Petrella, R., Roux, B., Won, Y.,
Archontis, G., Bartels, C., Boresch, S., Caflisch, A., Caves, L., Cui, Q., Dinner, A., Feig, M.,
Fischer, S., Gao, J., Hodoscek, M., Im, W., Kuczera, K., Lazaridis, T., Ma, J., Ovchinnikov, V.,
Paci, E., Pastor, R., Post, C., Pu, J., Schaefer, M., Tidor, B., Venable, R.M., Woodcock, H.L.,
Wu, X., Yang, W., York, D., Karplus, M.: CHARMM: the biomolecular simulation program.
J. Comput. Chem. 30, 1545–1614 (2009)
14. Bruant, N., Flatters, D., Lavery, R., Genest, D.: From atomic to mesoscopic descriptions of the
internal dynamics of DNA. Biophys. J. 77, 2366–2376 (1999)
15. Capriotti, E., Renom, M.M.: Quantifying the relationship between sequence and three-
dimensional structure conservation in RNA. BMC Bioinformatics 11, 322 (2010)
16. Case, D.A., Cheatham, T.E., Darden, T., Gohlke, H., Luo, R., Merz, K.M., Onufriev, A., Sim-
merling, C., Wang, B., Woods, R.J.: The Amber biomolecular simulation programs. J. Comput.
Chem. 26, 1668–1688 (2005)
17. Cheatham, T.E., Young, M.A.: Molecular dynamics simulation of nucleic acids: successes,
limitations, and promise. Biopolymers 56, 232–256 (2000)
18. Chen, Y., Ding, F., Nie, H., Serohijos, A.W.: S., S., Wilcox, K., Yin, S., Dokholyan, N.V.:
Protein folding: then and now. Arch. Biochem. Biophys. 469, 4–19 (2008)
19. Cho, S.S., Pincus, D.L., Thirumalai, D.: Assembly mechanisms of RNA pseudoknots are deter-
mined by the stabilities of constituent secondary structures. Proc. Natl. Acad. Sci. USA 106,
17349–17354 (2009)
20. Choi, C.H., Kalosakas, G., Rasmussen, K.O., Hiromura, M., Bishop, A.R., Usheva, A.: DNA
dynamically directs its own transcription initiation. Nucleic Acids Res. 32, 1584–90 (2004)
21. Cieplak, M., Sułkowska, J.I.: Structure-based models of biomolecules: stretching of proteins,
dynamics of knots, hydrodynamic effects, and indentation of virus capsids. In: A. Koliński (ed.)
Multiscale approaches to protein modeling: structure prediction, dynamics, thermodynamics
and macromolecular assemblies., chap. 8, pp. 179–208. Springer (2010)
22. Cui, Q., Tan, R.K.Z., Harvey, S.C., Case, D.A.: Low-Resolution Molecular Dynamics Simu-
lations of the 30S Ribosomal Subunit. Multiscale Model. Simul. 5, 1248–1263 (2006)
23. Dans, P.D., Zeida, A., Machado, M.R., Pantano, S.: A Coarse Grained Model for Atomic-
Detailed DNA Simulations with Explicit Electrostatics. J. Chem. Theory Comp. 6, 1711–1725
(2010)
24. Dauter, Z., Wlodawer, A., Minor, W., Jaskolski, M., Rupp, B.: Avoidable errors in deposited
macromolecular structures. IUCrJ 1, 179–193 (2014)
25. DeMille, R.C., Cheatham, T.E., Molinero, V.: A coarse-grained model of DNA with explicit
solvation by water and ions. J. Phys. Chem. B 115, 132–142 (2011)
26. DeMille, R.C., Molinero, V.: Coarse-grained ions without charges: reproducing the solvation
structure of NaCl in water using short-ranged potentials. J. Chem. Phys. 131, 034,107 (2009)
27. Denesyuk, N., Thirumalai, D.: Coarse-grained model for predicting rna folding thermodynam-
ics. J. Phys. Chem. B 117, 4901–4911 (2013)
28. Denesyuk, N., Thirumalai, D.: How do metal ions direct ribozyme folding? Nat. Chem. 7,
793–801 (2015)
29. Ding, D., Dokholyan, N.V.: Simple but predictive protein models. Trends Biotechnol. 23, 450–
455 (2005)
30. Ding, F., Sharma, S., Chalasani, P., Demidov, V.V., Broude, N.E., Dokholyan, N.V.: Ab initio
RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms.
RNA 14, 1164–1173 (2008)
31. Douglas, S.M., Marblestone, A.H., Teerapittayanon, S., Vazquez, A., Church, G.M., Shih,
W.M.: Rapid prototyping of 3D DNA-origami shapes with caDNAno. Nucleic Acids Res. 37,
5001–5006 (2009)
156 F. Leonarski and J. Trylska
32. Drukker, K., Schatz, G.C.: A Model for Simulating Dynamics of DNA Denaturation. J. Phys.
Chem. B 104, 6108–6111 (2000)
33. Drukker, K., Wu, G., Schatz, G.C.: Model simulations of DNA denaturation dynamics. J. Chem.
Phys. 114, 579 (2001)
34. Flicek, P., et al.: Ensembl 2011. Nucleic Acids Res. 39, D800–6 (2011)
35. Forrey, C., Muthukumar, M.: Langevin dynamics simulations of genome packing in bacterio-
phage. Biophys. J. 91, 25–41 (2006)
36. Freddolino, P.L., Liu, F., Gruebele, M., Schulten, K.: Ten-microsecond molecular dynamics
simulation of a fast-folding WW domain. Biophys. J. 94, L75–7 (2008)
37. Freeman, G.S., Hinckley, D.M., De Pablo, J.J.: A coarse-grain three-site-per-nucleotide model
for DNA with explicit ions. J. Chem. Phys. 135, 165,104 (2011)
38. Galas, D.J., Schmitz, A.: DNAse footprinting: a simple method for the detection of protein-
DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978)
39. Go, N.: Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. 12, 183–210 (1983)
40. Goodman, R.P., Schaap, I.A.T., Tardin, C.F., Erben, C.M., Berry, R.M., Schmidt, C.F., Turber-
field, A.J.: Rapid chiral assembly of rigid DNA building blocks for molecular nanofabrication.
Science 310, 1661–1665 (2005)
41. Górecki, A., Szypowski, M., Długosz, M., Trylska, J.: RedMD – Reduced Molecular Dynamics
Package. J. Comput. Chem. 30, 2364–2373 (2009)
42. Green, S.J., Bath, J., Turberfield, A.J.: Coordinated chemomechanical cycles: A mechanism
for autonomous molecular motion. Phys. Rev. Lett. 101, 238,101 (2008)
43. Guvench, O., Brooks, C.L.: Efficient approximate all-atom solvent accessible surface area
method parameterized for folded and denatured protein conformations. J. Comput. Chem. 25,
1005–1014 (2004)
44. Harris, S.A., Laughton, C.A., Liverpool, T.B.: Mapping the phase diagram of the writhe of
DNA nanocircles using atomistic molecular dynamics simulations. Nucleic Acids Res. 36,
21–29 (2008)
45. He, Y., Maciejczyk, M., Oldziej, S., Scheraga, H.A., Liwo, A.: Mean-field interactions between
nucleic-acid-base dipoles can drive the formation of the double helix. Phys. Rev. Lett. 110,
098,101 (2013)
46. Hoang, T.X., Cieplak, M.: Molecular dynamics of folding of secondary structures in Go-type
models of proteins. J. Chem. Phys. 112, 6851 (2000)
47. Hülsmann, M., Köddermann, T., Vrabec, J., Reith, D.: GROW: A gradient-based optimization
workflow for the automated development of molecular models. Comput. Phys. Commun. 181,
499–513 (2010)
48. Hyeon, C., Thirumalai, D.: Mechanical unfolding of RNA hairpins. Proc. Natl. Acad. Sci. USA
102, 6789–6794 (2005)
49. Hyeon, C., Thirumalai, D.: Capturing the essence of folding and functions of biomolecules
using coarse-grained models. Nat. Comm. 2, 487 (2011)
50. International Human Genome Sequencing Consortium: Initial sequencing and analysis of the
human genome. Nature 409, 860–921 (2001)
51. Jian, H., Schlick, T., Vologodskii, A.: Internal motion of supercoiled DNA: brownian dynamics
simulations of site juxtaposition. J. Mol. Biol. 284, 287–296 (1998)
52. Jonikas, M.A., Radmer, R.J., Altman, R.B.: Knowledge-based instantiation of full atomic detail
into coarse-grain RNA 3D structural models. Bioinformatics 25, 3259–3266 (2009)
53. Jonikas, M.A., Radmer, R.J., Laederach, A., Das, R., Pearlman, S., Herschlag, D., Altman,
R.B.: Coarse-grained modeling of large RNA molecules with knowledge-based potentials and
structural filters. RNA 15, 189–199 (2009)
54. Kibbe, W.A.: OligoCalc: an online oligonucleotide properties calculator. Nucleic Acids Res.
35, W43–W46 (2007)
55. Klimov, D.K., Thirumalai, D.: Native topology determines force-induced unfolding pathways
in globular proteins. Proc. Natl. Acad. Sci. USA 97, 7254–7259 (2000)
56. Knotts, T.A., Rathore, N., Schwartz, D.C., De Pablo, J.J.: A coarse grain model for DNA. J.
Chem. Phys. 126, 084,901 (2007)
Modeling Nucleic Acids at the Residue–Level Resolution 157
57. Koliński, A., Skolnick, J.: Monte Carlo simulations of protein folding. I. Lattice model and
interaction scheme. Proteins 18, 338–352 (1994)
58. Kolk, M.H., Heus, H.A., Hilbers, C.W.: The structure of the isolated, central hairpin of the HDV
antigenomic ribozyme: novel structural features and similarity of the loop in the ribozyme and
free in solution. EMBO J. 16, 3685–92 (1997)
59. Kumar, S.: D, B., Swendsen, R.H., Kollman, P.A., Rosenberg, J.M.: The weighted histogram
analysis method for free-energy calculations on biomolecules. I. the method. J. Comput. Chem.
13, 1011–1021 (1992)
60. Lankas, F., Lavery, R., Maddocks, J.H.: Kinking occurs during molecular dynamics simulations
of small DNA minicircles. Structure 14, 1527–1534 (2006)
61. Leach, A.: Molecular Modelling: Principles and Applications (2nd Edition). Prentice Hall
(2001)
62. Leonarski, F., D’Ascenzo, L., Auffinger, P.: Mg2+ ions: do they bind to nucleobase nitrogens?
Nucleic Acids Res. 45, 987–1004 (2017)
63. Leonarski, F., Trovato, F., Tozzini, V., Leś, A., Trylska, J.: Evolutionary algorithm in the
optimization of a coarse-grained force field. J. Chem. Theory Comput. 9, 4874–4889 (2013)
64. Leonarski, F., Trovato, F., Tozzini, V., Trylska, J.: Genetic algorithm optimization of force field
parameters: application to a coarse-grained model of RNA. In: Proceedings of the 9th European
conference on Evolutionary computation, machine learning and data mining in bioinformatics,
EvoBIO’11, pp. 147–152. Springer-Verlag, Berlin, Heidelberg (2011)
65. Leonarski, F., Trylska, J.: RedMDStream: Parameterization and simulation toolbox for coarse-
grained molecular dynamics models. Biophys. J. 108, 1843–1847 (2015)
66. Leontis, N.B., Westhof, E.: Analysis of RNA motifs. Curr. Opin. Struct. Biol. 13, 300–308
(2003)
67. Liphardt, J., Dumont, S., Smith, S.B., Tinoco, I., Bustamante, C.: Equilibrium information
from nonequilibrium measurements in an experimental test of Jarzynski’s equality. Science
296, 1832–1835 (2002)
68. Liphardt, J., Onoa, B., Smith, S.B., Tinoco, I., Bustamante, C.: Reversible unfolding of single
RNA molecules by mechanical force. Science 292, 733–737 (2001)
69. Liwo, A., Czaplewski, C., Oldziej, S., Rojas, A., Kazmierkiewicz, R., Makowski, M., Murarka,
R., Scheraga, H.: Simulation of protein structure and dynamics with the coarse-grained unres
force field. In: G. Voth (ed.) Coarse-Graining of Condensed Phase and Biomolecular Systems.,
chap. 8, pp. 107–122. Taylor & Francis (2008)
70. Liwo, A., He, Y., Scheraga, H.A.: Coarse-grained force field: general folding theory. Phys.
Chem. Chem. Phys. 13(16), 890–901 (2011)
71. Lu, Z.J., Turner, D.H., Mathews, D.H.: A set of nearest neighbor parameters for predicting
the enthalpy change of rna secondary structure formation. Nucleic Acids Res. 34, 4912–4924
(2006)
72. Lyubartsev, A.P., Laaksonen, A.: Calculation of effective interaction potentials from radial
distribution functions: A reverse Monte Carlo approach. Phys. Rev. E 52, 3730–3737 (1995)
73. Ma, J.: Usefulness and limitations of normal mode analysis in modeling dynamics of biomolec-
ular complexes. Structure 13, 373–380 (2005)
74. Maciejczyk, M., Rudnicki, W.R., Lesyng, B.: A mezoscopic model of nucleic acids. Part 2. An
effective potential energy function for DNA. J. Biomol. Struct. Dyn. 17, 1109–1115 (2000)
75. Maciejczyk, M., Spasic, A., Liwo, A., Scheraga, H.A.: Coarse-grained model of nucleic acid
bases. J. Comp. Chem. 31, 1644–1655 (2010)
76. Maciejczyk, M., Spasic, A., Liwo, A., Scheraga, H.A.: DNA duplex formation with a coarse-
grained model. J. Chem. Theory Comput. 10, 5020–5035 (2014)
77. MacKerell, A.D., Banavali, N., Foloppe, N.: Development and current status of the CHARMM
force field for nucleic acids. Biopolymers 56, 257–265 (2000)
78. Malhotra, A., Harvey, S.C.: A quantitative model of the Escherichia coli 16 S RNA in the 30
S ribosomal subunit. J. Mol. Biol. 240, 308–340 (1994)
79. Malhotra, A., Tan, R.K., Harvey, S.C.: Modeling large RNAs and ribonucleoprotein particles
using molecular mechanics techniques. Biophys. J. 66, 1777–1795 (1994)
158 F. Leonarski and J. Trylska
80. Malo, J., Mitchell, J.C., Venien-Bryan, C., Harris, J.R., Wille, H., Sherratt, D.J., Turberfield,
A.J.: Engineering a 2D protein DNA crystal. Angew. Chem. Int. Ed. 44, 3057–3061 (2005)
81. Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H.: Expanded sequence dependence of ther-
modynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288,
911–940 (1999)
82. Mathews, D.H., Turner, D.H.: Prediction of RNA secondary structure by free energy mini-
mization. Curr. Opin. Struct. Biol. 16, 270–278 (2006)
83. Mattick, J.S., Makunin, I.V.: Non-coding RNA. Human Mol. Gen. 15 Spec No, R17–29 (2006)
84. Mazur, A.K.: Evaluation of elastic properties of atomistic DNA models. Biophys. J. 91, 4507–
4518 (2006)
85. McCammon, J.A., Gelin, B.R., Karplus, M.: Dynamics of folded proteins. Nature 267, 585–590
(1977)
86. Mergell, B., Ejtehadi, M.R., Everaers, R.: Modeling DNA structure, elasticity, and deformations
at the base-pair level. Phys Rev E Stat Nonlin Soft Matter Phys 68, 15 (2003)
87. Mergny, J.L., Lacroix, L.: Analysis of thermal melting curves. Oligonucleotides 13, 515–537
(2003)
88. Merino, E.J., Wilkinson, K.A., Coughlan, J.L., Weeks, K.M.: RNA structure analysis at single
nucleotide resolution by selective 2’-hydroxyl acylation and primer extension (SHAPE). J.
Am. Chem. Soc. 127, 4223–4231 (2005)
89. Miao, Z., Adamiak, R.W., Antczak, M., Batey, R.T., Becka, A.J., Biesiada, M., Boniecki, M.J.,
Bujnicki, J.M., Chen, S.J., Cheng, C.Y., Chou, F.C., Ferre-D’Amare, A.R., Das, R., Dawson,
W.K., Ding, F., Dokholyan, N.V., Dunin-Horkawicz, S., Geniesse, C., Kappel, K., Kladwang,
W., Krokhotin, A., Lach, G.E., Major, F., Mann, T.H., Magnus, M., Pachulska-Wieczorek,
K., Patel, D.J., Piccirilli, J.A., Popenda, M., Purzycka, K.J., Ren, A., Rice, G.M., Santalucia,
J., Sarzynska, J., Szachniuk, M., Tandon, A., Trausch, J.J., Tian, S., Wang, J., Weeks, K.M.,
Williams, B., Xiao, Y., Xu, X., Zhang, D., Zok, T., Westhof, E.: RNA-Puzzles round III: 3D
RNA structure prediction of five riboswitches and one ribozyme. RNA 23, 655–672 (2017)
90. Mizushima, T., Kataoka, K., Ogata, Y.: Inoue, R.i., Sekimizu, K.: Increase in negative super-
coiling of plasmid DNA in Escherichia coli exposed to cold shock. Mol. Microbiol. 23, 381–386
(1997)
91. Mizushima, T., Natori, S., Sekimizu, K.: Relaxation of supercoiled DNA associated with induc-
tion of heat shock proteins in Escherichia coli. Mol. Gen. Genet. 238, 1–5 (1993)
92. Morriss-Andrews, A., Rottler, J., Plotkin, S.S.: A systematically coarse-grained model for DNA
and its predictions for persistence length, stacking, twist, and chirality. J. Chem. Phys. 132, 30
(2010)
93. Narberhaus, F., Waldminghaus, T., Chowdhury, S.: RNA thermometers. FEMS Microbiol. Rev.
30, 3–16 (2006)
94. Niewieczerzał, S., Cieplak, M.: Stretching and twisting of the DNA duplexes in coarse-grained
dynamical models. J. Phys. Condens. Matter 21, 474,221 (2009)
95. Olson, W.K.: Configurational statistics of polynucleotide chains. a single virtual bond treatment.
Macromolecules 8, 272–275 (1975)
96. Olson, W.K.: Flexible dna double helix.1. average dimensions and distribution functions.
Biopolymers 18, 1213–1233 (1979)
97. Olson, W.K., Manning, G.S.: A configurational interpretation of the axial phosphate spacing
in polynucleotide helices and random coils. Biopolymers 15, 859–878 (1976)
98. Olson, W.K., Zhurkin, V.B.: Modeling DNA deformations. Curr. Opin. Struct. Biol. 10, 286–
297 (2000)
99. Omabegho, T., Sha, R., Seeman, N.C.: A bipedal DNA brownian motor with coordinated legs.
Science 324, 67–71 (2009)
100. Ouldridge, T. (ed.): Coarse-Grained Modelling of DNA and DNA Self-Assembly. Springer,
Berlin Heidelberg, Oxford, UK (2012)
101. Ouldridge, T.E., Johnston, I.G., Louis, A.A., Doye, J.P.K.: The self-assembly of DNA Holliday
junctions studied with a minimal model. J. Chem. Phys. 130, 065101 (2009)
Modeling Nucleic Acids at the Residue–Level Resolution 159
102. Ouldridge, T.E., Louis, A.A., Doye, J.P.K.: DNA nanotweezers studied with a coarse-grained
model of DNA. Phys. Rev. Lett. 104, 4 (2009)
103. Ouldridge, T.E., Louis, A.A., Doye, J.P.K.: Extracting bulk properties of self-assembling
systems from small simulations. J. Phys. Condens. Matter 22, 104,102 (2010)
104. Ouldridge, T.E., Louis, A.A., Doye, J.P.K.: Structural, mechanical, and thermodynamic prop-
erties of a coarse-grained DNA model. J. Chem. Phys 134, 085,101 (2010)
105. Parisien, M., Cruz, J.A., Westhof, E., Major, F.: New metrics for comparing and assessing
discrepancies between rna 3d structures and models. RNA 15, 1875–1885 (2009)
106. Pasquali, S., Derreumaux, P.: HiRE-RNA: a high resolution coarse-grained energy model for
RNA. J. Phys. Chem. B 114, 11957–11966 (2010)
107. Pérez, A., Marchán, I., Svozil, D., Sponer, J., Cheatham, T.E., Laughton, C.A., Orozco,
M.: Refinement of the AMBER force field for nucleic acids: improving the description of
alpha/gamma conformers. Biophys. J. 92, 3817–3829 (2007)
108. Poulain, P., Saladin, A., Hartmann, B., Prévost, C.: Insights on protein-DNA recognition by
coarse grain modelling. J. Comp. Chem. 29, 2582–2592 (2008)
109. Prytkova, T.R., Eryazici, I., Stepp, B., Nguyen, S.B., Schatz, G.C.: DNA melting in small-
molecule-DNA-hybrid dimer structures: experimental characterization and coarse-grained
molecular dynamics simulations. J. Phys. Chem. B 114, 2627–2634 (2010)
110. Ramachandran, A., Guo, Q., Iqbal, S.M., Liu, Y.: Coarse-grained molecular dynamics simula-
tion of DNA translocation in chemically modified nanopores. J. Phys. Chem. B 115, 6138–6148
(2011)
111. Reith, D.: CG-OPT: A software package for automatic force field design. Comput. Phys.
Commun. 148, 299–313 (2002)
112. Reith, D., Pütz, M., Müller-Plathe, F.: Deriving effective mesoscale potentials from atomistic
simulations. J. Comput. Chem. 24, 1624–1636 (2003)
113. Ren, A., Patel, D.J.: c-di-AMP binds the ydaO riboswitch in two pseudo-symmetry-related
pockets. Nat. Chem. Biol. 10, 780–786 (2014)
114. Richmond, T.J., Davey, C.A.: The structure of DNA in the nucleosome core. Nature 423,
145–150 (2003)
115. Romano, F., Hudson, A., Doye, J.P.K., Ouldridge, T.E., Louis, A.A.: The effect of topology
on the structure and free energy landscape of DNA kissing complexes. J. Chem. Phys. 136,
215102 (2012)
116. Rothemund, P.: Folding DNA to create nanoscale shapes and patterns. Nature 440, 297–302
(2006)
117. Rother, K., Rother, M., Boniecki, M., Puton, T., Bujnicki, J.M.: RNA and protein 3D structure
modeling: similarities and differences. J. Mol. Model. pp. 2325–2336 (2011)
118. Rüdisser, S., Tinoco, I.: Solution structure of Cobalt(III)hexammine complexed to the GAAA
tetraloop, and metal-ion binding to G.A mismatches. J. Mol. Biol. 295, 1211–1223 (2000)
119. Rudnicki, W.R., Bakalarski, G., Lesyng, B.: A mezoscopic model of nucleic acids. Part 1.
Lagrangian and quaternion molecular dynamics. J. Biomol. Struct. Dyn. 17, 1097–1108 (2000)
120. Russell, R., Millett, I.S., Doniach, S., Herschlag, D.: Small angle X-ray scattering reveals a
compact intermediate in RNA folding. Nat. Struct. Biol. 7, 367–370 (2000)
121. Sambriski, E.J., Schwartz, D.C., De Pablo, J.J.: A mesoscale model of DNA and its renatura-
tion. Biophys. J. 96, 1675–1690 (2009)
122. Savelyev, A., Papoian, G.A.: Molecular Renormalization Group Coarse-Graining of Polymer
Chains: Application to Double-Stranded DNA. Biophys. J. 96, 4044–4052 (2009)
123. Savelyev, A., Papoian, G.A.: Chemically accurate coarse graining of double-stranded DNA.
Proc. Natl. Acad. Sci. USA 107, 20340–20345 (2010)
124. Schlick, T.: Molecular Modeling and Simulation: An Interdisciplinary Guide (Interdisci-
plinary Applied Mathematics), 2nd edition. edn. Springer (2010)
125. Seeman, N.C.: DNA in a material world. Nature 421, 427–431 (2003)
126. Sharma, S., Ding, F., Dokholyan, N.V.: iFoldRNA: three-dimensional RNA structure predic-
tion and folding. Bioinformatics 24, 1951–1952 (2008)
160 F. Leonarski and J. Trylska
127. Shaw, D.E., Dror, R.O., Salmon, J.K., et al.: Millisecond-scale molecular dynamics simula-
tions on anton. In: Proceedings of the Conference on High Performance Computing Network-
ing, Storage and Analysis, SC ’09, pp. 39:1–39:11. ACM, New York, NY, USA (2009)
128. Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., et al.: Atomic-level
characterization of the structural dynamics of proteins. Science 330, 341–346 (2010)
129. Skolnick, J., Koliński, A.: Simulations of the folding of a globular protein. Science 250,
1121–1125 (1990)
130. Stagg, S.M., Mears, J.A., Harvey, S.C.: A Structural Model for the Assembly of the 30S
Subunit of the Ribosome. J. Mol. Biol. 328, 49–61 (2003)
131. Sussman, J.L., Holbrook, S.R., Warrant, R.W., Church, G.M., Kim, S.H.: Crystal structure of
yeast phenylalanine transfer RNA. I. Crystallographic refinement. J. Mol. Biol. 123, 607–30
(1978)
132. Swendsen, R.H.: Monte Carlo renormalization group. Phys. Rev. Lett. 42, 859–861 (1979)
133. Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin-glasses. Phys. Rev. Lett.
57, 2607–2609 (1986)
134. Tan, R.K.Z., Harvey, S.C.: Molecular Mechanics Model of Supercoiled DNA. J. Mol. Biol.
205, 573–591 (1989)
135. Trovato, F., Tozzini, V.: Supercoiling and local denaturation of plasmids with a minimalist
DNA model. J. Phys. Chem. B 112, 13197–13200 (2008)
136. Trylska, J., Tozzini, V., McCammon, J.A.: Exploring global motions and correlations in the
ribosome. Biophys. J. 89, 1455–1463 (2005)
137. Tucker, B.J., Breaker, R.R.: Riboswitches as versatile gene control elements. Curr. Opin.
Struct. Biol. 15, 342–8 (2005)
138. Tullius, T.D.: DNA footprinting with hydroxyl radical. Nature 332, 663–664 (1988)
139. Turner, D.H., Mathews, D.H.: NNDB: the nearest neighbor parameter database for predicting
stability of nucleic acid secondary structure. Nucleic Acids Res. 38, D280–282 (2010)
140. Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–51 (2001)
141. Vinograd, J., Lebowitz, J., Radloff, R., Watson, R., Laipis, P.: The twisted circular form of
polyoma viral DNA. Proc. Natl. Acad. Sci. USA 53, 1104–1111 (1965)
142. Voltz, K., Trylska, J., Calimet, N., Smith, J.C., Langowski, J.: Unwrapping of nucleosomal
DNA ends: a multiscale molecular dynamics study. Biophys. J. 102, 849–858 (2012)
143. Voltz, K., Trylska, J., Tozzini, V., Kurkal-Siebert, V., Langowski, J., Smith, J.: Coarse-grained
force field for the nucleosome from self-consistent multiscaling. J. Comput. Chem. 29, 1429–
1439 (2008)
144. Vorobjev, Y.N.: Block-units method for conformational calculations of large nucleic acid
chains. i. block-units approximation of atomic structure and conformational energy of polynu-
cleotides. Biopolymers 29, 1503–1518 (1990)
145. Wang, J., Peck, L., Becherer, K.: DNA Supercoiling and Its Effects on DNA Structure and
Function. Cold Spring Harbor Symposia on Quantitative Biology 47, 85–91 (1983)
146. Whitelam, S., Feng, E.H., Hagan, M.F., Geissler, P.L.: The role of collective motion in exam-
ples of coarsening and self-assembly. Soft Matter 5, 1251–1262 (2009)
147. Wimberly, B.T., Bodersen, D.E., Clemons, W.M., Morgan-Warren, R.J., Carter, A.P., Von-
rhein, C., Hartsch, T., Ramakrishnan, V.: Structure of the 30S ribosomal subunit. Nature 407,
327–339 (2000)
148. Xia, Z., Gardner, D.P., Gutell, R.R., Ren, P.: Coarse-grained model for simulation of RNA
three-dimensional structures. J. Phys. Chem. B 114, 13497–13506 (2010)
149. Yu, I., Mori, T., Ando, T., Harada, R., Jung, J., Sugita, Y., Feig, M.: Biomolecular interactions
modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm.
eLife 5, e19,274 (2016)
150. Yurke, B., Turberfield, A.J., Mills Jr, A.P., Simmel, F.C., Neumann, J.L.: A DNA-fuelled
molecular machine made of DNA. Nature pp. 605–608 (2000)
151. Zheng, H., Chordia, M.D., Cooper, D.R., Chruszcz, M., Mueller, P., Sheldrick, G.M., Minor,
W.: Validation of metal-binding sites in macromolecular structures with the CheckMyMetal
web server. Nat. Protoc. 9, 156–170 (2014)
Modeling Nucleic Acids at the Residue–Level Resolution 161
152. Zou, J., Liang, W., Zhang, S.: Coarse-grained molecular dynamics modeling of DNA-carbon
nanotube complexes. Int. J. Numer. Meth. Eng. 0600661, 968–985 (2010)
153. Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic
Acids Res. 31, 3406–3415 (2003)
Modeling of Electrostatic Effects
in Macromolecules
Yury N. Vorobjev
Abstract Electrostatic energy and forces are primary important factors defining
macromolecular interactions and its’ self-organization in an aqueous solution. The
unique property of electrostatic forces is it’s long-range character. Therefore an
accurate modeling of the long-range electrostatic interactions and related energy
of macromolecule in an aqueous solvent at given temperature, salt and hydrogen
ion concentration is the long-standing problem. One of the most advanced solu-
tion of macromolecular electrostatics is a single-molecule approach with an implicit
solvent electrostatic model for macromolecular simulations in water proton bath is
considered here. The fundamental quantity that implicit electrostatic models approx-
imate is the solute potential of mean force, which is obtained by averaging over
solvent degrees of freedom. The implicit solvent models suggest practical ways to
calculate free energies of macromolecular conformations taking into account equi-
librium interactions with water solvent and proton bath, while the explicit solvent
approach is unable to do that due to the need to account for a large number of solvent
degrees of freedom and long-range nature of the electrostatic interactions. The most
advanced realizations of the implicit continuum electrostatic models by different
research groups are discussed, their accuracy are examined and some applications
of the implicit solvent electrostatic models to macromolecular modeling, such as
protein free energy calculations, protein folding, ionization equilibria and pKa ’s of
ionizable groups and constant pH molecular dynamics are highlighted.
1 Introduction
Computer simulations with explicit solvent molecules represents one of the most
detailed approach to model the structure and energy of biomolecules [21]. However,
an accurate description of the aqueous environment for realistic simulations, e.g. with
Y. N. Vorobjev (B)
Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian
Academy of Science, Lavrentiev Ave. 8, Novosibirsk 630090, Russia
e-mail: [email protected]
1 qi qj
N
Eel (1)
2 ij rij
Modeling of Electrostatic Effects in Macromolecules 165
The commonly used modern molecular mechanical force fields use monopole
atomic charges and ignore internal molecular polarization effects [22, 25]. The sim-
plicity of the charge distribution adopted for macromolecular modeling, partially can
be explained by the computational efficiency of simulations.
where, Gcav (x) is the free energy for creation of the molecular cavity in water (stage
1), Gvdw (x) is the free energy of van der Waals interactions between the solute and
the water solvent (stage 2), Gpol (x, q0 ) is the free energy of polarization of the
water solvent by the protein with gas phase partial charges on all atoms (stage 3),
Ginz (x, pH) is the free energy of equilibrium titration of protein for a given pH and
conformation x which leads to a change of the protein gas phase partial atomic charges
q0 of the neutral ionization microstate z0 (z01 , …, z0ζ ), where ζ is the total number of
titratable protons (or groups), to a new values qinz for equilibrium ionization state <z>
which is coupled with conformation x and pH value. The thermodynamic process
defines the free energy Gt (x, pH) of transport of a single protein molecule into water
at a given pH in an instantaneous microscopic conformation x:
It should be noted that transport of a neutral protein molecule from gas phase into
water solvent at a given pH is not accompanied by the transfer of a net charge. The
protein molecule becomes being charged in water proton bath due to equilibrium
proton binding (releasing), i.e. by means of equilibrium redistribution of protons
between the solvent and the solute in a given conformation x. The total free energy
of protein for a given conformation x in the solvent at given pH is equal to
G(x, p H ) Um x, q0 + W x, q0 + Ginz(x, pH) (4)
166 Y. N. Vorobjev
Fig. 1 Thermodynamic process of transport of protein from gas phase into water proton-bath; q0
is atomic charges in the gas phase with all ionizable groups are neutral; (stage 1) creation of a
solute-sized cavity in water; (stage 2) insertion of the zero charged protein (with all atoms having
zero partial charge) into the cavity in water; (stage 3) charging of the protein to the gas-phase partial
atomic charges q0 (q01 , …, q0N ), and (stage 4) an equilibrium titration of the protein at a given pH
value
Modeling of Electrostatic Effects in Macromolecules 167
The solvation free energy W (x) can be written in the framework of the free energy
perturbation method
1
Ums (x, y)dy exp −β λUms (x, y) + Uss (y)
W(x) dλ (8)
dy exp −β λUms (x, y) + Uss (y)
0
While the all intra- and inter-molecular interactions are electrostatic in the nature at
the quantum mechanical level, they are considered as the sum of electrostatic and
non-electrostatic terms in molecular mechanical force fields [22, 25]. The total free
energy of solvation of macromolecule consists of two parts, namely, the free energy
of non-electrostatic interactions, the first two terms of Eq. (2), which is mainly
independent on atomic charges, and the free energy of electrostatic interactions,
which is a function of the atomic charges q which is equal to zero for zero charge
distribution q 0. For completeness we consider non-electrostatic and electrostatic
parts of the free energy of solvation.
The sum of free energy of solvent cavity formation and solute-solvent van der Waals
interactions is the free energy of nonpolar solvation Gnp
The nonpolar solvation has a complex physical nature and the associated energy
has smaller amplitude than the electrostatic counterpart, however, hydrophobic asso-
ciation is one of the principal interaction that determines biomolecular structures
[127]. The nonpolar solvation includes two terms i.e., the free energy of solvent
cavity Gcav formation and solute-solvent van der Waals free energy Gvdw . These
two terms depend differently on structure and conformation of interacting chemical
groups [26, 27, 147].
Experimental data [15, 16, 50, 62], microscopic simulations on small systems [56,
57, 157, 158] and scaled particle theory [111, 112] show that the cavity free energy
changes linearly with the surface S of the solvent excluded cavity
where the cavity surface is defined as a smooth molecular surface (MS) confining the
molecular solvent excluded volume (SEV) [30, 146] or in some applications as a sol-
Modeling of Electrostatic Effects in Macromolecules 169
vent accessible surface (SAS) [29, 117]. The SAS is generated by the center of water
solvent probe molecule, modeled as a rigid sphere of radius Rw 1.4 Å, when this
rolls about external van der Waals (VDW) surface of protein atoms, each represented
by a spherical ball of atomic van der Waals radius Rvdw,i . It is common approxima-
tion that the atomic van der Waals radii are independent on atomic charges. The
proportionality factor, γ micro is a microscopic surface tension. An optimum choice
for the proportionality factor, γ micro , between surface area and cavity free energy
depends on the choice of a type of surface, the MS or the SAS. Simulations with an
explicit water model show the free energy of creating an uncharged small gas-bubble
in an aqueous solution to be proportional to the macroscopic surface of the cavity
with an interfacial surface tension γ macro similar to the experimental gas-solvent
surface tension [62]. The value of the microscopic surface free energy, γ micro used
to compute Gcav is smaller because, on a molecular scale, the microscopic surface
of an interface is much more irregular and somewhat larger than the corresponding
macroscopic surface by the average factor of ~1.5 [147, 148]. Correspondingly, the
microscopic surface free energy should be smaller than the macroscopic surface ten-
sion of water by the same factor. With experimental γ macro equal to 102 cal/(mol Å2 ),
this gives a value of 67 cal/(mol Å2 ) for γ micro , in good agreement with the estimate
of 70 cal/(mol Å2 ) that has been found to optimize the correlation between protein
stability experimental data and protein-protein binding constants of mutant proteins
[63, 104].
The first hydration shell gives up to 85% of the energy Gvdw due to a short-range
nature of van der Waals interactions with solvent. Therefore the energy Gvdw can be
approximated by the linear expression over area of molecular surface S,
The average proportionality factor γvdw −30 (±17) cal/mol/Å ´ 2 has been found
from MD simulations of the solute-solvent van der Waals energy for a set of medium
size proteins in an explicit SPC water [147]. An agreement between the distance
dependence of the implicit solvent PMF of non-polar interactions between two
methane molecules on the distance r in water [148] with the PMF calculated by micro-
scopic simulations via Monte Carlo and molecular dynamics shows self-consistency
the cavity term and solute solvent van der Waals energy defined by Eqs. (11)–(12).
A recent computational study [131] showed that the MS area in the Eqs. (11)–(12)
provides a reasonable description of hydrophobic association of hydrocarbons and
reproduces desolvation maximum of the rigorous PMF calculated by the free energy
simulation in an explicit water solvent. The total non-polar hydration free energy of
170 Y. N. Vorobjev
Eq. (10) is still modeled by the SAS area [33, 41, 67, 80, 110, 154] which does not
reproduce the PMF’s desolvation maximum of hydrophobic association.
The cavity formation free energy term Gcav is presented as a sum over partial
atomic SAS surfaces si with atom-dependent scaling factors γi [42–44, 85]
G cav γi si (13)
i
where,
16
ai − πρw εiw σiw
6
(15)
3
where ρw 0.033428 Å−3 and σiw and εiw are the OPLS force field parameters
[65] for van der Waals potential between atom i and water oxygen, Bi is the Born
radius of atom i in the molecule of given conformation and Rw 1.4 Å is radius
of water molecule. The values of parameters αi (which is in average ~1) have been
set so as to reproduce as best as possible the solute-solvent van der Waals energies
of individual atoms of a large set of proteins and small molecules obtained from
the results of explicit solvent simulations with TIP4P3 [45, 65, 85]. The description
of the nonpolar hydrations via Eqs. (13)–(14) with atomic scaling factors αi and γ i
empirically accounts for a dependence of atomic van der Waals radii Ri , which define
the SAS, on atomic charges.
The atomic charges q for protein conformation x induce in the solvent a polarization
charge density, <ρpol (r)> which produces the reaction field electrostatic potential,
V pol (xi ) at the protein’s atoms i,
Modeling of Electrostatic Effects in Macromolecules 171
<ρpol (r)>
<Vpol (xi )> dr (16)
|r − xi |
The polarization free energy is a work done in a charging process in which the
charges of the protein are gradually “turned on” by factor λ
1
Gpol dλ qi <Vpol (xi )>λ (17)
0 i
With the linear response approximation for solvent polarization, V pol and ρ pol
both are proportional to λ, and this gives
1 <ρpol (r)>
Gpol q dr (18)
2 i i |r − xi |
In a simulation with explicit solvent, ρ pol is identical with the distribution of the
average charges of the solvent atoms, and a common approach is to use Eq. (17) to
compute Gpol with thermodynamic perturbation method [76]. The validity of the lin-
ear response approximation Eq. (18) for the solvent reaction potential of an aqueous
solvent has been tested by direct simulations of its dependence on λ in molecular
dynamics free energy simulations [3, 56, 64, 84, 118, 121, 148]. In a majority of sim-
ulations of charged and polar molecules a nearly linear response has been observed
for a moderately charged solute.
The validity of linear response approximation suggests that the calculation of the
average induced polarization charge density, <ρ pol (r)> can be done in the framework
of macroscopic electrostatics i.e., with an implicit continuum solvent description. The
average electrostatic potential (r) contains contributions from the fixed charges q
of the protein and the induced polarization charges in the solvent, according to the
Poisson equation,
∇ 2 (r) −4π qi δ(r − xi ) − 4π <ρpol (r)> (19)
i
and with use of standard relations connecting the average induced charge density
<ρ pol (r)> with the average polarization, and the polarization with the electric field
E(r) [61, 79], one obtains Poisson equation with a position-dependent dielectric
constant D(r)
∇D(r)∇Φ(r) −4π qi δ(r − xi ) (20)
i
172 Y. N. Vorobjev
D(r) − 1
<ρpol (r)> −∇ E(r) (21)
4π
where θ(r) is a sharp switching function equal to zero inside the solvent excluded
volume. The exact choice of where to locate the solute-solvent dielectric boundary is
empirical and can compensate for deviations of the actual dependence of the dielectric
constant from the assumed step function near the protein surface. An optimal set of
atomic radii defining dielectric interface MS has been calculated by fitting the implicit
model polarization free energy to a set of experimental data [130] and data obtained
by calculations with explicit solvent for a training set of small molecules and for the
20 standard amino acids [100, 101, 150]. The obtained sets of atomic radii allow one
to reproduce polarization free energies of the 20 standard amino acids within errors
of 1–2% from free energy simulation by thermodynamic perturbation method with
explicit water.
The dielectric surface interface defining the border between solvent and molecular
interior of Eq. (22) is a smooth molecular surface confining the molecular solvent
excluded volume [30–32, 146]. It is shown that the smooth MS is a good approx-
imation of the dielectric surface border between the high dielectric polar solvent
and low-dielectric interior of solute molecule in continuum dielectric method based
on a numerical solution of the Poisson equation Eq. (20) [145, 146]. Calculation of
molecular properties on the MS and integration of a function over the MS requires a
numerical representation of the MS as a manifold S(si , ni , si ) of boundary elements
Modeling of Electrostatic Effects in Macromolecules 173
(BE’s) where si , ni , si are coordinates, normal vector in outward direction and area
of a small surface element. Due to complexity, the formally defined Connolly’s MS of
a protein may contain hundreds of unphysical regions with singularities (discontin-
ues) in the direction of the normal vector. Singularities called cusps and holes appear
when the probe can almost, but not quite pass through a group of two or three atoms
of the protein [32, 146, 164]. It has been shown [145–147] that accurate solution
of Poisson equation via boundary element method needs MS with smoothed singu-
larities. None of programs, MSROLL [32], MSEED [109], MS [142] and MSMS
[123] were specifically designed for the boundary element method application and
provide a dot MS of poor quality as was tested by Vorobjev and Hermans [146] to be
used with BE method. The Connolly’s method of MS calculation [30–32] has been
revised and the new method generating Sooth Invariant Molecular Surface (SIMS)
[146] has been developed. The SIMS method, (i) produces a near-homogeneous dot
distribution, (ii) is invariant to molecular rotation and translation and, (iii) recognizes
all types of singularities of the MS and smoothed them with specified minimal radius
of curvature. An optimal practical choice of the radius of the smoothing sphere is
~0.4 Å. The SIMS method generates a dot MS of good numerical quality, which
can be used in a variety of implicit continuum models for calculating solvation free
energy and for molecular electrostatics with Poisson equation. The influence of a
choice and composition of boundary elements on convergence of the solution of
the Poisson equation by numerical methods has been investigated in details using
Connolly’s MSROLL [32] and SIMS programs to generate BE on the solute-solvent
dielectric surface [70]. It has been found that the SIMS program generates the BE’s
of better quality and achieves convergence faster using smaller number of the surface
elements than the MSROLL program, by a factor ~1.5–2.0, in the test on a set of
35 medium size proteins. A complete description of the SIMS method can be found
elsewhere [146]. The CPU time of the SIMS method scales as the number of atoms
in the molecule [147]. The SIMS program is available from the authors on request
([email protected]).
The finite difference (FD) method solves Poisson (or Poisson-Boltzmann) equation
in differential form Eq. (20) using multigrid volume elements in a rectangular box
which includes the solute and a volume of solvent around it [51–54, 93, 120, 129,
130]. The alternative is a boundary element (BE) method which is used for numerical
solution of an integral equation over the dielectric boundary, to which the original
Poisson Eq. (19) can be analytically converted [18]. The BE method finds a solution
in terms of solvent polarization charge density induced or electrostatic potential on
boundary elements tessellated the solute-solvent dielectric surface [18, 68, 88, 89,
144, 145, 147, 150, 163]. The boundary element method shows it’s invariance to
rotation and translation of the solute molecule. The BE method exhibits a higher
degree of consistency in comparison with numerical results of multigrid BE and FD
methods [18, 145]. Improved methods of solving the Poisson equation for inhomo-
174 Y. N. Vorobjev
where f (1/2π)(DI − D0 )/(DI + D0 ) and n(t) is the outward normal vector to the
molecular surface at point t, Ei (t) is electrostatic field generated by the charge i at the
surface point t. The induced charge density σ (t) approximates the average solvent
induced charge density, in Eq. (16). The solvent polarization free energy GFM pol of the
FAMBE method can be found with Eq. (18), replacing volume integral and volume
charge density with surface integral and surface charge density σ (s)
1 σ(s)
pol (x)
GFM q ds
2 i i |s − xi |
S
1 σi (s)ds 1 σj (s)ds
q + q
2 i i |xi − s| 2 ij i |xi − s|
S S
1 FM
gFM
i (x) + w (x) (24)
i
2 ij ij
where gFM i (x) is the energy of solvent polarization by atom i, i.e. the energy of
self-polarization, and wFM
ij is the pair PMF of interaction of atoms i, j due to the
solvent polarization. The FAMBE is an efficient method to calculate a set of partial
atomic polarization densities σi (s), polarization energy and atomic forces for a given
protein conformation x. The FAMBE method for calculation of the induced surface
polarization charge density σ (t) splits the σ (t) given by Eq. (23) into a sum of
terms σ i (t), each one of which represents the induced polarization charge density,
generated by a single group of charges qi , since the term Ei (t) is linear in the charges
qi . The FAMBE method splits Eq. (24) into set of independent minor BE equations,
one each for the induced polarization charge density generated by a single charge
(or small compact group of charges)
σi (s)(t − s) n(t)ds f
σi (t) f + nt Ei (t), i 1, 2, . . . (25)
|t − s|3 D I
S
Modeling of Electrostatic Effects in Macromolecules 175
the total surface charge, σ (t) is the sum of the components σ i (t). The reason for such
decomposition is that the integral equation, Eq. (25), for each component σ i (t), can
be converted into a discrete linear equation of low dimensionality of a matrix Mi
over the set i of adaptive multi sized boundary elements
σi Mi σi + Ei (26)
For each charge, qi the size of the boundary elements steadily increases with
distance R from the source of the molecular electrostatic field. Thereby the MS is
tessellated by the unique set of multisized BE’s, so that, for any given single charge
qi the dimensions of the vector of surface charge densities σi and of the matrix Mi is
significantly lower, than the total number of surface elements that would be used if
the surface were tessellated by the finest uniform boundary elements in the Eq. (25).
The number of multisized boundary elements N MBE , i.e. the matrix Mi size for any
single charge qi , which tessellates an MS with area AS scales as
where, nloc and Aloc are an average number of boundary elements and size for the
local area with finest tessellation. Each minor matrix Eq. (26) is solved by the pre-
conditioned bi-conjugate gradient method [113]. A few iterations (5 or 6) are needed
to find a solution of linear Eq. (26) with a relative accuracy of 10−4 –10−5 . The
computational complexity of the FAMBE method scales as
where nloc is the average number of boundary elements, AS is the MS area and Aloc is
the size of the local area with the finest tessellation, N z is the number of charges (or
charged groups) in the solute molecule. Test calculations for several proteins show
that the CPU time of the FAMBE method scales approximately linearly with the
number of atoms of the molecule. The FAMBE method [150] shows a high degree
of internal self-consistency, accuracy and speed of calculations in comparison with
one of the latest realization of BE method by other authors [88, 89]. The free energy
of solvent polarization calculated with the FAMBE method includes dependence on
salt effects implicitly [150]. A good numerical quality and a high speed recommend
the FAMBE method as good tool for a post processing of molecular dynamics trajec-
tories for free energy estimations via Eq. (9) with important applications for systems
undergoing a large conformational changes. The FAMBE program is available from
the authors on request ([email protected]).
A solution of the Poisson equation by the fastest available methods for a medium size
protein takes 10–30 s CPU time on a single processor unit. However, this CPU time
176 Y. N. Vorobjev
is to large to use the Poisson equation for calculation of solvation energy and atomic
forces on the fly in the MD method. Therefore other faster simplified approaches like
the generalized Born (GB) method has received attention [12, 135]. The GB model
defines the free energy of solvent polarization by protein charges analytically
1 1 1 qi q j
G pol − − (29)
2 DI D0 i, j
f G B (ri j )
where fGB (r) is a function that interpolates between the effective Born radius Bij ,
of atoms i, j when the distance between atoms rij is short, and rij itself at the large
distances rij [135]
1/2
fGB (rij ) r2ij + Bi Bj exp −r2ij /4Bi Bj (30)
where Bi, Bj are effective Born radii of atoms i and j. The basic idea of the GB
approach can be viewed as an interpolation formula between analytical solutions
for a single sphere and for widely separated spheres. The total energy of solvent
polarization of the GB method is a sum of atomic self-polarization energies, gGB i ,
and the energy of polarization interactions, wGB
ij , of pair of atoms i, j similar to the
Eq. (24)
1 1 q2 1 qi qj 1 1
pol (r) − −
i
GGB +
D0 DI i
Bi 2 ij fGB (rij , Bi , Bj ) D0 DI
1 GB
gGB
i + w (31)
i
2 ij ij
q2i 1 1
gGB
i − − (32)
2Bi DI D0
Comparing Eqs. (31) and (24) one obtains a formal way to define Poisson-ideal
(or FAMBE-ideal) effective Born radius Bi of atom i of the protein in particular
conformation
q2i 1 1
Bi − FM D
− (33)
2gi I D0
Salt effect correction is included in the GB model by the simple substitution [12]
1 1 1 exp(−κ f(rij ))
− → − (34)
DI D0 DI D0
Modeling of Electrostatic Effects in Macromolecules 177
where κ is the Debye-Hükcel screening parameter. The goal of the GB model can be
thought of as an interpolation to find a relatively simple analytical formula, which
for real molecular conformations will reproduce, as much as possible, the results
of the Poisson equation. The GB model using the Poisson-ideal Born atomic radii
Bi provides an accurate approximation of the Poisson polarization free energy of
proteins [38, 105] with errors within ~1–3%. A calculation of the Poisson-ideal
Born radii set on the base of Eq. (33), i.e. by solving Poisson equation is impractical
[12], therefore a rapid and still reasonable approximations for the effective Born
radii to its Poisson-ideal values is needed. If an accurate effective Born radii can
be computed for each atom of molecule at low CPU time, than the computational
advantage of the analytical GB model relative to the numerical FD or BE solution
becomes obvious.
The original GB method [135] estimates the effective Born radii Bi by expression
using Coulomb field approximation (CFA) for electrostatic field in a solvent and
protein volume. The CFA self-polarization free energy GCFA i of a charge qi
q2i 1 1 dV
GCFA − (35)
i
2 × 4π D0 DI r>SEV |r − ri |4
where SEV is the solvent excluded volume. The effective Born radius in the CFA
approximation is defined as
SEV
1 dV
B−1
i R−1
i,vdW − (36)
4π |r − ri |4
r>Rvdw,i
where, Rvdw,i is van der Waals radius of atom i. The CFA approximations is exact for
a charge located in the center of spherical volume of excluded solvent. The further
approximation is the evaluation of volume integral of CFA energy density Eq. (36)
by numerical integration [135] over the volume of the van der Waals spheres of the
solute atoms instead of the SEV volume, i.e.
V DW
1 dV
Bi−1 −1
Ri,vdW − (37)
4π |r − ri |4
r >Rvdw,i
A closed form analytical expressions for the volume integral Eq. (37) over a set of
overlapping spheres has been derived in the pair-wise approximation [49, 124]. The
GB model with HTC [49] Born radii formula, Eq. (37), has been developed for small
molecules, where it was found to reproduce solvation energies and individual charge-
charge interactions quite well [33, 49] if a reduced values for atomic van der Waals
radii R*i,vdw Ri,vdw —0.09 Å are used. For macromolecules, the HTC approximation
tends to underestimate the values of Born radii for burried atoms [105] because the
integration procedure for Eq. (37) treats small vacuum-filled crevices between the
VDW spheres of protein atoms as being filled with water. The HTC formula assigns
178 Y. N. Vorobjev
the Born radii for medium size proteins in quite narrow interval ~1.5–4.0 Å, while
the range of values for the Poisson ideal Born radii is much large ~1.5–12 Å.
Bi (AGBNP)/Bi (SEV) ~ 1.4–3.0 for buried atoms with Born radiuses Bi (SEV) > 5 Å.
The AGBNP2 model is implemented in the MD package and shows a reasonable per-
formance on a large set of test proteins [45].
A simple and quite accurate expression to compute the effective Born radii was
proposed in the study [33], the R6 Born radii method,
⎛ SEV
⎞1/3
3 dV ⎠
Bi−1 ⎝ Ri,vdW
−3
− (38)
4π |r − ri |6
r >Rvdw,i
The R6 radii formula are exact for any location of a charged atom within a perfect
spherical solute in the limit D0 /DI 1 [1, 99]. It have been shown that R6 Born radii
are computed by accurate numerical integration over exact MS or SEV [99] are in
very close agreement with Poisson-ideal Born radii. The study of [1] suggests a new
analytical method (AR6) to compute the effective Born radii as empirical function
based on R6 integral of Eq. (38) with pairwise VDW approximation of the SEV
molecular volume and several molecular volume correction terms to approximate
more exactly the true solvent excluded volume in a vicinity of the atom in question.
The AR6 effective Born radii are defined by empirical function with several param-
eters which were optimized by parametrization. The RMSD between the inverse
effective AR6 and the Poisson-ideal Born radii for medium size protein lysozyme is
about 0.064. The Born radii of buried atoms with Born radii Bi > 3.3 Å are estimated
by the AR6 model with errors more than 20% and the error is increased up to 50%
for deeply buried atom with Born radii Bi > 6 Å. For the small drug-like molecules
the AR6 model with cavity term, of Eq. (13), and van der Waals solvation term, of
Eq. (14), reproduces the experimental solvation free energies with good accuracy,
the RMSD error is equal to 1.73 kcal/mol.
The accurate and fast version of the MSR6 method [1] for calculation of the
volume integral of Eq. (38) is developed recently [153]. The atomic Born radius
Bi (MSR6) of atom at position ri is defined by the integral over the protein MS [98]
⎛ ⎞1/3
1 (s − ri )n(s) ds ⎠
B−1
i ⎝ (39)
4π |s − ri |6
S
where n(s) is a normal vector to the MS at the point s. The MSR6 formula, Eq. (39),
follows from the Eq. (38). It has been shown that when the MSR6 atomic Born radii
are computed by accurate numerical integration over the exact MS [98] they are
in very close agreement with Poisson-ideal Born radii. Calculation of the surface
integral in Eq. (39) with uniform tessellation of protein MS by surface elements
used by Aguilar et al. [1] is a procedure of numerical complexity of O(N5/3 ) for
a protein with N atoms. The fast method for calculation of the surface integral in
Eq. (39) is based on the FAMBE adaptive tessellation of the protein MS by the
multi-sized boundary elements. The FAMBE adaptive tessellation reduces numerical
180 Y. N. Vorobjev
complexity of calculation of atomic Born radii to the order of O(N log N), because
the number of multi-sized surface elements scales as O(log N) [150]. Furthermore,
the MSR6 approximation of Eq. (39) has been empirically corrected, so that the
corrected approximation, MSR6c,
where Bi (MSR6) is the Born radii in (Å) defined by Eq. (39) over protein MS calcu-
lated by the SIMS method [146] with solvent probe radius of 2.0 Å. The last value
of the solvent probe radius was found to be optimal for approximation of dielectric
surface interface to reproduce the explicit water solvent polarization free energy [1].
Figure 2 shows that the correlation between the two sets of radii Bi (MSR6c) and
FAMBE-ideal Bi (FAMBE) is very high, R2 0.9989. The corrected MSR6 method
gives atomic Born radii, which agree with the FAMBE-ideal atomic Born radii with
average error of 2.5%, i.e. practically with numerical accuracy of solution of the
Poisson equation due to the finite size of boundary elements or 3D-grid [145]. Cal-
culation of almost FAMBE-ideal atomic Born radii Bi (MSR6c) is approximately
100 times faster, than calculation of FAMBE-ideal atomic Born radii by the FAMBE
method, i.e. solving Eq. (23).
Fig. 2 Comparison of FAMBE-ideal atomic Born radii B(FAMBE) with atomic Born radii
B(MSR6c)—red open circles and B(MSR6c)—blue open squares, for several conformations of pro-
teins BPTI, HEWL and RnaseA. The B(MSR6) radii are calculated using Eq. (39); the B(MSR6c)
radii are calculated using Eq. (40). The diagonal lines correspond to exact equality between
B(MSR6c) and B(FAMBE)
Modeling of Electrostatic Effects in Macromolecules 181
4 Protein Ionization
Transport of protein molecule from gas phase into a water proton bath is accompanied
by (de)protonation and ionization of titratable residues. The work required for the
equilibrium ionization is the free energy of ionization Ginz , Eq. (6) or it is the
implicit titration potential of mean force (IT-PMF) for the protein in water proton
bath. A rigorous statistical mechanical formulation of IT-PMF has been considered
by Baptista et al. [7] in terms, which eliminate the explicit reference to a variable
number of protons. The IT-PMF free energy G0inz (x, pH) of protein ionization (from
neutral gas phase state) at a given pH in water-proton bath is defined as
G0inz (x, pH) −kT ln exp (n(z)μ − G0 (x, z))/kT (41)
n,z
n(z) is a total number of bound protons for the ionization microstate z, μ is a chemical
potential of protons, μ −kT·ln10)pH. A canonical MD simulation of a protein with
free energy described by Eq. (41) at constant temperature is the constant pH MD
(CpHMD) simulation of the titratable system in the implicit titration potential of mean
force. To perform such simulation the free energy Ginz (x, pH) should be expressed
in terms of quantities that can be computed on the fly. The first implementation of the
implicit titration potential Ginz (x, pH) for CpHMD method developed by Baptista
et al. [7] was based on the mean field approximation for the ionization degrees and
Tanford-Kirkwood spherical model [138] for the protein.
An accurate implementation of the IT-PMF is provided by the method FAMBEpH
[150, 153] which generalizes FAMBE method [145] for calculating the free energies
of solvent polarization Gpol (x) and protein ionization Ginz (x, pH). The MSR6c
method Eqs. (39)–(40) is used for a fast evaluation of the Born atomic radii. The
GB method with MSR6c Born radii allows one to calculate solvent polarization and
protein ionization free energies and perform analytical calculation of all electrostatic
atomic forces for MD simulation. The FAMBEpH and the GB MSR6c method pro-
vides one with, (i) the solvation free energies of the ionizable residues in water, (ii)
a realistic estimation of an average ionization degrees, their pair correlations and,
(iii) the free energy of ionization and respective atomic forces due to the IT-PMF.
The IT-PMF gives an instant equilibrium response of the proton bath at given pH,
therefore the CpHMD with the IT-PMF can be more effective then the commonly
used explicit stochastic titration method which considers a vast number of randomly
generated ionization microstates [90, 97, 159].
182 Y. N. Vorobjev
The ionization free energy, Ginz (x, pH), can be calculated by thermodynamic inte-
gration method as a titration process from zero hydrogen-ion concentration to a given
value of pH via the Tanford-Schellman integral [126, 137]
ξ
∂Ginz (x, pH)
kT(ln 10) θi zi (x, pH) (43)
∂pH i1
where <zi (x, pH)> is the average ionization degree of site i in the protein in confor-
mation x; parameter θ i is equal to 1 or −1 if the ionizing group is a base or an acid,
respectively. Integrating over pH one obtains practically treatable expression [150,
162] to calculate the free energy of ionization
where the functions zi (x, pH) and zi,mod (x, pH) are the average ionization degree
of site i in the protein in conformation x, and in the isolated model compound,
respectively. The energy Ginz is the free energy of ionization of protein relative
to the total free energy of ionization of the all titratable residues of the respective
model compounds, i.e. isolated amino acids
Ginz (x, pH) Ginz (x, pH) − Ginz,mod (x, pH) (45)
For the site i in protein conformation x at a given pH, the average ionization
degrees <zi (x, pH)> are calculated by a Monte Carlo random walk in the space of
ionization microstates z
ξ
1
2
zi (x, pH) δ(zi ) exp (n(z)μ − G0 (x, z, pH))/kT (46)
Zinz z
where δ(zi ) is occupation (0, 1) of the ionization microstate zi , Zinz is the partition
function over all ionization microstates. It is shown [150] that a direct calculation of
the free energy from partition function from Eq. (41) and calculation by the integral,
Eq. (44), give well coincided numerical values for protein BPTI.
The total energy Ginz (x, pH) of the Eq. (41) can be presented relative to any
reference ionization microstate zr . Assuming that the Grinz (x, pH) is the free energy
of ionization of the protein at given pH with respect to the reference ionization
microstate zr , from Eq. (41), one obtains
Modeling of Electrostatic Effects in Macromolecules 183
Grinz (x, pH) + G(x, zr , pH) G0inz (x, pH) + G(x, z0 , pH) (47)
It follows from Eqs. (41), (47) that the energy Grinz (x, pH) has a minimal absolute
value if the reference ionization microstate zr is equal to the most probable ionization
microstate zp with minimal energy G(x, zp , pH). Thereby the most probable ionization
microstate zp is the optimal one-state approximation of the equilibrium ensemble of
ionization states.
Finally, the total free energy G(x, pH) of a protein in water-proton bath can be
presented relative to the most probable ionization microstate zp
p p p
G(x, pH) Umol (x) + Gpcav (x) + Gpol (x) + Ginz (x, pH) (48)
The first three terms of that equation describes physically real protein structure
p
in the ionization microstate zp . The IT-PMF Ginz (z, pH) has a minimal amplitude
for the optimal ionization microstate and describes correction due to deviation the
microstate zp from the equilibrium ensemble of ionization microstates.
GS (z)
Si (0) + n(z)H+ −→ Si (z i )
i i
↓ G(0) ↓ G(z) (49)
GP (X,z)
P(X, 0) + n(z)H+ −→ P(X, z)
where P(X, z) is the protein in the macroscopic conformation X and fixed ionization
state z, the Si (zi ) is the model compound site i in the state zi and GS (z) is the
free energy of protonation (deprotonation) of model compounds with n(z) protons,
(n(z) may be positive or negative), from the initial Si (0) fully ionized state; GP (X,
z) is the free energy of protonation reaction of the protein from its fully ionized state
P(0); G(0), G(z) are the free energy difference between model compounds and
protein in the fully ionized and in the protonated states, respectively
G(z) GP (X, z) − GSi (zi ) (50)
i
where G(X, z) (GSi (zi )) are the free energy of protein (the model compound) in the
fixed ionization state z (zi ), respectively. The fundamental assumption behind the use
of model compounds is that the quantum contribution for the (de)protonation of site
PSi in the protein is the same as in its corresponding model compound Si , so that only
classical contributions (from molecular mechanical model) need to be considered in
Eq. (41). The free energy of molecule in the solvent at the fixed ionization state (i.e.
fixed atomic charges) at particular macroscopic conformation is given by expression
[147]
The model compounds in solution contribute independently to the energy GS (z),
thus
ξ
GS (z) ln(10)kT θi pH − pKSi (zi ) (53)
i,zi
where, pKSi (zi ) is the pKa value of the deprotonation (protonation) reaction involving
the neutral tautomeric form Si (zi ) related to its macroscopic experimental pKa [9,
92]
Modeling of Electrostatic Effects in Macromolecules 185
where fi (zi ) is the fraction of the tautomer zi among all neutral tautomers of the model
compound Si , the pKSi is the macroscopic pKa of the model compound [103]. The
modern practice [92, 102] is to consider the thermodynamic cycle (49) assuming
the next approximations: (1) the protein is frozen in a particular conformational
microstate x, (2) the protein is considered as set ζ + 1 nonoverlapping fragments of
protonatable amino acids plus the remaining nonprotonatable background (B), (3) the
total protein free energy the Eq. (51) is approximated by the molecular-mechanical or
electrostatic energy of the protein in solution. The electrostatic energy is calculated
with the linear Poisson-Boltzmann equation in the continuum dielectric model,
where Ucoul
m is the molecular electrostatic energy in vacuum, Gpol is the solvent
polarization free energy Eq. (24). The linearity of the Poisson-Boltzmann equation
implies that the superposition principle holds for these fragments, giving for energy
UP of the protein
UP (x, z)
ξ
ξ
ξ
UPBB (x) + UPiB (x, zi ) + UPii (x, zi ) + UPij (x, zi , zj ) (55)
i i i>j
where UPαβ denotes the energy of interactions between fragments α and β. Finally,
the free energy of microstate z of the protein protonation reaction GP
ξ
GP (x, z) ln(10)kT δ(z, i) θi pH − pKSi − log fi (zi )
i zi
ξ
+ δ(z, i) (UPiB (x, zi ) + UPii (x, zi ) − USii (x, zi )
i zi
ξ
+ δ(z, i)δ(z, j)UPij (x, zi , zj ) (56)
i>j zi ,zj
where the δ(z, i) 0,1 is the occupation number of the state i in the ionization
microstate z; θi = −1, 1, 0 if the state i is acid, base or neutral tautomer, respec-
tively. The first sum of the Eq. (56) is the model compounds energy of protonation
corrected on the entropy factor, Eq. (54), due to the neutral tautomer fraction fi (zi );
the second sum is the effect of protein environment on ionizable site i in the state zi ;
the third sum is the energy of interaction of ionizable sites i, j in the isomeric states
zi , zj . A similar expression for the free energy of ionization microstate is consid-
186 Y. N. Vorobjev
ered by Song et al. [132] in the MCCE2 method, which considers both neutral state
tautomerism and side chain rotamers.
Probability p(x, z) to find protein in conformation x in the ionization state z is
defined by Boltzmann factor
p(x, z) exp −GP (x, z)/kT Zinz (57)
ξ
1
ξ ∂
p ∂ p p
<δi > − δi gi (x) + <δi δj > − δi δj wij (x) (58)
i1
∂rk 2 ij ∂rk
where, gi (x) is the electrostatic energy of ionization of the titratable group i, Δwij (x)
is the energies of pair interactions of titratable groups i, j, <δi > is the average occu-
p
pation of the state i and δi is occupation of the state i in the most probable optimal
ionization microstate; <δi δj > is the pair correlation of occupations of titratable groups
i and j, which are calculated by the method FAMBEpH [150]. An effective calcula-
tion of the gradients gi (x) and wij (x) over coordinate of atom ri is done in the
Modeling of Electrostatic Effects in Macromolecules 187
framework of the GB method with the Born radii defined by the MSR6c method
Eqs. (39), (40).
The CpHMD-IT method is implemented as a sequential algorithm [153], which
consists of the following 5 steps: (1) for a given protein conformation x0 at the time
t0 , the optimal ionization microstate zp , average occupation degrees <δi >, pair corre-
p
lation matrix <δi ·δj > and the PMF Ginz (x, pH) are calculated using the FAMBEpH
method [150], (2) initialization of the molecular topology of the protein molecule
in the optimal ionization microstate zp , (3) assignment of a velocity for each new
bound proton as the one equal to the velocity of the respective heavy atom; (4) MD
simulation of the protein molecule in the fixed ionization microstate zp in the force
field defined by Eq. (48) during the time τzfix ~ 2–4 ps, (5) return to the step (1).
The CpHMD-IT simulations were carried out at constant temperature of 300 K
using the in-house MD program BISON [151]. The optimal ionization microstate zp ,
p
average ionization degrees <δi >, pair correlation matrix <δi ·δj > and the PMF Ginz (x,
pH) are calculated using the FAMBEpH [150] method with the salt concentration
0.15 M and the dielectric constants D0 80 and DI 16. The large value of DI which
is used for calculation of ionization equilibrium for a fixed protein conformation x
accounts for reorganization due to nonstructural responses (e.g., charge redistribution
due to ionization) not captured by the current method [8]. The AMBER99 force field
Fig. 3 Comparison of PMFs W(FAMBE) of the FAMBE method with PMFs W(MSR6c) of the
Generalized Born model with almost-ideal atomic Born radii B(MSR6c) for pairs of atoms from
several conformations of proteins BPTI, HEWL and RnaseA. The diagonal solid line corresponds
to exact equality between values of two PMFs
188 Y. N. Vorobjev
[155] was used for calculations of intramolecular energy and forces. A consistent set
of atomic charges for protein residues in neutral and ionized states was computed
by the RESPA method [5]. Intramolecular electrostatic, solvent polarization energies
and all electrostatic atomic forces of Eq. (48) were calculated by the GB method with
salt effects using the almost FAMBE-ideal atomic Born radii Bi (MSR6c) with the
dielectric constants D0 80, DI 1 and salt concentration of 0.15 M. The optimal
update time-step for atomic Born radii τB 0.02–0.04 ps, which allows one to
generate a stable CpHMD-IT trajectory corresponding to RMSD about of 2 Å from
crystal structure [153] for a set of test proteins BPTI, HEWL and RNase A (Fig. 3).
The implicit solvent models have several advantages over the explicit molecular water
representation in MD simulation [106, 122, 148], (i) the implicit models describe an
instantaneous solvent dielectric response, which eliminate the need for the lengthy
equilibration of water that is necessary in explicit water simulations, (ii) the absence
of solvent reorganization energy barriers and dynamical viscosity associated with
explicit water environment allows the solute molecule more quickly explore the
available conformational phase space, (iii) the implicit dielectric continuum model
corresponds to solvation in an infinite volume of solvent avoiding possible artifacts
of solute replica electrostatic interactions in the periodic systems typically used with
explicit solvent models [58], (iv) the implicit titration method describes an instant
response of proton bath and eliminate the need for a vast number of ionization
microstates to model equilibrium ionization state, (v) estimating free energies of sol-
vated structures is much more straightforward than it can be done with explicit water
models, (vi) the computational cost associated with the use of implicit models is con-
siderably smaller than the cost of simulation representing water explicitly. Therefore
a realistic implicit models representing electrostatic effects find a wide applications
in biomolecular simulations. A reliable implicit solvent model should be carefully
optimized in conjunction with particular force field to reproduce the experimental
solvation energies for representative set of small molecules, the potential of mean
force of interactions between pairs of protein side chains in explicit solvent and the
secondary structure equilibrium for peptides [27, 28, 45].
The growing gap between the number of known protein sequences and the number
of structures solved by the X-ray or the NMR methods increases the interest in the
Modeling of Electrostatic Effects in Macromolecules 189
Fig. 4 The total electrostatic energy electrostatic energy of protein decoys versus decoy’s RMSD
from native structures
structure relaxation is superior with success rate ~89%, compare to other less real-
istic solvation models. This result confirms the importance of a reliable model for
electrostatic energy of protein in water solvent. The long-range nature of electro-
static interactions in large extent depends on the optimum of the global distribution
of charged and neutral residues over the protein volume and the shape of protein
molecular surface, compare to that dependence for the short-range van der Waals
interactions.
An accurate prediction of pKa is crucial for reliable modeling of virtually all bio-
logical processes. The current methods of pKa prediction have reached an average
accuracy (RMSD with experimental data) of less than 1 pH unit as reported in
benchmarking papers [11, 34, 69, 83, 128, 134, 140, 141]. However, the reported
benchmark databases are predominantly made of pKa values of surface exposed ion-
izable groups, while an analysis of failures showed that the most problematic are the
predictions of pK’s of buried amino acids. The first pKa-cooperative meeting [2, 102]
Modeling of Electrostatic Effects in Macromolecules 191
indicated that none of existing methods can predict the pKa values for buried amino
acids with the same level of accuracy, i.e. ~1 pK unit [160]. Ionization of the surface
amino acids negligible affects protein stability due to water screening. Ionization of
buried group could in principle significantly reduce protein stability by more than
tenth of kcal/mol. Such an energy change is comparable with typical folding free
energy and could cause partial unfolding or significant structural changes. Therefore
any attempt to predict the pKa value of such groups using static 3D structure will be
potentially wrong. For accurate pKa predictions the methods have to be able to model
induced structural rearrangement or protein structure reorganization and dielectric
response.
The most successful modern practical methods for calculation of pKa of ionizable
group of proteins are based on the continuum electrostatic model described in the
previous sections and take into account neutral state tautomers and conformational
sampling [2]. The conformational sampling can be taken by two different ways.
The first one is the uncorrelated sampling from a set of predefined conformational
states, which are uncorrelated with ionization microstates. The second one is the
conformational sampling by the method of molecular dynamics at constant pH with
conformational states which are correlated on the fly with ionization microstates.
and calculated pKa s is 0.53 is quite low. The improved version of the hybrid MCCE2
uses intensive generation of the side-chain and main-chain conformations by the MD
simulation of protein with ionizable buried residues [160] at all neutral and all ion-
ized states to extend conformational sampling. The hybrid MCCE2 method shows
some minor improvement over the original MCCE2 method.
One of the major factors affecting the modeling of the protein protonation is the
coupling between ionization and conformational states which is explicitly addressed
by the constant-pH molecular dynamics methods [7, 24, 73, 153, 159]. The CpHMD
methods inherit the problems of accuracy of the underlying atom-atom force field
and the parameters of PB or GB methods to compute the protonation free energies
using continuum dielectric model. The constant-pH MD methods can be classified
into two categories: (i) methods of explicit titration, [8, 10, 36, 90, 92, 97, 159] that
consider physical discrete ionization microstates z and (ii) methods of implicit con-
tinuous titration [72, 74, 82] that work with continuous average ionization degrees
<z> of titratable groups. Progress in the molecular simulation of pH-dependent bio-
logical processes and prediction of the pKa values of protein residues were reviewed
recently [75, 156]. Methods of explicit titration consider random walk in the dis-
crete space of ionization microstates using the Monte Carlo method. For a given
protein conformation x, a Markov chain of ionization microstates zα is generated
by the Metropolis method on the basis of the free energy difference G(x, z1 , z2 ,
pH) between two ionization microstates z1 and z2 . Then, a general MD method is
applied to sample the conformational space x of the protein in the accepted ioniza-
tion microstate. Thus, by the periodic repetition of the MC sampling of ionization
states z and the MD sampling of conformational states x, a distribution of states
(x, z) corresponding to the grand canonical ensemble of ionization-conformational
microstates is generated [10]. Methods of such explicit stochastic titration differ one
from another in several details, such as: (i) method used for calculation of the energy
difference G(x, z1 , z2 , pH) between two ionization microstates z1 and z2 , (ii) MC
method to sample ionization microstates and, (iii) MD program and/or protocol of
MD simulation at a given ionization microstate z. The MD GROMACS package
[17] has been used for MD with explicit water at constant temperature and pressure
to study ionization-conformation coupling in decalysine [90], cytochrome c3 [91]
and lysozyme [92]. The continuum electrostatic model was used for MC sampling
of ionization microstates. The methods employing explicit solvent model for MD
simulation and CEP model for calculation of protonation state energies are computa-
tionally expensive, and MC trial moves are attempted relatively infrequently, causing
long convergence time for systems with multiple titration sites. The GB implicit sol-
vent model employed in both the MC and MD steps via the CHARMM-MD package
[36]. McCammon group [159] used the GB solvent model for both the MC step
and MD simulations with AMBER8 package [25]. Predictions of pKa of titratable
Modeling of Electrostatic Effects in Macromolecules 193
residues were obtained from a set of 5 ns MD simulations at 300 K with about 5 × 105
MC trials of changing ionization microstate of one randomly chosen residue repeated
every 10 fs. This hybrid MD/MC constant pH simulation scheme has a limitation due
to a frequent, ~10 fs, periodic abrupt switch in the protonation state which introduces
a discontinuity in energy and atomic forces and may result in conformational and
energetic instabilities during the MD sampling of conformational states.
The recent works [6, 75, 82] rely on the explicit λ-titration method using λ-
dynamics method [78] to simulate proton binding/release by a set of titratable sites.
The replica exchange (REX) protocol [74] is able to enhance sampling of protonation
and conformational states. After completing all REX-CpHMD cycles for a wide pH
range, the titration coordinates are collected into values of probability of protonated
(unprotonated) state of the site. The calculated pKa of residues are obtained by the
fit of the probability of (de)protonation versus pH to the Henderson-Hasselbalch
equation. The REX-CpHMD method with an improved GBSW solvent model and
salt-screening with the CHARMM molecular modeling package was used for titration
simulation of 10 proteins. The experimental pKa values of residues of these proteins
Fig. 5 Dependence of the average ionization free energy Ginz (pH) of the protein HEWL versus
pH. Solid line is calculated values; filled black bars show the standard deviations (fluctuations) of
the free energy Ginz (x, pH) for ensemble of protein structures x for a given pH calculated over
2 ns trajectories; open circles and dotted line is experimental free energy of ionization Gexp (pH)
computed from experimental titration curve
194 Y. N. Vorobjev
were reproduced with rmsd of 0.6–1.2 with maximum errors of 1.0–4.2 pK units
for buried residues. Recently [6] the REX-CpHMD method was used for predicting
extreme pKa shifts in staphylococcal nucleases mutants. The experimental highly
perturbed pKa values were predicted with average unsigned error of 1.5 pK units,
while the maximum errors is still ~4 pK units for buried residues.
The recently developed CpHMD method with implicit titration potential of mean
force [153] described in the Sect. 4.4 is tested on three proteins, BPTY, HEWL and
RNase A. The developed implicit model of water-proton bath provides an efficient
way to study thermodynamics of biomolecular systems as a function of pH, Fig. 5.
The theoretical framework of the current electrostatic model are based on three
approaches: (i) continuum dielectric model for protein with low uniform dielectric
constant in the interior protein volume, DI , and bulk solvent dielectric constant, D0 ,
in the outside volume; (ii) linear Poisson-Boltzmann equation, and (iii) empirical
atom-atom force field for CpH-MD simulations.
The assumption of the uniform dielectric constant in the protein volume has a
limited accuracy, because the protein environment, local flexibility and dielectric
response is not uniform through the protein volume [19]. Moreover the dielectric
response can be modulated by the small internal cavities [96] presumably filled
with water molecules. The pKa values of protein surface residues are tend to be
very similar to the pKa values of isolated amino acids in water and are governed
by negligible desolvation of the highly flexible protein-water interface. They are
predicted optimally by the model with high value of protein dielectric constant DI
16–20 [9, 36, 150]. The pKa shifts of the buried ionizable groups in staphylococcal
nuclease (SNase) are always in the direction that promotes the neutral form of the
ionizable groups. This suggested that pKa values are primarily determined by the
desolvation of the buried groups. The desolvation of the buried groups appears to
be poorly counterbalanced by compensating factors to stabilize charged states of
residues [102]. The apparent dielectric constant are varied through protein volume
in the range of 20–8 for surface and buried residues, respectively, as shown by
estimations of the required desolvation penalty using the GB model and experimental
pKa of buried lysine residues in the SNase mutants [60]. A simulation of dielectric
properties of solvated proteins via MD showed that the dielectric response varies
through protein volume for surface and hydrophobic core regions of protein [19],
with average protein dielectric constant ~14–15 units.
The linear Poisson-Boltzmann equation has a limited accuracy to account for ion-
ion correlation and salt effects for protein with highly charged surface, e.g. when the
pH is far from the isoelectric point. The counter-ion condensation effect becomes
significant for a such conditions [150] and certainly can not be ignored. The atomic
radii defining solute-solvent dielectric interface on atomic charges are dependent on
atomic charges [55].
Modeling of Electrostatic Effects in Macromolecules 195
Acknowledgements This work was supported by a grant from the Russian Fund of Basic Research
#12-04-00135a, by grant #130-2012 from the Siberian Brunch of Russian Academy of Science and
exchange visitor program P-1-00043 of the Cornell University.
References
1. Aguilar, B., Shadrach, R., Onufriev, A.V.: Reducing the secondary structure bias in the gen-
eralized Born model via R6 effective Radii. J. Chem. Theory Comput. 6, 3613–3630 (2010)
2. Alexov, E., Mehler, E.L., Backer, N., Baptista, A.M., et al.: Progress in the prediction of pKa
values in proteins. Proteins 79, 3260–3275 (2011)
3. Aqvist, J., Hansson, T.: On the validity of electrostatic linear response in polar solvent. J.
Phys. Chem. 100, 9512–9521 (1996)
4. Arnautova, E.Y., Jagielska, A., Scheraga, H.A.: A new force field ECEPP05 for peptides,
proteins and organic molecules. J Phys. Chem. B 110, 5025–5044 (2006)
5. Arnautova, E.Y., Vorobjev, Y.N., Vila, J.A., Scheraga, H.A.: Identifying native-like protein
structures with scoring functions based on all-atom ECEPP force fields, implicit solvent
models and structure relaxation. Proteins 77, 38–51 (2009)
6. Arthur, E.J., Yesselman, J.D., Brooks III, C.L.: Predicting extreme pKa shifts in staphylococcal
nuclease mutants with constant pH molecular dynamics. Proteins 79, 3276–3286 (2011)
7. Baptista, M., Martel, P.J., Petersen, S.B.: Simulation of protein conformation freedom as a
function of pH: constant-pH molecular dynamics using implicit titration. Proteins 27, 523–544
(1997)
8. Baptista, M., Martel, P.J., Soares, C.M.: Simulation of electron-proton coupling with a Monte
Carlo method: application to cytochrome c(3) using continuum electrostatics. Biophys. J. 76,
2978–2998 (1999)
9. Baptista, M., Soares, C.M.: Some theoretical and computational aspects of inclusion of proton
tautomerism in the protonation equilibrium of proteins. J Phys. Chem. B 105, 293–309 (2001)
10. Baptista, A.M., Teixeira, V.H., Soares, C.M.: Constant-pH molecular dynamics using stochas-
tic titration. J. Chem. Phys. 2002(117), 4184–4200 (2002)
11. Bashford, D., Gerwert, K.: Electrostatic calculations of the pKa values of ionizable group in
bacteriorodopsin. J. Mol. Biol. 224, 473–486 (1992)
196 Y. N. Vorobjev
12. Bashford, D., Case, A.D.: Generalized born models of macromolecular solvation effects.
Annu. Rev. Phys. Chem. 51, 129–152 (2000)
13. Beglov, D., Roux, B.: An integral equation to describe the solvation of polar molecules in
liquid water. J. Chem. Phys. 104, 8678–8689 (1996)
14. Beglov, D., Roux, B.: Solvation of complex molecules in a polar liquid: an integral equation
theory. J. Phys. Chem. 101, 7821–7826 (1997)
15. Ben-Naim, A., Marcus, Y.: Solvation thermodynamics of nonionic solutes. J. Chem. Phys.
81, 2016–2027 (1984)
16. Ben-Naim, A.: Solvent effects on protein association and protein folding. Biopolymers 29,
567–596 (1990)
17. Berendsen, H.J.C., Van der Spoel, D., Van Drunen, R.: GROMACS: a message passing parallel
molecular dynamics implementation. Comput. Phys. Commun. 1995(91), 43–56 (1995)
18. Bharadwaj, R., Windemuth, A., Sridharan, S., Honig, B., Nicholls, A.: The fast multipole
boundary element method for molecular electrostatics: an optimal approach for large systems.
J. Comput. Chem. 16, 898–913 (1995)
19. Boresch, S., Ringhofer, S., Hochtl, P., Steinhauser, O.: Toward better description and under-
standing of biomolecular solvation. Biophys. Chem. 78, 43–68 (1999)
20. Bradley, P., Misura, K.M., Baker, D.: Towards high-resolution de nova structure prediction
for small proteins. Science 309, 1868–1871 (2005)
21. Brooks III, C.L., Karplus, M., Pettitt, B.M.: Proteins a theoretical perspectives of dynamics,
structure and thermodynamics. In: Prigogine, I., Rice, S.A. (eds.) Advances in Chemical
Physics, vol. LXXI. Wiley, New York (1988)
22. Brooks, B.R., Brooks III, C.L., Mackerell, A.D., Nilsson, L., Petrella, R.J., Roux, B., Won, Y.,
Archontis, G., Bartels, C., Boresch, S., Caflisch, A., Caves, L., Cui, Q., Dinner, A.R., Feig, M.,
Fischer, S., Gao, J., Hodoscek, M., Im, W., Kuczera, K., Lazaridis, T., Ma, J., Ovchinnikov, V.,
Paci, E., Pastor, R.W., Post, C.B., Pu, J.Z., Schaefer, M., Tidor, B., Venable, R.M., Woodcock,
H.L., Wu, X., Yang, W., York, D.M., Karplus, M.: CHARMM: the biomolecular simulation
program. J. Comput. Chem. 30, 1545–1615 (2009)
23. Bogusz, S., Cheatham III, T.E., Brooks, R.R.: Removal of pressure and free energy artifacts
in charged periodic system via net charge corrections to the Ewald potential. J. Chem. Phys.
108, 7070–7084 (2007)
24. Bürgi, R., Kollman, P.A., Van Gunsteren, V.F.: Simulating proteins at constant pH: an approach
combining molecular dynamics and Monte Carlo simulations. Proteins 47, 469–480 (2002)
25. Case, D.A., Darden, T., Cheatham III, T.E., Simmerling, C., Wang, J., Merz, K.M., Wang, B.,
Pearlman, D.A., Duke, R.E., Crowley, M., Brozell, S., Luo, R., Tsui, V., Gohlke, H., Mongan,
J., Hornak, V., Caldwell, J.W., Ross, W.S., Kollman, P.A.: Amber8. University of California,
San Francisco (2004)
26. Chen, J., Brooks, C.: Critical importance of length-scale dependence in implicit modeling of
hydrophobic interactions. J. Am. Chem. Soc. 129, 2444–2445 (2007)
27. Chen, J., Brooks, C.: Implicit modeling of nonpolar solvation for simulating protein folding
and conformational transitions. Phys. Chem. Chem. Phys. 10, 471–481 (2008)
28. Chen, J.: Effective approximation of molecular volume using atom-centered dielectric func-
tions in generalized Born models. J. Chem. Theory Comput. 6, 2790–2803 (2010)
29. Chothia, C.H.: Hydrophobic bonding and accessible area in proteins. Nature 248, 338–339
(1974)
30. Connolly, M.L.: Analytical molecular surface calculation. J. Appl. Crystallogr. 16, 548–558
(1983)
31. Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221,
709–713 (1983)
32. Connolly, M.L.: Computation of molecular volume. J. Am. Chem. Soc. 107, 1118–1124
(1985). http://www.netsci.org/Science/Compchem/feature14e.html
33. Curutchet, C., Cramer, C.J., Truhlar, D.G., Ruiz-Lopez, M.F., Rinaldi, D., Orozco, M., Luque,
F.J.: Electrostatic component of solvation: comparison of SCRF continuum models. J. Com-
put. Chem. 24, 284–297 (2003)
Modeling of Electrostatic Effects in Macromolecules 197
34. Davies, M.N., Toseland, C.P., Moss, D.S., Flower, D.R.: Benchmarking pKa prediction. BMC
Biochem. 7, 18–30 (2006)
35. Douglas, C.C.: Multigrid methods in science and engineering. Comput. Sci. Eng. 3, 55–68
(1996)
36. Dlugosz, M., Antosiewicz, J.M.: Constant pH molecular dynamics simulations: test case of
succinic acid. Chem. Phys. 302, 161–170 (2004)
37. Dominy, B.N., Brooks, C.L.: Identifying native-like protein structures using physics-based
potentials. J. Comput. Chem. 23, 147–160 (2002)
38. Feig, M., Onufriev, A., Lee, M., Im, W.: Performance comparison of Generalized Born and
Poisson methods in the calculation of electrostatic solvation energies for protein structures.
J. Comput. Chem. 25, 265–284 (2004)
39. Fisher, D.: 3D-SHORTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins
51, 434–444 (2003)
40. Felts, A.K., Gallicchio, E., Wallqvist, A., Levy, R.M.: Distinquishing native conformations
of proteins from decoys with an effective free energy estimator based on the OPLS all-atom
force field and the surface generalized Born solvent model. Proteins 48, 404–422 (2002)
41. Fogolary, F., Esposito, G., Viglino, P., Molinari, H.: Molecular mechanics and dynamics of
biomolecules using a solvent continuum model. J. Comput. Chem. 22, 1830–1842 (2001)
42. Gallicchio, E., Kubo, M.M., Levy, R.M.: Enthalpy-entropy and cavity decomposition of alkane
hydration free energies: numerical results and implications for theories of hydrophobic sol-
vation. J. Phys. Chem. B. 104, 6271–6285 (2000)
43. Gallicchio, E., Zhang, L.Y., Levy, R.M.: The SGB/NP hydration free energy model based on
the surface genaralized Born solvent reaction field and novel nonpolar hydration free energy
estimators. J. Comput. Chem. 23, 517–529 (2002)
44. Gallicchio, E., Levy, R.: AGBNP: an analytic implicit solvent model suitable for molecular
dynamics simulations and high-resolution modeling. J. Comput. Chem. 25, 479–499 (2004)
45. Gallicchio, E., Paris, K., Levy, R.: The AGBNP2 implicit solvation model. J. Chem. Theory
Comput. 5, 2544–2564 (2009)
46. Goel, N.S., Gang, F., Ko, Z.: Electrostatic field in inhomogeneous dielectric media. Indirect
boundary element method. J. Comput. Phys. 118, 172–179 (1995)
47. Grant, J.A., Pickup, B.T.: A Gaussian description of molecular shape. J. Phys. Chem. 99,
3503–3510 (1995)
48. Gribenko, A.V., Patel, M.M., Liu, J., McCallum, S.A., Makhatadze, G.I.: Rational stabilization
of enzymes by computational redesign of surface charge-charge interactions. Proc. Natl. Acad.
Sci. U.S.A. 106, 2601–2606 (2009)
49. Hawkins, G.D., Cramer, C.J., Truhlar, D.G.: Parametrized models of aqueous free energies
of solvation based pairwise solute descreening of solute atomic charges from a dielectric
medium. J. Phys. Chem. 100, 19824–19836 (1996)
50. Hermann, R.B.: Theory of hydrophobic bonding. II. The correlation of hydrocarbon solubility
in water with solvent cavity surface area. J. Phys. Chem. 76, 2754–2759 (1972)
51. Holst, M., Kozack, R.E., Saied, F., Subramaniam, S.: Treatment of electrostatic effects in
proteins: multigrid-based Newton iterative method for solution of the full nonlinear Poisson-
Boltzmann equation. Proteins 18, 231–245 (1994)
52. Holst, M., Saied, F.: Numerical solution of the nonlinear Poisson-Boltzmann equation: devel-
oping more robust and efficient methods. J. Comput. Chem. 16, 337–364 (1995)
53. Holst, M., Baker, N., Wang, M.: Adaptive multilevel finite element solution of the Pois-
son–Boltzmann equation I. Algorithms and examples. J. Comput. Chem. 21, 1319–1342
(2000)
54. Honig, B., Sharp, K., Yang, A.S.: Macroscopic models of aqueous solutions: biological and
chemical applications. J. Phys. Chem. 97, 1101–1109 (1993)
55. Hou, G., Zhu, X., Cui, Q.: An implicit solvent model for SCC-DFTB with charge-dependent
radii. J. Chem. Theory Comput. 6, 2303–2314 (2010)
56. Hummer, G., Pratt, L.R., Garcia, A.E.: Hydration free energy of water. J. Phys. Chem. 99,
14188–14194 (1995)
198 Y. N. Vorobjev
57. Hummer, G., Pratt, L.R., Garcia, A.E.: Free energy of ionic hydration. J. Phys. Chem. 100,
1206–1215 (1996)
58. Hűnnenberg, P.H., McCammon, J.A.: Effect of artificial periodicity in simulations of
biomolecules under Ewald boundary conditions: a continuum electrostatic study. Biophys.
Chem. 78, 69–88 (1999)
59. Im, W., Lee, M.S., Brooks III, C.L.: Generalized Born model with a simple smoothing func-
tion. J. Comput. Chem. 24, 1691–1702 (2003)
60. Isom, D.G., Castaneda, C.A., Cannon, B.R., Garcia-Moreno, B.E.: Large shifts in pKa values
of lysine residues buried inside a protein. PNAS 108, 5260–5265 (2011)
61. Jackson, J.D.: Classical electrodynamics. Wiley, New York (1975)
62. Jackson, R.M., Sternberg, J.E.: Application of scaled particle theory to model the hydrophobic
effect: implications for molecular association and protein stability. Protein Eng. 7, 371–383
(1994)
63. Jackson, R.M., Sternberg, J.E.: A continuum model for protein-protein interactions: applica-
tions to the docking problem. J. Mol. Biol. 250, 258–275 (1995)
64. Jayaram, B., Fine, R., Sharp, K., Honig, B.: Free energy calculations of ion hydration: an
analysis of the Born model in terms of microscopic simulations. J. Phys. Chem. 93, 4320–4327
(1989)
65. Jorgensen, W.L., Madura, J.D.: Temperature and size dependence for Monte Carlo simulations
of TIP4P water. Mol. Phys. 56, 1381–1392 (1985)
66. Jorgensen, W.L., Maxwell, D.S., Tirado-Rives, J.J.: Development and testing of the OPLS
all-atom force field on conformational energetics and properties of organic liquids. J. Am.
Chem. Soc. 118, 11225–11236 (1996)
67. Jorgensen, W., Tirado-Rives, J.: Free energies of hydration from a generalized born model
and an all-atom force field. J. Phys. Chem. B 108, 16264–16270 (2004)
68. Juffer, A.H., Botta, E.F.F., Bert, A.M., van Keulen, B.A.M., van der Ploeg, A., Berendsen,
H.J.C.: The electric potential of a macromolecule in a solvent: a fundamental approach. J.
Comput. Phys. 97, 144–171 (1991)
69. Juffer, A.H., Eisenbaher, S.J., Hubbard, S.J., Walter, D., Argos, P.: Comparison of atomic
solvation parametric sets: applicability and limitations in protein folding and binding. Protein
Sci. 4, 2499–2509 (1995)
70. Kar, P., Wei, Y., Hansmann, U.E., Höfinger, S.: Systematic study of the boundary composition
in Poisson Boltzmann calculations. J. Comput. Chem. 28, 2538–2544 (2007)
71. Karplus, M., McCammon, A.: Molecular dynamics simulations of biomolecules. Nat. Struct.
Biol. 9, 646–652 (2002)
72. Khandogin, J., Brooks III, C.L.: Constant pH molecular dynamics with proton tautomerism.
Biophys. J. 89, 141–157 (2005)
73. Khandogin, J., Chen, J., Brooks III, C.L.: Exploring atomistic details of pH-dependent peptide
folding. PNAS 103, 18546–18550 (2006)
74. Khandogin, J., Brooks III, C.L.: Toward the accurate first-principles prediction of ionization
equilibria in proteins. Biochemistry 45, 9363–9373 (2006)
75. Khandogin, J., Brooks III, C.L.: Molecular simulation pH-mediated biological processes.
Annu. Rep. Comput. Chem. 3, 3–12 (2007)
76. Kollman, P.: Free energy calculations: applications to chemical and biochemical phenomena.
Chem. Rev. 93, 2395–2417 (1993)
77. Kollman, P., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M., Lee, T., Dua, Y.,
Wang, L., Donini, O., Cieplak, P., Srinivasan, J., Case, D., Cheatham III, T.E.: Calculating
structures and free energies of complex molecules: combining molecular mechanics and
continuum models. Acc. Chem. Res. 33, 889–897 (2000)
78. Kong, X., Brooks III, C.L.: λ-dynamics: a new approach to free energy calculations. J. Chem.
Phys. 105, 2414–2423 (1996)
79. Landau, L.D., Lifshitz, E.M.: Electrodynamics of Continuous Media. V. 8. Course of theo-
retical physics. Translated from the Russian. Pergamon Press, Oxford (1988)
Modeling of Electrostatic Effects in Macromolecules 199
80. Lee, M.R., Duan, Y., Kollman, P.A.: Use of MM-PB/SA in estimating the free Energies of
proteins: application to native, intermediates, and unfolded villin headpiece. Proteins 39,
309–316 (2000)
81. Lee, M.S., Feig, M., Salsbury Jr., F.R., Brooks III, C.L.: New analytic approximation to the
standart molecular volume definition and its application to generalized Born calculations. J.
Comput. Chem. 24, 1348–1356 (2003)
82. Lee, M.S., Salsbury Jr., F.R., Brooks III, C.L.: Constant-pH molecular dynamics using con-
tinuous titration coordinates. Proteins 56, 738–752 (2004)
83. Lee, M.S., Olson, M.A.: Protein folding simulations combining self-guided Langevin dynam-
ics and temperature-based replica exchange. J. Chem. Theory Comput. 6, 2477–2487 (2010)
84. Levy, R.M., Belhadj, M., Kitchen, D.B.: Gaussian fluctuation formula for electrostatic free
energy changes in solution. J. Chem. Phys. 95, 3627–3633 (1991)
85. Levy, R.M., Zhanh, L.Y., Gallicchio, E., Felts, A.: On the non polar hydration free energy of
proteins: surface area and continuum solvent models for the solute-solvent interaction energy.
J. Am. Chem. Soc. 25, 9523–9530 (2003)
86. Loladze, V.V., Makhatadze, G.I.: Energetics of charge-charge interactions between residues
adjacent in sequence. Proteins 79, 3494–3499 (2011)
87. Lounnas, V., Pettitt, B.M., Phillips Jr., B.M.: A global model of protein-water interface.
Biophys. J. 66, 601–614 (1994)
88. Lu, B., Cheng, X.L., Hang, J.F., McCammon, A.: Order N algorithm for computation
of electrostatic interactions in biomolecular systems. Proc. Natl. Acad. Sci. U.S.A. 103,
19314–19319 (2006)
89. Lu, B., McCammon, A.: Improved boundary element method for Poisson-Boltzman electro-
static potential and force calculatins. J. Chem. Theory Comput. 3, 1134–1142 (2007)
90. Machuqueiro, M., Baptista, A.M.: Constant-pH molecular dynamics with ionic strength
effects: Protonation–Conformation coupling in decalysine. J. Phys. Chem. 110, 2927–2933
(2006)
91. Machuqueiro, M., Baptista, A.M.: Molecular dynamics at constant pH and reduction potential:
application to cytochrome c3. J. Am. Chem. Soc. 131, 12586–12594 (2009)
92. Machuqueiro, M., Baptista, A.M.: Is the prediction of pKa values by the constant-pH molec-
ular dynamics being hindered by inherited problems? Proteins 79, 3437–3447 (2011)
93. Madura, J.D., Davis, M.E., Gilson, M.K., Wade, R.C., Luty, B.A., McCammon, J.A.: Bio-
logical application of electrostatic calculations and Brownian dynamics simulations. Rev.
Comput. Chem. 5, 229–267 (1994)
94. McDowell, S.C., Špackova, N., Šponer, J., Walter, N.G.: Molecular dynamics simulations of
RNA: an in silico single molecule approach. Biopolymers 85, 169–184 (2007)
95. McKenney, A., Greengard, L.: A fast Poisson solver for complex geometries. J. Comput.
Phys. 118, 348–355 (1995)
96. Meyer, T., Kieseritzky, G., Knapp, E.W.: Electrostatic pKa computations in protein: role of
internal cavities. Proteins 79, 3320–3332 (2011). https://doi.org/10.1002/prot.23092
97. Mongan, J., Case, D.A., McCammon, J.A.: Constant pH molecular dynamics in generalized
Born implicit solvent. J. Comput. Chem. 25, 2038–2064 (2004)
98. Mongan, J., Simmerling, C., McCammon, J., Case, D., Onufriev, A.: A generalized Born
model with a simple, robust molecular volume correction. J. Chem. Theory Comput. 3,
156–159 (2007)
99. Mongan, J., Svrcek-Seiler, W.A., Onufriev, A.: Analysis of integral expressions for effective
Born radii. J Chem. Phys. 127, 18510–18521 (2007)
100. Nina, M., Beglov, D., Roux, B.: Atomic radii for continuum electrostatic calculations based
on molecular dynamics free energy simulations. J. Phys. Chem. 101, 5239–5248 (1997)
101. Nina, M., Im, W., Roux, B.: Optimized atomic radii for protein contiuum electrostatic solvation
forces. Biophys. Chem. 78, 89–96 (1999)
102. Nielesen, J.E., Gunner, M.R., Garcia-Moreno, B.E.: The pKa Cooperative: a collaborative
effort to advance structure-based calculation of pKa values and electrostatic effects in proteins.
Proteins 79, 3249–3259 (2011)
200 Y. N. Vorobjev
103. Nozaki, Y., Tanford, C.: Examination of titration behavior. Methods Enzymol. 11, 715–734
(1967)
104. Novotny, J., Brucooleri, R.E., Davis, M., Sharp, K.A.: Empirical free energy calculations: a
blind test and further improvements of the method. J. Mol. Biol. 268, 401–411 (1997)
105. Onufriev, A., Case, D., Bashford, D.: Effective Born radii the generalized Born approximation:
the importance of being perfect. J. Comput. Chem. 23, 1297–1304 (2002)
106. Onufriev, A., Bashford, D., Case, D.: Eploring protein native states and large scale confor-
mational changes with modified generalized Born model. Proteins 55, 383–394 (2004)
107. Onufriev, A.: Implicit solvent models in molecular dynamics simulations: a brief overview.
Annu. Rep. Comp. Chem. 4, 125–137 (2008)
108. Park, B.H., Levitt, M.: Decoys of globular proteins. J. Mol. Biol. 258, 367–392 (1996)
109. Perrot, G.B., Cheng, B., Gibson, K.D., Vila, J., Palmer, K.A., Nayeem, A., Maigret, B.,
Scheraga, H.A.: MSEED: a program for rapid analytical determination of accessible surface
areas and their derivatives. J. Comput. Chem. 13, 1–11 (1992)
110. Pellegrini, E., Field, M.J.: A generalized-born solvation model for macromolecular hybrid-
potential calculations. J. Phys. Chem. A 106, 1316–1326 (2002)
111. Pierotti, R.A.: A scaled particle theory of aqueous and non-aqueous solutions. Chem. Rev.
76, 717–726 (1976)
112. Postma, J.P.M., Berendsen, H.J.C., Haak, J.R.: Thermodynamics of cavity formation in water.
Faraday Symp. Chem. Soc. 17, 55–67 (1982)
113. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical recipes in C. Cam-
bridge University Press, Cambridge (1988)
114. Radmer, R.J., Kollman, P.A.: Free energy calculation methods: a theoretical and empirical
comparison of numerical errors and a new method for qualitative estimates of free energy
changes. J. Comput. Chem. 18, 902–919 (1997)
115. Rashin, A.A.: Hydration phenomena, classical electrostatics, and the boundary element
method. J. Phys. Chem. 94, 1725–1733 (1990)
116. Rashin, A.A., Young, L., Topol, I.A.: Quantitative evaluation of hydration thermodynamics
with continuum model. Biophys. Chem. 51, 359–374 (1994)
117. Richards, F.M.: Areas, volume, packing and protein structures. Annu. Rev. Biophys. Biophys.
Chem. 19, 301–332 (1977)
118. Rick, S.W., Berne, B.J.: The aqueous solvation of water: a comparison of continuum methods
with molecular dynamics. J. Am. Chem. Soc. 116, 3949–3954 (1994)
119. Ripoll, D.R., Vorobjev, Y.N., Liwo, A., Vila, J.A., Scheraga, H.A.: Coupling between folding
and ionization equilibria: effect of pH on the conformational preferences of polypeptides. J.
Mol. Biol. 264, 770–783 (1996)
120. Rocchia, W., Sridharan, S., Nicholls, A., Alexov, E., Chiabrera, A., Honig, B.: Rapid grid-
based construction of the molecular surface and the use of induced surface charge to calculate
reaction field energies: applications to the molecular systems and geometric objects. J. Com-
put. Chem. 23, 128–137 (2002)
121. Roux, B., Yu, H.A., Karplus, M.: Molecular basis for the Born model of ion solvation. J. Phys.
Chem. 94, 4683–4688 (1990)
122. Roux, B., Simonson, T.: Implicit solvent models. Biopys. Chem. 78, 1–20 (1999)
123. Sanner, M.F., Olson, A.J., Spehner, J.C.: Reduced surface: an efficient way to compute molec-
ular surfaces. Biopolymers 38, 305–320 (1996)
124. Schaefer, M., Froemmel, C.: A precise analytical method for calculating the electrostatic
energy of macromolecules in aqueous solution. J. Mol. Biol. 216, 1045–1066 (1990)
125. Sharp, K.A., Honig, B.: Electrostatic interactions in macromolecules: theory and applications.
Annu. Rev. Biophys. Chem. 19, 301–332 (1990)
126. Schellman, J.A.: Macromolecular binding. Biopolymers 14, 999–1018 (1975)
127. Scheraga, H.A.: Theory of hydrophobic interactions. J. Biomol. Struct. Dyn. 16, 447–460
(1998)
128. Simmerling, C., Strockbine, B., Roitberg, A.E.: All-atom structure prediction and folding
simulations of a stable protein. J. Am. Chem. Soc. 124, 11258–11259 (2002)
Modeling of Electrostatic Effects in Macromolecules 201
129. Simonson, T., Brünger, A.: Solvation free energies estimated from macroscopic continuum
theory: an accuracy assessment. J. Phys. Chem. 98, 4683–4694 (1994)
130. Sitkoff, D., Sharp, K.A., Honig, B.: Accurate calculation of hydration free energies using
macroscopic solvent models. J. Phys. Chem. 98, 1978–1988 (1994)
131. Sobolevski, E., Makowski, M., Czaplewski, C., Liwo, A., Oldziej, S., Scheraga, H.A.: Poten-
tial of mean force of hydrophobic association: dependence on solute Size. J. Phys. Chem. B
111, 10765–10774 (2007)
132. Song, W., Mao, J., Gunner, M.R.: MCCE2: Improved protein pKa calculations with extensive
side chain rotamer sampling. J. Comput. Chem. 30, 2231–2247 (2011)
133. Srinivasan, J., Cheatham, T.E., Cieplak, P., Kollman, P.A., Case, D.A.: Continuum solvent
studies of stability of DNA, RNA and phosphoramide DNA helicases. J. Am. Chem. Soc.
120, 9401–9409 (1998)
134. Stanton, C., Houk, K.: Benchmarking pKa prediction methods for residues in proteins. J.
Chem. Theory Comput. 3, 951–966 (2008)
135. Still, W.C., Tempczyk, A., Hawley, R.C., Hendricson, T.: Semianalytical treatment of solvation
for molecular mechanics and dynamics. J. Am. Chem. Soc. 112, 6127–6129 (1990)
136. Strickler, S.S., Gribenko, A.V., Keiffer, T.R., Tomlinson, J., Reihle, T., Loladze, V.V.,
Makhatadze, G.I.: Protein stability and surface electrostatics: a charged relationship. Biochm-
istry 45, 2761–2766 (2006)
137. Tanford, C.: Protein denaturation: part C. Theoretical models for denaturation. Adv. Protein
Chem. 24, 1–95 (1970)
138. Tanford, C., Roxby, R.: The interpretation of protein titration curves. Application to lysozyme.
Biochemistry 11, 2192–2198 (1972)
139. Tanokura, M.: 1 H-NMR study of the tautomerism of the imidazole ring of histidine residues:
1. Microscopic pK values and molar ratios of tautomers in histidine containing peptides.
Biochim. Biophys. Acta 742, 576–585 (1983)
140. Teixeira, V.H., Cunha, C.A., Machuqueiro, M., Oliveira, A.S.V., Victor, B.L., Soares, C.M.,
Baptista, A.A.: On the use of different dielectric constants for computing individual and
pairwise terms in Poisson-Bolzman studies of protein ionization equilibrium. J. Phys Chem
B 109, 14691–14706 (2005)
141. Tomasi, J., Persico, M.: Molecular interactions in solution: overview of methods based on
continuum distribution of the solvent. Chem. Rev. 94, 2027–2094 (1994)
142. Varshney, A., Brooks, F.P., Wright, W.V.: Computing smooth molecular surface. IEEE Com-
put. Graph. Appl. 14, 19–25 (1994)
143. Vila, j, Ripoll, D.R., Arnautova, Y.A., Vorobjev, Y.N., Scheraga, H.A.: Coupling between
conformation and proton binding in proteins. Proteins 61, 56–68 (2005)
144. Vorobjev, Y.N., Grant, J.A., Scheraga, H.A.: A combined iterative and boundary element
approach for solution of the nonlinear Poisson-Boltzmann equation. J. Am. Chem. Soc. 114,
3189–3196 (1992)
145. Vorobjev, Y.N., Scheraga, H.A.: A fast adaptive multigrid boundary element method for
macromolecular electrostatics in a solvent. J. Comput. Chem. 18, 569–583 (1997)
146. Vorobjev, Y.N., Hermans, J.: SIMS, computation of a smooth invariant molecular surface.
Biophys. J. 73, 722–732 (1997)
147. Vorobjev, Y.N., Almagro, J.C., Hermans, J.: Discrimination between native and intentionally
misfolded conformation of proteins: ES/IS, new method for calculating conformational free
energy that uses both dynamic s simulations with an explicit solvent and implicit solvent
continuum model. Proteins 32, 399–413 (1998)
148. Vorobjev, Y.N., Hermans, J.: ES/IS: estimation of conformational free energy by combining
dynamics simulations with explicit solvent with an implicit solvent continuum model. Biopys.
Chem. 78, 195–205 (1999)
149. Vorobjev, Y.N., Hermans, J.: Free energies of protein decoys provide insight into determinant
of protein stability. Protein Sci. 10, 2498–2506 (2001)
150. Vorobjev, Y.N., Vila, J., Scheraga, H.A.: FAMBE-pH: a fast and accurate method to compute
the total solvation free energies of proteins. J. Phys. Chem. B 112, 11122–11136 (2008)
202 Y. N. Vorobjev
151. Vorobjev, Y.N.: Blind docking method combining search of low-resolution binding sites with
ligand pose refinement by molecular dynamics-based global optimization. J. Comput. Chem.
31, 1080–1092 (2010)
152. Vorobjev, Y.N.: Advances in implicit models of water solvent to compute conformational free
energy and molecular dynamics of proteins at constant pH. Adv. Protein Chem. Struct. Biol.
85, 282–322 (2011)
153. Vorobjev, Y.N.: Potential of mean force of water-proton bath and molecular dynamic simula-
tion of proteins at constant pH. J. Comput. Chem. 33, 832–842 (2012)
154. Wagoner, J., Baker, N.: Assessing implicit models for nonpolar mean solvation forces: the
importance of dispersion and volume terms. Proc. Nat. Acad. Sci. U.S.A. 103, 8331–8336
(2006)
155. Wang, J., Cieplak, P., Kollman, P.A.: How well does a restrained electrostatic potential (RESP)
model perform in calculating conformational energies of organic and biological molecules?
J. Comput. Chem. 21, 1049–1074 (2000)
156. Wallace, J.A., Shen, J.K.: Predicting pKa values with continuous constant pH molecular
dynamics. Methods Enzymol. 466, 455–475 (2009)
157. Wallqvist, W., Berne, B.J.: Molecular dynamics study of the dependence of water solvation
free energy on solute curvature and surface area. J. Phys. Chem. 99, 2885–2892 (1995)
158. Wallqvist, W., Berne, B.J.: Computer simulation of hydrophobic hydration forces on stacked
plates at short range. J. Phys. Chem. 99, 2893–2899 (1995)
159. Williams, S.L., Oliveira, C.A.F., McCammon, J.A.: Coupling constant pH molecular dynamics
with accelerated molecular dynamics. J. Chem. Theory. Comput. 6, 560–568 (2010)
160. Wihtam, S., Talley, K., Wang, L., Zhang, Z., Sarkar, S., Gao, D., Yang, W., Alexov, E.:
Developing of hybrid approaches to predict pKa values of ionizable groups. Proteins 79,
3389–3399 (2011)
161. Wroblewska, L., Skolnick, J.: Can a physics-based, all-atom potential find a protein’s native
structure among misfolded structures? I. Large scale AMBER benchmarking. J. Comput.
Chem. 28, 2059–2066 (2007)
162. Yang, S.A., Honig, B.: On the pH dependence of protein stability. J. Mol. Biol. 231, 459–474
(1993)
163. Yoon, B.J., Lenhoff, A.M.: A boundary element method for molecular electrostatics with
electrolyte effects. J. Comput. Chem. 11, 1080–1086 (1990)
164. Zauhar, R.J.: SMATR: a solvent-accessible triangulated surface generator for molecular graph-
ics and boundary element applications. J. Comput. Aided Mol. Des. 9, 149–159 (1995)
165. Zauhar, R.J., Varnek, A.A.: Fast and space-efficient boundary element method for computing
electrostatics and hydration effects in large molecules. J. Comput. Chem. 17, 864–877 (1996)
166. Zhang, Y., Skolnick, J.: Automated structure prediction of weakly gomologous proteins on a
genomic scale. Proc. Natl. Acad. Sci. U.S.A. 101, 7594–7599 (2003)
167. Zhou, Z., Payne, P., Vasquez, M., Kuhn, N., Levitt, M.: Finite-difference solution of the
Poisson-Boltzmann equation: complete elimination of self-energy. J. Comput. Chem. 17,
1344–1353 (1996)
168. Zhou, Y.C., Feig, M., Wei, G.W.: Higly accurate biomolecular electrostatics in continuum
dielectric environments. J. Comput. Chem. 29, 87–97 (2008)
Optimizations of Protein Force Fields
Abstract In this Chapter we review our works on force fields for molecular simula-
tions of protein systems. We first discuss the functional forms of the force fields and
present some extensions of the conventional ones. We then present various methods
for force-field parameter optimizations. Finally, some examples of our applications
of these parameter optimization methods are given and they are compared with the
results from the existing force fields.
1 Introduction
Computer simulations of protein folding into native structures can be achieved when
both of the following two requirements are met: (1) potential energy functions
(or, force fields) for the protein systems are sufficiently accurate and (2) sufficiently
powerful conformational sampling methods are available. Professor Harold A. Scher-
aga has been one of the most important pioneers in studies of both of the above
requirements [1, 2]. By the developments of the generalized-ensemble algorithms
Y. Sakae
Department of Theoretical and Computational Molecular Science,
Institute for Molecular Science, Okazaki, Aichi 444-8585, Japan
e-mail: [email protected]
Y. Sakae · Y. Okamoto (B)
Department of Physics, Graduate School of Science, Nagoya University,
Nagoya, Aichi 464-8602, Japan
e-mail: [email protected]
Y. Okamoto
Structural Biology Research Center, Graduate School of Science,
Nagoya University, Nagoya, Aichi 464-8602, Japan
Y. Okamoto
Center for Computational Science, Graduate School of Engineering,
Nagoya University, Nagoya, Aichi 464-8603, Japan
Y. Okamoto
Information Technology Center, Nagoya University, Nagoya, Aichi 464-8601, Japan
© Springer Nature Switzerland AG 2019 203
A. Liwo (ed.), Computational Methods to Study the Structure and Dynamics
of Biomolecules and Biomolecular Processes, Springer Series on Bio-
and Neurosystems 8, https://doi.org/10.1007/978-3-319-95843-9_7
204 Y. Sakae and Y. Okamoto
(for reviews, see, e.g., Refs. [3–6]) and related methods, Requirement (2) seems to be
almost fulfilled. In this chapter, we therefore concentrate our attention on Require-
ment (1).
There are several well-known all-atom (or united-atom) force fields, such as
AMBER [7–11], CHARMM [12–14], OPLS [15, 16], GROMOS [17, 18], GRO-
MACS [19, 20], and ECEPP [21, 22]. Generally, the force-field parameters are
determined based on experimental results for small molecules and theoretical results
using quantum chemistry calculations of small peptides such as alanine dipeptide.
However, the simulations using different force-field parameters will give different
results. We have performed detailed comparisons of three version of AMBER (ff94
[7], ff96 [8], and ff99 [9]), CHARMM [12], OPLS-AA/L [16], and GROMOS [17]
by generalized-ensemble simulations of two small peptides in explicit solvent [23,
24]. We saw that these force fields showed clearly different behaviors especially
with respect to secondary-structure-forming tendencies. The folding simulations of
the two peptides with implicit solvent model also showed similar results [25–27]. For
instance, the ff94 [7] and ff96 [8] versions of AMBER yield very different behaviors
about the secondary-structure-forming tendencies, although these force fields differ
only in the main-chain torsion-energy terms. Many researchers have thus studied
the main-chain torsion-energy terms and their force-field parameters. For example,
newer force-field parameters for the main-chain torsion-energy terms about φ and
ψ angles have been developed, which are, e.g., AMBER ff99SB [10], AMBER
ff03 [11], CHARMM22/CMAP [13, 14] and OPLS-AA/L [16]. The methods of
the force-field optimization thus mainly concentrate on the torsion-energy terms.
These modifications of the torsion energy are usually based on quantum chemistry
calculations [13, 14, 28–31] or NMR experimental results [32, 33].
We have proposed a new main-chain torsion-energy term, which is represented
by a double Fourier series in two variables, the main-chain dihedral angles φ and
ψ [34, 35]. This expression gives a natural representation of the torsion energy in
the Ramachandran space [36] in the sense that any two-dimensional energy surface
periodic in both φ and ψ can be expanded by the double Fourier series. We can
then easily control secondary-structure-forming tendencies by modifying the main-
chain torsion-energy surface. We have presented preliminary results for AMBER
ff94 and AMBER ff96 [34, 35].
Moreover, we have introduced several optimization methods of force-field param-
eters [25–27, 38, 39]. These methods are based on the minimization of some score
functions by simulations in the force-field parameter space, where the score functions
are derived from the protein coordinate data in the Protein Data Bank (PDB). Our
methods are different from most of previous knowledge based optimization methods
mainly in two points: We use only the PDB data without introducing decoys such
as Z-score method [37] and we use larger and more proteins than one or a few pep-
tides such as alanine dipeptide for estimating our score functions. One of the score
functions consists of the sum of the square of the force acting on each atom in the
proteins with the structures from the PDB [25–27]. Other score functions are taken
from the root-mean-square deviations between the original PDB structures and the
corresponding minimized structures [38, 39].
Optimizations of Protein Force Fields 205
We have also proposed a new type of the main-chain torsion-energy terms for
protein systems, which can have amino-acid-dependent force-field parameters [40].
As an example of this formulation, we applied this approach to the AMBER ff03
force field and determined new amino-acid-dependent main-chain torsion-energy
parameters for ψ (N–Cα –C–N) and ψ (Cβ –Cα –C–N) by using our optimization
method in Refs. [25–27].
In this chapter, we review our works on protein force fields. In Sect. 2 the details of
the new main-chain torsion-energy terms and the methods for refinements of force-
field parameters are given. In Sect. 3 examples of the applications of these methods
are presented. Section 4 is devoted to conclusions.
2 Methods
The all-atom force fields for protein systems such as AMBER, CHARMM, OPLS,
and ECEPP use essentially the same functional forms for the potential energy except
for minor differences. The commonly used total conformational potential energy
E conf is given by
where
E BL = K ( − eq )2 , (2)
bond length
E BA = K θ (θ − θeq )2 , (3)
bond angle θ
Vn
E torsion = [1 + cos(nΦ − γn )] , (4)
dihedral angle Φ n
2
Ai j Bi j 332qi q j
E nonbond = − 6 + . (5)
i< j
ri12j ri j εri j
and Coulomb terms between pairs of atoms, i and j, separated by the distance ri j
(in Å). The parameters Ai j and Bi j in Eq. (5) are the coefficients for the Lennard-
Jones term, qi (in units of electronic charges) is the partial charge of the i-th atom,
and ε is the dielectric constant, where we usually set ε = 1 (the value in vacuum).
The factor 332 in the electrostatic term is a constant to express energy in units of
kcal/mol. Hence, we have five classes of force-field parameters, namely, those in
the bond-stretching term (K and eq ), those in the bond-bending term (K θ and θeq ),
those in the torsion term (Vn and γn ), those in the Lennard-Jones term (Ai j and Bi j ),
and those in the electrostatic term (qi ).
Equation (1) represents a standard set of the potential energy terms. As mentioned
above, there are minor differences in the energy functions among different force
fields. For instance, the Urey-Bradley term is used in CHARMM and OPLS, but
not in AMBER. In our parameter refinement methods, we try to optimize a certain
set of parameters in the existing force fields without changing the functional forms.
Therefore, if the original force field has non-standard terms, then the optimized one
also has them.
Separating the contributions E(φ, ψ) of the backbone dihedral angles φ and ψ from
the rest of the torsion terms E rest , we can write the torsion energy term in Eq. (4) as
where we have
Vm Vn
E(φ, ψ) = [1 + cos(mφ − γm )] + [1 + cos(nψ − γn )] . (7)
m
2 n
2
For example, the coefficients for the cases of six force fields namely, AMBER
parm94, AMBER parm96, AMBER parm99, CHARMM27, OPLS-AA, and OPLS-
AA/L, are summarized in Table 1, and we can explicitly write E(φ, ψ) in Eq. (7) as
follows:
E parm94 (φ, ψ) = 2.7 − 0.2 cos 2φ − 0.75 cos ψ − 1.35 cos 2ψ − 0.4 cos 4ψ , (8)
E parm96 (φ, ψ) = 2.3 + 0.85 cos φ − 0.3 cos 2φ + 0.85 cos ψ − 0.3 cos 2ψ ,
(9)
E parm99 (φ, ψ) = 5.35 + 0.8 cos φ − 0.85 cos 2φ − 1.7 cos ψ − 2.0 cos 2ψ , (10)
E CHARMM (φ, ψ) = 0.8 − 0.2 cos φ + 0.6 cos ψ , (11)
Optimizations of Protein Force Fields 207
Table 1 Torsion-energy parameters for the backbone dihedral angles φ and ψ for AMBER parm94,
AMBER parm96, AMBER parm99, CHARMM27, OPLS-AA, and OPLS-AA/L in Eq. (7)
φ ψ
Vm Vn
m γm n γn (radians)
2 2
(kcal/mol) (radians) (kcal/mol)
parm94 2 0.2 π 1 0.75 π
2 1.35 π
4 0.4 π
parm96 1 0.85 0 1 0.85 0
2 0.3 π 2 0.3 π
parm99 1 0.8 0 1 1.7 π
2 0.85 π 2 2.0 π
charmm 1 0.2 π 1 0.6 0
opls-aa 1 −1.1825 0 1 0.908 0
2 0.456 π 2 0.611 π
3 −0.425 0 3 0.7905 0
opls-aal 1 −0.298 0 1 0.3715 0
2 0.1395 π 2 1.254 π
3 −2.4565 0 3 −0.4025 0
The backbone torsion-energy term E(φ, ψ) in Eq. (7) is a sum of two one-
dimensional Fourier series: one is for φ and the other for ψ. The two variables φ and
ψ are decoupled, and no correlation of φ and ψ can be incorporated. On the other
hand, any periodic function of φ and ψ with period 2π can be expanded by a dou-
ble Fourier series. As a simple generalization of E(φ, ψ), we therefore proposed to
express this backbone torsion energy by the following double Fourier series [34, 35]:
∞
E (φ, ψ) = a + (bm cos mφ + cm sin mφ)
m=1
∞
+ (dn cos nψ + en sin nψ)
n=1
∞ ∞
+ ( f mn cos mφ cos nψ + gmn cos mφ sin nψ
m=1 n=1
+h mn sin mφ cos nψ + i mn sin mφ sin nψ) . (14)
208 Y. Sakae and Y. Okamoto
where α are the normalization constants and x(φ, ψ) are the basis functions for the
Fourier series. Table 2 summarizes these coefficients and functions. Here, φ and ψ are
π π
given in radians, and φ̃ and ψ̃ are in degrees (φ = 180 φ̃, ψ = 180 ψ̃). Hereafter, angu-
lar quantities without tilde and with tilde are in radians and in degrees, respectively.
Finally, E (φ, ψ) in Eq. (14) and E rest in Eq. (6) define our torsion-energy term in
Eq. (1) [(instead of Eq. (4)]:
The double Fourier series in Eq. (14) is particularly useful, because it describes
the backbone torsion-energy surface in the Ramachandran space. The Fourier series
can express the torsion-energy surface E (φ, ψ) that was obtained by any method
including quantum chemistry calculations [13, 14, 16, 28–31].
Moreover, one can refine the existing backbone torsion-energy term and con-
trol the secondary-structure-forming tendencies of the force fields. For example,
α-helix is obtained for (φ̃, ψ̃) ≈ (−57◦ , −47◦ ), 310 -helix for (φ̃, ψ̃) ≈ (−49◦ ,
−26◦ ), π -helix for (φ̃, ψ̃) ≈ (−57◦ , −70◦ ), parallel β-sheet for (φ̃, ψ̃) ≈ (−119◦ ,
113◦ ), antiparallel β-sheet for (φ̃, ψ̃) ≈ (−139◦ , 135◦ ), and so on [36]. Hence, if
Table 2 Fourier coefficients c, normalization constants α, and the basis functions x(φ, ψ) for the
double Fourier series of the backbone torsion energy E (φ, ψ) in Eqs. (14) and (15)
c α x(φ, ψ)
a 4π 2 1
bm 2π 2 cos mφ
cm 2π 2 sin mφ
dn 2π 2 cos nψ
en 2π 2 sin nψ
f mn π2 cos mφ cos nψ
gmn π2 cos mφ sin nψ
h mn π2 sin mφ cos nψ
i mn π2 sin mφ sin nψ
Optimizations of Protein Force Fields 209
the existing force field gives, say, too little α-helix-forming tendency compared
to experimental results, one can lower the backbone torsion-energy surface near
(φ̃, ψ̃) = (−57◦ , −47◦ ) in order to enhance α-helix formations.
We can thus write
E (φ, ψ) = E(φ, ψ) − f (φ, ψ) , (17)
where E(φ, ψ) is the existing backbone torsion-energy term that we want to refine
and f (φ, ψ) is a function that has peaks around the corresponding regions where
specific secondary structures are to be enhanced. There are many possible choices
for f (φ, ψ). For instance, one can use the following function when one wants to
lower the torsion-energy surface in a single region near (φ, ψ) = (φ0 , ψ0 ):
⎧
⎨ A exp B
, for (φ − φ0 )2 + (ψ − ψ0 )2 < r0 2 ,
f (φ, ψ) = (φ − φ0 )2 + (ψ − ψ0 )2 − r0 2
⎩
0, otherwise ,
(18)
where A, B, and r0 are constants that we adjust for refinement. In this case, the energy
surface is lowered by f (φ, ψ) in a circular region of radius r0 , which is centered at
(φ, ψ) = (φ0 , ψ0 ). Note that we should also impose periodic boundary conditions
on f (φ, ψ).
We then express E (φ, ψ) in Eq. (17) in terms of the double Fourier series in
Eq. (14), where the Fourier coefficients are obtained from Eq. (15). Hence, we can
fine-tune the backbone torsion-energy term by the above procedure so that it yields
correct secondary-structure-forming tendencies.
Some remark about the computation time is now in order. It may appear that we
have to expect great increase in computation time by the introduction of the double
Fourier series, because the number of terms are much larger. However, because most
of the computation time for the force-field evaluations is spent in the calculations
of distances between pairs of atoms in the system, the increase in computation time
due to the double Fourier series is essentially negligible compared to these main
computational efforts.
where the first summation is taken over all dihedral angles Φ (both in the main
chain and in the side chains), n is the number of waves, γn is the phase, and Vn is
the Fourier coefficient. Namely, the energy term E torsion has γn (Φ) and Vn (Φ) as
force-field parameters.
210 Y. Sakae and Y. Okamoto
(MC) (SC)
where E torsion and E torsion are the torsion-energy terms for dihedral angles around
main-chain bonds and around side-chain bonds, respectively. Examples of the dihe-
(MC)
dral angles in E torsion are φ (C–N–Cα –C), ψ (N–Cα –C–N), φ (Cβ –Cα –N–C), ψ
(SC)
(Cβ –Cα –C–N), and ω (Cα –C–N–Cα ). The force-field parameters in E torsion can read-
(MC)
ily depend on amino-acid residues. However, those in E torsion are usually taken to
be independent of amino-acid residues and the common parameter values are used
for all the amino-acid residues (except for proline). This is because the amino-acid
dependence of the force field is believed to be taken care of by the very existence of
side chains. In Table 3, we list examples of the parameter values for ψ (N–Cα –C–N)
and ψ (Cβ –Cα –C–N) in general AMBER force fields.
However, this amino-acid independence of the main-chain torsion-energy terms
is not an absolute requirement, because we are representing the entire force field
by rather a small number of classical-mechanical terms. In order to reproduce the
exact quantum-mechanical contributions, one can introduce amino-acid dependence
on any force-field term including the main-chain torsion-energy terms. Hence, we
(MC)
can generalize E torsion in Eq. (20) from the expression in Eq. (19) to the following
amino-acid-dependent form:
Table 3 Torsion-energy parameters (Vn and γn ) for the main-chain dihedral angles ψ and ψ in
Eq. (19) for the original AMBER ff94, ff96, ff99, ff99SB, and ff03 force fields. The values are
common among the amino-acid residues for each force field. Only the parameters for non-zero Vn
are listed
Force field ψ (N–Cα –C–N) ψ (Cβ –Cα –C–N)
n Vn /2 γn n Vn /2 γn
ff94 1 0.75 π 2 0.07 0
2 1.35 π 4 0.10 0
4 0.40 π
ff96 1 0.85 0 2 0.07 0
2 0.30 π 4 0.10 0
ff99 1 1.70 π 2 0.07 0
2 2.00 π 4 0.10 0
ff99SB 1 0.45 π 1 0.20 0
2 1.58 π 2 0.20 0
3 0.55 π 3 0.40 0
ff03 1 0.6839 π 1 0.7784 π
2 1.4537 π 2 0.0657 π
3 0.4615 π 3 0.0560 0
Optimizations of Protein Force Fields 211
20 Vn Φ (k)
(MC) MC (k) (k)
E torsion = 1 + cos nΦMC − γn ΦMC , (21)
k=1 Φ (k) n
2
MC
(k)
where k (= 1, 2, . . . , 20) is the label for the 20 kinds of amino-acid residues and ΦMC
are dihedral angles around the main-chain bonds in the k-th amino-acid residue.
2.3.1 Use of Force Acting on Each Atom with the PDB Coordinates
[25–27, 41]
In the previous section, we presented functional forms of the force fields. Given a
fixed set of force-field functions, we try to optimize a certain set of parameters in the
force fields without changing the functional forms.
Our optimization method for these force-field parameters is now described [25].
We first retrieve N native structures (one structure per protein) from PDB. We try to
choose proteins from different folds (such as all α-helix, all β-sheet, α/β, etc.) and
different homology classes as much as possible. If the force-field parameters are of
ideal values, then all the chosen native structures are stable without any force acting
on each atom in the molecules on the average. Hence, we expect
F =0, (22)
where
N
1
Nm
2
F= f i , (23)
m
m=1
N m i =1
m
and
{m}
∂ E tot
f im = − . (24)
∂xim
{m}
Here, Nm is the total number of atoms in molecule m, E tot is the total potential
energy for molecule m, xi is the Cartesian coordinate vector of atom i, and f i is the
force acting on atom i. In reality, F = 0, and because F ≥ 0, we can optimize the
force-field parameters by minimizing F with respect to these parameters. In practice,
we perform a simulation in the force-field parameter space for this minimization.
Proteins are usually in aqueous solution, and hence we also have to incorporate
some kind of solvent effects. Because the more the total number of proteins (N )
is, the better the force-field parameter optimizations are expected to be, we want to
minimize our efforts in the calculations of the solvent effects. Here, we employ the
212 Y. Sakae and Y. Okamoto
generalized-Born/surface area (GB/SA) terms for the solvent contributions [42, 43].
Hence, we use in Eq. (24) (we suppress the label m for each molecule)
where
E solv = E GB + E SA , (26)
1 qi q j
E GB = −166 1 − , (27)
εs i, j ri j + αi2j e−Di j
2
E SA = σk Ak . (28)
k
Namely, in the GB/SA model, the total solvation free energy in Eq. (26) is given by
the sum of a solute-solvent electrostatic polarization term, a solvent-solvent cavity
term, and a solute-solvent van der Waals term. A solute-solvent electrostatic polar-
ization term can be calculated by the generalized Born equation in Eq. (27), where
√
αi j = αi α j , αi is the so-called Born radius of atom i, Di j = ri2j /(2αi j )2 , and εs
is the dielectric constant of bulk water (we take εs = 78.3). A solvent-solvent cav-
ity term and a solute-solvent van der Waals term can be approximated by the term
in Eq. (28) that is proportional to the solvent accessible surface area. Here, Ak is
the total solvent-accessible surface area of atoms of type k and σk is an empirically
determined proportionality constant [42, 43].
The flowchart of our method for the optimization of force-field parameters is
shown in Fig. 1.
In Step 1 of the flowchart we try to obtain as many structures as possible from
PDB. The number is limited by the computer power that we have available in our
laboratory. We want to choose proteins with different sizes (numbers of amino acids),
different folds, and different homology classes as much as possible. We also want to
use only those with high experimental resolutions. Note that only atomic coordinates
of proteins are extracted from PDB (and coordinates from other molecules such as
crystal water are neglected).
If we use data from X-ray experiments, hydrogen atoms are missing, and thus
in Step 2 we have to add hydrogen coordinates. Many protein simulation software
packages provide with routines that add hydrogen atoms to the PDB coordinates,
and one can use one of such routines.
We now have N protein coordinates ready, but usually such “raw data” result in
very high total potential energy and strong forces will be acting on some of the atoms
in the molecules. This is because the hydrogen coordinates that we added as above are
not based on experimental results and have rather large uncertainties. The coordinates
of heavy atoms from PDB also have experimental errors. We take the position that
we leave the coordinates of heavy atoms as they are in PDB as much as possible,
and adjust the hydrogen coordinates to reduce this mismatch. This is why we want
Optimizations of Protein Force Fields 213
4. Optimize the first set of force-field parameters by minimizing F in Eq. (23) (calculated
from the refined structures obtained in 3.) with respect to these first set of parameters
6. Optimize the second set of force-field parameters by minimizing F in Eq. (23) (calculated
from the refined structures obtained in 5.) with respect to these second set of parameters
No
Convergent ?
Yes
Fig. 1 The flowchart of our method for the optimization of force-field parameters
214 Y. Sakae and Y. Okamoto
to include as many PDB data as possible with high experimental resolutions (so that
the effects of experimental errors in PDB may be minimal). We thus minimize the
total potential energy E tot = E conf + E solv + E constr with respect to the coordinates
for each protein conformation, where E constr is the constraint energy term that is
imposed on the heavy atoms in PDB (it is referred to as the “predefined constraints”
in Steps 3 and 5 in Fig. 1):
E constr = K x (x − x0 )2 . (29)
heavy atom
Here, K x is the force constant of the restriction, and x0 are the original coordinate
vectors of heavy atoms in PDB. Because we are searching for the nearest local-
minimum states, usual minimization routines such as the conjugate-gradient method
and Newton-Raphson method can be employed here. As one can see from Eq. (29),
the coordinates of hydrogen atoms will be mainly adjusted, but unnaturally displaced
heavy-atom coordinates will also be modified.
Given N set of “ideal” reference coordinates in Step 3 of the flowchart, we now
optimize the first set of force-field parameters in Step 4. In Eq. (1) we have five classes
of force-field parameters as mentioned above. Namely, the force-field parameters are
those in the bond-stretching term (K and eq ), those in the bond-bending term (K θ
and θeq ), those in the torsion term (Vn and γn ), those in the Lennard-Jones term (Ai j
and Bi j ), and those in the electrostatic term (qi ). Because they are of very different
nature, we believe that it is better to optimize these classes of force-field parameters
separately (as in Steps 4, 6, and so on in Fig. 1). Note also that if we optimize all
the parameters simultaneously, the null result (with all the parameter values equal to
zero) is a solution to Eq. (22). This is the main reason why we optimize each class
of parameters separately.
For each set of force-field parameters, the optimization is carried out by minimiz-
ing F in Eq. (23) with respect to these parameters. Here, E tot in Eq. (24) is given by
Eq. (25). For this purpose usual minimization routines such as the conjugate-gradient
method are not adequate, because we need a global optimization. One should employ
more powerful methods such as simulated annealing [44] and generalized-ensemble
algorithms [4]. We perform this minimization simulation in the above parameter
space to obtain the parameter values that give the global minimum of F.
These processes are repeated until the optimized force-field parameters converge.
We can, in principle, optimize all the force-field parameters following the flowchart
in Fig. 1. In the examples given below, however, we just optimize two classes of the
force-field parameters for simplicity; namely, the partial charges and the backbone
torsion-energy parameters. For the optimization of the partial charges (qi ), we impose
a condition that the total charge of each amino acid remains constant, which is the
usual assumption adopted by the force fields of Eq. (1) based on classical mechanics.
As for the main chain torsion-energy parameters, we use the following functional
form for each backbone dihedral angle φ and ψ [see Eq. (4)]:
Optimizations of Protein Force Fields 215
Va Vb
E Φ=φ,ψ = 1 + cos(n a Φ − γa ) + 1 + cos(n b Φ − γb )
2 2 (30)
Vc
+ 1 + cos(n c Φ − γc ) .
2
We optimize only the parameters (Va , Vb , and Vc ) and fix the number of waves (n a ,
n b , and n c ) and the phases (γa , γb , and γc ) as in the original force field. This torsion-
energy parameter optimization strongly depends on the values of the force constant
K x of the constraint energy in Eq. (29). The larger the values of K x are, the larger
those of Va , Vb , and Vc tend to be. In order to minimize such dependences, we impose
the constraint that the total area enclosed by the curve of |E Φ | (from Φ = −180◦ to
180◦ ) remains less than or equal to the original value during the optimization.
We believe that these two classes of parameters have the most uncertainty among
all the force-field parameters. This is because partial charges are usually obtained
by quantum chemistry calculations of an isolated amino acid in vacuum separately,
which is a very different condition from that in amino acids of proteins in aqueous
solution, and because the torsion-energy term is the most problematic (for instance,
the parm94, parm96, and parm99 versions of AMBER differ mainly in backbone
torsion-energy parameters).
Moreover, when we perform the optimizations of force-field parameters by using
F in Eq. (23), we can neglect unnaturally large forces acting on atoms in order to
remove the errors of PDB structures. Namely, we can exclude the term for f im in
Eq. (23) that satisfies
f i > f cut . (31)
m
Here, n is the total number of backbone dihedral angles (φ and ψ angles) in all
molecules, Φinative is the i-th backbone dihedral angle of the native structures and Φimin
is the corresponding i-th backbone dihedral angle of the minimized structures using
the trial force-field parameters. The optimal value of f cut is chosen so that ΦRMSD
is the minimal value with f cut ≤ f cut
max max
, where f cut is obtained in an appropriate way
(see an example below).
We now describe our second method for optimizing the force-field parameters. We
use N proteins again from PDB. If the force-field parameters are of ideal values,
we expect that all the chosen native structures minimized by the ideal force field do
not change after minimizations. Namely, we believe that force-field parameters are
216 Y. Sakae and Y. Okamoto
where N
R M S Di
CRMSD = i=1
. (34)
N
Here, R M S Di is the root-mean-square deviation of coordinates between the native
structure of protein i and the corresponding minimized structure using the trial force-
field parameters. In reality, CRMSD = 0, and because CRMSD ≥ 0, we expect that
we can optimize the force-field parameters by minimizing CRMSD with respect to
these force-field parameters. In practice, we perform a simulation in the force-field
parameter space for this minimization. Namely, in the previous method we minimize
F in Eq. (23), and in the present method we minimize CRMSD in Eq. (34) instead.
We now describe our third method for optimizing the force-field parameters. We
first select N proteins from PDB as in the previous two methods. If the force-field
parameters are of ideal values, we expect that all the chosen native structures min-
imized by the ideal force field do not change. Namely, we believe that force-field
parameters are better, if they have lower deviations obtained from minimizations of
protein structures. Hence, we expect
ΦRMSD = 0, (35)
where
n
1
ΦRMSD = (Φ native − Φimin )2 . (36)
n i=1 i
Here, n is the total number of backbone dihedral angles (φ and ψ angles) in all
molecules, Φinative is the i-th backbone dihedral angle of the native structures and Φimin
is the corresponding i-th backbone dihedral angle of the minimized structures using
the trial force-field parameters. In reality, ΦRMSD = 0, because ΦRMSD ≥ 0, we
expect that we can optimize the force-field parameters by minimizing ΦRMSD with
respect to these force-field parameters. In practice, we perform a simulation in the
force-field parameter space for this minimization.
However, our first aim is to determine the balance of secondary-structure-forming
tendencies such as helix structure and β-sheet structure. Moreover, it is difficult
to perform the minimization of ΦRMSD in wider force-field paramter space until
ΦRMSD is close to 0 because of the computational cost. Therefore, we only focus on
secondary-structure regions of helix structure and β-sheet structure in the amino-acid
Optimizations of Protein Force Fields 217
sequence. Namely, we only consider the backbone dihedral angles of residues in the
native structures which are identiffied by the DSSP program [45] that they constitute
one of α-helix, 3/10-helix, π -helix, and β-sheet structures. We calculate two kinds
of ΦRMSD for secondary structures, namely, ΦRMSDhelix and ΦRMSDβ . Here,
ΦRMSDhelix stands for ΦRMSD of backbone dihedral angles of residues which
have helix structures in the native structures, and ΦRMSDβ means that of only
β-sheet structures in the native structures. Using these two ΦRMSDs, we want to
optimize the torsion-energy parameters, which will have better balance of secondary-
structure-forming tendencies. We propose the following combination:
We now describe our fourth method for optimizing the force-field parameters. In this
method, we prepare M protein structures, which are some experimentally determined
conformations. For these proteins, we perform MD simulations, which start from the
experimental conformations, by using a trial force field. We try to perform MD
simulations with varied values of force-field parameters. After that, we estimate the
“S” value defined by the following function from the trajectories of the M proteins
obtained from the trial MD simulations:
M
n iS→U n iU→S
S= + . (38)
i=1
NiS NiU
Here, n iS→U is the number of the amino acids in protein i where their structures
in PDB (initial conformation) had some secondary structures (such as α-helix, 310 -
helix, π -helix, and β structures) but transformed into unstructured, coil structures
without any secondary structures after a short MD simulation. Likewise, n iU→S is
is the number of amino acids in protein i where their structures in PDB had coil
structures but transformed to have some secondary structures after a MD simulation.
NiS is the total number of amino acids in protein i which have some secondary
structures in PDB, and NiU is the total number of amino acids in protein i which have
coil structures in PDB.
When we calculate the S values for the conformations obtained from MD sim-
ulations by using trial force-field parameters, the parameter set, which yields the
minimum S value, is considered to give the optimized force field.
218 Y. Sakae and Y. Okamoto
This function has 29 Fourier-coefficient parameters. We will see below that this
number of Fourier terms is sufficient for most of our purposes.
We first check how well the truncated Fourier series in Eq. (39) can reproduce the
six original backbone torsion-energy terms in Eqs. (8)–(13). Because these functions
are already the sum of one-dimensional Fourier series and subsets of the double
Fourier series in Eq. (14), the Fourier coefficients in Eq. (15) can be analytically
calculated and agree with those in Eqs. (8)–(13) except for the last one (that for
cos 4ψ) in Eq. (8). This term is missing in Eq. (39). These cases thus give us good
test of numerical integrations in Eq. (15). The numerical integrations were evaluated
as follows. We divided the Ramachandran space (−180◦ < φ̃ < 180◦ , −180◦ < ψ̃ <
180◦ ) into unit square cells of side length ε̃ (in degrees). Hence, there are (360/ε̃)2
unit cells altogether. The double integral on the right-hand side of Eq. (15) was
π π π π
approximated by the sum of E 180 φ̃, 180 ψ̃ x 180 φ̃, 180 ψ̃ × (ε̃)2 , where each
π π π π
E 180 φ̃, 180 ψ̃ x 180 φ̃, 180 ψ̃ was evaluated at one of the four corners of each
unit cell. We tried two values of ε̃ (1◦ and 10◦ ). Both cases gave almost complete
agreement of Fourier coefficients with the resutls of the analytical integrations (see,
for example, Table 4).
In Fig. 2 we compare the six original backbone torsion-energy surfaces with those
of the corresponding double Fourier series in Eq. (39). Hereafter, the primed labels
for figures such as (a ) indicate that the results are those of the double Fourier series.
As can be seen from Fig. 2, the backbone torsion-energy surfaces are in complete
agreement for all force fields except for AMBER parm94, whereas we see a little
difference for AMBER parm94 between Fig. 2a, a . As discussed above, this slight
difference for AMBER parm94 reflects the fact that the cos 4ψ term in Eq. (8) is
missing in the truncated double Fourier series in Eq. (39).
Optimizations of Protein Force Fields 219
Table 4 Fourier coefficients in Eq. (39) obtained from the numerical evaluations of the integrals in
Eq. (15). “org94” stands for the original AMBER parm94 force field.“mod94(α)” and “mod94(β)”
stand for AMBER parm94 force fields that were modified to enhance α-helix structures and β-sheet
structures, respectively, by Eqs. (17) and (18). The bin size ε̃ is the length of the sides of each unit
square cell for the numerical integration in Eq. (15)
Bin size ε̃ 1◦ 10◦
Coefficient org94 mod94(α) mod94(β) org94 mod94(α) mod94(β)
2.700000 2.308359 1.916719 2.700000 2.308370 1.916742
a
0.000000 −0.330937 0.781150 0.000000 −0.331053 0.781041
b1
0.000000 0.509599 0.930938 0.000000 0.509517 0.930809
c1
−0.200000 −0.101549 −0.115937 −0.200000 −0.101513 −0.115970
b2
0.000000 0.221123 −0.476745 0.000000 0.221100 −0.476558
c2
0.000000 −0.018073 0.031693 0.000000 −0.018084 0.031714
b3
0.000000 −0.002862 −0.018298 0.000000 −0.003036 −0.018310
c3
−0.750000 −1.164401 −0.052959 −0.750000 −1.164500 −0.052874
d1
0.000000 0.444390 −0.995478 0.000000 0.444289 −0.995599
e1
−1.350000 −1.333115 −1.184428 −1.350000 −1.333073 −1.184340
d2
0.000000 0.241460 0.454905 0.000000 0.241451 0.455147
e2
0.000000 −0.014220 0.035349 0.000000 −0.014143 0.035324
d3
0.000000 −0.011515 0.009472 0.000000 −0.011671 0.009465
e3
0.000000 −0.342789 −0.680493 0.000000 −0.343087 −0.680497
f 11
0.000000 0.367596 0.971845 0.000000 0.367697 0.971851
g11
0.000000 0.527849 −0.810980 0.000000 0.527949 −0.810985
h 11
0.000000 −0.566049 1.158199 0.000000 −0.565751 1.158206
i 11
0.000000 0.090016 −0.064642 0.000000 0.090168 −0.064636
f 21
0.000000 −0.096530 0.092318 0.000000 −0.096472 0.092309
g21
(continued)
220 Y. Sakae and Y. Okamoto
Table 4 (continued)
Bin size ε̃ 1◦ 10◦
Coefficient org94 mod94(α) mod94(β) org94 mod94(α) mod94(β)
0.000000 0.202178 0.366601 0.000000 0.202421 0.366565
h 21
0.000000 −0.216810 −0.523561 0.000000 −0.216596 −0.523509
i 21
0.000000 0.012329 −0.142682 0.000000 0.012385 −0.142712
f 12
0.000000 0.176308 −0.392017 0.000000 0.176622 −0.392098
g12
0.000000 −0.018984 −0.170042 0.000000 −0.019013 −0.170077
h 12
0.000000 −0.271490 −0.467187 0.000000 −0.271321 −0.467284
i 12
0.000000 −0.000586 −0.002453 −0.000001 −0.000585 −0.002451
f 22
0.000000 −0.008378 −0.006738 0.000000 −0.008397 −0.006733
g22
0.000000 −0.001316 0.013909 0.000000 −0.001317 0.013897
h 22
0.000000 −0.018817 0.038215 0.000000 −0.018867 0.038183
i 22
Fig. 2 Backbone-torsion-energy surfaces of six force fields. The backbone dihedral angles φ̃ and
ψ̃ are in degrees. a, b, c, d, e, and f are those of the original AMBER parm94, the original AMBER
parm96, the original AMBER parm99, the original CHARMM 27, the original OPLS-AA, and the
original OPLS-AA/L, respectively. a –f are those of a–f, respectively, that were expressed by the
truncated double Fourier series in Eq. (39). The contour lines are drawn every 0.5 kcal/mol
We used (φ̃0 , ψ̃0 ) = (−57◦ , −47◦ ) and (φ̃0 , ψ̃0 ) = (−130◦ , 125◦ ) in order to
enhance α-helix-forming tendency and β-sheet-forming tendency, respectively. The
central values f (φ̃0 , ψ̃0 ) that we used were 3.0 and 6.0 kcal/mol for enhancing α-
helix and β-sheet, respectively, in the case of AMBER parm94, AMBER parm99,
CHARMM27, and OPLS-AA/L. They were both 3.0 kcal/mol in the case of AMBER
parm96 and OPLS-AA.
We remark that the large value of f (φ̃0 , ψ̃0 ), 6.0 kcal/mol, that was necessary to
enhance β-sheet in the case of AMBER parm94, AMBER parm99, CHARMM27,
and OPLS-AA/L reflects the fact that their original force fields favor α-helix.
In Fig. 3a1–f1 we compare the six backbone torsion-energy surfaces modified
according to Eq. (17), which reduced the torsion energy in the α-helix region, with
those of the corresponding double Fourier series in Eq. (39). In Fig. 3a1–f1, α-helix
is enhanced from the original AMBER parm94 (a1), AMBER parm96 (b1), AMBER
parm99 (c1), CHARMM27 (d1), OPLS-AA (e1), and OPLS-AA/L (f1). In Fig. 4a1–
f1 we show the case of the β-sheet region, and β-sheet is enhanced from the original
AMBER parm94 (a1), AMBER parm96 (b1), AMBER parm99 (c1), CHARMM27
(d1), OPLS-AA (e1), and OPLS-AA/L (f1).
These modified backbone torsion-energy functions were expanded by the trun-
cated double Fourier series in Eq. (39) by evaluating the corresponding Fourier coef-
222 Y. Sakae and Y. Okamoto
Fig. 3 Backbone-torsion-energy surfaces of six force fields that were modified by Eqs. (17),
(18) and (39). From a1 to f1 are those of AMBER parm94, AMBER parm96, AMBER parm99,
CHARMM 27, OPLS-AA, and OPLS-AA/L force fields that were modified to enhance α-helix
structures, respectively. From a1 to f1 are those of AMBER parm94, AMBER parm96, AMBER
parm99, CHARMM 27, OPLS-AA, and OPLS-AA/L force fields that were expanded by the trun-
cated double Fourier series in Eq. (39)
ficients from Eq. (15). For the numerical integration we again tried two values of the
bin size ε̃ (1◦ and 10◦ ). The obtained Fourier coefficients are summarized in Table 4,
for example, in the case of AMBER parm94. For comparisons, the Fourier coeffi-
cients of the original AMBER force fields (before modifications) are also listed. We
see that the two choices of the bin size ε̃ gave essentially the same results (agreeing
in about 3 digits).
In Figs. 3a1 –f1 and 4a1 –f1 we show the backbone torsion-energy surfaces rep-
resented by the truncated double Fourier series. Comparing these with the original
ones in Figs. 3a1–f1 and 4a1–f1, we find that the overall features of the energy sur-
faces are well reproduced by the Fourier series. If more accuracy is desired, we can
simply increase the number of Fourier terms in the expansion. As we will see below,
the present accuracy of the Fourier series was sufficient for the purpose of controlling
the secondary-structure-forming tendencies towards α-helix or β-sheet.
We examined the effects of the above modifications of the backbone torsion-
energy terms in AMBER parm94, AMBER parm96, AMBER parm99, CHARMM27,
OPLS-AA, and OPLS-AA/L (towards specific secondary structures) by performing
the folding simulations of two peptides, namely, C-peptide of ribonuclease A and the
C-terminal fragment of the B1 domain of streptococcal protein G, which is some-
Optimizations of Protein Force Fields 223
Fig. 4 Backbone-torsion-energy surfaces of six force fields that were modified by Eqs. (17),
(18) and (39). From a1 to f1 are those of AMBER parm94, AMBER parm96, AMBER parm99,
CHARMM 27, OPLS-AA, and OPLS-AA/L force fields that were modified to enhance β-sheet
structures, respectively. From a1 to f1 are those of AMBER parm94, AMBER parm96, AMBER
parm99, CHARMM 27, OPLS-AA, and OPLS-AA/L force fields that were expanded by the trun-
cated double Fourier series in Eq. (39)
times referred to as G-peptide [47]. The C-peptide has 13 residues and its amino-acid
sequence is Lys-Glu-Thr-Ala-Ala-Ala-Lys-Phe-Glu-Arg-Gln-His-Met. This peptide
has been extensively studied by experiments and is known to form an α-helix struc-
ture [48, 49], as shown in Fig. 5a. Because the charges at peptide termini are known to
affect helix stability [48, 49], we blocked the termini by a neutral COCH3 - group and
a neutral -NH2 group. The G-peptide has 16 residues and its amino-acid sequence is
Gly-Glu-Trp-Thr-Tyr-Asp-Asp-Ala-Thr-Lys-Thr-Phe-Thr-Val-Thr-Glu. The termini
were kept as the usual zwitter ionic states, following the experimental conditions [47,
50, 51]. This peptide is known to form a β-hairpin structure by experiments [47, 50,
51], as shown in Fig. 5b.
Simulated annealing [44] MD simulations were performed for both peptides from
fully extended initial conformations, where the 12 versions of the truncated double
Fourier series (which were described in Table 4 and in Figs. 3a1 –f1 and 4a1 –f1 )
were used for the backbone torsion-energy terms of AMBER parm94, AMBER
parm96, AMBER parm99, CHARMM27, OPLS-AA, and OPLS-AA/L force fields.
For comparisons, the simulations with the original force fields were also performed.
The unit time step was set to 1.0 fs. Each simulation was carried out for 1 ns (hence,
it consisted of 1,000,000 MD steps). The temperature during MD simulations was
controlled by Berendsen’s method [53]. For each run the temperature was decreased
224 Y. Sakae and Y. Okamoto
exponentially from 2000 to 250 K. We modified and used the program package
TINKER version 4.1 [54] for all the simulations. As for solvent effects, we used
the GB/SA model [42, 43] included in the TINKER program package. For both
peptides, these folding simulations were repeated 60 times with different sets of
randomly generated initial velocities.
In Fig. 6, we show seven (out of 60) lowest-energy final conformations of C-
peptide and G-peptide obtained by the simulated annealing MD simulations, for
example, in the case of AMBER parm94.
In figure, we see that all conformations of the original AMBER parm94 (except
for conformations 2 and 4 of G-peptide) and all conformations of its force field
modified towards α-helix are α-helix structures (conformations 2 and 4 are 310 -
helix structures). The results show that the original AMBER parm94 favors α-helix
structures, and moreover, its force field modified towards α-helix favors α-helix
structures more than the original force field in the sense that the obtained helices are
more extended (and almost entirely helical). On the other hand, AMBER parm94
modified towards β-sheet favors β structures strongly. The results for other force
fields were similar.
Therefore, regardless of the secondary-structure-forming tendencies of the orig-
inal force fields, our modifications of the backbone torsion-energy term succeeded
in enhancing the desired secondary structures.
Fig. 6 Seven lowest-energy final conformations of C-peptide a–a and G-peptide b–b obtained
from six sets of 60 simulated annealing MD runs. a and b are the results of the original AMBER
parm94. a and b are the results of AMBER parm94 of the truncated double Fourier series of six
force fields that were modified to enhance α-helix structures. a and b are the results of AMBER
parm94 of the truncated double Fourier series of six force fields that were modified to enhance
β-sheet structures. The conformations are ordered in the increasing order of energy for each case.
The figures were created with DS Visualizer v1.5 [55]
226 Y. Sakae and Y. Okamoto
of folds given by SCOP (version 1.73 in November 2007) [65]. Namely, we used
29 all α, 18 all β, 16 α/β, and 37 (α + β) proteins (see Table 5 and Fig. 7). We
then refined these selected 100 structures. We added hydrogen atoms to the PDB
coordinates by using the AMBER11 program package [57]. We thus minimized the
total potential energy E total = E conf + E solv + E constr with respect to the coordinates
conformation, where E constr is the harmonic constraint energy term
for each proten
(E constr = heavy atom K x (x − x0 )2 ), and E solv is the solvation energy term. Here, K x
is the force constant of the restriction and x0 are the original coordinate vectors of
heavy atoms in PDB. As one can see from E constr , the coordinates of hydrogen atoms
Fig. 7 Structures of 100 proteins in Table 5 which were used in the optimization of force-field
parameters
Table 6 Optimized V1 /2 parameters for the main-chain dihedral angles ψ and ψ for the 19 amino-
acid residues (except for proline) in Eq. (21). The rest of the parameters are taken to be the same
as in the original ff03 force field. The original amino-acid-independent values are also listed for
reference
ψ (N–Cα –C–N) ψ (Cβ –Cα –C–N)
original ff03 0.6839 0.7784
Ala 0.122 0.150
Arg 0.409 0.200
Asn −0.074 −0.162
Asp −0.137 0.182
Cys 0.361 0.089
Gln 0.144 −0.024
Glu 0.180 0.152
Gly 0.258 –
His 0.020 0.237
Ile 0.643 0.194
Leu 0.382 0.257
Lys 0.222 0.042
Met 0.141 0.346
Phe −0.010 0.553
Ser −0.248 0.475
Thr 0.512 0.328
Trp 0.027 0.477
Tyr 0.082 0.652
Val 0.142 0.590
(a-1) (a-2)
(b-1) (b-2)
Fig. 8 α-helicity (a-1) and β-strandness (a-2) of C-peptide and α-helicity (b-1) and β-strandness
(b-2) of G-peptide as functions of the residue number at 300 K. These values were obtained from
the REMD simulations. Normal and dotted curves stand for the optimized and the original AMBER
ff03 force fields, respectivery
the DSSP program [45], which is based on the formations of the intra-main-chain
hydrogen bonds. As is shown in Fig. 8, for the original AMBER ff03 force field,
the α-helicity is clearly higher than the β-strandness not only in C-peptide but also
in G-peptide. Namely, the original AMBER ff03 force field clearly favors α-helix
and does not favor β-structure. On the other hand, for the optimized force field,
in the case of C-peptide, the α-helicity is higher than the β-strandness, and in the
case of G-peptide, the β-strandness is higher than the α-helicity. We conclude that
these results obtained from the optimized force field are in better agreement with the
experimental results in comparison with the original force field. In Fig. 9, 310 -helicity
and π -helicity of two peptides obtained from the REMD simulations are shown. For
310 helicity, there is no large difference for both force fields in C-peptide, and in
the case of G-peptide, the value of the optimized force field slightly decreases in
comparison with the original force field. π -helicity has almost no value in the both
cases of the original and optimized force fields in two peptides.
In Fig. 10, α-helicity and β-strandness as functions of temperature for the two
peptides obtained from the REMD simulations are shown. For α-helicity, the values
230 Y. Sakae and Y. Okamoto
(a-1) (a-2)
(b-1) (b-2)
Fig. 9 310 -helicity (a-1) and π -helicity (a-2) of C-peptide and 310 -helicity (b-1) and π -helicity
(b-2) of G-peptide as functions of the residue number at 300 K. These values were obtained from
the REMD simulations. Normal and dotted curves stand for the optimized and the original AMBER
ff03 force fields, respectivery
of both force fields decrease gradually from low temperature to high temperature
in the case of C-peptide. On the other hand, in the case of G-peptide, there are
small peaks at around 300 and 358 K for the original and optimized force fields,
respectively. For β-strandness, in the case of C-peptide, it is almost zero for both
force fields. In the case of G-peptide, for the optimized force field, there is clearly a
peak around 300 K.
We now present the results of our force-field optimizations. In Step 1 of the flowchart
in Fig. 1, we chose 100 PDB files (N = 100) from X-ray experiments with resolution
Optimizations of Protein Force Fields 231
(a-1) (a-2)
(b-1) (b-2)
Fig. 10 α-helicity (a-1) and β-strandness (a-2) of C-peptide and α-helicity (b-1) and β-strandness
(b-2) of G-peptide as functions of temperature. These values were obtained from the REMD sim-
ulations. Normal and dotted curves stand for the optimized and the original AMBER ff03 force
fields, respectivery
1.8 Å or better and with less than 200 residues (the average number of resiudes is
120.4) from PISCES [62]. Their PDB codes are 2LIS, 1EP0, 1TIF, 1EB6, 1C1L,
1CCW, 2PTH, 1I6W, 1DBF, 1KPF, 1LRI, 1AAP, 1C75, 1CC8, 1FK5, 1KQR, 1K1E,
1CZP, 1GP0, 1KOI, 1IQZ, 3EBX, 1I40, 1EJG, 1AMM, 1I07, 1GK8, 1GVP, 1M4I,
1EYV, 1E29, 1I2T, 1VCC, 1FM0, 1EXR, 1GUT, 1H4X, 1GBS, 1B0B, 119L, 1IFC,
1DLW, 1EAJ, 1GGZ, 1JR8, 1RB9, 1VAP, 1JZG, 1M55, 1EN2, 1C9O, 2ERL, 1EMV,
1F41, 1EW6, 2TNF, 1IFR, 1JSE, 1KAF, 1HZT, 1HQK, 1FXL, 1BKR, 1ID0, 1LQV,
1G2R, 1KR7, 1QTN, 1D4O, 1EAZ, 2CY3, 1UGI, 1IJV, 3VUB, 1BZP, 1JYR, 1DZK,
1QFT, 1UTG, 2CPG, 1I6W, 1C7K, 1I8O, 1LO7, 1LNI, 1EQO, 1NDD, 1HD2, 3PYP,
1FD3, 1DK8, 1WHI, 1FAZ, 4FGF, 2MHR, 1JB3, 2MCM, 1IGD, 1C5E, and 1JIG.
In Step 2 of the flowchart, we used the routine in the TINKER package to add
hydrogen atoms to the PDB coordinates. The force fields that we optimized are the
AMBER parm94 version [7], parm96 version [8], parm99 version [9], CHARMM
version 22 [12], and OPLS-AA [15]. We have optimized only two sets of parameters.
The first set is the partial-charge parameters [qi in Eqs. (5) and (27)]. In order to
simplify the constraint-imposing processes on the total charge, we did not optimize
232 Y. Sakae and Y. Okamoto
the charge of one of the hydrogen atoms (HN) in proline when it is located at tht
N-terminus. In the original X-ray data, hydrogen coordinates are missing, and in the
case of neutral histidine whether Nδ and Nε are protonated or not is non-trivial to
determine. Because we want to deal with as many as PDB data as possible, we treated
all the histidine residues as positively charged histidine for simplicity. Among the five
force fields, AMBER has the largest number of remaining partial-charge parameters
(602). We thus optimized these 602 parameters for all the five force fields. The second
set of parameters that we optimized is the backbone torsion-energy parameters [Va ,
Vb , and Vc in Eq. (30)] and there are six such parameters (three each for φ and ψ).
As explained in detail above, the coodinates of the 100 proteins molecules have
been prepared (Steps 1 and 2 of the flowchart in Fig. 1). The coordinate refinement
in Step 3 of the flowchart was then carried out with the constraint in Eq. (29) on the
heavy atoms. As for the force constant K x in Eq. (29), we have some freedom for
the choice of the values. Our choice is: K x should be of the same order as K l in the
bond-stretching term in Eq. (2). The force constant K l in AMBER varies from 1662
to 656 kcal/mol/Å2 , and that in CHARMM varies from 1732 to 650 kcal/mol/Å2 .
Hence, in our first trial we set K x = 100 kcal/mol/Å2 .
In Step 4 of the flowchart, we performed the optimization of the 602 partial-charge
parameters by MC simulated annealing. Namely, we minimized F in Eq. (23) by MC
simulated annealing simulations of these parameters (the parameters were updated
and the updates were accepted or rejected according to the Metropolis criterion). For
this we introduced an effective “temperature” for the parameter space. The simulation
run consisted of 50,000 MC sweeps with the temperature decreased exponentially
from 20 to 0.01. The simulation was repeated 10 times with different initial random
numbers. We found that F decreased quickly in the beginning until about 5000 MC
sweeps and then it decreased very slowly for all force fields; the total number of MC
sweeps (50,000) seemed sufficient. The optimized partial charges were taken from
those that resulted in the lowest F value.
In Tables 7, 8 and 9, five examples (glycine, alanine, and glutamic acid) of the
obtained partial charges together with the original force-field values are listed. We
see from these tables that the values of the partial charges have not changed a lot.
Although the sign of the partial charges remains the same for those with large magni-
tude, charges with small magnitude sometimes change their signs (see, for example,
CA of glycine and CG of glutamic acid).
In Step 5 of the flowchart, the original coordinates obtained in Step 2 were again
refined with the constraints in Eq. (29), but this time the optimized parameters from
Step 4 were used. This time we used the value K x = 500 kcal/mol/Å2 . For all force
fields, the average RMSD of the 100 proteins was 0.012 Å, and the coordinates of
heavy atoms had little changed.
In Step 6 of the flowchart, we carried out the optimization of the six torsion-
energy parameters (Va , Vb , and Vc in Eq. (30) for both φ and ψ) by minimizing F
in Eq. (23) with MC simulated annealing simulations in this parameter space. The
simulation run consisted of 10,000 MC sweeps with the temperature decreasing from
1000 to 1.0. The simulation was repeated six times with different random numbers.
We stopped after six trials because the convergence was very good. The optimized
Optimizations of Protein Force Fields 233
Table 7 Partial-charge parameters of glycine. AMB, CHA, and OPLS respectively stand for the
original AMBER, CHARMM version 22, and OPLS-AA force fields. Opt(94), Opt(96), Opt(99),
Opt(CH), and Opt(OP) are the optimized AMBER parm94, AMBER parm96, AMBER parm99,
CHARMM version 22, and OPLS-AA, respectively
Atom AMB Opt(94) Opt(96) Opt(99) CHA Opt(CH) OPLS Opt(OP)
N −0.4157 −0.3471 −0.3614 −0.3506 −0.4700 −0.4381 −0.5000 −0.5153
CA −0.0252 0.0175 0.0148 0.0166 −0.0200 0.0185 0.0800 0.0909
C 0.5973 0.5526 0.5698 0.5577 0.5100 0.5309 0.5000 0.6459
HN 0.2719 0.2492 0.2509 0.2480 0.3100 0.3004 0.3000 0.2615
O −0.5679 −0.5980 −0.5977 −0.5983 −0.5100 −0.5491 −0.5000 −0.5546
HA 0.0698 0.0629 0.0618 0.0633 0.0900 0.0687 0.0600 0.0358
Total 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
torsion-energy parameters were taken from those that resulted in the lowest F value.
The obtained torsion-energy parameters are listed in Tables 10 and 11.
In the present work, we stopped our process in Step 6 of the flowchart and did not
iterate the optimizations.
In order to examine how much the torsion-energy terms have changed after opti-
mizations, we depict them in Fig. 11 (we remark that the error of factor 2 in the
ordinate of Fig. 5e1 in Ref. [26] is corrected here). Although the behaviors of the
original force fields are quite different, those of the optimized force fields are rather
similar. For example, the optimized torsion-energy curves for φ angles have two max-
imum peaks around φ ∼ −60◦ and +60◦ and a local minimum at φ = 0◦ , while those
for ψ angle have two peaks around ψ ∼ −100◦ and +100◦ and a local minimum at
ψ = 0◦ (the exceptions are those for CHARMM version 22 and OPLS-AA, which
give the global maximum and a local maximum, respectively, at ψ = 0◦ ). These
Table 10 Torsion parameters of φ angle. Parm94, Parm96, Parm99, CHARMM, and OPLS are
AMBER parm94, AMBER parm96, AMBER parm99, CHARMM version 22, and OPLS-AA force
fields, respectively. “Optimized” stands for the corresponding optimized force field
Force field Va na γa Vb nb γb Vc nc γc
Parm94 0.200 2 180.0 – – – – – –
Optimized 0.191 1 0.0 0.146 2 180.0 −0.223 3 0.0
Parm96 0.850 1 0.0 0.300 2 180.0 – – –
Optimized 1.182 1 0.0 0.359 2 180.0 −0.410 3 0.0
Parm99 0.800 1 0.0 0.850 2 180.0 – – –
Optimized 1.380 1 0.0 0.599 2 180.0 −0.330 3 0.0
CHARMM 0.200 1 180.0 – – – – – –
Optimized −0.047 1 180.0 0.240 2 180.0 −0.015 3 0.0
OPLS −2.365 1 0.0 0.912 2 180.0 −0.850 3 0.0
Optimized 0.502 1 0.0 1.811 2 180.0 −0.567 3 0.0
(a1)
Energy (kcal/mol
(a2)
Energy (kcal/mol
(b1) (b2)
Energy (kcal/mol
Energy (kcal/mol
(c1) (c2)
Energy (kcal/mol
Energy (kcal/mol
(d1) (d2)
Energy (kcal/mol
Energy (kcal/mol
(e1) (e2)
6
2
4
Energy (kcal/mol
Energy (kcal/mol
0
-180 -120 -60 0 60 120 180 2
-2 0
-180 -120 -60 0 60 120 180
-4 -2
Fig. 11 Backbone torsion-energy curves as functions of φ (in degrees) and ψ (in degrees). The
force fields are AMBER parm94 (a), AMBER parm96 (b), AMBER parm99 (c), CHARMM version
22 (d), and OPLS-AA (e). The results for the original force fields are represented by dotted curves,
and those for the optimized force fields are by solid curves
236 Y. Sakae and Y. Okamoto
results suggest that our optimizations of the torsion-energy term yield a tendency for
convergence towards a common function. Some remark is in order. The case for the
optimized CHARMM is the most distinct from other optimized parameters in the
sense that it gives the global maximum at ψ = 0◦ whereas that for other cases lies
around ψ ∼ −100◦ and +100◦ .
In Fig. 12 the potential-energy surfaces of the alanine dipeptide (ACE-ALA-NME)
are shown for the 10 force-field parameters: the original AMBER parm94, AMBER
parm96, AMBER parm99, CHARMM version 22, OPLS-AA, and the corresponding
optimized parameters. According to the ab initio quantum mechanical calculations,
there exist three local-minimum states in the energy surface [7]. They are conformers
C7eq , C5 , and C7ax , which correspond to (φ, ψ) ∼ (−80◦ , +80◦ ), (−160◦ , +160◦ ),
and (+75◦ , −60◦ ), respectively (C7eq is the global-minimum state). We remark that
these are the results of quantum chemistry calculations in vacuum, and so it is not
clear how reliable the results are to represent the dipeptide in aqueous solution.
The results of all five original force fields in Fig. 12a1–e1 seem to satisfy the above
conditions. Namely, there are three local-minimum states at the locations of C7eq , C5 ,
and C7ax , and the global-minimum state is C7eq . As for the results of the optimized
force fields in Fig. 12a2–e2, those for CHARMM version 22 and OPLS-AA also
satisfy the above conditions. Those of the optimized AMBER force fields are less
consistent with the quantum mechanical calculations: C7eq is no longer the global-
minimum state, but it is a local-minimum state. In particular, the optimized AMBER
parm99 seems to be in the greatest disagreement in the sense that the C7eq state is
almost disappearing.
We now present another example of the refinement of our backbone torsion energy
in Eq. (14). We consider the following truncated Fourier series:
This function has 13 Fourier-coefficient parameters. We will see below that this
number of Fourier terms is sufficient for the most of our purposes [34, 35], but that
for some cases more number of Fourier terms are preferred.
We optimized the force-field parameters of this double Fourier series by using our
optimization method. At first, we chose 100 PDB files from PDB-REPRDB [56].
We added hydrogen atoms to the PDB coordinates by using the TINKER program
package [54].
In our optimization method, the minimizations of F in Eq. (23) by the Monte
Carlo (MC) simulations of the 13 backbone-torsion-energy parameters with 3000
MC steps were performed. The initial values of 13 parameters were all set to be
zero. We performed MC simulations of the optimization for each f cut value 10 times
with different seeds for the random numbers. After that, the minimum F value was
selected from the results of the obtained 10 parameter sets for each case of the f cut
Optimizations of Protein Force Fields 237
AMBER parm99 (c1),
CHARMM version 22 (d1),
and OPLS-AA (e1), and the
corresponding optimized
parameters (a2)–(e2). The
contour maps were evaluated (b1) (b2)
every 10◦ of φ and ψ angles
and plotted every 1 kcal/mol,
after minimizing the total
potential energy in vacuum
with the backbone structures
fixed. The bluer the color is,
the lower the potential
energy surface is. As the
potential-energy value
increases, the color changes (c1) (c2)
from blue to green, to
yellow, and to red
(d1) (d2)
(e1) (e2)
238 Y. Sakae and Y. Okamoto
value. The overall parameter distributions were essentially the same for the 10 runs.
max
The maximum f cut value was taken to be f cut 9.0, which was selected from the
peak point in the distribution of the forces acting on each atom in the 100 protein
structures in Fig. 13. For the obtained several parameters, several ΦRMSD were
calculated by using Eq. (32). Here, if a difference between Φinative and Φimin of a
backbone dihedral angle in a protein was more than 20◦ , the value was ignored.
Because there are about 90% of differences between Φinative and Φimin including less
than 20◦ . In Fig. 14, the distribution of the backbone dihedral angles in the 100 protein
structures is shown. Namely, we wanted to consider the majority of the differences
of backbone dihedral angles. After the calculations of several ΦRMSD, we selected
f cut = 8.5 at the minimum value of ΦRMSD from the several those.
In Table 12, optimized double Fourier-coefficient parameters and the correspond-
ing original AMBER ff94 and ff96 force-field parameters are listed. Here, the original
AMBER ff94 has a Fourier coefficient that the number of waves is four. Therefore,
this coefficient set of the original AMBER ff94 is not complete. In Fig. 15, these
backbone-torsion-energy surfaces on the Ramachandran space are illustrated.
In order to test the validity of the force-field parameters obtained by our opti-
mization methods, we performed folding simulations using two peptides, namely,
C-peptide and G-peptide.
Table 12 Fourier coefficients in Eq. (39) obtained from the numerical evaluations of the integrals
in Eq. (15). “org94” and“org96” stand for the original AMBER ff94 and the original AMBER
ff96, respectively, “optimized” stands for the optimized force field obtained by our optimization
method. Here, the original AMBER ff94 has the Fourier coefficient that the number of waves is
four. Therefore, this coefficient set of the original AMBER ff94 is not complete
Coefficient org94 org96 Optimized
2.700 2.300 0.000
a
0.000 0.850 0.835
b1
−0.200 −0.300 −0.088
b2
0.000 0.000 −0.327
c1
0.000 0.000 0.100
c2
−0.750 0.850 0.287
d1
−1.350 −0.300 0.019
d2
0.000 0.000 −0.160
e1
0.000 0.000 −0.054
e2
0.000 0.000 −0.427
f 11
0.000 0.000 0.247
g11
0.000 0.000 0.114
h 11
0.000 0.000 0.603
i 11
For the folding simulations, we used REMD [60]. We used the TINKER program
package [54] modified by us for the folding simulations. The unit time step was
set to 1.0 fs. Each simulation was carried out for 5.0 ns (hence, it consisted of
5,000,000 MD steps) with 32 replicas. The temperature during MD simulations
was controlled by Nosé-Hoover method [63]. For each replica the temperature was
distributed exponentially from 700 to 250 K. As for solvent effects, we used the
GB/SA model [42, 43] included in the TINKER program package [54].
We checked the secondary-structure formations by the DSSP program [45]. In
Fig. 16, the helicity and strandness of C-peptide which were obtained with the opti-
mized force field, the original AMBER ff94, and the original AMBER ff96 are
shown. In comparison with the original AMBER ff94, the helicity of the optimized
force field decreases, and in comparison with the original AMBER ff96, that of the
240 Y. Sakae and Y. Okamoto
Fig. 15 The backbone-torsion-energy surfaces of the optimized force field (a), the original AMBER
ff94 (b), and the original AMBER ff96 are shown
optimized force field increases. As for the strandness, the original AMBER ff94 is
almost zero, and both the optimized force field and the original AMBER ff96 have
low strandness.
In Fig. 17, the helicity and strandness of G-peptide which were obtained with the
optimized force field, the original AMBER ff94, and the original AMBER ff96 are
shown. The helicity of the original AMBER ff94 obviously has high value as in the
case of C-peptide. On the other hand, the helicity of both the optimized force field
and the original AMBER ff96 decrease in comparison with the case of the original
AMBER ff94. However, in comarison with the original AMBER ff96, the optimized
force field slightly favors the helix structure in the region around amino-acid residues
6–8. In the experimental results, there is a turn region around residues 7–10 in G-
peptide, and the backbone-torsion angles of the turn conformation are similar to that
of the helix structure. Therefore, we consider that this tendency is not disagreement
with the experimental results. For the strandness, the original AMBER ff94 is also
Optimizations of Protein Force Fields 241
(a) (b)
Fig. 16 Helicity (a) and strandness (b) of C-peptide as functions of the residue number. These
values are obtained from the REMD [60] simulations at 300K. Normal, dashed, and dotted lines
stand for the optimized force field, the original AMBER ff94, and the original AMBER ff96,
respectively. There is only one secondary structural element (an α-helix in residues 4–12) in the
native structure (PDB ID: 1A5P). See Fig. 5a
(a) (b)
Fig. 17 Helicity (a) and strandness (b) of G-peptide as functions of the residue number. These
values are obtained from the REMD [60] simulations at 300 K. Normal, dashed, and dotted lines
stand for the optimized force field, the original AMBER ff94, and the original AMBER ff96,
respectively. There is only one secondary structural element (a β-hairpin; β-strands are in residues
2–6 and residues 11–15) in the native structure (PDB ID: 1PGA). See Fig. 5b
almost zero as in the case of C-peptide, and both the optimized force field and the
original AMBER ff96 have higher values of the strandness than those ot the helicity.
In Fig. 17b, the strandness decreases in the region around 7–8 residues in agreement
with the experiments.
These secondary-structure-forming tendencies of the optimized force field for
two peptides agree with experimental implications in comparison with those of the
original AMBER ff94 and ff96 force fields. Therefore, our improvement methods
succeeded in enhancing the accuracy of the AMBER force field.
242 Y. Sakae and Y. Okamoto
2. C–N–Cα –C (φ), N–Cα –C–N (ψ), C–N–Cα –Cβ and N–C–Cα –Cβ by Methods 1
and 2
to 1.0 fs. Each simulation was carried out for 10 ns (hence, it consisted of 10,000,000
MD steps) with 16 replicas. The temperature during MD simulations was controlled
by Nosé-Hoover method [63]. The temperature was distributed exponentially: 700,
662, 625, 591, 558, 528, 499, 471, 446, 421, 398, 376, 355, 336, 317, and 300 K. As for
solvent effects, we used the GB/SA model [42, 43] included in the TINKER program
package [54]. These folding simulations were repeated 10 times with different sets
of randomly generated initial velocities.
In Fig. 19, the helicity and strandness of C-peptide which were obtained with the
original OPLS-UA and its optimized force field are shown. These values are the
averages of the 10 REMD simulations at 300 K. In comparison with the helicity
of the original OPLS-UA, the helicity of the optimized force field increases at the
amino-acid sequence between 6 and 12. The strandness is almost zero for both the
original and the optimized OPLS-UA force fields.
In Fig. 20, the helicity and strandness of G-peptide with the original OPLS-UA
and its optimized force fields are shown. In comparison with the original OPLS-UA,
(a) (b)
Fig. 19 Helicity (a) and strandness (b) of C-peptide as functions of the residue number. These
values are the average of the 10 independent REMD [60] simulations at 300 K. Normal and dotted
lines stand for the optimized and original OPLS-UA force fields, respectively
(a) (b)
Fig. 20 Helicity (a) and strandness (b) of G-peptide as functions of the residue number. These
values are the average of the 10 independent REMD [60] simulations at 300 K. Normal and dotted
lines stand for the optimized and original OPLS-UA force fields, respectively
Optimizations of Protein Force Fields 245
the helicity of the optimized force field decreases in the area of amino-acid sequence
between 8 and 15, and in comparison with the original OPLS-UA, the strandness of
the optimized force field clearly increases at the two areas of amino-acid sequences
2–6 and 9–15. In the experimental results, there is a turn region around residues 7–10
and there are five intra-backbone hydrogen bond pairs, namely, between residue pairs
2–15, 3–14, 4–13, 5–12, and 6–11 in G-peptide. In Fig. 20b, the strandness decreases
in the region around 7–8 residues in agreement with the experiments.
These results show that the optimized force field favors helix structures more than
the original OPLS-UA in the case of C-peptide and favors β structures more than the
original OPLS-UA in the case of G-peptide. We see that these secondary-structure-
forming-tendencies of the optimized force field are better than those of the original
OPLS-UA.
In Figs. 21 and 22, we show the 20 lowest-energy conformations of C-peptide
and G-peptide obtained by the REMD simulations in the case of the original and
(a)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
(b)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
Fig. 21 Twenty lowest-energy conformations of C-peptide obtained from 10 sets of REMD [60]
simulation runs. a and b are the results of the original and optimized OPLS-UA force field, respec-
tively. The conformations are ordered in the increasing order of energy for each case. The figures
were created with DS Visualizer v1.5 [52]
246 Y. Sakae and Y. Okamoto
(a)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
(b)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
Fig. 22 Twenty lowest-energy conformations of G-peptide obtained from 10 sets of REMD [60]
simulation runs. a and b are the results of the original and optimized OPLS-UA force field, respec-
tively. The conformations are ordered in the increasing order of energy for each case. The figures
were created with DS Visualizer v1.5 [52]
optimized OPLS-UA force fields, respectively. In Fig. 21a, five conformations (Nos.
11, 13, 16, 18, and 19) have α-helix structures for the original OPLS-UA in the case
of C-peptide. In Fig. 21b, 18 conformations (all conformations except for Nos. 2 and
12) have α-helix structures for the optimized OPLS-UA in the case of C-peptide.
From these results, we can see that the optimized OPLS-UA force field favor α-
helix structure more than the original OPLS-UA force field in the case of C-peptide.
In Fig. 22a, 11 conformations have α-helix structures for the original OPLS-UA in
the case of G-peptide. In Fig. 22b, seven conformations have α-helix structures, and
eight conformations have β-hairpin structures for the optimized OPLS-UA in the
case of G-peptide. In Fig. 22b, two conformations (Nos. 3 and 16) out of the eight
β-hairpin conformations have the right hydrogen bond formations that are inferred
by the experiments. Namely, conformation No. 3 has three native-like hydrogen
Optimizations of Protein Force Fields 247
bonds between residue pairs 3–14, 4–13, and 5–12, and conformation No. 16 has
two native-like hydrogen bonds between residue pairs 3–14 and 4–13. These results
for G-peptide show that the optimized OPLS-UA force field does not favor α-helix
structure and clearly favors β-hairpin structure more than the original OPLS-UA
force field.
These secondary-structure-forming tendencies of the optimized OPLS-UA force
field for two peptides agree with experimental implications in comparison with those
of the original OPLS-UA force field. Therefore, our optimization methods succeeded
in enhancing the accuracy of the OPLS-UA force field.
We now present the results of the applications of our optimization method of force-
field parameters in Sect. 2.3.3.
At first, we chose 100 PDB files from PDB-REPRDB [56]. We selected the number
of each fold (all α, all β, α/β, and α + β) in 100 proteins based on the number of
folds given by SCOP (version 1.73 at November 2007) [65]. Namely, we used 29 all
α, 18 all β, 16 α/β, and 37 (α + β) proteins (the list is slightly different from that
in Table 5).
The force field that we optimized is the AMBER parm96 version [8]. The
backbone-torsion-energy term E torsion (Φ, Ψ ) for this force field is given by
φ φ ψ
V1 V V
E torsion (Φ, Ψ ) = [1 + cos φ] + 2 [1 − cos 2φ] + 1 [1 + cos ψ]
2 2 2 (42)
ψ
V2
+ [1 − cos 2ψ],
2
φ φ ψ ψ
where we have V1 = 1.7, V2 = 0.6, V1 = 1.7, and V2 = 0.6. Here, we have opti-
ψ
mized only two parameters in the backbone-torsion-energy term, namely, V1 and
ψ
V2 for ψ angle. As described above, AMBER parm94 and AMBER parm96 have
quite different secondary-structure-forming-tendencies, although these force fields
differ only in the backbone torsion-energy terms for rotations of the φ and ψ angles.
ψ ψ
Moreover, we can easily imagine that force-field parameters V1 and V2 for ψ angle
are important for the secondary-structure-forming-tendencies, because the energy
surface in the Ramachandran space is quite sensitive to this energy term in the helix
and β-sheet regions. Namely, if the torsion-energy term for the ψ angle changes, the
stabilities of helix structure region and β-sheet region on the Ramachandran space
ψ ψ
change. Therefore, we considered some trial force-field parameters for V1 and V2 ,
which are given by the following equations:
Here, i is any real number. When i is 5, the force-field parameters V1trial and V2trial of
ψ angle are equal to those of the original AMBER parm96. From our experience, if
i has a small value (i < 5), the force field favors helix structure, and if i has a large
value (i > 5), the force field favors β-sheet structure (see also Figs. 23 and 24). We
calculated ΦRMSD2ndry values in Eq. (37) about some trial force-field parameters
obtained by changing i in Eqs. (43) and (44).
We performed the minimization, which was terminated when the root-mean-
square (RMS) potential energy gradients were less than 0.1 (kcal/mol/Å) by using
TINKER program package [54]. For solvent effects, we used GB/SA solvent model
in TINKER.
(a)
80
70
60
Helicity (%)
50
40
Optimized
30
Original
20 Para3
10 Para7
0
0 2 4 6 8 10 12 14
Residue number
(b)
80
Optimized
70
Original
60 Para3
Strandness (%)
50 Para7
40
30
20
10
0
0 2 4 6 8 10 12 14
Residue number
Fig. 23 Helicity (a) and strandness (b) of C-peptide as functions of the residue number. These values
are the averages of the 10 independent REMD [60] simulations at 300 K. Optimized, original, para3,
and para7 stand for the optimized AMBER parm96 (i = 4.7), original AMBER parm96 (i = 5.0),
trial force field para3 (i = 3.0), and trial force field para7 (i = 7.0), respectively
Optimizations of Protein Force Fields 249
(a)
80
Optimized
70
Original
60 Para3
50
Para7
Helicity (%)
40
30
20
10
0
0 2 4 6 8 10 12 14 16
Residue number
(b)
80
Optimized
70
Original
60 Para3
50 Para7
Strandness (%)
40
30
20
10
0
0 2 4 6 8 10 12 14 16
Residue number
Fig. 24 Helicity (a) and strandness (b) of G-peptide as functions of the residue number. These
values are the averages of the 10 REMD [60] simulations at 300 K. Optimized, original, para3, and
para7 stand for the optimized AMBER parm96 (i = 4.7), original AMBER parm96 (i = 5.0), trial
force field para3 (i = 3.0), and trial force field para7 (i = 7.0), respectively
The results of ΦRMSDhelix and ΦRMSDβ are shown in Fig. 25a, b, recpectively.
In these calculations, if the differences of the backbone-dihedral angles between
Φinative and Φimin in Eq. (36) are more than 30◦ , they were ignored, assuming that
the uncertaintties in those angles are too large. We see that ΦRMSDhelix decreases
gradually with a decrease in i. If i decreases, the torsion energy of the helix structure
region in the Ramachandran space also decreases. On the other hand, ΦRMSDβ
decreases gradually with an increase in i. If i increases, the torsion energy of the β
structure region in the Ramachandran space decreases. Hence, this result is reason-
able. However, ΦRMSDβ reaches the global minimium, when i is 6.5. If i is larger
250 Y. Sakae and Y. Okamoto
(a) (b)
10.4 18
RMSDhelix
16
RMSD
10.3
14
10.2
12
10.1 10
-20 -10 0 10 20 -20 -10 0 10 20
i i
Equation (5)
(c)
88
86
RMSD2ndly
84
82
80
-20 -10 0 10 20
i
Fig. 25 Distributions of ΦRMSDhelix (a), ΦRMSDβ (b), and ΦRMSD2ndry (c) obtained from the
minimization of 100 proteins using the trial force-field parameters V1trial and V2trial as functions of
the number i
than 6.5, ΦRMSDβ increases gradually. This result implies that the ΦRMSDβ does
not correspond to the parameters V1trial and V2trial completely.
For ΦRMSDhelix and ΦRMSDβ in Fig. 25a, b, we can see the difference clearly.
The noteworthy point obtaind from these results is that ΦRMSD can distinguish
between helix structure and β structure.
We combined ΦRMSDhelix and ΦRMSDβ by Eq. (37). Here, in order to have
roughly equal contributions from both terms, we can set the value of the scaling
factor λ to be, for example, the coefficients of variations:
σβ
μβ
λ= σ . (45)
helix
μhelix
Optimizations of Protein Force Fields 251
Here, μhelix and μβ are the averages and σhelix and σβ are the corresponding standard
deviations for ΦRMSDhelix and ΦRMSDβ . For the calculations, we have chosen a
small number of i values in a range i min ≤ i ≤ i max . For i min = 0 and i max = 10, we
obtained λ = 6.857, and this fixied value was used for all the calculations in the
present work.
In Fig. 25c, the combined result is shown. The smallest ΦRMSD2ndry is obtained
value i = 4.7, namely, the obtained force-field parameters are V1trial = 1.598 and
V2trial = 0.564. These values are slightly smaller than those of the original AMBER
parm96, which corresponds to i = 5. We can easily expect the new obtained force-
field parameters slightly favor helix structure more and β-sheet structure less than
the original AMBER parm96.
In order to test the validity of the force-field parameters obtained by our opti-
mization method, we performed the folding simulations using two peptides, namely,
C-peptide and G-peptide.
For the folding simulations, we used REMD [60]. We used the TINKER program
package [54] modified by us for the folding simulations. The unit time step was set to
1.0 fs. Each simulation was carried out for 2 ns (hence, it consisted of 2,000,000 MD
steps) with 16 replicas and repeated 10 times. The temperature during MD simula-
tions was controlled by Berendsen’s method [53]. The temperature was distributed
exponentially: 700, 662, 625, 591, 558, 528, 499, 471, 446, 421, 398, 376, 355, 336,
317, and 300 K. As for solvent effects, we used the GB/SA model [42, 43] included
in the TINKER program package [54]. These folding simulations were performed
with different sets of randomly generated initial velocities.
In Fig. 23, the helicity and strandness of C-peptide which were obtained with the
original AMBER parm96 and its optimized force field are shown. These values are
the averages of the 10 REMD simulations at 300 K. In comparison with the original
AMBER parm96, the helicity of the optimized force field is similar. However, the
helicity of Thr3, Ala4, and Ala5 of the optimized force field slightly increases. In
comparison with the original AMBER parm96, the strandness of the optimized force
field decreases except for that at Ala6, Lys7, and Phe8.
In Fig. 24, the helicity and strandness of G-peptide at the original AMBER parm96
and its optimized force field are shown. In comparison with the original AMBER
parm96, the helicity of the optimized force field slightly increases and the strandness
of the optimized force field slightly decreases. For trial force fields of para3 and para7,
the scondary-structure-forming-tendencies are simillar to the case of C-peptide.
These results clearly show that the optimized force field favors helix structures
and does not favor β structures in comparison with the original AMBER parm96.
We can see that these secondary-structure-forming-tendencies of the optimized force
field are better than those of the original AMBER parm96, becasue it is known that
the AMBER parm96 slightly favors the β structure too much [23–27].
We also performed the folding simulations with two extreme cases of the trial
force fields, namely, para3 (i = 3.0) and para7 (i = 7.0) (see Figs. 23 and 24) for
comparisons. The trial force field para3 favors helix structure strongly and does
not favors β structure clearly. On the other hand, the trial force field para7 has the
tendency that is quite reverse to para3. According to the results of ΦRMSDhelix and
252 Y. Sakae and Y. Okamoto
We present the results of the applications of our optimization method in Sect. 2.3.4 to
the AMBER ff99SB force field. At first, we chose 31 PDB files (M = 31) with reso-
lution 2.0 Å or better, with sequence similarity of amino acid 30.0 % or lower and with
from 40 to 111 residues (the average number of residues is 86.7) from PDB-REPRDB
[56]. Namely, the PDB IDs of these 31 proteins are 1LDD, 1HBK, 1Y02, 1I2T, 1U84,
2ERL, 1TQG, 1O82, 1V54, 1XAK, 1GMU, 1O5U, 1NLQ, 1WHO, 1CQY, 1H75,
1GMX, 1IIB, 1VC1, 1AY7, 1KAF, 1KPF, 1BM8, 1MK0, 1EW4, 1OSD, 1VCC,
1OPD, 1CYO, 1CTF, and 1N9L. We added hydrogen atoms to the PDB coordi-
nates by using the AMBER11 program package. After adding the hydrogen atoms,
we performed the short potential energy minimizations while restraining the heavy
atoms. We used the obtained conformations as the initial structures (experimental
structures). We performed MD simulations for these proteins. Each simulation was
carried out for 40.0 ps (hence, it consisted of 20,000 MD steps, and the unit time
step was set to 2.0 fs and the bonds involving hydrogen were constrained by SHAKE
algorithm [61]) by using Langevin dynamics at 300 K. The nonbonded cutoff of 20
Å were used. As for solvent effects, we used the GB/SA model [58] included in
the AMBER program package (igb = 5). These simulations were performed with
different sets of the same generated initial velocities of atoms in 31 proteins. For
all the process, we used the AMBER11 program package [57]. As trial force-field
parameters, we used the parameters V1 of ψ (N–Cα –C–N) and ψ (Cβ –Cα –C–N)
angles for torsion-energy term in Eq. (4). We performed the simulations by using 14
and 15 values of the V1 parameters of ψ and ψ , respectively, and these simulations
with each set of parameter values were performed five times by changing the initial
velocities of atoms in the 31 proteins. Namely, we calculated n iS→U and n iU→S in
Eq. (38) as the average numbers of n iS→U and n iU→S of 10 trajectories from 20.0 to
40.0 ps of the five simulations. These results are shown in Fig. 26. We determined the
optimized force-field parameters in order of ψ and ψ, by searching the minimum
value of S in Fig. 26. V1 parameter for ψ changed from 0.45 to 0.31, and V1 parameter
for ψ changed from 0.20 to −1.60.
In order to test the validity of the force-field parameters obtained by our opti-
mization method, we performed the folding simulations using two peptides, namely,
C-peptide and G-peptide.
For test simulations, we used REMD [60]. We used the AMBER11 program
package [57]. The unit time step was set to 2.0 fs, and the bonds involving hydrogen
were constrained by SHAKE algorithm [61]. Each simulation was carried out for 30.0
ns (hence, it consisted of 15,000,000 MD steps) with 32 replicas by using Langevin
dynamics. The replica exchange was tried every 3,000 steps. The temperature was
Optimizations of Protein Force Fields 253
(a) (b)
Fig. 26 S values [defined in Eq. (38)] obtained from MD simulations of 31 proteins with the force
fields which have different V1 parameter values for ψ (Cβ –Cα –C–N) (a) and ψ (N–Cα –C–N) (b)
angles
distributed exponentially: 600, 585, 571, 557, 544, 530, 517, 505, 492, 480, 469, 457,
446, 435, 425, 414, 404, 394, 385, 375, 366, 357, 348, 340, 332, 324, 316, 308, 300,
293, 286, and 279 K. As for solvent effects, we used the GB/SA model [58] included
in the AMBER program package (igb = 5). These simulations were performed with
different sets of randomly generated initial velocities.
In Fig. 27, α helicity and strandness of two peptides obtained from the test sim-
ulations are shown. For the original AMBER ff99SB force field, the α helicity is
clearly larger than the strandness in not only C-peptide but also G-peptide. Namely,
the original AMBER ff99SB force field clearly favors α-helix structure and does not
favor β structure. On the other hand, for the optimized force field, in the case of
C-peptide, the α helicity is larger than the strandness, and in the case of G-peptide,
the strandness is larger than the α helicity. We can see that these results obtained
from the optimized force field are in better agreement with the experimental results
than the original force field.
4 Conclusions
In this chapter we reviewed our works on force fields for molecular simulations
of protein systems. We first discussed the functional forms of the force fields and
present some extensions of the conventional ones. Because the main-chain torsion-
energy terms are the most problematic among the force-field terms in the existing
force fields, we mainly considered the main-chain torsion-energy terms. We have
generalized them into the double Fourier series in φ and ψ. We have also introduced
the amino-acid dependence on these terms.
Given the functional forms, we then presented four methods for force-field
parameter optimizations. Our methods use the coordinates from PDB, which were
254 Y. Sakae and Y. Okamoto
(a-1) (a-2)
(b-1) (b-2)
Fig. 27 α helicity (a-1) and strandness (a-2) of C-peptide and α helicity (b-1) and strandness
(b-2) of G-peptide as functions of the residue number. These values are obtained from REMD [60]
simulations at 300 K. Normal and dotted lines stand for the optimized and original AMBER ff99SB
force field, respectively
Our optimization methods for the force-field parameters are quite general and
they can be readily applied to any new energy terms whenever they are introduced
in the future.
Acknowledgements The computations were performed on the computers at the Research Cen-
ter for Computational Science, Institute for Molecular Science, Information Technology Center,
Nagoya University, and Center for Computational Sciences, University of Tsukuba. This work was
supported, in part, by the Grants-in-Aid for the Academic Frontier Project, “Intelligent Information
Science”, for Scientific Research on Innovative Areas (“Fluctuations and Biological Functions”
), and for the Next Generation Super Computing Project, Nanoscience Program and Computa-
tional Materials Science Initiative from the Ministry of Education, Culture, Sports, Science and
Technology (MEXT), Japan.
References
1. Liwo, A., Czaplewski, C., Stanislaw, O., Scheraga, H.A.: Curr. Opin. Struct. Biol. 18, 134
(2008)
2. Scheraga, H.A.: Ann. Rev. Biophys. 40, 1 (2011)
3. Hansmann, U.H.E., Okamoto, Y.: Curr. Opin. Struct. Biol. 9, 177 (1999)
4. Mitsutake, A., Sugita, Y., Okamoto, Y.: Biopolymers 60, 96 (2001)
5. Okamoto, Y.: J. Mol. Graphics Model. 22, 425 (2004)
6. Mitsutake, A., Mori, Y., Okamoto, Y.: Biomolecular Simulations: Methods and Protocols. In:
Monticelli, L., Salonen, E. (eds.), pp. 153–195. Humana Press, New York (2012)
7. Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Kenneth, J., Merz, M., Ferguson, D.M.,
Spellmeyer, D.C., Fox, T., Caldwell, J.W., Kollman, P.A.: J. Am. Chem. Soc. 117, 5179 (1995)
8. Kollman, P.A., Dixon, R., Cornell, W., Fox, T., Chipot, C., Pohorille, A.: Computer Simulations
of Biological Systems In: van Gunsteren, W.F., Weiner, P.K., Wilkinson, A.J., vol. 3, pp. 83–96,
Kluwer/ESCOM, Dordrecht (1997)
9. Wang, J., Cieplak, P., Kollman, P.A.: J. Comput. Chem. 21, 1049 (2000)
10. Hornak, V., Abel, A., Okur, R., Strockbine, B., Roitberg, A., Simmerling, C.: Proteins 65, 712
(2006)
11. Duan, Y., Wu, C., Chowdhury, S., Lee, M.C., Xiong, G., Zhang, W., Yang, R., Cieplak, P., Luo,
R., Lee, T.: J. Comput. Chem. 24, 1999 (2003)
12. MacKerell, Jr., A.D., Bashford, D., Bellott, M., Dunbrack Jr., R.L., Evanseck, J.D., Field, M.J.,
Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau,
F.T.K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D.T., Prodhom, B., Reiher III., W.E., Roux,
B., Schlenkrich, M., Smith, J.C., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera, J.,
Yin, D., Karplus, M.: J. Phys. Chem. B 102, 3586 (1998)
13. MacKerell Jr., A., Feig, M., Brooks III, C.: J. Comput. Chem. 25, 1400 (2004)
14. MacKerell Jr., A., Feig, M., Brooks III, C.: J. Am. Chem. Soc. 126, 698 (2004)
15. Jorgensen, W.L., Maxwell, D.S., Tirado-Rives, J.: J. Am. Chem. Soc. 118, 11225 (1996)
16. Kaminski, G.A., Friesner, R.A., Tirado-Rives, J., Jorgensen, W.L.: J. Phys. Chem. B 105, 6474
(2001)
17. Gunsteren, W.F., Billeter, S.R., Eising, A.A., Hünenberger, P.H., Krüger, P., Mark, A.E., Scott,
W.R.P., Tironi, I.G.: Vdf Hochschulverlag AG an der ETH Zürich, Zürich, (1996)
18. Oostenbrink, C., Villa, A., Mark, A.E., van Gunsteren, W.F.: J. Comput. Chem. 25, 1656 (2004)
19. Berendsen, H.J.C., van der Spoel, D., van Drunen, R.: Comput. Phys. Commun. 91, 43 (1995)
20. Lindahl, E., Hess, B., van der Spoel, D.: J. Mol. Model. 7, 306 (2001)
21. Némethy, G., Gibson, K.D., Palmer, K.A., Yoon, C.N., Paterlini, G., Zagari, A., Rumsey, S.,
Scheraga, H.A.: J. Phys. Chem. 96, 6472 (1992)
256 Y. Sakae and Y. Okamoto
22. Arnautova, Y.A., Jagielska, A., Scheraga, H.A.: J. Phys. Chem. B 110, 5025 (2006)
23. Yoda, T., Sugita, Y., Okamoto, Y.: Chem. Phys. Lett. 386, 460 (2004)
24. Yoda, T., Sugita, Y., Okamoto, Y.: Chem. Phys. 307, 269 (2004)
25. Sakae, Y., Okamoto, Y.: Chem. Phys. Lett. 382, 626 (2003)
26. Sakae, Y., Okamoto, Y.: J. Theor. Comput. Chem. 3, 339 (2004)
27. Sakae, Y., Okamoto, Y.: J. Theor. Comput. Chem. 3, 359 (2004)
28. Simmerling, C., Strockbine, B., Roitberg, A.E.: J. Am. Chem. Soc. 124, 11258 (2002)
29. Duan, Y., Wu, C., Chowdhury, S., Lee, M.C., Xiong, G., Zhang, W., Yang, R., Cieplak, P., Luo,
R., Lee, T., Caldwell, J., Wang, J., Kollman, P.: J. Comput. Chem. 24, 1999 (2003)
30. Iwaoka, M., Tomoda, S.: J. Comput. Chem. 24, 1192 (2003)
31. Kamiya, N., Watanabe, Y., Ono, S., Higo, J.: Chem. Phys. Lett. 401, 312 (2005)
32. Best, R.B., Hummer, G.: J. Phys. Chem. B 113, 9004 (2009)
33. Mittal, J., Best, R.B.: Biophys. J. 99, L26 (2010)
34. Sakae, Y., Okamoto, Y.: J. Phys. Soc. Jpn. 75, 054802 (9 pages) (2006)
35. Sakae, Y., Okamoto, Y.: Mol. Sim. 36, 138 (2010)
36. Ramachandran, G.N., Sasisekharan, V.: Adv. Protein Chem. 23, 283 (1968)
37. Tanaka, S., Scheraga, H.A.: Macromolecules 9, 945 (1976)
38. Sakae, Y., Okamoto, Y.: Mol. Sim. 36, 159 (2010)
39. Sakae, Y., Okamoto, Y.: Mol. Sim. 36, 1148 (2010)
40. Sakae, Y., Okamoto, Y.: e-print: arXiv:1206.3909 [cond-mat.stat-mech]; submitted for publi-
cation
41. Sakae, Y., Okamoto, Y.: Mol. Sim. (In press)
42. Still, W.C., Tempczyk, A., Hawley, R.C., Hendrickson, T.: J. Am. Chem. Soc. 112, 6127 (1990)
43. Qiu, D., Shenkin, P.S., Hollinger, F.P., Still, W.C.: J. Phys. Chem. A 101, 3005 (1990)
44. Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Science 220, 671 (1983)
45. Kabsch, W., Sander, C.: Biopolymers 22, 2577 (1983)
46. Sakae, Y., Okamoto, Y. (In preparation)
47. Honda, S., Kobayashi, N., Munekata, E.: J. Mol. Biol. 295, 269 (2000)
48. Shoemaker, K.R., Kim, P.S., Brems, D.N., Marqusee, S., York, E.J., Chaiken, I.M., Stewart,
J.M., Baldwin, R.L.: Proc. Natl. Acad. Sci. U.S.A. 82, 2349 (1985)
49. Osterhout Jr., J.J., Baldwin, R.L., York, E.J., Stewart, J.M., Dyson, H.J., Wright, P.E.: Bio-
chemistry 28, 7059 (1989)
50. Blanco, F.J., Rivas, G., Serrano, L.: Nature Struct. Biol. 1, 584 (1994)
51. Kobayashi, N., Honda, S., Yoshii, H., Uedaira, H., Munekata, E.: FEBS Lett. 366, 99 (1995)
52. Accelrys discovery studio visualizer. Software available at http://www.accelrys.com/
53. Berendsen, H.J.C., Postma, J.P.M., van Gunsteren, W.F., DiNola, A., Haak, J.R.: J. Chem. Phys.
81, 3684 (1984)
54. Tinker program package. Software available at http://dasher.wustl.edu/tinker/
55. URL http://www.accelrys.com/
56. Noguchi, T., Onizuka, K., Akiyama, Y., Saito, M.: In: Proceeding of the Fifth International
Conference on Intelligent Systems for Molecular Biology, AAAI press, Menlo Park, CA (1997)
57. Case, D.A., Cheatham, T., Darden, T., Gohlke, H., Luo, R., Merz Jr., K.M., Onufriev, A.,
Simmerling, C., Wang, B., Woods, R.: J. Comput. Chem. 26, 1668 (2005)
58. Onufriev, A., Bashford, D., Case, D.A.: Proteins 55, 383 (2004)
59. Weiser, J., Shenkin, P.S., Still, W.C.: J. Comput. Chem. 20, 217 (1999)
60. Sugita, Y., Okamoto, Y.: Chem. Phys. Lett. 314, 141 (1999)
61. Ryckaert, J.P., Ciccotti, G., Berendsen, H.J.C.: J. Comput. Phys. 23, 327 (1977)
62. Wang, G., Jr, R.L.D.: Bioinformatics 19, 1589 (2003)
63. Hoover, W.G.: Phys. Rev. A 31, 1695 (1985)
64. Jorgensen, W.L., Tirado-Rives, J.: J. Am. Chem. Soc. 110, 1657 (1988)
65. Levitt, M., Chothia, C.: Nature 261, 552 (1976)
Enhanced Sampling for Biomolecular
Simulations
1 Introduction
W. Berhanu · U. H. E. Hansmann
Department of Chemistry and Biochemistry, University of Oklahoma,
Norman 73019-5251, USA
e-mail: [email protected]
U. H. E. Hansmann
e-mail: [email protected]
P. Jiang (B)
Tiandao, Education, Shanghai, People’s Republic of China
e-mail: [email protected]
studied with such brute-force approach is limited. This is because that the complex
form of the forces leads to a rough energy landscape with a vast number of local
minima acting as traps, and as a result the computational requirements for sampling
the energy landscape increase exponentially with size of the system [2].
In principle one can think of two approaches to overcome these numerical dif-
ficulties. One is to utilize simplified or coarse-grained models since they lead by
design to an energy landscape with reduced number of valleys. However, while such
models allow a much faster evaluation of energy, the problem of poor sampling and
slow convergence will likely reappear for sufficiently large proteins as roughness
is an intrinsic characteristics of protein energy landscapes. The other approach to
obtain sufficient sampling of the conformational space is the use of enhanced sam-
pling techniques that can quickly find local minima but avoid trapping. Such methods
will “flatten” the energy landscape by reducing barriers. While they will change the
dynamics and therefore often do not allow to study directly the kinetics of protein
folding, association, or aggregation, this is a small price to pay for faster and more
accurate calculation of thermal averages and free energy landscapes.
This chapter is organized as follows: we start with a short review of a number of
advanced simulation techniques before discussing shortcomings and open problems.
Recent applications demonstrate what can be done when using these approaches on
high-performance computing systems. We finish this short review with a summary
and outlook.
Results from the simulation at the highest temperature, T1 , are used to construct an
estimator of the probability density function
ρ(x1 , . . . , xn ; T1 )
that biases the simulation at T2 . In turn, this simulation provides a bias for the one at
T3 , and iteratively continued down to T f . Here, one uses the approximation
n
ρ(x1 , . . . , xn ; Tr ) = ρ i1 (xi ; Tr ), (2)
i=1
ρ(x1 , . . . , xn ; Tr −1 )
The idea behind all generalized-ensemble techniques can be seen most easily for
the global optimization method energy landscape paving (ELP) [8] which relies on
low-temperature Monte Carlo simulations with an effective energy:
Here, T is a (low) temperature and f (H (q, t)) is a function of the histogram H (q, t)
in a pre-chosen “order parameter” or “reaction coordinate” q. The weight of a local
minimum state decreases the more the longer the system stays in that state until the
local minimum is no longer favored, after which the system will again explore higher
energies. We have evaluated the efficiency of ELP in simulations of the 20-residue trp-
cage protein whose structure we could “predict” within a root-mean-square deviation
(rmsd) of 1 Å [9]. Energy landscape paving allows also the possibility of zero-
temperature simulations [9]. For T → 0 only moves with Δ Ẽ ≤ 0 will be accepted.
If one chooses: Ẽ = E + cH (E, t), the acceptance criterion is given by:
where E is the “physical” energy. Hence, energy landscape paving can overcome
even at T = 0 any energy barrier. The waiting time for such a move is proportional
to the height of the barrier that needs to be crossed. The factor c sets the time scale,
and in this sense the T = 0 form of ELP is parameter-free.
However, the weight factor is time dependent, and therefore ELP violates detailed
balance. Hence, the method can not be used to calculate thermodynamic averages.
Detailed balance is fulfilled only for f (H (q, t)) = f (H (q)) in which case ELP
reduces to one of the generalized-ensemble methods [10] generating a random walk
through order parameter space (energy, for instance), control parameter space (tem-
perature), or model space (i.e. different energy functions).
these techniques to protein simulations can be found in Ref. [14] where a Monte
Carlo technique was used. Later, it was also adapted to molecular dynamics [15].
In multicanonical simulations configurations with energy E are assigned a weight
w(E) such that the distribution of energies
where n(E) is the spectral density. Since all energies appear with equal probability,
a free random walk in the energy space is enforced and the simulation can overcome
any entrapment in one of the many local minima. For a wide range of temperatures
it is now possible to obtain a canonical distribution by re-weighting techniques [16]:
−1
PB (T, E) ∝ Pmu (E) wmu (E) e−β E , (8)
since a large range of energies is sampled. This allows one to calculate the expectation
value of any physical quantity O at temperature T by
d E O(E)PB (T, E)
O T = . (9)
d E PB (T, E)
The drawback of multicanonical sampling is that the weights wmu (E) ∝ n −1 (E)
are not a priori known and one needs their estimates for a numerical simulation.
Calculation of the weights is usually done by an iterative procedure [14, 17, 18]. For
instance, the so-called Wang-Landau sampling [19] where the transition probability
between two conformations with energy E 1 and E 2 is given by the ratio of the
(time-dependent) estimators n(E) of the density of states
n(E 1 )
p(E 1 → E 2 ) = min ,1 . (10)
n(E 2 )
where, initially, n(E) = 1 and f = f 0 = e1 . Once the desired energy range is cov-
ered, the factor f is refined,
f1 = f , f n+1 = fn , (12)
Here E 0 is an estimator for the ground-state energy and n F is the number of degrees of
freedom of the system. The weight reduces in the low-energy region to the canonical
Boltzmann weight exp(−β E). This is because E − E 0 → 0 for T → 0(β → ∞)
leading to β(E − E 0 )/n F 1. On the other hand, high-energy regions are no longer
exponentially suppressed but only according to a power law, which enhances excur-
sions to high-energy regions.
In stochastic tunneling [23], conformations are weighted by w(E) = exp( f (E)/
k B T ). Here, f (E) is a non-linear transformation of the potential energy onto the
interval [0, 1] and T is a low temperature. The energy in the stochastic tunnel-
ing technique is transformed dynamically dependent on the simulation history. The
transformation is designed so that the system is automatically cooled down near the
local minima, and heated up at the high energy region allowing efficient tunneling
through the barriers [23]. Such a transformation can be realized by
where E 0 is again an estimate of the ground state and n F is the number of degrees
of freedom of the system. Note that the location of all minima is preserved. The
efficiency of this algorithm for protein-folding simulations was demonstrated in
Ref. [24]. As a broad range of energies is sampled, one can use again re-weighting
techniques [16] to calculate thermodynamic quantities over a large range of tempera-
tures. In contrast to other generalized-ensemble techniques, the weights are explicitly
given. One needs only to find an estimator for the ground-state energy E 0 which is
easier than the determination of weights for other generalized ensembles.
Enhanced Sampling for Biomolecular Simulations 263
Here, the function g(T ) is chosen so that the probability distribution of temperature
is given by
PST (T ) = d E n(E) e−E/k B T −g(T ) = const. (16)
Physical quantities have to be sampled for each temperature point separately and
expectation values at intermediate temperatures are calculated by re-weighting tech-
niques [16].
As with the previously discussed generalized-ensemble methods, the weight
w ST (T, E) is not a priori known, since it requires knowledge of the parameters
g(T ) and their estimator has to be calculated. It can be again obtained by an iterative
procedure. In the simplest version the improved estimator for g (i) (T ) for the i-th
(i−1)
iteration is calculated from the histogram of temperature distribution HST (T ) of
the preceding simulation as follows:
(i−1)
g (i) (T ) = g (i−1) (T ) + log HST (T ). (17)
In this procedure one uses that the histogram of the i-th iteration is given by
It is easy to see that the factor g(T ) drops out once one considers more than one
copy of the system. This is the idea behind replica exchange method (or parallel
tempering) [27], which was first applied to protein science in Ref. [28]. Assuming
we have N non–interacting replicas of the molecule, each at a different temperature
Ti , standard MC or MD moves are performed in parallel and independently at these
N temperatures. At certain time points, conformational exchanges occur between
neighboring temperatures Ti and Ti+1 , and the exchange moves are accepted or
rejected with probability
The result of the exchange of conformations is the faster convergence of the Markov
chain than in regular canonical simulations since the resulting random walk in tem-
peratures allows the configurations to move out of local minima and to cross energy
barriers. Hence, the temperature distribution should be chosen such that any relevant
energy barrier can be crossed at the highest temperature.
There is no clear consensus on the optimal frequency of exchange attempts. One
opinion is that exchanges should be performed often, but no more often than the
potential energy autocorrelation time [29, 30]. The other argument is that exchange
moves should be attempted every few steps [31, 32]. It has been also suggested to
use multiplexed layers of replicas (n layers, each with M temperatures). In this mul-
tiplexed replica exchange method, replicas are exchanged both within and between
layers [33]. This offers a way of using more computing units on massively parallel
computers without the need of adding more temperatures.
Expectation values of a physical quantity A are calculated as usual according to:
1
MES
A Ti = A(Ci (k)) , (21)
MES k
where MES is the number of measurements taken for the i-th temperature. Values
for intermediate temperatures are calculated using reweighting techniques [16]. Note
that parallel tempering does not require Boltzmann weights. The method can be
combined easily with generalized-ensemble techniques [28]. Obviously, the method
is also not restricted to temperature but can be used with any control parameter, for
instance, pH [34] or pressure.
Finally, one can enhance sampling of low energy configurations also by performing
a random walk through an ensemble of systems with altered energy functions. In
that way, information is exchanged between varying stages of coarse graining or
Enhanced Sampling for Biomolecular Simulations 265
different local environments. This is the idea behind “model hopping” [35], “hamilton
exchange method” [36] and related approaches [37]. Consider, for instance, that the
energy function can be separated into two terms: E = E A + a E B . As in parallel
tempering, “model hopping” considers N non-interacting copies of the molecule,
but adjacent copies are now exchanged with probability
w(Cold → Cnew ) = min(1, exp{−β E A (C j ) + ai E B (C j ) + E A (Ci ) + a j E B (Ci )
(22)
−E A (Ci ) − ai E B (Ci ) − E A (C j ) − a j E B (C j ) . (23)
While there has been much progress in advancing the generalized-ensemble approach,
folding simulations are still limited in their scope. Aggregation, oligomer assembly
and intra-oligomer conformational rearrangements are examples of systems with a
need for faster algorithms: the sampling process poses even for relatively simple sys-
tems such as polyglutamine repeats a formidable challenge [39, 40]. The importance
and severity of the problem motivates our search for further methodological advances.
266 W. Berhanu et al.
n up (i)
f up (i) = (24)
n up (i) + n dn (i)
where Tmax is the highest temperature, Tmin is the lowest temperature. Both quantities
have to be chosen in advance [42].
If the relaxation at a particular temperature is slower than hopping in tempera-
ture, the state space partitions into disjoint free energy basins forming a tree-like
hierarchical network. Because of this broken ergodicity an optimized temperature
distribution needs to be found iteratively [43],
T jk
η(opt) (T )dT = j/N , (28)
T1
where 1 < j < N and k marks the iteration. The two terminal temperatures T1 and
TN are kept fixed, and
Enhanced Sampling for Biomolecular Simulations 267
(opt) 1 df
η (T ) = C , (29)
ΔT dT
This will again lead to a linear flow distribution, but the acceptance probabilities are
not any longer constant. One can also show that in the case of broken ergodicity
weight optimization of flow through order parameter space (for instance, energy)
leads to a distribution that is no longer flat [41, 43].
A direct measurement of the flow distribution is computationally costly as indi-
vidual replicas have to cross the full ladder of nodes many times. Such “tunneling”
events are especially rare in early stages of the control parameter optimization when
round trip times are largest. For this reason, we have proposed to estimate the flow
distribution from measurements of mean first passage times of replicas crossing only
part of the ladder. In our simulations, this procedure led to temperature sets that are
more stable upon iteration than those from flows measured directly [44].
Traditionally temperature replica exchange method is implemented such that the
exchanges have been synchronous and this has been a major limiting factor mak-
ing it highly inefficient. This replica exchange synchronization of attempted moves
strategy which results in wasted computation time as the periodic synchronization
causes the overall simulation to run at the speed of the slowest processor and the
centralized coordination step is not scalable to many processors. In asynchronous
replica exchange, one attempts to escape this problem through performing replica
exchange moves for pairs of replicas independently from the other replicas, thereby
removing the need for processor synchronization found in conventional synchronous
implementations [45]. Because it does not involve a centralized synchronization step,
the algorithm is scalable to an arbitrary number of processors and it is not limited
by the slowest processor. The method is suitable for integration in dynamical simu-
lation environments, such as computational grids, in which processors dynamically
join and leave the calculation [45].
1
E(x, v) = E pot (x) + E kin (v) with E kin (v) = m i vi2 (31)
2 i
268 W. Berhanu et al.
is the sum of the potential energy E pot , which depends only on the coordinates x,
and the kinetic energy E kin that is solely a function of the velocities v. Scaling all
velocities by a factor r changes the kinetic energy by:
In standard replica exchange molecular dynamics this relation is used by scaling the
velocities after a successful exchange with a factor [46]
that depends on the temperatures T1 and T2 of the two replicas that are exchanged. The
rescaling of the velocities leads to v(1,2)
new
= v(2,1)
old
, and therefore ΔE kin = 0. Hence,
the probability for an exchange is given only by the difference of potential energies
of the two replicas
w(1 ↔ 2) = exp(ΔβΔE pot ). (34)
Microcanonical replica exchange simulations call for a different scaling [47, 48].
By definition of the ensemble, one has to assure that ΔE = 0. Assuming E 1 < E 2 ,
and scaling parameters r1 and r2 given by
E (2,1) − E pot (x1,2 )
r(1,2) =
E (1,2) − E pot (x1,2 )
E kin (v(2,1) ) ± ΔE pot
= , (35)
E kin (v1,2 )
and
Such rejection-free moves are possible for E pot (x2 ) < E 1 , a restriction that does
not violate detailed balance. Molecular dynamics time evolution between exchange
moves ensures ergodicity. Hence, the sampling will lead for sufficiently long simu-
lation times to the correct distribution:
n /2
P(E pot ; E) ∝ Ω pot (E pot )E kinf , (38)
Enhanced Sampling for Biomolecular Simulations 269
The “true” potential energy E pot can be approximated by a quantity Q = Ppp + Pis ,
leading to:
E pot = Q + H. (41)
(1) (1)
where E kin and Ê kin are the kinetic energies at temperature T1 before and after an
exchange move, respectively. Rescaling the velocities according to
(1) (2)
(2) E kin − ΔH (1) E kin + ΔH
v (2) ↔ v̂ (1) =v (2)
and v (1) ↔ v̂ (2) =v (1)
(43)
E kin E kin
leads to
(1) (1) (2) (2)
Ê kin = E kin − ΔH and Ê kin = E kin + ΔH . (44)
Exchange moves are now accepted with a probability of the same form as in Okur
et al. [51]:
270 W. Berhanu et al.
However, the velocity rescaling improves on that method by relating the solvation
energies as measured with the explicit solvent and the one calculated with the implicit
solvent. We have shown for the 20-residue Trp-cage protein that the number of repli-
cas in explicit solvent replica exchange molecular dynamics can be reduced from
40 to 10 replicas [53]. As the contribution of solvent-solvent interaction increases
faster than protein-protein and protein-solvent terms one can expect a more dra-
matic improvement for the larger proteins, allowing to evaluate and improve velocity
rescaling as a way to advance on explicit solvent simulations and other applications
of replica exchange.
2.2.3 Replica-Exchange-with-Tunnling
3. After the exchange, the two replica evolve again by microcanonical molecular
dynamics. While the total energies E 1 and E 2 on the two replica do not change,
the exchange between potential and kinetic energy will lead to final states B̂ on
replica 1 and  on replica 2 that have potential energies comparable to the cor-
responding configurations before the exchange move, and velocity distributions
as one would expect for the given temperatures at each replica.
4. The final configurations on each replica are now either accepted or rejected
according to the following Metropolis criterium
exp −β1 (E pot (q̂ B ) − E pot (q A )) − β2 (E pot (q̂ A ) − E pot (q B )) with β = 1/k B T.
(47)
Enhanced Sampling for Biomolecular Simulations 271
Hence, the Metropolis-Hastings criterium for accepting the RET move is in general
given by:
3N /2 3N /2
E 1 − E pot (q̂ B ) E 2 − E pot (q̂ A )
w( C old
→ C new
) = min 1, ×
E 1 − E pot (q A ) E 2 − E pot (q B )
(50)
This equation is cumbersome to evaluate. However, as both functions on the right side
of Eq. 48 grow strongly with their arguments, the distribution of potential energies
P(E Pot , E) is for large N a sharply peaked function, and a saddle-point expansion
will lead to
⎧ 2 ⎡ 3 ⎤⎫
⎨ 3N E pot − Ê pot E pot − Ê pot ⎬
P(E pot , E) ∝ Ω(E pot ) exp −β E E pot − +O⎣ ⎦ ,
⎩ 2 E − Ê pot E − Ê pot ⎭
(51)
with the inverse microcanonical temperature β E = 1/k B TE = d ln Ω(E)/d E and
Ê pot the most probable potential energy. Hence, for sufficiently large N and long
enough trajectories, the RET acceptance criterion of Eq. 50 reduces to Eq. 47 which
can be evaluated more easily [54].
We have shown in Ref. [54] through simulations of the trp-cage protein, an often
used toy-model for evaluating new sampling techniques, that the RET move increases
indeed the flow of replicas through temperature by allowing the system to “tunnel”
through unfavorable “transition states” generated by the exchange move. Both regu-
lar replica exchange molecular dynamics (REMD) and RET lead to the same thermo-
dynamic averages; but depending on number of replicas we could achieve a twelve
272 W. Berhanu et al.
times larger sampling efficiency for RET than seen in regular REMD. Thermaliza-
tion is especially faster for RET when a too large spacing in temperature leads for
regular REMD to very low acceptance rates. As described above, this is a persistent
problem in replica-exchange molecular dynamics of proteins in an explicit solvent
where the large number of water molecules leads to the need for very small spacing
in temperature (and therefore a large number of replicas).
configurations before exchange is φa and {φb , xb }, the trial configurations are simply
φb and {φa , xb }. Namely, only the coarse-grained part of potential energies are sub-
jected to exchange. Subscripts H and L denote high-resolution and low-resolution
respectively and the corresponding potential energy is defined as U H and U L . Then
the probability of the configurations a and b before exchange is the product of
probability of having configuration a, π L = ex p(−β L U L (φa ))/Z L and having b,
π H = ex p(−β H U H (φb , xb ))/Z H . Similarly, the probability after exchange is the
product of π L = ex p(−β L U L (φb ))/Z L and π H = ex p(−β H U H (φa , xb ))/Z H . Z H
and Z L are partition functions. In sum, the exchange criterion can be written as
Eq. 52. The criterion satisfies the detailed balance and therefore ensures the canoni-
cal distribution at any resolution.
A practical problem of the resolution exchange method is that when the system
studied i s of larger size than dipeptides, the trial exchanges are rejected easily.
Lyman et. al have found that the rejection rate depends on both number and type of
the degrees of freedom of coordinates x. They employed an incrementally coarse-
graining scheme to coarse grain one residue each time [56]. In-between the finest
and most coarse grained replica, hybrid models which are partially atomic and for
the rest united are used. Finally the acceptance rate of exchange becomes reasonably
high (from 0.09% to >2%). To tackle the same issue, Liu et. al used configurational-
bias Monte Carlo (CBMC) to reconstruct the nascent degrees of freedom [57]. The
position of the next interacting site is constructed using a look-ahead algorithm. A
set of trial positions are generated and each is assigned a weight wi = ex p(−βUi ).
The coordinates will be selected based on its Rosenbluth factor, wi / wi , and the
process iterated till the last site is generated.
We have proposed to overcome the problem of vanishing acceptance in resolu-
tion exchange simulations by utilizing our new “Replica-Exchange-with-Tunneling”
approach. For this purpose, we describe our system by a potential energy made of
three terms:
E pot = E F G + E C G + λE λ . (53)
The first term is the energy E F G of the system described by an all-atom model.
The second term E C G describes our system by suitable coarse-grained model. The
fine-grained and coarse grained models are coupled by a model specific penalty term
E λ , proposed in Ref. [58], that measures their similarity. The strength by that the two
models are coupled is set by a parameter λ that differs for each replica.
With the above set-up one can build now a ladder of replica, starting with one
where λ = 0 and fine-grained and coarse-grained model are independent, followed
by replica with increasing values of λ, i.e growing coupling between the two models.
While the energy of a replica is given by the joint expression of Eq. 53, replicas are
exchanged with a probability that depends only on the coupling term E λ , i.e., Eq. 52
simplifies to the familiar looking expression:
w(A → B) = min 1, eβ(Δλ)(ΔEλ ) , (54)
274 W. Berhanu et al.
(56)
where ΔE (1)phys = E phys (q̂ B ) − E phys (q A ) and ΔE (2)
phys = E phys ( q̂ A ) − E phys (q B );
(i) (i)
ΔE Go and ΔE λ are defined accordingly. First examples showing he usefulness
of this approach can be found in Refs. [59, 60].
3 Recent Applications
mutations can accumulate without changing structure and function of a protein until
a single mutation finally switches the fold. In the case of GA and GB this process
can be studied systematically by comparing the free-energy landscapes of the various
mutants. We have probed this assumption first with all-atom Go-model simulations
of both the GA and GB wild types and the GA98 and GB98 mutants [69], but recently
extended these investigations by using all-atom RET simulations [59]. Unlike previ-
ous physics-based all-atom simulations, that failed to reproduce these differences, we
find for the proteins very different landscapes consistent with the experiments. This is
the more astonishing as our simulations approximate the protein-solvent interaction
by an implicit solvent model. This suggest that the previous difficulties in simulat-
ing these two proteins reported in recent papers are not so much due to insufficient
accuracy of the force fields (as was claimed) but incomplete sampling.
In another application of replica exchange with tunneling (RET) we could simulate
formation and interconversion between fibril-like and barrel-like assemblies of the
amyloid-forming cylindrin peptide [60]. This success was possible because the RET
move leads to faster walk between replica where the system is biased toward fibril
assemblies and such where it is biased toward barrel-like aggregates. The net-effect is
a more effective sampling of independent configurations at the replica where λ = 0,
i.e., where the physical model is not biased. We further increased the efficiency of our
approach by including of information from all replica. Hence, while at replica with
λ = 0 the physical model is biased toward either fibril or barrel structure, this bias is
accounted for and corrected through re-weighting to the λ = 0 replica. Both effects
allowed a detailed exploration of the free energy landscape of cylindrin assemblies,
which let us propose the mechanism for formation and interconversion of the various
assemblies. Its main element is that the transition between the two polymorphs does
not involve unfolding of the chains but only their dissociation and re-association.
Crucial for formation of the barrel-like oligomer is the salt-bridge between K3-D7
which guides the association of the peptides into this form instead of the energetically
more favorable fibril.
4 Conclusion
Progress in the development of algorithms over the last three decades has extended the
size of peptides and proteins that are accessible in all-atom simulations, and has also
allowed to pinpoint the remaining difficulties. The most important open problem
in present generalized-ensemble techniques is that they require careful tuning of
parameters. Unfortunately, there are no simple and universal rules for this tuning
toward optimal sampling. As the described techniques can only reduce the sampling
difficulties from an exponential scaling to a power law, it is necessary to have software
that is highly adapted to massively parallel computers and modern architectures such
as GPUs and cell processors. Further advancements in hardware and algorithms
may overcome the remaining sampling problems and establish the use of computer
Enhanced Sampling for Biomolecular Simulations 277
Acknowledgements This article is an updated version of a review published in the first edition
of this book, adding new algorithmic developments and applications. We thank Nathan Bernhardt,
Yanjie Wei, Huilin Zang, Wei Wang, Wenhui Xi and Fatih Yasar for their contributions to work now
also reviewed here. Support by the National Science Foundation (research grants CHE-998174,
0313618, 0809002, 1266256) and the National Institutes of Health (GM62838) are acknowledged.
References
1. Lindorff-Larsen, K., Piana, S., Dror, R.O., Shaw, D.E.: How fast-folding proteins fold. Science
334, 517–520 (2011)
2. Chen, Y., Ding, F., Nie, H., Serohjos, A.W., Sharma, S., Wilocx, K.C., Yin, S., Dokholyan,
N.V.: Protein folding: then and now. Arch. Biochem. Biophys. 469, 4–19 (2007)
3. Daggett, V., Fersht, A.: Is there a unifying mechanism for protein folding? Trends Biochem.
Sci. 28, 18–25 (2003)
4. Daggett, V.: Molecular dynamics simulations of the protein unfolding/folding reaction. Acc.
Chem. Res. 35, 422–429 (2002)
5. Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B195,
216–221 (1987)
6. Brass, A., Pendleton, B.J., Chen, Y., Robson, B.: Hybrid Monte Carlo simulation theory and
initial comparison with molecular dynamics. Biopolymers 33, 1307–1315 (1993)
7. Berg, B.A.: Metropolis importance sampling for rugged dynamical variables. Phys. Rev. Lett
90, 180601 (2003)
8. Hansmann, U.H.E., Wille, L.: Global optimization by energy landscape paving. Phys. Rev.
Lett. 88, 068105 (2002)
9. Schug, A., Wenzel, W., Hansmann, U.H.E.: Energy landscape paving simulations of the trp-
cage protein. J. Chem. Phys. 122, 194711 (2005)
10. Hansmann, U.H.E., Okamoto, Y.: The generalized-ensemble approach for protein folding sim-
ulations. In: Stauffer, D. (ed.) Annual Reviews in Computational Physics, pp. 129–157. World
Scientific, Singapore (1998)
11. Kumar, S., Payne, P., Vásquez, M.: Method for free-energy calculations using iterative tech-
niques. J. Comp. Chem. 17, 1269–1275 (1996)
12. Torrie, G.M., Valleau, J.P.: Nonphysical sampling distributions in Monte Carlo free-energy
estimation: umbrella sampling. J. Comp. Phys. 23, 187–199 (1977)
13. Berg, B.A., Neuhaus, T.: Multicanonical algorithms for first order phase transitions. Phys. Lett.
B 267, 249–253 (1991)
14. Hansmann, U.H.E., Okamoto, Y.: Prediction of peptide conformation by multicanonical algo-
rithm: a new approach to the multiple-minima problem. J. Comp. Chem. 14, 1333–1338 (1993)
15. Hansmann, U.H.E., Okamoto, Y., Eisenmenger, F.: Molecular dynamics, Langevin and hybrid
Monte Carlo simulations in a multicanonical ensemble. Chem. Phys. Lett. 259, 321–330 (1996)
16. Ferrenberg, A.M., Swendsen, R.H.: New Monte Carlo technique for studying phase transitions.
Phys. Rev. Lett. 61, 2635–2638 (1988). Optimized Monte Carlo data analysis. Phys. Rev. Lett.
63, 1195–1198 (1989)
17. Berg, B.A.: Markov chain Monte Carlo simulations and their statistical analysis. World Scien-
tific, Singapore (2004)
18. Hansmann, U.H.E., Okamoto, Y.: Comparative study of multicanonical and simulated anneal-
ing algorithms in the protein folding problem. Physica A 212, 415–437 (1994)
19. Wang, F.G., Landau, D.P.: Efficient, multiple-range random walk algorithm to calculate the
density of states. Phys. Rev. Lett. 86, 2050–2053 (2001)
278 W. Berhanu et al.
20. Hansmann, U.H.E., Okamoto, Y.: Finite-size scaling of helix-coil transitions in poly-alanine
studied by multicanonical simulations. J. Chem. Phys. 110, 1267–1276 (1999)
21. Hansmann, U.H.E., Okamoto, Y.: New Monte Carlo algorithms for protein folding. Curr. Opin.
Struct. Biol. 9, 177–184 (1999)
22. Curado, E.M.F., Tsallis, C.: Possible generalization of Boltzmann-Gibbs statistics. J. Phys. A:
Math. Gen. 27, 3663 (1994)
23. Wenzel, W., Hamacher, K.: Stochastic tunneling approach for global minimization of complex
potential energy landscapes. Phys. Rev. Lett. 82, 3003 (1999)
24. Hansmann, U.H.E.: Protein folding simulations in a deformed energy landscape. Eur. Phy. J.
B 12, 607–612 (1999)
25. Laio, A., Parrinello, M.: Escaping free-energy minima. Proc. Natl. Acad. Sci. USA 99, 12562–
12566 (2002)
26. Lyubartsev, A.P., Martinovski, A.A., Shevkunov, S.V., Vorontsov-Velyaminov, P.N.: New
approach to Monte Carlo calculations of the free energy: method of expanded ensembles.
J. Chem. Phys. 96, 1776–1783 (1992). Marinari, E., Parisi, G.: Simulated tempering: a new
Monte Carlo Scheme. Europhys. Lett. 19, 451–458 (1992)
27. Hukushima, K., Nemoto, K.: Exchange Monte Carlo method and applications to spin glass sim-
ulations. J. Phys. Soc. (Japan) 65, 1604–1608 (1996); Geyer, G.J., Thompson, E.A.: Annealing
Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assn. 90,
909–920 (1995)
28. Hansmann, U.H.E.: Parallel tempering algorithm for conformational studies of biological
molecules. Chem. Phys. Lett. 281, 140–150 (1997)
29. Periole, X., Mark, A.E.: Convergence and sampling efficiency of replica-exchange molecular
dynamic simulations of peptide folding in explicit solvent. J. Chem. Phys. 126, 014903 (2007)
30. Abraham, M.J., Gready, J.E.: Ensuring mixing efficiency of replica-exchange molecular
dynamics simulations. J. Chem. Theor. Comput. 4, 1119–1128 (2008)
31. Sindhikara, D.J., Emerson, D.J., Roitberg, A.E.: Exchange often and properly in replica
exchange molecular dynamics. J. Chem. Theor. Comput. 6, 2804–2808 (2010)
32. Sindhikara, D.J., Emerson, D.J., Roitberg, A.E.: Exchange frequency in replica exchange
molecular dynamics. J. Chem. Phys. 128, 10 (2008)
33. Rhee, Y.M., Pande, V.S.: Multiplexed-replica exchange molecular dynamics method for protein
folding simulation. Biophys. J. 84, 755–786 (2003)
34. Wallace, J.A., Shen, J.K.: Continuous constant pH molecular dynamics in explicit solvent with
pH-based replica exchange. J. Chem. Theor. Comput. 7, 2617–2629 (2011)
35. Kwak, W., Hansmann, U.H.E.: Efficient sampling of protein structures by model hopping.
Phys. Rev. Lett. 95, 138102 (2005)
36. Fukunishi, H., Watanabe, O., Takada, S.: On the Hamiltonian replica exchange method for
efficient sampling, of biomolecular systems: application to protein structure prediction. J.
Chem. Phys. 116, 9058–9067 (2002)
37. Sugita, Y., Kitao, A., Okamoto, Y.: Multidimensional replica-exchange method for free-energy
calculations. J. Chem. Phys. 113, 6042–6051 (2000)
38. Gront, D., Kolinski, A., Hansmann, U.H.E.: Exploring protein energy landscape with hierar-
chical clustering. Int. J. Quant. Chem. 105, 826 (2005)
39. Williamson, T.E., Vitalis, A., Crick, S.L., Pappu, R.V.: Modulation of polyglutamine confor-
mations and dimer formation by the N-terminus of huntingtin. J. Mol. Biol. 396, 1295–1309
(2010)
40. Vitalis, A., Pappu, R.V.: Assessing the contribution of heterogeneous distributions of oligomers
to aggregation mechanisms of polyglutamine peptides. Biophys. Chem. 159, 14–33 (2011)
41. Nadler, W., Hansmann, U.H.E.: Generalized ensemble and tempering simulations: a unified
view. Phys. Rev. E 75, 026109 (2007)
42. Nadler, W., Hansmann, U.H.E.: Optimized explicit-solvent replica-exchange molecular dynam-
ics from scratch. J. Phys. Chem. B 112, 10386 (2008)
43. Trebst, S., Troyer, M., Hansmann, U.H.E.: Optimized parallel tempering simulations of pro-
teins. J. Chem. Phys. 124, 174903 (2006)
Enhanced Sampling for Biomolecular Simulations 279
44. Nadler, W., Meinke, J.A., Hansmann, U.H.E.: Folding proteins by first-passage-times optimized
replica exchange. Phys. Rev. E 78, 061905 (2008)
45. Gallicchio, E., Levy, R.M., Parashar, M.: Asynchronous replica exchange for molecular sim-
ulations. J. Comput. Chem. 29, 788–794 (2008)
46. Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding.
Chem. Phys. Lett. 314, 141–151 (1999)
47. Nadler, W., Hansmann, U.H.E.: Optimizing replica exchange moves for molecular dynamics.
Phys. Rev. E 76, 057102 (2007)
48. Kar, P., Nadler, W., Hansmann, U.H.E.: Microcanonical replica exchange molecular dynamics
simulation of proteins. Phys. Rev. E 80, 056703 (2009)
49. Kim, B., Hagen, M., Liu, P., Friesner, R.A., Berne, B.J.: Serial replica exchange. J. Phys. Chem.
B. 111, 1416–1423 (2007)
50. Lee, M., Olson, M.: Comparison of two adaptive temperature-based replica exchange methods
applied to a sharp phase transition of protein unfolding-folding. J. Chem. Phys. 134, 244111
(2011)
51. Okur, A., Wickstrom, L., Layten, M., Geney, R., Song, K., Hornak, V., Simmerling, C.:
Improved efficiency of replica exchange simulations through use of a hybrid explicit/implicit
solvation model. J. Chem. Theor. Comput. 2, 420–433 (2006)
52. Huang, X., Hagen, M., Kim, B., Friesner, R.A., Zhou, R., Berne, B.J.: Replica exchange with
solute tempering: efficiency in large scale systems. J. Phys. Chem. B 111, 5405–5410 (2007)
53. Wang, J., Zhu, W., Li, G., Hansmann, U.H.E.: Velocity-scaling for replica exchange simulations
of proteins in explicit solvent. J. Chem. Phys. 135, 084115 (2011)
54. Yaşar, F., Bernhardt, N.A., Hansmann, U.H.E.: Replica-exchange-with-tunneling for fast explo-
ration of protein landscapes. J. Chem. Phys. 143, 224102 (2015)
55. Lyman, E., Ytreberg, F.M., Zuckerman, D.M.: Resolution exchange simulation. Phys. Rev.
Lett. 96, 028105 (2006)
56. Lyman, E., Zuckerman, D.M.: Resolution exchange simulation with incremental coarsening.
J. Chem. Theor. Comput. 2, 656–666 (2006)
57. Liu, P., Shi, Q., Lyman, E., Both, G.A.: Reconstructing atomistic detail for coarse-grained
models with resolution exchange. J. Chem. Phys. 129, 114103 (2008)
58. Moritsugu, K., Terada, T., Kidera, A.: Scalable free energy calculation of proteins via multiscale
essential sampling. J. Chem. Phys. 133, 224105 (2010)
59. Bernhardt, N.A., Xi, W., Wang, W., Hansmann, U.H.E.: Simulating protein fold switching
by replica-exchange-with-tunneling. J. Chem. Theor. Comput. 12, 5656–5666 (2016); 13 393
(2017)
60. Zhang, H., Xi, W., Hansmann, U.H.E., Wei, Y.: Fibril-barrel transitions in cylindrin amyloids.
J. Chem. Theor. Comput. 13, 3936–3944 (2017)
61. Mohanty, S., Meinke, J.H., Zimmermann, O., Hansmann, U.H.E.: Simulation of top7-CFr: a
transient helix extension guides folding. Proc. Natl. Acad. Sci. U.S.A. 105, 8004–8007 (2008)
62. Mohanty, S., Hansmann, U.H.E.: Caching of a chameleon segment facilitates folding of a
protein with end-to-end β -sheet. J. Phys. Chem. B 112, 15134 (2008)
63. Kuhlman, B., Dantas, G., Ireton, G.C., Varani, G., Stoddard, B.L., Baker, D.: Design of a novel
globular protein fold with atomic level accuracy. Science 302, 1364–1368 (2003)
64. Dantas, G., Watters, A.L., Lunde, B.M., Eletr, Z.M., Isern, N.G., Roseman, T., Lipfert, J.,
Doniach, S., Tompa, M., Kuhlman, B., Stoddard, B.L., Varani, G., Baker, D.: Mis-translation
of a computationally designed protein yields an exceptionally stable homodimer: implications
for protein engineering and evolution. J. Mol. Biol. 362, 1004–1024 (2006)
65. Gaye, M.L., Hardwick, C., Kouza, M., Hansmann, U.H.E.: Chamelonicity and folding of the
C-fragment of TOP7. Eur. Phys. Let. 97, 68003 (2012)
66. Kouza, M., Gowtham, S., Seel, M., Hansmann, U.H.E.: A numerical investigation into possible
mechanisms by that the A629P mutant of ATP7A causes Menkes Disease. Phys. Chem. Chem.
Phys. 12, 11390–11397 (2010)
67. Jiang, P., Hansmann, U.H.E.: Modeling structural flexibility of proteins with Go-models. J.
Chem. Theor. Comput. 8, 2127–2133 (2012)
280 W. Berhanu et al.
68. Alexander, P., He, Y., Chen, Y., Orban, J., Bryan, P.: A minimal sequence code for switching
protein structure and function. Proc. Natl. Acad. Sci U.S.A. 106, 21149–21154 (2009)
69. Kouza, M., Hansmann, U.H.E.: Folding simulations of the A and B domains of protein G. J.
Phys. Chem. B. 116, 6645–6653 (2012)
Determination of Kinetics
and Thermodynamics of Biomolecular
Processes with Trajectory Fragments
Alfredo E. Cardenas
Abstract Trajectory fragments algorithms are a set of methods that partition the
relevant trajectory space between reactants and products into smaller regions of
phase space. Many short trajectories are launched to evaluate transition probabili-
ties between these regions. Each of the methods processes this short-trajectory data
with different kinetic models and as a result long-time kinetic and thermodynamic
information for the overall molecular event can be extracted. This chapter focuses
on Milestoning, providing detailed analysis of the approximations involved in the
algorithm and its computational implementation. Two other trajectory fragments
methods (Partial Path Transition Interface Sampling and Markov State Models) are
briefly discussed as well. Finally, two recent applications of trajectory fragments
methods are described.
1 Introduction
A. E. Cardenas (B)
Institute for Computational Engineering and Sciences, University of Texas,
Austin, TX 78712, USA
e-mail: [email protected]
[22]. The overall computational gain from multi-time stepping algorithms is modest
(about a factor of two).
Expensive special purpose machines for MD as Anton focus only on reducing the
factors proportional to N [23, 24]. While this hardware is strikingly successful in
producing a few millisecond trajectories, the problem of kinetic at biophysical times
(milliseconds) remains prohibitively costly due to the requirement of an ensemble
of trajectories. Furthermore, it is desirable to make the calculations of long-time
dynamics available at a single-researcher laboratory setting.
Most of the success in speeding up the calculations has come from reducing the
N-factor contribution. Therefore, the most significant remaining barrier for routine
calculations of kinetic and thermodynamic properties of molecular systems is the L
factor—the trajectory length.
To recapitulate let us not forget why these trajectories are computed, how they
are used and if there are ways of avoiding the expensive straightforward calculations
discussed so far. In the case of thermodynamic calculations, configurations can be
generated by MD simulations to average the values of observables. Averaging using
straightforward trajectories is correct for ergodic systems, but correct does not mean
efficient. Enhanced sampling techniques have been used for a long time in statisti-
cal mechanics calculations of thermodynamic variables. For example the method of
umbrella sampling [25] is widely used to probe and estimate probabilities of infre-
quent events in phase space. Straightforward MD trajectories should not be used to
compute thermodynamic properties that can be estimated much faster with enhanced
sampling techniques.
We can make similar arguments for the evaluation of kinetics. While straight-
forward calculations of an ensemble of trajectories from A to B provide the exact
answer, it is not the only way of obtaining the correct result. The cost of calculations
of kinetics is even higher than simulations of equilibrium due to the need of many
trajectories. Alternative approaches can provide the desired statistics and overcome
the time scale barriers, or a large value of L. It is the reduction of the lengths of
the trajectories, breaking them into fragments, running these fragments in different
processes, and still computing observables of long time dynamics, which is the main
topic of the present chapter.
Fig. 1 Five milestones are used to separate the relevant trajectory space between states A (reactant)
and B (product). Trajectory fragments are short segments of trajectories connecting neighboring
milestones. For example, trajectories started from milestone 2 (the three blue trajectories) are run
until they hit milestones 1 or 3. Before they hit any of the two neighboring milestones we say that
the system “belongs” to milestone 2. A reaction coordinate connecting A and B is shown in orange
are marked as passage events and generate trajectory fragments. The lengths of the
trajectory fragments are much shorter than the expected length of an exact first pas-
sage trajectory connecting A to B (with number of integration steps L). What are the
reasons for this efficiency gain?
Consider first a diffusive process. Diffusive motion is the typical dynamics found
in biomolecular motions beyond tens of picoseconds. Let the reactant and product
be separated by a distance R. The time scale for free diffusion along one dimension
is roughly t : R 2 . If we consider M-1 cells between the end interfaces then the time
2
scale for diffusion between a pair of divisions is of order of R M . In order to
complete a trajectory we need to select M pieces of the fragments and hence the time
2
scale using fragments is t M : M · R M R 2 M. The analysis suggests a speed
up by a factor of M with respect to a straightforward trajectory.
What is the origin of this saving? Diffusive trajectories are going back and forth
many times. In contrast, the fragments are computed without explicitly simulating
back and forth transitions. In the Milestoning picture we first generate a bank of
transitional trajectory fragments, say from cell i to cell j and from cell j to i. We found
by experience that adequate sampling of trajectory fragments to estimate transition
probabilities can be achieved using hundreds or thousand of trajectories for most
molecular systems [26, 28, 29]. The sampling intends to estimate the transition
probability between the interfaces and not necessarily to provide a comprehensive
picture of the dynamics within the cell. For example, a transition probability of
10 percent can be estimated quite accurately using 100 trajectory fragments per
transition event in Milestoning. Milestoning is designed to provide uniform sampling
286 A. E. Cardenas
of events as the reaction progresses or returns. If the trajectory goes back there is no
need to re-compute trajectory fragments since we re-sample from the prepared pool.
We will obtain similar statistics if we are at a minimum or at the top of the free
energy barrier. That is in contrast to straightforward MD simulations in which we
usually get a lot more statistics near the minima, using inefficiently our limited com-
putational resources. This brings us to another advantage of the trajectory fragments:
overcoming a barrier is more efficient compared to a complete calculation of a tra-
jectory moving from one side of the barrier to the other side. Consider climbing a
barrier of height V . In the canonical ensemble the time to reach the top of the barrier,
is proportional to exp(βV ), where β is the Boltzmann factor. Imagine that the bar-
rier is broken into cells. Eachmilestoning
transition climbs up with an intermediate
time proportional to exp βV M . This time is exponentially shorter than exp(βV ).
Adding up M milestones has a small impact on the overall time t M : M exp β B M
in this case, keeping the rate significantly faster than of a single trajectory. In prac-
tice the speedup easily exceeds a factor of millions for these activated processes. For
example, in the simulation of the recovery stroke in myosin [30] the actual accumu-
lated length of all the trajectory fragments was of the order of 100 ns. The predicted
mean first passage time of the process (fraction of a millisecond) was within a factor
of 10 from the experimental result [31] and is a million times longer than the simu-
lated time. Hence the use of trajectory fragments dramatically reduces the collective
length of the computed trajectories and increases the computational efficiency.
It is important to point out that an adequate use of short trajectories to evaluate
thermodynamics and kinetic properties depends on a thorough sampling of the con-
formational space of the system, such that the calculations do not miss important
regions of the conformational space. Also, the interfaces used to partition the space
should be close enough such that short trajectories can correctly sampled their tran-
sition times, but long enough to eliminate any bias to the initial conformations [32].
A few long trajectories or other sampling techniques can be used to explore the space
before any trajectory fragment technique is used.
In the following we will describe three different trajectory fragment methodolo-
gies: Milestoning, Partial Path Transition Interface Sampling (PPTIS) and Markov
State Models (MSM). Other trajectory fragment techniques have been developed
and applied successfully to the study of rare processes such as transition interface
sampling (TIS) [3], forward flux sampling (FFS) [33, 34], weighted ensemble [35],
and boxed molecular dynamics [36]. There are similarities between these algorithms
so for the purposes of this review we are describing only three of them. We will
provide more theoretical and algorithmic details for Milestoning. Recent reviews
provide additional descriptions of other methods [37–41].
Determination of Kinetics and Thermodynamics of Biomolecular … 287
2.1 Milestoning
In the following we will introduce the basic objects and definitions of Milestoning,
provide some details of its implementation, and describe the equations to determine
kinetic and thermodynamic properties.
The points X that satisfy the equality and therefore define the interface are closer to
the final state j and hence the sense of directionality. The parameter determines the
extent of asymmetry between the two end states. The term is added to minimize
the possibility of rapid termination of trajectories between milestones that crossed
each other.
As we further discuss later, the physical assumption of Milestoning theory is of
memory loss between hypersurfaces. The coarse variables of the individual trajecto-
ries or trajectory fragments, in accord with a statistical mechanics view of dynamics,
suffer numerous collision events with other degrees of freedom and their motion is
overall diffusive. After a typical time period the coarse variables uncorrelate and it
is not possible to trace them back to their point of origin. A formal statement of
this approximation and the profound simplifications it suggests for the calculations
of kinetics and thermodynamics are given in Sect. 2.1.3. In the next subsection we
continue to describe the algorithm of fragment generation that uses this assumption
and how these fragments can be used to compute the relevant transition kernel.
Fig. 2 Trajectory fragments computed in Milestoning. Three milestones (i, j, k) are represented as
vertical lines. In a backward trajectories are launched starting from configurations in milestone j.
The configurations (and velocities) belong to the first hitting point distribution of j (black points)
if the backward trajectory hits a neighboring milestones (i or k) without re-crossing j (solid lines).
If they re-cross j (dashed lines), the originated points (in grey) are not saved for the next step.
In b forward trajectories are launched from the first hitting points discovered in (a). The forward
trajectories are shown as solid lines. The backward trajectories [from (a)] are shown as dotted lines.
Notice that the forward trajectories are allowed to re-cross the originating milestone j
that it crosses another milestone, before re-crossing the milestone it started from. If
it re-crosses the milestone of initiation, then it is not a first hitting point. This phase
space point is removed from the sample set.
In summary the generation of the trajectory fragments uses the following steps:
1. Generate a canonical sample of configurations at a milestone. This is achieved
either with constant temperature MD while restraining the system to the hyper-
surface [26] similar to what is done in umbrella sampling [25] or with constrained
dynamics implemented with Lagrange’s multipliers [28]. The set of selected con-
figurations is distributed in the interface with weights of exp(−βU (X )) where
β is the Boltzmann factor and U (X ) the potential energy.
2. Examine if the phase space points sampled in step 1 are first hitting points. Since
the sampling in step 1 is of configurations only, sample first atomic velocities
from the Maxwell distribution conditioned on the overall velocity directed back-
ward from the hypersurface. Each point is integrated backward in time using
Newtonian mechanics (constant energy) until it hits and terminates on a mile-
stone (Fig. 2a). The use of the NVE ensemble is important for the calculations
of dynamics. Other ensembles provide only phenomenological parameterization
of time dependent properties. If the terminating milestone is different from the
interface we started from, accept this initial configuration and velocity as a first
hitting point. If not, reject the point and try with another phase point from step 1.
3. Integrate the first hitting points from step 2 forward in time. The trajectory frag-
ment is terminated when it hits for the first time a milestone different from the
milestone it was initiated on. Note the important difference between the back-
ward and the forward integrations. During the forward integration we do not
290 A. E. Cardenas
terminate trajectories that re-cross the initial milestone. We continue the forward
trajectories until they find a new milestone to terminate on (Fig. 2b). All the
forward trajectories count, and the removal of some of the sampled phase points
at the interface occur only in step 2.
What do we do with the sampled fragments? The Milestoning theory is built
around a kernel or a transition operator, which we denote by K αβ (t). It is the proba-
bility density that a trajectory fragment initiated at interface α will hit interface β at
∞
time t. This probability density is normalized: β∈ᾱ 0 K αβ (t)dt 1. The normal-
ization states that at infinite time the trajectory must terminate on one of the nearby
milestones β. The symbol ᾱ means milestones that can be reached from α without
crossing other milestones along the way.
How do we use the trajectory fragments to estimate the value of the kernel (or
time moments of it)? We compute the kernel (or moments of it) by binning. For
example, let the number of first hitting point trajectories initiated at hypersurface
α be n α . Let the number of trajectories that hit a neighboring milestone β between
time t and time t + t be n αβ (t). The kernel element K αβ (t) is therefore estimated
as n αβ (t) n α t. We will be mostly interested in the moments of the kernel. For
example, the probability that a trajectory fragment will make it from α to β (at any
∞
time) is the zero moment (in time) of K αβ (t), pαβ K αβ (t)dt ≈ i n αβ (ti ) n α .
0
Computing the moments is more stable statistically since less sampling is required
to compute them compared to accurate estimates of many bins of K.
Assuming that we have computed the ensemble of trajectory fragments, and then
estimated the kernel K αβ (t), how do we proceed to obtain kinetics and thermody-
namics? At the core of the Milestoning theory one finds an equation for the flux
through milestones. A flux is defined as the number of trajectories fragments that
pass through a milestone at time t. We write a general and exact equation for the flux
(irrespective of the dynamics used to generate the trajectory fragments):
t
qα (t, X α ) pα (0, X α )δ t + + qβ t , X β K βα t − t , X β , X α dt d X β , (3)
β∈ᾱ 0
where the indices α, β are used to denote milestones, and pα (t, X α ) is the probability
that the last milestone that was crossed at time t is α. The coordinate vectors X α and
X β are at the interfaces, and qα (t, X α ) is the flux through the milestone point X α at
time t. Equation 3 is difficult to solve as it is. The flux is a function of the position in
the hypersurface, which means a function of N-k dimensions of all degrees of freedom
(where N is the number of degrees of freedom, and k the number of coarse variables).
Determination of Kinetics and Thermodynamics of Biomolecular … 291
The kernel itself depends on position vectors in two Milestones. This exact equation
is therefore not useful for simulation of large molecular systems with a number of
coarse variables that could easily exceeds hundreds. To make progress we use the
memory loss assumption mentioned in the previous section. In the kernel language it
means that a trajectory fragment depends only on the label of the milestone it started
from, but is independent of the exact location within the milestone. Hence
K αβ t, X α , X β ∼ K αβ t, X β (4)
The approximation in Eq. (4) is what makes Milestoning different (and computa-
tionally more efficient) than other trajectory fragment techniques. For example, FFS
continues a trajectory from the current interface using a prior trajectory that hit the
interface before; it therefore produces an exact path. Milestoning is using indepen-
dent fragments to estimate the kernel. With the approximation of Eq. (4) at hand we
define
K αβ (t) K αβ t, X β d X β
qα (t) qα (t, X α )d X α
pα (t) pα (t, X α )d X α (5)
Integrating Eq. (3) with respect to X α (and also integrating over X β on the right hand
side equation) we obtain the basic formula of the Milestoning theory [6]
t
qα (t) pα (0)δ t + + qβ t K βα t − t dt (6)
β∈ᾱ 0
Equation (6) can be solved analytically using Laplace transforms to provide the
stationary distribution, pα (t → ∞) and the mean first passage time τ (and higher
moments of it) as was shown in a number of publications [28, 29, 44]. In the absence
of external forces and (or) fluxes in and out the system, pα (t → ∞) is the equilibrium
distribution. The overall mean first passage time, τ , is computed for a system with an
absorbing boundary at the product state. Every trajectory that makes it to the product
state is terminated. The final expressions for the stationary flux and distribution are
qstat (I − K) 0
pα,stat qα,stat · tα (7)
∞
tα t · K αβ (t)dt (8)
β∈ᾱ 0
From the first line of Eq. (7) we realize that q is an eigenvector of the matrix (I-K)
with an eigenvalue of zero—a straightforward problem in linear algebra.
The calculation of the Mean First Passage Time (MFPT) follows another analytical
expression
τ p · (I − K)−1 t (9)
where p is the vector of initial conditions (p)α pα (0) , and t is a vector with
components t α ≡ tα . Higher moments of the first passage time can be computed
as well using moments of the kernel [28, 44].
Partial Path Transition Interface Sampling (PPTIS) [2] is a method similar to Mile-
stoning in the sense that only computes trajectory fragments between interfaces. The
fundamental principles used in both methods are similar but the practical implemen-
tation to extract the required probabilities is slightly different.
PPTIS is a variation of the Transition Interface Sampling (TIS) method [3]. In
TIS paths are computed from the interfaces until they reach states A or B. Therefore,
this method is not particularly useful when considering diffusive barriers. This limi-
tation led to the development of PPTIS that uses shorter paths between neighboring
interfaces similar to Milestoning. The theoretical framework of PPTIS starts with
the conditional crossing probability depending on the location of any four interfaces
i, j, l, m:
f
i φi j (x)h lm (x)
l
P (10)
m j φi j (x)
This equation states that the flux at k coming from i is the product of the flux at j
< k coming from i times the conditional probability of reaching k before i when the
system is coming from j directly from i. Applying twice these flux relations between
neighboring interfaces the following probabilistic relation among four interfaces i
< j < k < l is obtained
l j l k k j
P P P (12)
i i i i i i
These last two equations are exact because they keep track of the starting interface,
in this case interface i.
Before writing down the expressions for the rates, PPTIS introduces two additional
crossing probabilities. The single interface crossing probabilities are defined as:
± i +1 i i −1 i
pi ≡ P , pi ≡ P
m
,
i −1 i −1 i +1 i +1
i − 1 i i + 1 i
pi ≡ P , piP ≡ P , (13)
i +1 i −1 i −1 i +1
where 0 is the interface closest to the initial state A. For example, P3+ is the proba-
bility that a trajectory crosses interface 3 while coming from state A directly. From
these definitions the corresponding rate constants for a one-dimensional reaction
coordinate can be written as:
φ1,0 + φn−1,n −
k AB P , kB A Pn (15)
hA n hB
with
p ±j−1 P j−1
+
p ∓j−1 P j−1
−
P j+ ≈ , P j− ≈ (16)
p ±j−1 + p j−1 P j−1
−
p ±j−1 + p j−1 P j−1
−
where n is the interface closest to the product state B and A and B represent overall
states. For example, state A consists of stable state A and all phase space points
coming directly from state A in the past. A similar definition applies to state B.
294 A. E. Cardenas
The recursive expressions for the long-distance crossing probabilities are approx-
imate. The approximation in PPTIS is that trajectories lose their memory over a
distance shorter than the separation between interfaces. This Markovian assumption
is basically the same one used in Milestoning.
Starting
with the initial condition
− −
P1 P1 1 one can iteratively solve for P j , P j from j 2, . . . , n.
+ +
The evaluation of the single interface local probabilities pi± , pi , pim , and piP
entails the generation of all possible paths starting from interfaces i − 1 and i + 1
that cross at least once with i.
PPTIS was developed to use for transitions that can be described by a single
reaction coordinate. For those cases, PPTIS is similar to Milestoning. However,
Milestoning is a more general formulation that enables computations of kinetics
without the need to know a priori the reaction coordinate. In the DiM implementation
of Milestoning, anchors labeled with multiple coarse variables are used to partition
the relevant phase space for the process under study.
In the last 10 years the use of Markov State Models (MSM) have become quite popular
to analyze large set of simulation data [7, 38, 45–51]. These techniques usually start
from a few long MD simulations or many short trajectories where a molecular system
undergoes conformational transitions. Typical examples of applications are protein
folding and conformational changes associated with ligand binding. Very often the
amount of data generated from these simulations is too large and analysis tools are
required to extract from them the relevant structural and dynamical information. This
reduction of the original high-dimensional molecular simulation data often entails the
partitioning of the relevant conformational space of the system into discrete states.
Kinetics information can be obtained by extracting transition probabilities between
these discreet states. MSM assume that the transitions between states are Markovian,
i.e., the jumps between these states are memoryless. Specifically, let’s assume that
x(t) describes the positions and momenta of a long trajectory for a molecular system
of interest. This trajectory can be discretized into a set of states {S1 , . . . , Sn }. The
time evolution of the system between the states can be described by the transition
matrix T(τ ) ∈ Rn×n , where Ti j (τ ) is the steady-state probability to find the system
in state j at time t + τ given it was in state i at time t. The transition matrix elements
can be computed by evaluating correlation functions:
j (τ )
cicorr
Ti j (τ ) (17)
πi
is counted in the long trajectory x(t) and stored in a count matrix ci j (τ ) then the
correlation functions are easy to obtain because Cicorr j (τ ) ∝ C i j (τ ).
Let’s denote by p(t) ∈ Rn the population of the system at the different states
{S1 , . . . , Sn } at time t. After a time τ , the state populations change according to:
n
p j (t + τ ) pi (t)Ti j (τ ), (18)
i1
or in matrix notation as
pT (t + τ ) ≈ pT (t)T(τ ) (19)
where ci k cik is the total number of times the trajectory is in state i. For a very
long trajectory this approximated transition matrix will converge to the exact result:
Due to the limited statistics the approximate transition matrix does not satisfy the
detailed balance condition but in general we have πi T̂i j π j T̂ ji [52]. This can be
partially corrected using maximum likelihood estimator that enforces the detailed
balance equations.
The number of states used in MSM varies depending on the complexity of the
system. For example for protein folding simulations the number of partitions can
easily reach tens of thousands [39]. Conventional structural clustering techniques
such as k-means or k-centers are often used initially to create states that group
structures from the available simulation data. A kinetic clustering is done later by
constructing the corresponding transition matrix and lumping together states that
interconvert faster than a chosen lag time (typically less than 10 ns). In practice, this
is done by computing and analyzing the eigenvalues and eigenvectors of the current
transition matrix to identify states that are kinetically similar.
It often happens that using the initial simulation data is not enough to sample
adequately relevant state transitions. In that case, adaptive techniques [53] can be used
to efficiently sample with additional short simulations the transitions that contribute
more to the uncertainties in the transition probability matrix.
Once the MSM is constructed it should be validated for self-consistency with
respect to the input data used in its construction. Several approaches have been
296 A. E. Cardenas
suggested in the literature [48]. Once this validation is passed the model can be used
to make kinetic predictions that could be compared to experiment.
Markov State Models provide only an approximate kinetics mostly due to two
reasons. First, in practical applications MSM can provide only approximate transition
probabilities due to limited sampling. This is a limitation that is present in any
trajectory fragment algorithm. The second reason is that by discretizing the dynamical
process x(t) (that is Markovian in the more often used algorithms of molecular
dynamics) into a set of states the exact location of the system is lost, and the jump
process between states is no longer Markovian. For example, when the system is in a
region of state i closer to j it will have a larger probability to jump to j than systems that
are close to the center of state i. The state space discretization introduces systematic
error in the prediction of long-time kinetics:
pT (t + kτ ) ≈ pT (t)Tk (τ ) (22)
Table 1 Mean first passage time for blocked tryptophan permeation through a DOPC lipid mem-
brane
Method Average (h) Individual layers (h)
Milestoning 3.8 7.5, 0.05
Solubility-diffusion 0.23 0.41, 0.05
Experiment 8
The second column shows the average permeation time for the two lipid layers and the third column
the permeation time computed for the individual layers
The last 10 years have brought new algorithmic advances such as trajectory fragments
that are starting to bridge the gap between the short-time limits of molecular dynamics
Determination of Kinetics and Thermodynamics of Biomolecular … 299
Fig. 4 Folding of FiP35. On the left, a folding flux network showing the top 12 folding pathways
obtained with transition path theory. Arrow widths are proportional to flux and node size is propor-
tional to state populations. The conformations closest to the native are depicted at the bottom. On
the right, examples of conformations while the folding progresses from Pfold 0.1 to Pfold 0.9
are shown. Reproduced with permission from Ref. [56]
master equations between the states and pathways fluxes are obtained by using kinetic
approaches such as transition path theory. Applications of these trajectory fragment
methods have shown their efficiency and accuracy in the determination of rates
and provided richer insights into the mechanisms of biomolecular processes and
interpretation of experimental data.
Despite those advances and impressive applications, these methods are used by a
limited number of groups in the theoretical biophysical community. One reason for
this is that the theory can be rather intimidating at first and its algorithmic imple-
mentation is involved with many steps to follow. Another reason is that the hardware
needed to perform the required calculations (hundreds to thousand of computers) is
not always available to many groups. The second reason is more difficult to tackle, but
to try to alleviate the first problem a more automatized procedure could be designed
to provide assistance in setting up the calculations, given a few input parameters
and error tolerance levels. For MSM some tools have been designed to address this
automatization [47, 59] but not for Milestoning. Algorithmic challenges still remain
to help in the design of general procedures and in the choice of simulation parameters
that will provide accurate results in most general cases.
References
1. Truhlar, D.G., Garrett, B.C., Klippenstein, S.J.: Current status of transition-state theory. J. Phys.
Chem. 100(31), 12771–12800 (1996)
2. Moroni, D., Bolhuis, P.G., van Erp, T.S.: Rate constants for diffusive processes by partial path
sampling. J. Chem. Phys. 120(9), 4055–4065 (2004). https://doi.org/10.1063/1.1644537
3. van Erp, T.S., Moroni, D., Bolhuis, P.G.: A novel path sampling method for the calculation of
rate constants. J. Chem. Phys. 118(17), 7762–7774 (2003)
4. Bolhuis, P.G., Chandler, D., Dellago, C., Geissler, P.L.: Transition path sampling: throwing
ropes over rough mountain passes, in the dark. Ann. Rev. Phys. Chem. 53, 291–318 (2002).
https://doi.org/10.1146/annurev.physchem.53.082301.113146
5. Allen, R.J., Warren, P.B., ten Wolde, P.R.: Sampling rare switching events in biochemical net-
works. Phys. Rev. Lett. 94(1), 018104 (2005). https://doi.org/10.1103/PhysRevLett.94.018104
6. Faradjian, A.K., Elber, R.: Computing time scales from reaction coordinates by milestoning.
J. Chem. Phys. 120(23), 10880–10889 (2004)
7. Chodera, J.D., Swope, W.C., Pitera, J.W., Dill, K.A.: Long-time protein folding dynamics from
short-time molecular dynamics simulations. Multiscale Model. Simul. 5(4), 1214–1226 (2006)
8. Landau, L.D., Lifshitz, E.M.: Mechanics, vol. 1. Course of Theoretical Physics. Pergamon,
Oxford (1976)
9. Machlup, S., Onsager, L.: Fluctuations and irreversible processes. II system with kinetic energy.
Phys. Rev. 91, 1512–1515 (1953)
10. Onsager, L., Machlup, S.: Fluctuations and irreversible processes. Phys. Rev. 91, 1505–1512
(1953)
11. Olender, R., Elber, R.: Calculation of classical trajectories with a very large time step: formalism
and numerical examples. J. Chem. Phys. 105(20), 9299–9315 (1996)
12. Elber, R., Ghosh, A., Cardenas, A.: Long time dynamics of complex systems. Acc. Chem. Res.
35(6), 396–403 (2002)
13. Elber, R., Cardenas, A., Ghosh, A., Stern, H.A.: Bridging the gap between long time trajectories
and reaction pathways. In: Prigogine, I., Rice, S.A. (eds.) Advances in Chemical Physics, vol.
126, pp. 93–129. Wiley & Sons Inc, NJ (2003)
Determination of Kinetics and Thermodynamics of Biomolecular … 301
14. Faccioli, P., Sega, M., Pederiva, F., Orland, H.: Dominant pathways in protein folding. Phys.
Rev. Lett. 97(10), 108101 (2006). https://doi.org/10.1103/PhysRevLett.97.108101
15. Cardenas, A.E., Elber, R.: Kinetics of cytochrome C folding: atomically detailed simulations.
Proteins Struct. Funct. Bioinf. 51(2), 245–257 (2003)
16. Cardenas, A.E., Elber, R.: Atomically detailed Simulations of helix formation with the stochas-
tic difference equation. Biophys. J. 85(5), 2919–2939 (2003)
17. Bai, D., Elber, R.: Calculation of point-to-point short-time and rare trajectories with boundary
value formulation. J. Chem. Theory Comput. 2(3), 484–494 (2006)
18. Elber, R., Meller, J., Olender, R.: Stochastic path approach to compute atomically detailed
trajectories: application to the folding of C peptide. J. Phys. Chem. B 103(6), 899–911 (1999)
19. Siva, K., Elber, R.: Ion permeation through the gramicidin channel: atomically detailed model-
ing by the Stochastic Difference Equation. Proteins Struct. Funct. Bioinf. 50(1), 63–80 (2003)
20. Ghosh, A., Elber, R., Scheraga, H.A.: An atomically detailed study of the folding pathways
of protein A with the stochastic difference equation. Proc. Natl. Acad. Sci. U. S. A. 99(16),
10394–10398 (2002)
21. Tuckerman, M., Berne, B.J., Martyna, G.J.: Reversible multiple time scale molecular-dynamics.
J. Chem. Phys. 97(3), 1990–2001 (1992)
22. Morrone, J.A., Zhou, R.H., Berne, B.J.: Molecular dynamics with multiple time scales: how
to avoid pitfalls. J. Chem. Theory Comput. 6(6), 1798–1804 (2010). https://doi.org/10.1021/
ct100054k
23. Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., Eastwood, M.P., Bank,
J.A., Jumper, J.M., Salmon, J.K., Shan, Y.B., Wriggers, W.: Atomic-level characterization of
the structural dynamics of proteins. Science 330(6002), 341–346 (2010). https://doi.org/10.
1126/science.1187409
24. Shaw, D.E., Deneroff, M.M., Dror, R.O., Kuskin, J.S., Larson, R.H., Salmon, J.K., Young, C.,
Batson, B., Bowers, K.J., Chao, J.C., Eastwood, M.P., Gagliardo, J., Grossman, J.P., Ho, C.R.,
Ierardi, D.J., Kolossvary, I., Klepeis, J.L., Layman, T., McLeavey, C., Moraes, M.A., Mueller,
R., Priest, E.C., Shan, Y.B., Spengler, J., Theobald, M., Towles, B., Wang, S.C.: Anton, a
special-purpose machine for molecular dynamics simulation. Commun. ACM 51(7), 91–97
(2008). https://doi.org/10.1145/1364782.1364802
25. Valleau, J.: Monte Carlo: changing the rules for fun and profit. In: Berne, B.J., Cicootti, G.,
Coker, D.F. (eds.) Classical and quantum dynamics in condensed phase simulations. World
Scientific, Singapore (1998)
26. Majek, P., Elber, R.: Milestoning without a reaction coordinate. J. Chem. Theory Comput. 6(6),
1805–1817 (2010). https://doi.org/10.1021/ct100114j
27. Vanden-Eijnden, E., Venturoli, M.: Markovian milestoning with Voronoi tessellations. J. Chem.
Phys. 130(19), 194101 (2009). https://doi.org/10.1063/1.3129843
28. West, A.M.A., Elber, R., Shalloway, D.: Extending molecular dynamics time scales with mile-
stoning: Example of complex kinetics in a solvated peptide. J. Chem. Phys. 126(14), 145104
(2007)
29. Kirmizialtin, S., Elber, R.: Revisiting and computing reaction coordinates with directional
milestoning. J. Phys. Chem. A 115(23), 6137–6148 (2011)
30. Elber, R., West, A.: Atomically detailed simulation of the recovery stroke in myosin by Mile-
stoning. Proc. Natl. Acad. Sci. U. S. A. 107, 5001–5005 (2010)
31. Malnasi-Csizmadia, A., Toth, J., Pearson, D.S., Hetenyi, C., Nyitray, L., Geeves, M.A.,
Bagshaw, C.R., Kovacs, M.: Selective perturbation of the myosin recovery stroke by point
mutations at the base of the lever arm affects ATP hydrolysis and phosphate release. J. Biol.
Chem. 282(24), 17658–17664 (2007)
32. Monticelli, L., Sorin, E.J., Tieleman, D.P., Pande, V.S., Colombo, G.: Molecular simulation
of multistate peptide dynamics: a comparison between microsecond timescale sampling and
multiple shorter trajectories. J. Comput. Chem. 29, 1740–1752 (2008)
33. Allen, R.J., Frenkel, D., ten Wolde, P.R.: Forward flux sampling-type schemes for simulating
rare events: Efficiency analysis. J. Chem. Phys. 124(19), 194111 (2006). https://doi.org/10.
1063/1.2198827
302 A. E. Cardenas
34. Allen, R.J., Valeriani, C., ten Wolde, P.R.: Forward flux sampling for rare event simulations.
J. Phys.: Condens. Matter. 21(46), 463102 (2009). https://doi.org/10.1088/0953-8984/21/46/
463102
35. Zhang, B.W., Jasnow, D., Zuckerman, D.M.: The “weighted ensemble” path sampling method
is statistically exact for a broad class of stochastic processes and binning procedures. J. Chem.
Phys. 132(5), 054107 (2010). https://doi.org/10.1063/1.3306345
36. Glowacki, D.R., Paci, E., Shalashilin, D.V.: Boxed molecular dynamics: a simple and general
technique for accelerating rare event kinetics and mapping free energy in large molecular
systems. J. Phys. Chem. B 113(52), 16603–16611 (2009)
37. Van Erp, T.S.: Dynamical rare event simulation techniques for equilibrium and nonequilibrium
systems. In: Nicolis, G., Maes, D. (eds.) Kinetics and Thermodynamics of Multistep Nucleation
and Self-Assembly in Nanoscale Materials: Advances in Chemical Physics, vol. 151. Wiley &
Sons Inc, Hoboken (2012)
38. Prinz, J.-H., Keller, B., Noe, F.: Probing molecular kinetics with Markov models: metastable
states, transition pathways and spectroscopic observables. Phys. Chem. Chem. Phys. 13,
16912–16927 (2011)
39. Pande, V.S., Beauchamp, K., Bowman, G.R.: Everything you wanted to know about Markov
State Models but were afraid to ask. Methods 52, 99–105 (2010)
40. Bolhuis, P.G., Dellago, C.: Trajectory-based rare event simulations. In: Lipkowitz, K.B. (ed.)
Reviews in Computational Chemistry, vol. 27. John Wiley & Sons Inc, Hoboken (2010)
41. Cardenas, A.E., Elber, R.: Enhancing the capacity of molecular dynamics simulations with tra-
jectory fragments. In: Schlick, T. (ed.) Innovations in Biomolecular Modeling and Simulations,
vol. 1. RSC Biomolecular Sciences. The Royal Society of Chemistry, Cambridge (2012)
42. Elber, R.: A milestoning study of the kinetics of an allosteric transition: atomically detailed
simulations of deoxy Scapharca hemoglobin. Biophys. J. 92(9), L85–L87 (2007)
43. Kuczera, K., Jas, G.S., Elber, R.: Kinetics of helix unfolding: molecular dynamics simula-
tions with milestoning. J. Phys. Chem. A 113(26), 7461–7473 (2009). https://doi.org/10.1021/
jp900407w
44. Shalloway, D., Faradjian, A.K.: Efficient computation of the first passage time distribution
of the generalized master equation by steady-state relaxation. J. Chem. Phys. 124(5), 054112
(2006)
45. Noe, F., Schutte, C., Vanden-Eijnden, E., Reich, L., Weikl, T.R.: Constructing the equilibrium
ensemble of folding pathways from short off-equilibrium simulations. Proc. Natl. Acad. Sci.
U. S. A. 106(45), 19011–19016 (2009). https://doi.org/10.1073/pnas.0905466106
46. Swope, W.C., Pitera, J.W.: Describing protein folding kinetics by molecular dynamics simu-
lations. 1. Theory. J. Phys. Chem. B 108(21), 6571–6581 (2004)
47. Chodera, J.D., Singhal, N., Pande, V.S., Dill, K.A., Swope, W.C.: Automatic discovery of
metastable states for the construction of Markov models of macromolecular conformational
dynamics. J. Chem. Phys. 126(15), 155101 (2007)
48. Prinz, J.-H., Wu, H., Sarich, M., Keller, B., Senne, M., Held, M., Chodera, J.D., Schutte,
C., Noe, F.: Markov models of molecular kinetics: generation and validation. J. Chem. Phys.
134(17), 174105 (2011)
49. Noe, F., Horenko, I., Schutte, C., Smith, J.C.: Hierarchical analysis of conformational dynamics
in biomolecules: transition networks of metastable states. J. Chem. Phys. 126(15), 155102
(2007)
50. Buch, I., Giorgino, T., De Fabritiis, G.: Complete reconstruction of an enzyme-inhibitor bind-
ing process by molecular dynamics simulations. Proc. Natl. Acad. Sci. U. S. A. 108(25),
10184–10189 (2011)
51. Voelz, V.A., Bowman, G.R., Beauchamp, K., Pande, V.S.: Molecular simulation of ab initio
protein folding for a millisecond folder NTL9(1-39). J. Am. Chem. Soc. 132(5), 1526–1528
(2010)
52. Scalco, R., Caflisch, A.: Equilibrium distribution from distributed computing (Simulations of
protein Folding). J. Phys. Chem. B 115(19), 6358–6365 (2011)
Determination of Kinetics and Thermodynamics of Biomolecular … 303
53. Singhal, N., Pande, V.S.: Error analysis and efficient sampling in Markovian state models for
molecular dynamics. J. Chem. Phys. 123(20), 204909 (2005)
54. Schutte, C., Noe, F., Lu, J.F., Sarich, M., Vanden-Eijnden, E.: Markov state models based on
milestoning. J. Chem. Phys. 134(20), 204105 (2011). https://doi.org/10.1063/1.3590108
55. Cardenas, A.E., Jas, G.S., DeLeon, K.Y., Hegefeld, W.A., Kuczera, K., Elber, R.: Unassisted
transport of N-Acetyl-L-tryptophanamide through membrane: experiment and simulation of
kinetics. J. Phys. Chem. B 116, 2739–2750 (2012)
56. Lane, T.J., Bowman, G.R., Beauchamp, K., Voelz, V.A., Pande, V.S.: Markov State Model
reveals folding and functional dynamics in ultra-long MD trajectories. J. Am. Chem. Soc. 133,
18413–18419 (2011)
57. Berezhkovskii, A., Hummer, G., Szabo, A.: Reactive flux and folding pathways in network
models of coarse-grained protein dynamics. J. Chem. Phys. 130(20), 205102 (2009). https://
doi.org/10.1063/1.3139063
58. Metzner, P., Schutte, C., Vanden Eijnden, E.: Transition path theory for Markov jump processes.
Multiscale Model. Simul. 7, 1192–1219 (2009)
59. Bowman, G.R., Beauchamp, K., Boxer, G., Pande, V.S.: Progress and challenges in the auto-
mated construction of Markov state models for full protein systems. J. Chem. Phys. 131(12),
124101 (2009)
Part III
Molecular Simulations: Applications
Mechanostability of Virus Capsids
and Their Proteins in Structure-Based
Coarse-Grained Models
Marek Cieplak
1 Introduction
M. Cieplak (B)
Institute of Physics, Polish Academy of Sciences,
Aleja Lotników 32/46, 02-668 Warsaw, Poland
however, that these details are not relevant at the orders-of-magnitude longer
timescales that are probed in the simplified models.
In Sect. 2, we describe the version of the model used here. In Sect. 3 the results
of surveys of mechanostability of thousands of proteins are outlined and types of
mechanical clamps are identified. In the following sections we shall discuss folding
and stretching taking place in proteins that are found in virus capsids and manip-
ulation of complexes involving such proteins. We shall focus here on the role of
single-point mutations.
When discussing specific examples, we shall focus on the cowpea chlorotic mottle
virus (CCMV) as it has been studied experimentally the most. Its capsid is composed
of 60 complexes, known as capsomers, which comprise three sequentially identical
proteins, or chains, known also as subunits. The chains will be denoted as 1CWP:A,
1CWP:B, and 1CWP:C, where 1CWP is the Protein Data Bank (PDB) code of the
complex structure. This complex is shown in Fig. 1. Even though the complex is,
generally, a part of the CCMV capsid, it is likely to exist as a physical entity during the
self-assembly stage of the virus. Here, however, it will serve a didactical purpose as
we shall discuss how to analyse mechanostability of multimeric systems where there
is a much greater variety of possible ways of stretching compared to monomeric
systems. In the last section, we shall discuss several spherical virus capsids and
demonstrate existence of a large variety of their responses to nanoindentation.
This chapter presents a review of the merits of using the coarse-grained structure-
based model and it also shows two new results: (1) Stretching and nanoindentation
Fig. 1 The three chains of capsomer 1CWP that forms the structural unit of the CCMV capsid.
The three pairs of termini are indicated
310 M. Cieplak
are sensitive to single point mutations in the sequences, (2) The strength of the elastic
response of a capsid to indentation is not related to mechanostability of its constitutive
proteins as assessed through stretching.
There are many possible variants of structure-based models as there are many ways
to realize Go’s idea [26] to describe conformational changes of a protein in terms of
its native interactions. The first implementations have been set on a lattice [27–29].
However, dynamics are better defined in a continuum space where Newton’s equation
apply and forces derive from the potentials. We have considered 62 specific molecular
dynamics realizations [30, 31], some of them proposed previously by other authors
[32, 33], and compared them to the experimental data on stretching. We have also
checked their folding properties. Only some of the realizations led to good folding
and were consistent with the stretching data. We have identified four optimal choices.
One of them is the simplest version that does not distinguish between the chemical
identities of the amino acids. The results discussed in this Chapter have been obtained
using this very model.
In this simplest model one assigns the same depth, ε, to the potential wells asso-
ciated with a pair of amino acids that form a native contact. (Relevant attractive
non-native contacts can also be built in, if information about them is available [34,
35]). The contact interactions effectively correspond to hydrogen bonds and ionic
bridges. The disulfide bridges between cysteines are covalent in nature and are rep-
resented by the harmonic potentials. The contact map is determined by checking for
atomic overlaps [22, 36, 37] and if it exists the two amino acids become represented
merely by their Cα atoms (adding the Cβ atoms does lead to improvement in the
workings of the model [24, 30]) that form a potential well. Otherwise there is a soft
core repulsion between the Cα atoms. Alternative schemes to derive contact maps
are discussed in Ref. [38]. The specific choice of the well potential has turned out
to be of a minor importance compared to the proper definition of the contact map
and we usually work with the Lennard-Jones potential. The length parameter in this
potential is chosen so that the location of its minimum agrees with the experimen-
tally determined distance between the Cα atoms – in water. Another way in which
the solvent enters the description is through the velocity dependent (over) damping
term and Langevin noise which is controlled by the temperature, T . Still another is
through the characteristic time scale, τ , which is of order 1 ns instead of 1 ps usually
characterizing all-atom models [39, 40]. The reason is that the motion of the model
Cα atoms in the implicit solvent is diffusive instead of ballistic.
The model has good predictive properties. For instance, our simulations [24] have
predicted large mechanostability of two cellulosome-related cohesin proteins c7A
(the PDB structure code 1aoh) and c1C (the structure code 1g1k) that got confirmed
experimentally [41]. In the case of c7A, the calculated value of Fmax is 470 pN and
measured – 480 pN. Comparisons of this sort are based on calibration of the energy
Mechanostability of Virus Capsids and Their Proteins … 311
Fig. 2 The F − d pattern for γ D-crystallin. Displacements at which two distinct mechanical
clamps are operational are indicated. The two lines correspond to two different trajectories
Fig. 3 Two kinds of mechanical clamps in γ D-crystallin. The left panel shows the tensile clamp
and the right panel—the shearing clamp
Mechanostability of Virus Capsids and Their Proteins … 313
A very different kind of a mechanical clamp has been discovered in the survey of
2008 [42]. It is topological in nature and we have dubbed it the cystine slipknot (see
Fig. 4). It can arise in proteins containing the cystine knot motif [48–51] in the native
state. The motif involves three disulfide bonds. Two of them connect two segments
of the backbone in a way that forms an effective ring made of, typically, eight amino
acids. The third bond connects other segments of the backbone across the ring. On
pulling, this third bond may drag one of these segments across the ring and thus form
a slipknot conformation. The related movement generates an isolated force peak with
high values of Fmax – in the range of 1000 pN. In fact, the 13 top strength in the set
of 17,134 are those which are endowed with the cystine slipknot mechanism. The
workings of this mechanism have been elucidated in all-atom simulations [52] but
experimental verification is still missing.
The 2008 survey [42] has been applied to single chains. If several chains are
listed under the same PDB code, the first one was taken into the considerations.
Thus if a structure code corresponds to a proteinic complex then the value of Fmax
applies to one of its components. In most cases, this yields a reasonable estimate of
mechanostability of the whole complex. Many cystine knot proteins, however, form
dimers which are linked covalently and analyzing their mechanostability requires
more care. For instance, in the case of the placenta growth factor-1 with the struc-
ture coded as 1FZV the monomers are linked by two disulfide bonds. Each of the
monomers contains the cystine knot motif and between zero and two force peaks
related to the formation of the cystine slipknot may arise on stretching, depending
on which termini are used in the process [53, 54]. If the termini in one monomer are
denoted as N and C, and in the other as N and C then one can implement distinct
stretching patterns using pairs N-N , N-C, N-C , and C-C . For instance, if the N and
C termini are involved in stretching, only one slipknot forms, as illustrated in Fig.
4. Once the slipknot arises in the ring which is shown in the lower part of the figure,
the disulfide bond intersecting the upper ring gets aligned in a way that blocks the
second dragging movement in the upper ring [54]. The values of Fmax in the dimeric
cases may be either smaller or bigger compared to Fmax obtained for the single chain,
depending on the protein and the way of pulling. However, whenever a force peak
arises, it comes with a high value of Fmax .
Recently, we have discovered a related version of the cystine slipknot mechanical
clamp: the cystine plug [53]. We have found it only in one protein (human trans-
forming growth factor – β2 with the PDB code of 1TFG). It involves dragging of an
N-terminal ring of 10 residues through the ring of the cystine knot. The NB-terminal
ring is closed by still another disulfide bond. The corresponding Fmax could be of
order 1500 pN.
moment, it is not clear whether the folding behavior of 1CWP:A also characterises
other capsidic proteins.
Figure 6 represents what we refer to as a scenario diagram. It shows time order
in which various native contacts are established for the first time on the average as
determined at T = 0.3 ε/k B . Note that the folding time is defined through all native
contacts being established simultaneously so the scenario diagrams are focused on
the early stages of collapse to the globular form. The contacts are labeled by their
sequential distance | j − i|. This labeling system does not identify a contact uniquely,
as several contacts may be between pairs of sites separated by the same sequential
distance. However, it indicates the role of this distance in the folding process. There
is a fairly monotonic average dependence, meaning that residues which are close
by sequentially tend to be established earlier than those which are sequentially far
apart. This tendency has been encapsulated by introduction of the relative contact
order parameter, CO, [56, 57] which is argued to correlate well with the experimental
folding times. However, we observe many deviations from the average dependence in
our model and, in particular, the longest ranged contact (between sites 49 and 179) is
first established around 3500 τ whereas the last contact (between sites 56 and 172) is
first established around 4800 τ . In other words, closing the formation of the globular
structure need not involve regions which are most distant sequentially, even though
the initial stages are dominated by formation of the short range contacts. There are
many examples of such deviations in our simulations and some of them are discussed
fully in Ref. [34]. Even though our model is based on the native geometry, we do not
observe t f old to depend on the geometrically conceived parameter CO [22].
316 M. Cieplak
We now consider stretching of chains A and B (chain C behaves very much like
chain B). The F − d plots at the pulling speed of 0.005 Å/τ are shown in the top
panels of Figs. 7 and 8. The corresponding scenario diagram for unfolding for chain
A at ε/k B = 0.3 is shown in Fig. 9. The diagram indicates pulling distances (which
are proportional to the duration of pulling) at which a given contact breaks down
for good (initially the distance between the residues involved may be crossing a
cutoff distance multiply due to thermal fluctuations). The unfolding diagram has
some reverse properties relative to the folding diagram in the sense that long ranged
contacts tend to be unraveled in the initial stages and short ranged contacts – in
the later stages of the process. However, there is an important difference: there is
a significantly more pronounced discretization as a function of time (or pulling
distance) as various groups of contacts get ruptured around common values of d.
These aggregations of rupture events correspond to emergence of force peaks.
Some of the rupture events are significant dynamically and some are just necessary
byproducts of the significant events. Therefore, a given group of contacts that are
torn around a value of d involves contacts of various sequential contact ranges.
The significant rupture events define the corresponding mechanical clamps. One
can test the level of significance of a group of contact by removing them from the
contact map and by checking the effect of this action on the height of the force
peak [21, 24]: a substantial decrease indicates a major contribution of these contacts
to mechanostability. The groups of such important contacts are indicated in Fig. 9.
Mechanostability of Virus Capsids and Their Proteins … 317
The first force peak is due to shearing between two antiparallel strands A (residues
50–60) and G (residues 166–178). The second force peak is due to shear between
antiparallel strands B (67–70) and F (154–160) as well between C (88–99) and G.
The third force peaks is between antiparallel strands C and E (136–139) as well as
between antiparallel strands D (105–111) and F. The final smaller peak is due to
shear between antiparallel strands D and E. The second force peak is the largest
and the corresponding Fmax is equal to 1.75 ± 0.1 ε/Å. A similar F − d pattern is
observed for chains B and C with Fmax = 1.6 ± 0.1 ε/Å. The values of Fmax are listed
in Table 1.
It is interesting to consider what is the sensitivity of the F − d patterns to single
point mutations. A structure coded 1ZA7 corresponds to a K42R mutation (at site
42 lysin is replaced by arginine, both positively charged) on 1CWP. In chain A, the
mutation is implemented on the first residue (as counted from the N terminus) for
which the structure is available. The known structure for chain B starts at residue
27. The bottom panels in Figs. 7 and 8 show the F − d patterns for chains A and
B in 1ZA7 respectively. The patterns for the mutant chains look similar to those for
the wilde type chains. However, the force peaks are taller. The values of Fmax are
2.0 ± 0.1 ε/Å and 1.9 ± 0.1 ε/Å for chains A and B respectively – a shift of about
0.3 ε/Å compared to the wilde type chains. The differences grow bigger on decreasing
the temperature. In particular, we show the F − d curves at T = 0, i.e. when all
thermal fluctuations are ignored. The curves are different not only in terms of the
peak heights but also in terms of the details in the patterns. We have observed a similar
sensitivity to mutations for the T4 lysozymes [58]. The wild type of the lysozyme
Mechanostability of Virus Capsids and Their Proteins … 319
Table 1 Characteristics of selected T1, T3, and T3p virus capsids that are discussed in this chapter.
The first column shows the acronym used, the second—the PDB structure code, the third—the
common name together with the symmetry type, the forth—the number of Cα atoms describing
the model capsid, R̄ denotes the average radius of the capsid, The next three columns give results
obtained through the molecular dynamics simulations at k B T = 0.3ε. k is the spring constant and
Fc – characteristic force associated with the capsid. The last column gives the values of Fmax
obtained for individual chains in the corresponding capsomer
Acronym PDB Name and symmetry N R̄ [Å] k [ε/Å 2 ] Fc [ε/Å] Fmax,i
[ε/Å]
MVM 1MVM Parvovirus minute virus 32,940 110.54 0.217 8.7 2.2
of mice T1
FPV 1C8E Feline panleukopenia 32,040 109.69 0.280 13 2.7
virus T1
SPMV 1STM Satellite panicum mosaic 8460 69.55 0.174 11 –
virus T1
CCMV 1CWP Cowpeak chlorotic mottle 28,620 119.56 0.050 5.5 1.75, 1.6
virus T3 1.6
1ZA7 1ZA7 K42R mutant of CCMV 28,860 118.41 0.050 6.7 2.0, 1.9,
T3 1.9
NV 1IHM Norwalk virus T3 89,700 159.62 0.190 12 1.9, 1.8,
1.6
CPMV 1NY7 Cowpea mosaic virus 33,480 124.29 0.500 15 –
T3p
HRV 1AYN Human rhinovirus 16 48,240 131.60 0.443 32 1.5, 2.1,
T3p 1.6
has the structure denoted by 102L and the mutant – 1B6I. In the mutant, threonine
and lysine in locations 21 nad 124 are both replaced by cysteins. The experimental
studies on stretching of this mutant are described in Ref. [59]. The sensitivity of
the F − d patterns to mutations decreases with a growing T as thermal fluctuations
become increasingly important compared to the terms in the potentials.
We now consider the three-protein capsomeric complex shown in Fig. 1. The com-
plex is connected through interchain contacts. Even though the complex also forms
contacts with neighboring capsomers in the CCMV capsid, it is instructive to con-
sider stretching by various combinations of of pairs of the six termini. The termini
will be denoted by N and C for the first chain, N and C for the second chain, and
N and C for the third chain. The F − d curves corresponding to the various ways
of pulling are shown in Fig. 10 and the values of Fmax are summarized in Fig. 11.
320 M. Cieplak
The modes of pulling can be divided into “diagonal” and “off-diagonal”. The
former refer to a situation in which pulling is implemented by attaching to the termini
of a single chain. The latter - in which the termini belong to different chains. The
diagonal F − d curves look qualitatively similar to those of the isolated chains.
However, the force peaks are higher due to additional stabilization provided by other
chains in the complex. For chains A and B the increase in Fmax is just by 0.1 ε/Å,
but for chain C – by 0.4 ε/Å so that Fmax is equal to 2 ε/Å. The off-diagonal stability
is weaker: the corresponding values of Fmax vary between 0.45 and 1 ε/Å.
For other complexes, the off-diagonal values of Fmax may be larger than the
diagonal ones. This happens, for instance, in some dimers containing the cystine
knots [54] and in the 3D domain-swapped amyloide-prone cystatin C [60]. We have
predicted [44] that this dimer should be able to withstand mechanical stress of about
7 ε/Å or 770 pN if stretched using termini N and N compared to 4.4 ε/Å when
using termini N and C. These values are listed in Fig. 12. This system would thus
provide one of the strongest known shear-based mechanical clamp. The reason for
this behavior is that the two cystatine chains are intertwined in a way in which two
long β-strands of one chain are parallel to two long β-strand of another chain. These
arrangement generates many inter-chain contacts which require a big force to be
sheared if pulled by N and N . For the N-C pulling, shearing involves a smaller
number of contacts between intrachain strands.
We have found [53] a behavior similar to that of the cystatine in a bacterial dimeric
protein with the PDB code of 2B1Y. When pulled along the C-C direction, Fmax is
close to 9 ε/Å, but along N-C, merely 1.5 ε/Å. This protein would then exhibit an
even stronger mechanostability than cystatin provided stretching is performed along
the C-C direction.
Virus capsids are proteinic shells that protect strands, often quite short, of nucleic
acids. The volumes of these shells can be estimated by a novel algorithm presented in
Ref. [61]. A class of capsids are quasispherical and have icosahedral symmetry. Their
structures have been explained in terms of the Caspar and Klug sphere triangulation
theory [62]. Symmetries of possible structures are enumerated by the triangulation
number Tk (the subscript in the symbol is meant to distinguish this number from
the symbol used for temperature). In simple cases, Tk coincides with the number,
n, of chains in a capsomer where n = 1,2,3, etc. If this happens then the number of
proteins in the whole capsid is equal to 60Tk . If Tk is 1, then the 60 proteins form
12 pentameric units. If Tk is larger than 1 then the 12 pentamers are embedded in a
matrix of 10 (Tk -1) hexamers. The short hand notation for such capsids here is T1,
T2, T3, etc. Some capsids are called Tk -pseudo capsids when the number of chains
in a capsomer is larger than Tk but the additional chains act as physical extensions of
the nominal number of chains or if the chains are not identical sequentially. CPMV
(cowpea mosiac virus) is an example of a T3p capsid in which a protein is shared by
two capsomers.
The mechanostability of capsids has been studied through nanoindentation [63].
The method has been applied to less than 10 capsids, including CCMV [64, 65]
and MVM (parvovirus minute virus of mice) [66, 67]. The latter is a T1 capsid. We
have applied the coarse grained model described here to 35 empty capsids [68, 69]
for which the full native structure is known and deposited in the VIPERdb database
[70]. The nanoindentation has been implemented by placing a capsid between two
flat repulsive planes and by reducing their separation, s, at a constant rate of 0.005
Å/τ which is equal to the pulling speed used in our theoretical stretching studies.
(Introducing curvature to the squeezing objects, such as the tip of the AFM, yields
similar results [69].) Fig. 13 shows two trajectories corresponding to T = 0.3 ε/k B
for CCMV and two trajectories for its mutant 1ZA7. Both structures have the same
initial elasticity as defined by the slope of the F(s) curve at the largest values of s.
However, their yield point forces, Fc , at which the F(s) curves dip down are distinct:
they differ by 1.7 ε/Å as summarised in Table 1. At the yield point, the quasispherical
structure collapses into a pancake-like object. The collapse is irreversible within short
time scales and retraction of the planes does not retrace the curve [68]. A schematic
representation of a squeezed conformation of CCMV just past the yield point is shown
in Fig. 14 where it is compared to a similar representation of the native state. The
squeezing process is seen to affect primarily the regions near the indenting planes,
as discussed further in Ref. [68]. The retracing on retraction does take place in the
initial elastic regime. The retracing is approximate due to the presence of thermal
fluctuations. The mutation is seen to affect only the later stages of nanoindentation,
but its effect should be observable experimentally.
The behavior of the F(s) curve is consistent with the experimental value of Fc
and the effective spring constant is smaller by the factor of 3 [68] because of an
“emptier” representation of the structure – a residue is represented just by its Cα
Mechanostability of Virus Capsids and Their Proteins … 323
Fig. 14 A coarse grained representation of the CCMV capsid in the native state (left panel) and
when the separation between the squeezing planes is equal to 164 Å. The planes are not shown but
they are placed one above and another below the capsid. The figure shows two panels taken from
Fig. 11 in Ref. [68] which also shows four additional stages in the indentation process
atom. It is also consistent with the continuum shell-like model [71, 72]. However,
the strain field in the molecular model is different [68]. In particular, the molecular
model predicts no bulging out of the capsid at the “equator”, i.e. half-way between
the squeezing planes.
324 M. Cieplak
We now consider how proteins combine into virus capsids. This problem, so far, has
been studied by using models involving some rigid objects, typically full capsomers,
with some creatively invented directional couplings that could bind them [73–80].
None of these models considers the capsid as being made explicitly of proteins –
proteins that keep changing their shape and are endowed with intra- and inter-protein
interactions. Currently, only the all-atom models take the protein perspective into
account, but they have never been used in the context of aggregation. The structure-
based model of proteins we have described here is probably the simplest system that
allows for studies of the capsid disassembly and reassembly at the molecular level
and by the methods of molecular dynamics instead of Monte Carlo usually associated
with the rigid objects.
We have initiated this program of research for single capsids of SPMV and CCMV
[81]. We have considered two cases: the empty capsids and with the molecules of
RNA inside. In our approach, a capsid is dissociated by an application of a high
temperature for a variable period and then encouraged to reassembly by restoring
the room temperature. The reassembly of the capsid proceeds to various extent,
depending on the nature of the dissociated state, but is rarely complete because there
is misfolding and, in addition, some proteins depart too far unless the process takes
place in a confined space.
Figure 16 illustrates the reassembly process in an open space for two starting
denatured states of the empty CCMV. A fuller discussion of the process, for various
starting conformations, can be found in Ref. [81]. Further studies should allow for
a number of capsids (not just one). In addition, the space should be constrained so
that one is able to observe more completely assembled structures.
In this chapter, we have explained the workings of the coarse-grained model of
proteins based on the knowledge of their native structures. The model may provide
a first description of a system of interest that allows for identification of its most
important features. The model may then serve as a scaffolding for more elaborate
approaches. We have focused on proteins that are parts of virus capsids and showed
that mutations in these proteins would yield different patterns of the stretching curves.
The values of Fmax of the capsidic proteins are seen not to be correlated with the
strength of resistance to nanoindenation of the capsids.
The structure-based model can be empirically generalized to consider the behavior
of proteins under the conditions of the solvent flow [40] or at the air-water and oil-
water interfaces [82, 83]. The former requires adding a flow-related term to the
drag force. Inclusion of the hydrodynamic interactions requires adding the diffusion
tensor to the equations of motion as done in Ref. [55] that shows that the interactions
accelerate folding. Studying proteins at the interfaces involves adding interface-
related forces that couple to the hydropathy indices of residues. These forces deform
the proteins and pin them to the interface. One application of this approach is to
explain stabilization of the foam in beer [83]: the barley protein LTP1 and its isoform
LTP1b, that contains a ligand, provide a coating of the bubbles.
326 M. Cieplak
Fig. 16 Examples of the empty CCMV capsid assembly after thermal denaturation at temperature
0.94 ε/k B . The top-left structure resulted from denaturation lasting for 2000 τ . 69% of the inter-
protein contacts are disrupted in this structure. The bottom-left structure was obtained through
denaturation lasting for 4000 τ which disrupted 89% of the inter-protein contacts. The corresponding
structures on the right are obtained by a subsequent evolution of 8000 τ at the room temperature.
In the state shown in the upper-right panel, 3% of the inter-protein contacts are disrupted; in the
lower-right—29%
References
1. Neuman, K.C., Nagy, A.: Single-molecule force spectroscopy: optical tweezers, magnetic
tweezers and atomic force microscopy. Nat. Methods 5, 491–505 (2008)
2. Weiss, S.: Fluorescence spectroscopy of single biomolecules. Science 283, 1676–1683 (1999)
Mechanostability of Virus Capsids and Their Proteins … 327
3. Schuler, B., Lipman, E.A., Eaton, W.A.: Probing the free-energy surface for protein folding
with single-molecule fluorescence spectroscopy. Nature 419, 743–747 (2002)
4. Yang, H., Luo, G.B., Karnchanaphanurach, P., Louie, T.M., Rech, I., Cova, S., Xun, L.Y., Xie,
X.S.: Protein conformational dynamics probed by single-molecule electron transfer. Science
302, 262–266 (2003)
5. Borgia, M.B., Borgia, A., Best, R.B., Steward, A., Nettels, D., Wunderlich, B., Schuler, B.,
Clarke, J.: Single-molecule fluorescence reveals sequence-specific misfolding in multidomain
proteins. Nature 474, 662–665 (2011)
6. Carrion-Vasquez, M., Oberhauser, A.F., Fowler, S.B., Marszalek, P.E., Broedel, P.E.: Mechan-
ical and chemical unfolding of a single protein: a comparison. Proc. Natl. Acad. Sci. USA 96,
3694–3699 (1999)
7. Fernandez, J.M., Li, H.B.: Force-clamp spectroscopy monitors the folding trajectory of a single
protein. Science 303, 1674–1678 (2004)
8. Cecconi, C., Shank, E.A., Bustamante, C., Marqusee, S.: Direct observation of the three-state
folding of a single protein molecule. Science 309, 2057–2060 (2005)
9. Carrion-Vazquez, M., Cieplak, M., Oberhauser, A.F.: Protein mechanics at the single-molecule
level. In: Meyers R.A. (ed.) Encyclopedia of Complexity and Systems Science, pp. 7026–7050.
Springer, New York (2009)
10. Crampton, N., Brockwell, D.J.: Unravelling the design principles for single protein mechanical
strength. Curr. Opin. Struct. Biol. 20, 508–517 (2010)
11. Del Rio, A., Perez-Jimenez, R., Liu, R.C., Roca-Cusachs, P., Fernandez, J.M., Sheetz, M.P.:
Stretching single talin rod molecules activates vinculin binding. Science 323, 638–641 (2009)
12. Vogel, V.: Mechanotransduction involving multimodular proteins: converting force into bio-
chemical signals. Annu. Rev. Biophys. Biomol. Struct. 35, 459–488 (2006)
13. Hervas, R., Oroz, J., Galera-Prat, A., Goni, O., Valbuena, A., Vera, A.M., Gomez-Socilia, A.,
Losada-Urzaiz, F., Uversky, V.N., Menendez, M., Laurents, D.V., Bruix, M., Carrion-Vazquez,
M.: Common features at the start of the neurodegeneration cascade. PLoS Biol. 10, e1001335
(2012)
14. Rief, M., Gautel, M., Oesterhelt, F., Fernandez, J.M., Gaub, H.E.: Reversible unfolding of
individual titin immunoglobulin domains by AFM. Science 276, 1109–1112 (1997)
15. Improta, S., Politou, A.S., Pastore. A.: Immunoglobulin-like modules from titin I-band: exten-
sible components of muscle elasticity. Struct. 4, 323–337 (1996)
16. Marszalek, P.E., Lu, H., Li, H.B., Carrion-Vazquez, M., Oberhauser, A.F., Schulten, K., Fernan-
dez, J.M.: Mechanical unfolding intermediates in titin modules. Nature 402, 100–103 (1999)
17. Lu, H., Schulten, K.: Steered molecular dynamics simulation of conformational changes of
immunoglobulin domain I27 interprete atomic force microscopy observations. Chem. Phys.
247, 141–153 (1999)
18. Paci, E., Karplus, M.: Unfolding proteins by external forces and temperature: the importance
of topology and energetics. Proc. Natl. Acad. Sci. USA 97, 6521–6526 (2000)
19. Bockelmann, U., Essevaz-Roulet, B., Heslot, F.: Molecular stick-slip motion revealed by open-
ing DNA with piconewton forces. Phys. Rev. Lett. 79, 4489–4492 (1997)
20. Hoang, T.X., Cieplak, M.: Molecular dynamics of folding of secondary structures in Go-like
models of proteins. J. Chem. Phys. 112, 6851–6862 (2000)
21. Cieplak, M., Hoang, T.X., Robbins, M.O.: Folding and stretching in a Go-like model of titin,
proteins: function. Struct. Genet. 49, 114–124 (2002)
22. Cieplak, M., Hoang, T.X.: Universality classes in folding times of proteins. Biophys. J. 84,
475–488 (2003)
23. Cieplak, M., Hoang, T.X., Robbins, M.O.: Thermal effects in stretching of Go-like models of
titin and secondary structures. Proteins: Struct. Funct. Bio. 56, 285–297 (2004)
24. Sułkowska, J.I., Cieplak, M.: Mechanical stretching of proteins—a theoretical survey of the
Protein Data Bank. J. Phys.: Cond. Mat. 19, 283201 (2007)
25. Yang, L.J., Tan, C.H., Hsieh, M.J., Wang, J.M., Duan, Y., Cieplak, P., Caldwell, J., Kollman,
P.A., Luo, R.: New-generation amber united-atom force field. J. Phys. Chem. B 110, 13166–
13176 (2006)
328 M. Cieplak
26. Go, N.: Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. 12, 183–210 (1983)
27. Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in
globular proteins. II. Application to two-dimensional lattice proteins. Biopolymers 20, 1013–
1031 (1981)
28. Sali, A., Shakhnovich, E., Karplus, M.: How does a protein fold. Nature 369, 248–251 (1994)
29. Shrivastava, I., Vishveshwara, S., Cieplak, M., Maritan, A., Banavar, J.R.: Lattice model for
rapidly folding protein-like heteropolymers. Proc. Natl. Acad. Sci. USA 92, 9206–9209 (1995)
30. Sułkowska, J.I., Cieplak, M.: Selection of optimal variants of Go-like models of proteins
through studies of stretching. Biophys. J. 95, 3174–3191 (2008)
31. Cieplak, M., Sułkowska, J.I.: Structure-based models of biomolecules: stretchnig of proteins,
dynamics of knots, hydrodynamic effects, and indentation of virus capsids. In: Koliński, A.
(ed.) Chapter 8 in Multiscale Approaches to Protein Modeling: Structure Prediction, Dynamics,
Thermodynamics and Macromolecular Assemblies, pp. 179–208. Springer, New York (2010)
32. Clementi, C., Nymeyer, H., Onuchic, J.N.: Topological and energetic factors: what determines
the structural details of the transition state ensemble and "en-route" intermediates for protein
folding? An investigation for small globular proteins. J. Mol. Biol. 298, 937–953 (2000)
33. Karanicolas, J., Brooks III, C.L.: The origins of asymmetry in the folding transition states of
protein L and protein G. Protein Sci. 11, 2351–2361 (2002)
34. Cieplak, M.: Cooperativity and contact order in protein folding. Phys. Rev. E 69, 031907 (2004)
35. Wallin, S., Zeldovich, K.B., Shakhnovich, E.I.: Folding mechanics of a knotted protein. J. Mol.
Biol. 368, 884–893 (2007)
36. Tsai, J., Taylor, R., Chothia, C., Gerstein, M.: The packing density in proteins: Standard radii
and volumes. J. Mol. Biol. 290, 253–266 (1999)
37. Settanni, G., Hoang, T.X., Micheletti, C., Maritan, A.: Folding pathways of prion and doppel.
Biophys. J. 83, 3533–3541 (2002)
38. Wołek, K., Gómez-Sicilia, Á., Cieplak, M.: Determination of contact maps in proteins: a
combination of structural and chemical approaches. J. Chem. Phys. 143, 243105 (2015)
39. Veitshans, T., Klimov, D., Thirumalai, D.: Protein folding kinetics: timescales, pathways and
energy landscapes in terms of sequence dependent properties. Fold. Des. 2, 1–22 (1997)
40. Szymczak, P., Cieplak, M.: Stretching of proteins in a uniform flow. J. Chem. Phys. 125, 164903
(2006)
41. Valbuena, A., Oroz, J., Hervas, R., Vera, A.M., Rodriguez, D., Menendez, M., Sułkowska, J.I.,
Cieplak, M., Carrion-Vazquez, M.: On the remarkable mechanostability of scaffoldins and the
mechanical clamp motif. Proc. Natl. Acad. Sci. USA 106, 13791–13796 (2009)
42. Sikora, M., Sułkowska, J.I., Cieplak, M.: Mechanical strength of 17 132 model proteins and
cysteine slipknots. PloS Comp. Biol. 5, e1000547 (2008)
43. Wołek, K., Cieplak, M.: Criteria for folding in structure-based models of proteins. J. Chem.
Phys. 144, 185102 (2016)
44. Sikora, M., Cieplak, M.: Mechanical stability of multidomain proteins and novel mechanical
clamps. Proteins: Struct. Funct. Bioinf. 79, 1786–1799 (2011)
45. Sikora, M., Sułkowska, J.I., Witkowski, B.S., Cieplak, M.: BSDB: the biomolecule stretching
database. Nucl. Acid. Res. 39, D443–D450 (2011)
46. Chen, J., Callis, P.R., King, J.: Mechanism of the very efficient quenching of tryptophan fluo-
rescence in human γ D- and γ S-crystallins: the γ -crystallin fold may have evolved to protect
tryptophan resdidues from ultraviolet photodamage. Biochemistry 48, 3708–3716 (2009)
47. Flaugh, S.L., Kosinski-Collins, M.S., King, J.: Interdomain side-chain interactions in human
γ D-crystallin influencing folding and stability. Prot. Sci. 14, 2030–2043 (2005)
48. McDonald, N.Q., Lapatto, R., Murray-Rust, J., Gunning, J., Wlodawer, A., Blundell, T.L.: New
protein fold revealed by a 2.3-A resolution crystal structure of nerve growth factor. Nature 354,
411414 (1991)
49. Murray-Rust, J., McDonald, N.Q., Blundell, T.L., Hosang, M., Oefner, C., Winkler, F., Brad-
shaw, R.A.: Topological similarities in TGF-beta 2, PDGF-BB and NGF define a superfamily
of polypeptide growth factors. Structure 1, 153–159 (1993)
Mechanostability of Virus Capsids and Their Proteins … 329
50. Sun, P.D., Davies, D.R.: The cystine-knot growth-factor superfamily. Annu. Rev. Biophys.
Biomol. Struct. 24, 269–291 (1995)
51. Iyer, S., Acharya, K.R.: The cystine signature and molecular-recognition processes of the
vascular endothelial growth factor family of angiogenic cytokines. FEBS J. 278, 4304–4322
(2011)
52. Peplowski, L., Sikora, M., Nowak, W., Cieplak, M.: Molecular jamming—the cysteine slipknot
mechanical clamp in all-atom simulations. J. Chem. Phys. 134, 085102 (2011)
53. Sikora, M., Cieplak, M.: Cystine plug and other novel mechanisms of large mechanical stability
in dimeric proteins. Phys. Rev. Lett. 109, 208101 (2012)
54. Sikora, M., Cieplak, M.: Formation of cystine slipknots in dimeric proteins. PLoS ONE 8,
e57443 (2013)
55. Niewieczerzał, S., Cieplak, M.: Hydrodynamic interactions in protein folding. J. Chem. Phys.
21, 124905 (2009)
56. Plaxco, K.W., Simons, K.T., Baker, D.: Contact order, transition state placement and the refold-
ing rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998)
57. Plaxco, K.W., Simons, K.T., Ruczinski, I., Baker, D.: Topology, stability, sequence, and length:
defining the determinants of two-state protein folding kinetics. Biochemistry 39, 11177–11183
(2000)
58. Cieplak, M., Hoang, T.X., Robbins, M.O.: Stretching of proteins in the entropic limit. Phys.
Rev. E 69, 011912 (2004)
59. Yang, G., Cecconi, C., Baase, W.A., Vetter, I.R., Breyer, W.A., Haack, J.A., Matthews, B.W.,
Dahlquist, F.W., Bustamante, C.: Solid-state synthesis and mechanical unfolding of polymers
of T4 lysozyme. Proc. Natl. Acad. Sci. USA 97, 139–144 (2000)
60. Janowski, R., Kozak, M., Jankowska, E., Grzonka, Z., Grubb, A., Abrahamson, M., Jaskólski,
M.: Human cystatin C, an amyloidogenic protein dimerizes through three-dimensional domain
swapping. Nature Struct. Biol. 8, 316–320 (2001)
61. Chwastyk, M., Jaskólski, M., Cieplak, M.: The volume of cavities in proteins and virus capsids.
Proteins 84, 1275–1286 (2016)
62. Caspar, D., Klug, A.: Physical principles in the construction of regular viruses. Cold Spring
Harbor Symp. Quant. Biol. 27, 1–24 (1962)
63. Roos, W.H., Bruisma, R., Wuite, G.J.L.: Physical virology. Nat. Phys. 6, 733–743 (2010)
64. Michel, J.P., Ivanovska, I.L., Gibbons, M.M., Klug, W.S., Knobler, C.M., Wuite, G.J.L.,
Schmidt, C.F.: Nanoindentation studies of full and empty viral capsids and the effects of cap-
sid protein mutations on elasticity and strength. Proc. Natl. Acad. Sci. USA 103, 6184–6189
(2006)
65. Klug, W.S., Bruinsma, R.F., Michel, J.-P., Knobler, C.M., Ivanovska, I.L., Schmidt, C.F., Wuite,
G.J.L.: Failure of viral shells. Phys. Rev. Lett. 97, 228101 (2006)
66. Carrasco, C., Carreira, A., Schaap, I.A.T., Serena, P.A., Gomez-Herrero, J., Mateu, M.G., de
Pablo, P.J.: DNA-mediated anisotropic mechanical reinforcement of a virus. Proc. Natl. Acad.
Sci. USA 103, 13706–13711 (2006)
67. Carrasco, C., Castellanos, M., de Pablo, P.J., Mateu, M.G.: Manipulation of the mechanical
properties of a virus by protein engineering. Proc. Natl. Acad. Sci. USA 105, 4150–4155 (2008)
68. Cieplak, M., Robbins, M.O.: Nanoindentation of virus capsids in a molecular model. J. Chem.
Phys. 132, 015101 (2010)
69. Cieplak, M., Robbins, M.O.: Nnaoindentation of 35 virus capsids in a molecular model: relating
mechanical properties to structure. PLoS ONE 8, e63640 (2013)
70. Carrillo-Tripp, M., Shepherd, C.M., Borelli, I.A., Venkataraman, S., Lander, G., Natarajan, P.,
Johnson, J.E., Brooks III, C.L., Reddy, V.S.: VIPERdb2: and enhanced and web API enabled
relational database for structural virology. Nucl. Acids Res. 37, D436–D442 (2009). http://
viperdb.scripps.edu/
71. Gibbons, M.M., Klug, W.S.: Nonlinear finite-element analysis of nanoindentation of viral
capsids. Phys. Rev. E 75, 031901 (2007)
72. Gibbons, M.M., Klug, W.S.: Influence of nonuniform geometry on nanoindentation of viral
capsids. Biophys. J. 95, 3640–3649 (2008)
330 M. Cieplak
73. Endres, D., Zlotnick, A.: Model-based analysis of assembly kinetics for virus capsids or other
spherical polymers Biophys. J. 83, 1217–1230 (2002)
74. Wales, D.J.: The energy landscape as a unifying theme in molecular science. Phil. Trans. R.
Soc. 363, 357–377 (2005)
75. Johnston, I.G., Louis, A.A., Doye, J.P.K.: Modelling the self-assembly of virus capsids. J.
Phys.: Cond. Matter 22, 104101 (2010)
76. Elrad, O.M., Hagan, M.F.: Mechanisms of size control and polymorphism in viral capsid
assembly. Nano Lett. 8, 3850–3857 (2008)
77. Elrad, O.M., Hagan, M.F.: Encapsulation of a polumer by an icosahedral virus. Phys. Biol. 7,
045003 (2010)
78. Rapaport, D.C.: Role of reversibility in viral capsid growth: a paradigm for self-assembly. Phys.
Rev. Lett. 101, 186101 (2008)
79. Zlotnick, A., Porterfield, J.Z., Wang, J.C.-Y.: To build a virus on a nucleic acid substrate.
Biophys. J. 104, 1595–1604 (2013)
80. Garmann, R.F., Comas-Garcia, M., Gopal, A., Knobler, C.M., Gelbart, W.M.: The assembly
pathway of an icosahedral single-stranded RNA virus depends on the strength of inter-subunit
attractions. J. Mol. Biol. 426, 1050–1060 (2014)
81. Wołek, K., Cieplak, M.: Self-assembly of model proteins into virus capsids. J. Phys. Cond.
Matter 47, 474003 (2017)
82. Cieplak, M., Allen, D.B., Leheny, R.L., Reich, D.H.: Proteins at air-water interfaces: a coarse-
grained approach. Langmuir 30, 12888–96 (2014)
83. Zhao, Y., Cieplak, M.: Structural changes in barley protein LTP1 isoforms at air-water inter-
faces. Langmuir 33, 4769–4780 (2017)
Computer Modelling of the Lipid Matrix
of Biomembranes
Abstract The best recognised functions of biomembranes are to separate and pro-
tect the cell or the organelle from the environment and to enable communication
and transport between their interior and exterior. The main structural element of any
biomembrane is its lipid matrix, which, in most cases, is a lipid bilayer. Lipid matrix
is a supramolecular dynamic structure where molecules undergo a broad range of
motions. Such structures are difficult to study experimentally; in contrast, classi-
cal molecular modelling methods are well suited for this purpose. In this chapter
we present computational approaches based on classical molecular modelling with
atomic resolution to study lipid bilayers and their limitations, the studied bilayer
models and the results obtained using these methods. The necessity of model vali-
dation is stressed.
Fig. 1 Schematic picture of animal cell membranes. The plasma membrane and internal membranes
are indicated
biomembrane. The matrix determines the bulk membrane properties and provides a
proper dynamic and active milieu for membrane proteins such that they can perform
their biological functions, among which are the inter-compartmental communication
and controlled transport of various types of molecules. The matrix also constitutes a
protective barrier that prohibits uncontrolled flow of larger-size polar molecules and
ions from or to the cytoplasm, although small-size molecules such as oxygen and
carbon dioxide and to a smaller extent water readily diffuse through membranes. The
integrity of the lipid matrix is assured by weak intermolecular interactions, mainly
hydrogen bonding, dispersion and electrostatic interactions.
In most cases, the lipid matrix is a phospholipid bilayer whose molecular com-
position varies among cell types within the same organism and depends on the cell
function [216]. The composition may change with time and environmental factors,
but it is strictly controlled [135]. Usually, changes in the lipid composition would
result in alteration of the physical properties of the membrane, which would then
affect the function of proteins immersed in the lipid bilayer [42]. A general feature
of the matrix is not only heterogeneity with respect to the lipid composition but also
with respect to the lateral distribution of the lipids. Cholesterol, which is a natural
component of the animal cell plasma membrane, enhances inhomogeneous lateral
distribution of membrane lipids by stimulating the formation of transient membrane
domains enriched in cholesterol. Moreover, cholesterol locally modulates physical
Computer Modelling of the Lipid Matrix of Biomembranes 333
properties of the bilayer. Both are crucial for the biological activity of membrane
proteins and peptides, which depends on the lipid composition and physical state of
their local surroundings (domain) in the membrane.
A lipid bilayer is a supramolecular soft liquid-crystalline material of certain struc-
tural features and physical properties that are key to the biological functions of
biomembranes. Bilayer properties follow directly from the structural characteristics
of lipids, the main bilayer building-blocks, and of water. Lipid molecules are amphi-
pathic and in water spontaneously form bilayers or other ordered aggregates. This
chapter is devoted to the computer modelling of lipid bilayers predominantly com-
posed of phospholipids, mainly phosphatidylcholine (PC) (Fig. 2), and of cholesterol
(Chol) (Figs. 2 and 3), which model the lipid matrices of animal cell membranes using
molecular modelling methodology with atomic resolution. Excellent reviews of both
the earlier stages of the computer modelling of biologically relevant lipid systems
and of the later stages are in the Refs. [10, 47, 187, 206] and in the Refs. [104, 113,
217], respectively.
Fig. 2 Chemical structures of the main fragments of commonly occurring phospholipids and
cholesterol. On the left-hand side are phosphatidylcholine (PC), phosphatidylethanolamine (PE),
phosphatidylserine (PS), phosphatidylglycerol (PG) heads; in the middle are glycerol (GLY) and
sphingosine (SPH) skeleton; on the right-hand side are myristoyl (M), palmitoyl (P), stearoyl (S),
oleoyl (O) acyl chains. The atoms in the PC head, glycerol skeleton, and myristoyl chain have been
numbered in accordance with Sundaralingam [200]. At the bottom there are monogalactosyldiacyl-
glycerol (MGDG) head and cholesterol (Chol) with atoms numbered in accordance with the IUPAC
convention. The chemical symbols for carbon atoms, C, and hydrogen atoms in the CH3 , CH2 and
CH groups have been omitted
Computer Modelling of the Lipid Matrix of Biomembranes 335
Fig. 3 A space-filling representation of the cholesterol molecule. The smooth α-face (Alpha) and
rough β-face (Beta) of cholesterol are apparent
Fig. 4 Examples of various conformations of a PC molecule. The molecules were arbitrarily chosen
from a liquid-crystalline POPC bilayer simulated for 70 ns [111]. PC molecules are in the united
atom representation and atoms are represented in standard colours
However, due to the existence of distinct horizontal regions within the bilayer [10]
of contrasting properties (water phase, interfacial region, hydrophobic core) (Fig. 5),
conformational disorder of phospholipid acyl chains (Fig. 4) and motional freedom of
lipid molecules, even model membranes, create experimental difficulties. In effect,
experimental methods provide detailed information on global bilayer parameters
such as the membrane width and average surface area per lipid e.g. [95, 132, 133,
225], the thickness of the hydration shell e.g. [132, 133, 165], the phase state e.g.
[96, 103] etc. However, they only provide averaged conformational and motional
characteristics of bilayer lipids, where the averaging is strictly related to the time
window of the experimental method used e.g., [118].
As has been already stressed, the main characteristic of a lipid bilayer is the
dynamics of the constituting lipid molecules. A single molecule contributes to the
global properties of the bilayer but its actual conformational state does not have
much significance as it changes over a short time scale. Nevertheless, to understand
the supramolecular, extended, integral, and flexible structure of a lipid bilayer, the
details of the dynamical behaviour of individual lipid molecules in the bilayer must
be well recognised.
336 M. Pasenkiewicz-Gierula and M. Markiewicz
Fig. 5 Snapshot of a liquid-crystalline POPC bilayer at the end of 70-ns MD simulation [111].
The POPC molecules are in the united atom representation and atoms are represented in standard
colours. Water molecules are blue
Detailed information about the dynamic structure of the model membrane and
of each lipid molecule as well as the motional events that occur over time scales
up to microseconds can be obtained using classical molecular modelling methods.
In principle this methodology has a spatial atomic resolution and time resolution
in the femtosecond time scale, thus it is particularly well suited for studying such
disordered and dynamic structures as lipid bilayers. Nevertheless, models gener-
ated with molecular modelling methodology have to be validated against a range of
experimentally obtained properties e.g. [7, 160, 161].
Amphipathic phospholipid molecules can form a lamellar structure (as is the lipid
bilayer) only in the presence of water and this is a spontaneous self-assembling pro-
cess. In addition to the phospholipid shape (the ratio of the cross-section of the head
group to that of the acyl chains), water, ions, and temperature determine the lyotropic
phase state (e.g. lamellar, hexagonal, micellar, cubic) of the assembly. Above the main
thermotropic phase transition temperature, when the phospholipid acyl chains are in
a melted state (disordered), PC bilayers are in the lamellar phase when the system
composition is ~40 wt% water, e.g. [132, 133, 165, 223]. Other phospholipids like
phosphatidylethanolamine (PE), phosphatidylserine (PS), sphingomyelin (SM), and
phosphatidylglycerol (PG) etc. (Fig. 2) require different amounts of water, depend-
ing on the charge and volume of their polar head groups, their capacity as H-bond
donors, and also on the length and degree of unsaturation (the number of double
Computer Modelling of the Lipid Matrix of Biomembranes 337
C=C bonds) of their hydrocarbon chains [24, 119, 165, 227]. In multi-lamellar lipo-
somes, the equilibrium number of water molecules that hydrate a saturated PC bilayer
is ~30/PC [50, 132, 207, 211] of which, on average, ~5 water molecules are strongly
bound by a PC [50, 132].
One of the first monographs that provided practical information and some theoreti-
cal background on building realistic and reliable computer models of a lipid bilayer,
related problems and limitations is Ref. [151]. In those early days, the starting con-
figuration of the bilayer was created from spatially ordered phospholipid molecules
with acyl chains in the extended all-trans conformation, and, thus, the initial struc-
ture corresponded to the crystal state e.g., Ref. [30, 146]. However, the biologically
active lipid matrix is in the liquid-crystalline phase, where phospholipid acyl chains
are in a melted state. This means that, on average, a certain percentage of torsion
angles in a chain (~25%) are in the gauche conformation and the probability of the
gauche conformation changes little along the chain, except for the last torsion angle,
where the probability of gauche is higher [92, 97]. MD simulation of a lipid bilayer,
which was initially in the crystal state, required a long equilibration time and part of
the equilibration process was often carried out at an elevated temperature to speed
up breaking of the crystal order e.g., Refs. [142, 146]. It thus seems more rational
to start the simulation from a random initial configuration of lipid molecules in the
bilayer and disordered (randomly distributed gauche conformations of torsion angles
along the chains in accordance with the equilibrium population) acyl chains of the
lipid molecules, as has been done e.g. in Refs. [11, 31].
In classical molecular modelling, atoms move in a conservative potential on the
potential energy surface that is calculated in the framework of the force field descrip-
tion [9]. A force field is the functional form and parameters and should be considered
as a single entity [89]. The most-widely used functional form has three terms describ-
ing bonded interactions (bond stretching, angle bending, and bond rotations) and two
terms describing non-bonded interactions (van der Waals and Coulomb) and some-
times also improper torsion and 1–4 interaction terms [89]. The potential energy
of the molecular system is an analytical function of the positions of the atoms in
the system [94]. The force field parameters are necessary to compute the value of
the total energy of the molecular system and forces acting on each atom. The force
field can contain parameters for all atoms in the system (all atom force field) or
parameters where some groups of atoms, typically methyl and methylene groups
are treated as interaction units (united atom force field). The most-commonly used
force fields in the molecular modelling of lipid bilayers are OPLS (optimized poten-
tials for liquid simulations) [76, 77], CHARMM (chemistry at Harvard molecular
mechanics) [106], AMBER (assisted model building with energy refinement) [11,
338 M. Pasenkiewicz-Gierula and M. Markiewicz
29] and GROMOS (Groningen molecular simulation) [214]. A search of the PubMed
Central database indicates that of 196 papers on molecular modelling of the POPC
bilayer published after the year 2010, 79 used CHARMM, 40 Berger, 35 GROMOS,
20 OPLS, 10 AMBER/Lipid and 5 Slipids [68, 69] force fields. All these force fields
have similar functional forms (Eq. 1) but their parameters were adjusted to repro-
duce different physico-chemical quantities of the molecular system and thus, should
not be inter-changed. These force fields also use different ways of assigning atom
types to atoms in the system. One should always keep in mind that due to the way in
which the parameters were derived, the force field can be used to predict only certain
properties of a molecular system.
Vn
E( R) K b (b − b0 )2 + K θ (θ − θ 0 )2 + [1 + cos(nφ − φ0 )
b θ φ
2
12 6
r∗ r∗ qi q j
+ ε −2 + , (1)
i< j
ri j ri j i, j
ri j
The first three summations in Eq. (1) (bonded interactions) are over bonds (1–2
interactions), angles (1–3 interactions), and torsions (1–4 interactions). The last two
summations (non-bonded interactions) over pairs of atoms i and j exclude 1–2 and
1–3 interactions and often use separate parameters for 1–4 interactions as compared
with those used for atoms separated by more than three covalent bonds. Non-bonded
interactions include the “van der Waals” term (dispersion and repulsion) represented
by a Lennard-Jones 6–12 potential, and the electrostatic term, where partial charges
qi of atoms interact via Coulomb’s law. b0 , θ 0 , K b , K θ , V n , ϕ 0 , ε, r*, and qi , are the
potential function parameters. R represents coordinates of the atoms present in the
molecular system [162].
The functional form and parameters of a given force field are transferable, which
means that molecules of similar atom types can be modelled using the same set of
parameters and energy function [9, 89]. OPLS, CHARMM, GROMOS, and AMBER
force fields are used to model large molecular systems and therefore their functional
forms are simple as an adequate compromise between accuracy and computational
efficiency. Newer versions of the lipid force field parameters can be found in a number
of papers, namely OPLS [84, 105, 191], CHARMM [81, 152], AMBER [183, 205],
GROMOS [185].
The set of parameters used to model water or aqueous solutions (force field for
water, called a water model) should be compatible with that for the biomolecules.
The most common water models used in MD simulations of lipid bilayers hydrated
with explicit water are TIP3P (transferable intermolecular potential three-point) [75]
with further modifications for simulations with Ewald summation [164], and SPC
(simple point charge) [8]. These water models are rigid and have three interaction
sites (three-point models, where point charges are centred on each of the three water
atoms). TIP3P and SPC have no Lennard-Jones parameters on the hydrogen atoms
and this makes the models compatible with most classical force fields, although they
Computer Modelling of the Lipid Matrix of Biomembranes 339
perform differently with different force fields [109]. For use with the CHARMM
force field, the TIP3P water model was slightly modified and Lennard-Jones terms
on the hydrogen atoms were included [107, 109, 134]. A rigid water model with four
interaction sites (TIP4P) [75] has also been used in MD simulations of lipid bilayers
[33, 63, 190], although it is less common due to its additional computational expense.
All force fields listed above use fixed-point charges. In order to allow the elec-
tron density to respond to the local electric fields, a polarizable force field for lipid
molecules based on the Drude oscillator [87, 88] was developed [27, 98]. The polar-
izable force fields reproduce electrostatic interactions better, and, while adding addi-
tional computational complexity [74] they provide a more accurate representation of
a molecular system.
A typical mammalian cell has a diameter of ~10 × 10−6 m (10 μm) and thus a sur-
face area of ~10−10 m2 . Estimating that the cross-sectional area of a lipid molecule
is ~100 × 10−20 m2 and assuming that lipids occupy only 10% of the membrane sur-
face (the rest are proteins), one can roughly estimate that one leaflet of the lipid
matrix is built of ~107 lipid molecules. The computer model cannot be built of so
many molecules due to its computational complexity. In classical molecular mod-
elling, atoms that constitute the model interact through a many-body potential. This
potential explicitly depends upon the atoms’ positions. As many-body interactions
are an intractable problem to solve, the non-bonded interactions are in most cases
approximated by the sum of pairwise interactions. For N atoms in the model, there are
approximately N 2 interactions (the complexity of the algorithm is denoted O(N 2 )). In
effect, the time required to compute non-bonded interactions without further approx-
imations is proportional to N 2 . Thus, the first limitation of the computer model with
an atomic resolution is related to the number of its atoms. The model of the membrane
matrix must thus be a patch of the lipid bilayer that, by applying two-dimensional
periodic boundary conditions, is algorithmically made horizontally continuous, and
by applying three-dimensional boundary conditions, is additionally made vertically
periodic.
The second main limitation of the computer model is the time scale of dynamical
processes that can be simulated. In classical molecular dynamics (MD) simulations,
the movements of atoms are governed by the classical equation of motion, which
in most cases is Newton’s equation. The position of each atom is obtained by the
numerical solution of the equation at successive discrete time points, every time step.
When the time step is constant, its value is determined by the fastest movements in
the model, which are bond vibrations. The fastest vibrations in the molecule are those
of covalent bonds that link hydrogen atoms and their time constant is ~10 fs. To probe
this motion faithfully, the time step should be less than 1 fs (10−15 s). When these
vibrations are eliminated, then the time step can be extended to 2 fs. Thus, to evaluate
the dynamical characteristics of a lipid bilayer at equilibrium, often 109 or even more
340 M. Pasenkiewicz-Gierula and M. Markiewicz
time steps are required. When designing a computer model of the bilayer, one should
thus consider the number of atoms in the model and the number of integration time
steps as well as the computational power available in order to estimate the total
elapsed real time of the simulation. If the calculations are not likely to finish within
a reasonable time period, one has to compromise the size of the model or the length
of the time scale investigated. To expand the time and length scales of these systems
beyond what is feasible with atomic models; coarse grained (CG) models for lipid
aggregates can be employed. A very successful and widely used CG lipid model
is the MARTINI force field [115]. How lipids and cholesterol are mapped to the
MARTINI CG representation is shown in Fig. 6 of Chapter Modeling of Membrane
Proteins.
Fig. 6 Examples of a water molecules H-bonded to phosphate oxygen atoms, b a water molecule
bridging phosphate oxygen atoms of two PC molecules (intermolecular water bridge), c a water
molecule anchoring clathrate around the choline moiety and a phosphate oxygen atom (intramolec-
ular water anchor), d charge pairs between two methyl groups of a choline moiety and a phosphate
oxygen atom, e Na+ coordinated by four PC molecules
342 M. Pasenkiewicz-Gierula and M. Markiewicz
Due to limited computational power, the first computer models of hydrated phos-
pholipid bilayers with an atomic resolution described in the literature consisted of
lipids of a single-type. These computer models comprised from 36 to 200 PC or PE
molecules e.g., Refs. [30, 32, 38, 61, 169, 196], although in most cases their MD
simulation times were far below 1 ns [30, 32, 38, 61, 169], or ~2 ns [196]. The aim of
these simulations was mainly to assess the reliability of computer models by com-
paring the results of simulations with the experimental results and to improve the
methodology. For these reasons, computer models comprised predominantly those
phospholipids for which experimental data were available, and they were mainly
saturated PCs DPPC and DMPC [6, 38, 112, 196], but also DLPE [32] and mono-
unsaturated POPC [61]. Nevertheless, even these short simulations provided a wealth
of information about lipid bilayers, particularly about the dynamics of lipids and their
interactions with water. A significant extension of both the spatial and temporal scales
of bilayer MD simulations was made by Lindahl and Edholm [99], who carried out
simulations of a fully hydrated bilayer consisting of 1024 DPPC molecules for 10 ns.
With technological advances, particularly advances in the development of algorithms
[7], a much larger time scale is accessible for simulations these days and a 100-ns MD
simulation of a lipid bilayer is now standard. At present, single-type lipid bilayers
are mainly used in computer modelling studies of membrane proteins or peptides e.g.
[78, 80, 136] (this subject is broadly discussed in the chapter Modeling of Membrane
Proteins), of the collective behaviour of lipids in the bilayer e.g., [45, 47, 168], of
membrane permeation e.g., [154, 159, 184, 203] or interactions with ions in differ-
ent membrane thermotropic phases e.g., [195]. In an impressive, large scale study
(2.7-million-atoms) of a ribosome anchored to the membrane channel embedded in
a single-lipid POPC bilayer, a 50-ns MD simulation was performed by the Schulten
group [54]. Single-lipid bilayers are also used as reference systems in studies of
the effects of a certain membrane component on the main bilayer constituents (see
below).
The PCs that occur most frequently in nature are those with cis unsaturated acyl
chains. As such, like PCs with saturated acyl chains, they are the most commonly
used PCs in model studies. In contrast, PCs with trans unsaturated acyl chains are
rather rare in nature. Nevertheless, they have a negative impact on human health.
Even though the effect of trans unsaturation of the PC acyl chains has been studied
both experimentally and computationally, e.g. [127, 172, 177, 194, 197, 222] such
studies are scarce. A recent comparative MD simulation study of saturated, cis and
trans mono-unsaturated bilayers of Kulig et al. [85] indicated that trans unsaturated
chains are more flexible than cis unsaturated chains (cf. Sect. 4.1). In effect, the
packing of trans unsaturated chains, thus their order in the bilayer, is higher than
Computer Modelling of the Lipid Matrix of Biomembranes 343
cis unsaturated chains. Also, interactions between cholesterol and trans unsaturated
chains are stronger than cis unsaturated chains, which results in a higher ordering
effect of cholesterol in trans unsaturated bilayers.
The lipid matrix of a cell membrane contains different kinds of lipids [215]. A
mixed-lipid bilayer is thus a more realistic model of the lipid matrix of biomem-
branes, although it is more difficult to analyse than a single-lipid bilayer. As lipid
molecules of the same kind tend to cluster together [62], and mix nonideally with
lipids of other kinds [62, 166], the lateral distribution of lipids in the bilayer is often
inhomogeneous and the bilayer has compositionally distinct microdomains. A recent
comprehensive review on the molecular modelling of bilayers of heterogeneous com-
position is in Ref. [144]. The first atomistic computer models of mixed-lipid bilayers
consisted of two kinds of phospholipid. The Berkowitz group carried out MD sim-
ulations of bilayers comprising DPPC and DPPS at a ratio of 5:1 [142] and the
simulations provided detailed information about lipid-lipid interactions and showed
that ions strongly affect them. More exotic binary bilayers of DMPC and dimyris-
toyltrimethylammonium propane (DMTAP, a cationic lipid with no phosphate group)
at a varying mole ratio were constructed and MD simulated by the Vattulainen group
[55]. There, the effect of the lipid composition on the structure and electrostatic
properties of the bilayer was investigated. Bilayers composed of DOPC and DOPE
at a varying mole ratio were simulated by the Marrink group [34]. They found that
the equilibrium properties of these bilayers as a function of their PC/PE compo-
sition are nonlinear. However, they found no indication of domain formation, but
suggested that only MD simulation times in the microsecond range might reveal that
this process really takes place. Yet another binary bilayer made of POPE and POPG
in the proportion 3:1 was MD simulated by the Pasenkiewicz-Gierula group [128].
There, the organization of the bilayer interfacial region was analysed in detail. Other
computer simulations of binary phospholipid bilayers followed e.g. [91, 229, 231].
As was mentioned above, at present, atomistic MD simulations cannot be used to
model the process of micro-domain formation in binary lipid bilayers due to the
timescale of the process. However, using the CG MD simulation, the Voth group
[188] observed phase separation of a mixed 1:1 DPPC/DPPE bilayer.
344 M. Pasenkiewicz-Gierula and M. Markiewicz
The binary lipid bilayers that have been most studied using molecular modelling
methodology are composed of PC or SM and Chol. This is because SM, PC, and Chol
(Figs. 2 and 3) constitute three major classes of lipids in the outer leaflet of the animal
cell membrane. The cholesterol content of cell membranes is usually 20–50 mol%
of total the lipids [124] but in ocular lens membranes, the Chol content often exceeds
that of the phospholipids [208]. Chol has numerous functions in biomembranes.
From a biophysical perspective the main membrane function of Chol is to modulate
the physical properties of the lipid matrix, for example to regulate its fluidity and
the phase behaviour [125, 220], to increase its mechanical strength [15, 40], and to
increase its hydrophobic barrier [72, 199]. The first MD simulation of a fully hydrated
PC-Chol bilayer was carried out by Robinson et al. [170]. This simulation was short;
nevertheless it provided an interesting insight into the cholesterol ordering effect
and showed the formation of hydrogen bonds (H-bonds) between Chol and PC. This
simulation was followed by a much longer one by Tu et al. [209], which demonstrated
that Chol has a significant influence on the subnanosecond time scale PC dynamics.
Computer simulation studies on bilayers containing cholesterol published before
2009 are reviewed and summarised in Refs. [12, 181].
The molecular level membrane effects of cholesterol which were identified ear-
lier are the so-called ordering [138] and condensing [116] effects. The ordering
effect describes the ability of Chol molecules to increase the order of acyl chains in
phospholipid-Chol bilayers in the liquid-crystalline phase. A measure of the chain
order is the molecular order parameter, Smol , or deuterium order parameter, SCD . An
effect which is closely related to the Chol ordering effect is the condensing effect
that denotes that Chol induces an increase in the membrane surface density or, in
other words, decreases the surface area occupied by phospholipid molecules in bilay-
ers containing Chol. Both effects are easily detected in MD simulations but basic,
atomic-level mechanisms that are responsible for the effects are not easy to indicate,
so they have not been fully explained yet.
The atomic and molecular level mechanisms behind the cholesterol effects on the
membrane are reviewed in Ref. [181]. In short, as in the case of most biomolecules,
there is a direct relationship between the Chol structure that has been optimised over
the long process of natural evolution, and its biological function [122]. Chol consists
of three structural elements, namely the rigid steroid ring, the polar 3β-hydroxyl
group, and a short hydrocarbon chain attached to the ring at position 17 (cf. Fig. 2).
In addition, two methyl substituents, called C18 and C19 for short, are attached to the
ring at positions 10 and 13 (Figs. 2 and 3). They make the cholesterol ring asymmet-
ric—one of its sides is flat (α-face), the other is rough (β-face). Any modification of
these structural elements decreases the effects of Chol on lipid bilayers. A systematic
MD simulation study of the effect of modifying the chemical structure of Chol on the
ability of Chol to affect the properties of the bilayer was carried out by Róg et al. [163,
176, 180, 182, 213]. The first modification involved a change of the β-configuration
of the Chol hydroxyl group to α [176]. This epimeric form of cholesterol (epicholes-
terol, Echol) is rare in nature. MD simulations of the DMPC-Echol bilayer confirmed
Computer Modelling of the Lipid Matrix of Biomembranes 345
the experimental results of Dufourc et al. [36], and Demel et al. [35] that Echol has
weaker ordering and condensing effects on bilayers than Chol. The second modifica-
tion deprived Chol of the ability to be an H-bond donor by substituting the Chol OH
group with a ketone group [182]. Ketosterone is an artificial steroid as the 3-ketone
group is not present in sterols. The interactions of PC polar groups as well as water
with the ketone group are much weaker than those with the Chol OH group. Thus,
ketosterone is not firmly anchored in the bilayer interfacial region as is Chol and
its ordering and condensing effects are much weaker. Moreover, MD simulations
showed that ketosterone is able to undergo flip-flops between the bilayer leaflets in
a relatively short time of ~50 ns, whereas Chol does not flip-flop even on a much
longer time scale. The third modification deprived Chol of two methyl groups (C18
and C19) from the rough, β-face [180]. This made the cholesterol ring symmetric
and both its faces flat. Contrary to expectations, the effects of such a modified sterol
on the membrane order and condensation were weaker than those of cholesterol. To
obtain a better understanding of the functional significance of each methyl group of
Chol, one or two methyl groups were sequentially removed from the Chol molecule
[163]. This “chemical” experiment clearly showed that the removal of a single C18
methyl group or simultaneous removal of the other two methyl groups (C19 and
C21, the latter attached to C20 in the acyl chain) strongly affects the Chol ordering
effect. Desmosterol, which is a direct precursor of Chol and differs from Chol only
by one double bond in the sterol acyl chain, influences a saturated bilayer less than
cholesterol [213]. Smondyrev and Berkowitz carried out MD simulation studies of
other chemically modified structures of Chol and showed that an additional ketone
group at position 6 [193] as well as replacing the Chol OH group with an SO4 group
(cholesterol sulphate) [192] decreases the Chol effect on the lipid bilayer.
Detailed analyses of the results of studies on the ordering and condensing effects
of various sterols allowed Aittoniemi et al. [1] to find a strong correlation between
the tilt of the sterol ring (the angle between the ring plane and the bilayer normal) and
the sterol ordering and condensing abilities—the smaller the tilt, the more ordered
and condensed the bilayer is. This correlation arises from basic interactions between
Chol and lipids, and, as was shown in the studies of “chemical” modifications of
the Chol molecule as well as those with Chol precursors, all structural elements
of the cholesterol molecule are important and effective in these interactions. In all
binary lipid bilayers that contain sterol molecules investigated in the MD simulations
cited above, Chol had the smallest tilt and the strongest effect on the bilayer of
all these sterols [1]. A more recent MD simulation study of the Chol condensing
effect confirmed a correlation between an average tilt angle of the Chol ring and the
magnitude of the Chol condensing effect [3].
The PC-Chol bilayers discussed in this section so far contained no more than
50 mol% Chol and modelled a “typical” animal cell membrane [124]. However,
there are natural cell membranes that contain more than 50 mol% Chol. An example
of such membranes is the fibre cell membrane of the eye lens [208] where Chol
not only saturates the membrane but also causes pure Chol domains to form within
the membrane [66]. Model studies on PC-Chol bilayers with an increasing Chol
content allowed the Subczynski group to make the extension of the phase diagrams
346 M. Pasenkiewicz-Gierula and M. Markiewicz
for Chol/PC mixture to the region where PC bilayers are saturated and oversaturated
with Chol [108].
The biological purpose of oversaturating amount of Chol in the membranes of
the eye lens cells was puzzling. Computer modelling studies on the PC-Chol bilayer
revealed that at saturating Chol content, cholesterol suppresses vertical fluctuations
of atoms in a bilayer [158, 224] which smooths the bilayer surface. As one of the
principal properties of the lens is transparency and light-scattering is one of the
factors compromising the transparency, cholesterol-induced smoothing of the surface
of the eye lens membranes helps to maintain lens transparency by decreasing light-
scattering [158, 224]. A very recent MD simulation study [159] strongly supported
the hypothesis that pure Chol domains present in the lipid matrix of the eye lens cell
membranes provide barriers for oxygen transport to the lens centre, and thus protect
the lens against cataract development [198].
The distribution of lipids in the lipid matrix of a biomembrane is not only laterally
inhomogeneous but also asymmetric across the matrix. The latter means that the lipid
composition of the two bilayer leaflets is different. In animal cell membranes the outer
leaflet is enriched with SM, PC, and Chol, and inner in PS, phosphatidylinositol (PI)
(both are anionic), and PE. In the first computer model of an asymmetric bilayer
found in the literature [23], one leaflet consisted of DPPC and the other of randomly
distributed DPPC and DPPS. An MD simulation of this bilayer did not show any effect
of the mixed-lipid leaflet on the single-lipid leaflet. An asymmetric bilayer consisting
of four lipid species, PC, SM, PE, and PS was constructed and MD simulated by
Vacha et al. [212]. In that study a realistic model of the inner and outer bilayer
leaflets was created as the system comprised two parallel asymmetric bilayers. The
inner leaflets of both bilayers, separated by the “interior” water layer, consisted of
PS and PE, the outer leaflets consisted of PC and SM. The number of added Na+ and
K+ ions exceeded the number needed to neutralize the negative charge on PS. The
simulations indicated that phospholipid head groups preferentially bind sodium over
potassium ions, and also that some water molecules are able to permeate across the
bilayers on a 100 ns timescale. An asymmetric bilayer containing Chol and SM in
one leaflet and Chol and PS in the other, was MD simulated by Bhide et al. [14]. The
authors observed practically no interaction between the two leaflets but observed a
more extended network of interactions between SM and Chol than between PS and
Chol. This might suggest that SM is more effective in the formation of domains than
PS.
The Marrink group [65] carried out large-scale CG 40-μs MD simulations of
a multicomponent bilayer consisting of 63 different lipid species asymmetrically
distributed across the two leaflets, to make a realistic model of the lipid matrix
of a mammalian plasma membrane. This model showed the formation of transient
domains with a liquid-ordered character in both bilayers, although in each bilayer
they consisted of different lipids. The domains were coupled across the two bilayer
leaflets. The later result might seem at variance with the experimental results obtained
for a much simpler bilayer which did not reveal evidence of transbilayer coupling
between the leaflets [39].
As has been already stressed in Introduction, the lamellar structure and properties of
lipid bilayers follow directly from the structural characteristics of lipids and water.
Phospholipid bilayers form spontaneously in water and do not exist on their own in
the absence of water. Water must thus play a significant role not only in the formation
348 M. Pasenkiewicz-Gierula and M. Markiewicz
but also in the stability of the bilayer. The hydrophobic effect causes the lipid acyl
chains to assemble together in order to minimise their contact with water. At the
same time, the lipid head groups stay in contact with water—polar phosphate and
carbonyl groups can form hydrogen bonds with water molecules but the non-polar
choline group cannot form such bonds. The formation of H-bonds between PC and
water is evident in MD simulations of hydrated phospholipid bilayers (Fig. 6a). The
first thorough analysis of interactions between water and polar groups of PC in an
MD simulated bilayer was carried out by Alper et al. [2]. Also, they and Damodaran
and Merz [30] were the first to identify clathrate-like structures of water around
choline groups in PC bilayers.
A careful analysis of the interfacial water of an MD simulated PC bilayer showed
that water molecules can simultaneously form H-bonds with two PC polar groups
[149]. These bifurcated H-bonds were named “water bridges” (Fig. 6b). The earlier
quantum mechanical calculations of Frischleder et al. [48] showed that the binding
energy of a water bridge between two phosphate oxygen atoms is significantly higher
than that of a single H-bond. Thus, water bridges linking two or more PC molecules
lower the system’s energy and stabilize the bilayer structure. Such water-mediated
interactions between PC oxygen atoms have been postulated previously e.g. [16,
131] but only recently has their existence been shown experimentally [221]. Water
molecules can also bridge choline and phosphate or carbonyl groups by simultane-
ously belonging to a clathrate around the choline group and being H-bonded to one
of the polar groups. Such water molecules were evidenced in MD simulations of a PC
bilayer hydrated by normal and heavy water [181]; to distinguish them from water
bridges they were named “anchoring water” (Fig. 6c). Intermolecular water anchors
can also be expected to contribute to the stabilization of the bilayer structure.
PC molecules cannot form direct H-bonds among themselves as they are only H-
bond acceptors but, as was discussed above, in the hydrated PC bilayer they may be
linked indirectly, via water bridges and anchors. PC molecules can, however, interact
directly via Coulomb interactions as they contain groups that are positively (choline
moiety) and negatively (phosphate and carbonyl oxygen atoms) charged, whereas
their net electrostatic charge is zero. These charge-charge interactions were named
“charge pairs” (Fig. 6d) and they certainly contribute to the bilayer stability [150].
Detailed analyses of water bridges and charge pairs formed at the PC bilayer/water
interface in the POPC, palmitoylelaidoylPC (PEPC), and DMPC bilayers revealed
that these interactions make up an extended network that links PC molecules; this
network involves a large majority (more than 96%) of the bilayer lipid molecules
at any instant [127]. An analysis of the inter-lipid network discussed above did not
include water anchors. Murzyn et al. [127] found a strong correlation between the
cross-sectional surface area available to a PC head group, either average or individual,
in the bilayer and the number of H-bonds, water bridges and charge pairs a given PC
molecule makes—the larger the area the greater the number of PC-water H-bonds
but the smaller the number of short distance PC-PC interactions; the latter results in
a less branched inter-lipid network in bilayers with a larger average surface area per
lipid.
Computer Modelling of the Lipid Matrix of Biomembranes 349
A lipid bilayer has a strong effect on the properties of the water near its sur-
faces. The results of earlier studies of the effect of the phospholipid bilayer on the
properties of the hydrating water are summarized in Refs. [13, 130, 173]. In a recent
comparative MD simulation study [110] the effect of the DOPC and monogalactosyl-
diacylglycerol, MGDG, bilayers on the properties of the surface water was analysed
in detail. The study showed that ordering the water dipoles by the PC head groups
extended further into the water phase than that by the galactolipid head groups,
whereas inside the bilayer/water interface the ordering was higher in the galactolipid
than the PC bilayer. The study also showed that near the surface of both bilayers the
net orientation of water dipoles was close to horizontal.
In the PC bilayer containing Chol a repertoire of short-distance inter-lipid inter-
actions is greater than in the pure PC bilayer [147]. Chol is both an H-bond donor
and acceptor and the OH group of Chol can form direct H-bonds with phosphate and
carbonyl oxygen atoms of PC. Also, a Chol OH group and a choline moiety of PC
can form a charge pair. Such a charge pair was identified by Chiu et al. as a weak PC-
Chol hydrogen bond [25]. Unfortunately, high level quantum chemistry calculations
have not been performed yet to establish how to classify this short-distance PC-Chol
interaction. In the DMPC bilayer containing Chol [147], a network of inter-lipid
interactions forms as in the bilayer without Chol, and it involves a large majority of
DMPC and Chol molecules, although it is less branched than in the DMPC bilayer
without Chol [150].
Several phospholipids, in particular PE, PS, PG, and SM, unlike PC (Fig. 2), are
both H-bond donors and acceptors, thus they are able to make direct inter-lipid H-
bonds. Short-distance interactions between these phospholipids at the bilayer/water
interface in the absence or presence of PC have been analysed e.g. in Refs. [32, 34, 41,
91, 126, 128, 141, 230]. A comparative MD simulation study of DPPE and DPPC
bilayers [91] showed that these direct inter-lipid H-bonds at the bilayer interface
result in a smaller cross-sectional surface area per lipid, and a higher acyl chain
order, and are responsible for the higher temperature of the main phase transition of
the PE than PC bilayer. In binary PC-PE bilayers, with increasing PE content, the
average surface area per lipid noticeably decreases and the chain order increases [34,
91].
At the water/bilayer interface, ions also interact with phospholipids. One of the
first bilayer simulations that included ions was carried out on a PS bilayer by Pandit
and Berkowitz [141]. PS is a donor and acceptor of H-bonds but is also negatively
charged. The authors [141] observed that, once the negative charge of the PS serine
group (cf. Fig. 2) is compensated by Na+ counterions, the PS molecule becomes
analogous to the PE molecule, and a PS bilayer in the presence of Na+ has similar
properties to a PE bilayer. They also showed that Na+ ions are generally coordinated
by both serine carboxyl and phosphate groups. In a much longer MD simulation of a
PS bilayer, Mukhopadhyay et al. [126] observed that Na+ ions penetrate deeper into
the bilayer/water interface and are mainly coordinated by carbonyl oxygen atoms.
The disparity between the results of Pandit and Berkowitz [141] and Mukhopadhyay
et al. [126] was, most likely, due to the slow penetration of the bilayer/water interface
by Na+ ; to reach a stable distribution of Na+ ions the bilayer has to be equilibrated
350 M. Pasenkiewicz-Gierula and M. Markiewicz
Once a bilayer is formed, in its hydrophobic core there is a balance between attrac-
tive van der Waals interactions among adjacent acyl chains and inter-chain entropic
repulsion. The extent of the attractive interaction depends on the phospholipid chain
length and the degree of unsaturation. Longer saturated chains attract one another
more strongly than shorter chains. They are therefore more densely packed in the
bilayer core. In consequence, their mobility is decreased and the main phase tran-
sition temperature of the bilayer is increased. A cis-double bond located near the
middle of the chain, which is typical for mono-unsaturated chains of PCs in animal
cell membranes, interferes with the chain packing. In effect, cis-unsaturated chains
Computer Modelling of the Lipid Matrix of Biomembranes 351
are less densely packed and have considerable motional freedom in the bilayer core.
These decrease the cooperativity of the chain interactions and cause a decline in
the main phase transition temperature of cis-unsaturated compared with that of sat-
urated bilayers. It is interesting to note that bilayers made of phospholipids with
trans-double bonds have a significantly higher main phase transition temperature
than those made of corresponding cis-unsaturated phospholipids and, in general,
their properties are more similar to those of bilayers made of saturated than cis-
unsaturated phospholipids [83]. MD simulations of Róg et al. [172] and Kulig et al.
[85] provided a plausible explanation of these similarities (cf. Sect. 2.5.1 and 4.1).
As has already been discussed above (Sect. 2.6.2), in binary PC-Chol bilayers,
Chol both induces a higher order of PC acyl chains (ordering effect) [138] and
makes their packing denser (condensing effect) [116], although the atomic-level
mechanisms that are responsible for the effects are not easy to indicate precisely.
Thus, there is still no general consent regarding the molecular basis of both effects.
Many researchers claim that phospholipid acyl chains strongly interact with steroid
rings and this makes the chains more straight and ordered—this concept was first put
forward by Levine and Wilkins [93]—and the attractive character of the interaction
increases the packing of atoms in the bilayer. There are two ways to increase the
chain order as measured by one of the order parameters. One of them is to reduce the
number of gauche rotamers along acyl chains, and the other is to reduce the tilting
of acyl chains; tilt, by definition is the angle (θ ) between the chain vector (linking
the carbon atom next to the carbonyl group with the last in the chain) and the bilayer
normal (Fig. 7). However, such a definition of the chain tilt might be ambiguous. In
the liquid-crystalline bilayer, there is no collective tilt of chains. To say that, one has
to consider both the azimuthal, φ, and the polar, θ , chain angles (Fig. 7). Generally
speaking, no collective tilt means that due to the axial symmetry of the bilayer, for a
given θ angle, there are 2π φ angles of equal probability; this means that the average
value of θ over the whole range of angles is zero. In the liquid-crystalline bilayer
the chains are randomly tilted relative to the normal within the confines of a cone
[86, 156] with some distribution. In the tilt analysis, one is interested only in the
absolute value of θ angles. Due to internal flexibility of phospholipid acyl chains,
the chain tilt in the liquid-crystalline bilayer cannot be measured in spectroscopic
experiments. But for such a rigid molecule as is Chol spectroscopic methods can
provide an average tilt of the molecule from the average cosine square of θ [148]. In
MD simulations, the distributions of both θ and φ angles can be determined e.g. [1,
146, 202]. MD simulations clearly show that Chol increases Smol along the whole
chain either saturated e.g., [44] or mono-unsaturated e.g., [158, 179], by decreasing
the average chain tilt and narrowing the tilt angles distribution e.g., [158, 174].
However, it has a relatively mild effect on the probability of the trans conformation
of torsion angles along the chain, particularly in the case of mono-unsaturated chains
[158, 179].
Based on the analysis of the radial distribution function of carbon atoms in the
hydrophobic bilayer core, Róg and Pasenkiewicz-Gierula [174, 175, 177] postulated
that an increased packing of atoms in the bilayer (Chol condensing effect) originates
from interactions between the chains, and not between the chains and the Chol
352 M. Pasenkiewicz-Gierula and M. Markiewicz
rings. This explanation of the Chol condensing effect is in line with the experimental
hypothesis postulated by Hyslop et al. [64], i.e. that Chol induces an increase in the
van der Waals interactions of acyl chains, while its van der Waals interactions with
the chains are less favourable [64]. Also, the free energy calculations of Zhang et al.
showed favourable changes in lipid–lipid interactions near cholesterol molecules
[228]. In binary PC-Chol bilayers, the Chol induced condensing effect is limited
only to that fragment of each chain that penetrates the bilayer core to the same depth
as the cholesterol ring [3, 175]. A more recent MD simulation study [117] reveals
that in the PC-Chol bilayer Chol molecules avoid direct Chol-Chol contacts, and at a
higher Chol content form a three-fold symmetric arrangement with the nearest Chol
molecules. This induces a particular relative orientation of Chol adjacent PC acyl
chains and their ordering. The main conclusion of this study was that Chol molecules
act collectively in the lipid bilayer [117].
Computer Modelling of the Lipid Matrix of Biomembranes 353
The fastest motion having a direct influence on the bilayer properties is trans-gauche
isomerisation. This causes constant conformational changes in lipid acyl chains and,
together with the vibrations of the covalent bonds and valence angles, makes lipid
molecules internally flexible. This gives rise to the liquid-like (fluid) character of
the bilayer. In saturated acyl chains, there are three low energy conformations: trans
(t, torsion angle 180°), gauche-plus (g+ , torsion angle 60°) and gauche-minus (g– ,
torsion angle −60°). The trans conformation has the lowest torsional energy, thus it is
the most probable and has the longest lifetime of the three conformations. In naturally
occurring mono-unsaturated acyl chains the torsion angle associated with the double
bond is mainly in cis conformation. This conformation is stable (has a much longer
lifetime than those for single bonds) because the rotation around the double bond
is restricted. The rigidity of the double bond obviously affects the rotational states
of the single bonds connected directly to the double bond. The effect of the double
bond on the conformation of the adjacent single bonds was first observed in MD
simulations described in Refs. [129, 172], even though the torsional parameters for
these bonds there were not fully correct as the rotation around these single bonds
was unrestricted (no barriers for rotation). The parameterisation for the single bonds
derived in a rigorous way [4] takes into account that the most probable conformation
around each of the single bonds next to the double bond are skew-plus (s+ , torsion
angle 120°) and skew-minus (s− , torsion angle −120°). The profiles of probabilities
and lifetimes for t, g+− and s+− along saturated and mono-unsaturated chains of POPC
in pure POPC and POPC-Chol 1:1 bilayers were calculated by Plesnar et al. [158].
These results are in overall agreement with the experimental data of Tuchtenhagen
et al. [210]. The most recent calculations that lead to the revised parameters for the
single bonds next to the trans double bond determined that in addition to their most
probable s+ and s− conformations, mentioned above, the cis conformation is also
highly probable as are, to a lesser extent, any other conformation of these single
bonds [85]. This is due to the relatively low barriers for rotation around the single
bonds next to the trans double bond.
treated as a rigid rod motion. However, over a timescale much longer than trans-
gauche isomerisation, the overall effect of the isomerisation along the acyl chain
might be approximated by a fast rotation of the chain around its long axis, which
would give the chain an apparent cylindrical shape. As has already been mentioned
in Sect. 3.2, chains are randomly tilted relative to the normal within the confines of
a cone; this tilting results from chain rotation around the axis perpendicular to the
bilayer normal and restrictions from a relatively dense environment of other acyl
chains [86, 101, 156]. However, it is not easy to indicate whether the perpendicu-
lar axis is associated with one particular or several covalent bonds or whether it is
the axis of rotation of the whole phospholipid molecule. As the timescale of this
perpendicular rotation is much longer than isomerisation [156], in the first approx-
imation it might be acceptable to assume that indeed over a longer time scale the
rotational motion of acyl chains can indeed be approximated by a rigid rod rotational
diffusion. This rotation takes place in a restoring potential [86, 101, 120, 151] that
acts to align the chains along the bilayer normal. A thorough analysis of the nuclear
magnetic resonance (NMR) spectra of PC bilayers provided correlation times for
trans-gauche isomerisation of the order of 10−10 s (~0.1 ns) and for chain reorienta-
tion of the order of 10−8 –10−7 s (10–100 ns) [46, 156]. These times generally agree
well with those obtained in MD simulations of lipid bilayers e.g., [43, 101, 123,
146, 158]. The lifetimes of trans and gauche rotamers along the PC chain do not
change significantly and for the trans and gauche conformations fall within a range
150–300 ps, and ~50–80 ps, respectively e.g., [146, 151, 158]. The rotational motion
of a PC molecule or fragments of a PC molecule were analysed in e.g. Refs. [43, 101,
123, 146, 151]. In each of these papers a different approach was used to calculate
the motional parameters. Pasenkiewicz-Gierula and Róg [146] assessed rotational
correlation times from the rotational autocorrelation function (RAF) for Legendre
polynomials P1 (cosθ) and P2 (cosθ), where θ is the angle between the chosen vec-
tor at time t 0 and time t + nΔt. RAFs were calculated from a 2-ns MD trajectory
for three fragments of the DMPC molecule: P-N vector, O21-C1 (shoulder) vector,
and the chain vector defined as a vector linking a carbon atom next to the carbonyl
group with the centre of gravity of the chain. The RAF as a function of time was
then fitted to the sum of exponentials, although each decay curve was practically
a single-exponential function. This analysis clearly indicated that each of the three
fragments of the DMPC molecule rotate with different correlation times and the
rotation of the acyl chain is the slowest. The estimated rotational correlation times
from RAFs for P1 (cosθ) are ~ 4–6 × 10−8 s for the chain vector, ~2 × 10−8 s for the
shoulder vector, and ~0.7 × 10−8 s for the P-N vector [146]. A qualitatively similar
result was obtained by Moore et al. [123], who calculated the rotational diffusion
coefficients for the rotation of certain DMPC vectors relative to the molecular-fixed
reference frame, from an angular mean square displacement (MSD) function. It is
not possible to obtain, in general, the rotational correlation time from the diffusion
coefficient for restricted rotation in a restoring potential, so a numerical comparison
of the results of both papers is not possible. Nevertheless, both papers demonstrated
that different lipid fragments rotate to a large extent independently of one another.
However, of the fragments, chain rotation was the fastest in Ref. [123] and the slowest
Computer Modelling of the Lipid Matrix of Biomembranes 355
in Ref. [146]. Essmann and Berkowitz [43] derived rotational diffusion coefficients
from time correlation functions for Wigner rotation matrices, first assuming a free
rotor model for the DPPC molecule rotating within a pre-defined reference frame,
and roughly estimated that rotation around the long molecular axis is one order of
magnitude faster than that around the perpendicular axis.
The results of Pasenkiewicz-Gierula and Róg [146] and Moore et al. [123] indicate
that a PC molecule in the bilayer does not rotate as a rigid rod and actually each of the
PC chains rotates independently. As the azimuthal angle φ of an acyl chain vector
(Fig. 7) is not restricted and covers the whole range of angles 0–360° with equal
probability, there is certainly not a single axis of the perpendicular chain rotation. So
what is the origin of the PC acyl chain rotation? Using NMR and X-ray diffraction,
Hauser et al. [59] determined that the glycerol backbone of a PC molecule is not as
rigid and there are two conformations about the C2–C3 bond (cf. Fig. 2) that rapidly
interconvert on the NMR time scale (estimated as 1010 conversions per s). This
interconversion destroys the parallel alignment of the PC acyl chains. To compensate
for the effect of this interconversion and maintain the parallel alignment of the PC
acyl chains, the first four torsion angles in each of the chains must synchronously and
appropriately change [59]. However, in the liquid-crystalline bilayer, the chains are
not aligned parallel to each other, and the transient tilt of one chain is independent
of that of the other chain. On the basis of the analyses of Hauser et al. [59] one could
conclude that a transition between low energy conformations of any of the first four
torsion angles can bring about changes in the tilt of the acyl chain even though all other
torsions are trans. A simple test (unpublished results), where in a well equilibrated
PC bilayer that was MD simulated for 200 ns [158] all torsion angles in the acyl
chains were manually changed to trans conformation, whereas the conformations of
those in the glycerol backbone were unchanged (torsions for the C2–C3, C2–O21,
C3–O31, and O31–C31 bonds, cf. Fig. 2) showed a broad distribution of tilt angles
of PC acyl chains. This indicates that the chain tilting is to a large extent governed
by conformational states about the bonds in the glycerol backbone and that the chain
perpendicular rotation involves a combination of torsional events in the backbone.
In addition to this, the third torsion angle in each PC chain (corresponding to the
C31–C32 and C21–C22 bond, respectively, cf. Fig. 2) has markedly low barriers for
rotation [105], and thus can rapidly change its value triggering fast local changes
in the orientation of the associated acyl chain fragment; this change can propagate
along the chain.
attention of researchers to the study of the bilayer surface and mechanical proper-
ties. These properties can also be studied using MD simulation methodology. The
mechanical properties of the DOPC and MGDG bilayers are compared in Ref. [5].
The bending rigidity modulus calculated is higher for the MGDG than POPC bilayer
due to the higher number of inter-lipid interactions at this bilayer surface. This results
in a smaller surface area per molecule and thus in an increased rigidity of the MGDG
bilayer compared to the DOPC bilayer.
One of the basic surface properties is its curvature. Unfortunately, limitations on
the spatial and temporal scales of current atomistic MD simulations, as well as the
use of periodic boundary conditions, make direct observation and calculation of the
lipid bilayer curvature a non-trivial task. One of the methods for determining the
curvature involves calculating the depth-dependent distribution of intra-membrane
pressures, the lateral pressure profile. To calculate the profile, the bilayer is divided
into thin slices parallel to the interface plane and then the pressure tensor is calculated
for each slice [100, 139]. On the basis of Helfrich’s theory [60], one can calculate
the spontaneous curvature and Gaussian curvature modulus by integrating the lateral
pressure profile [140].
The lateral pressure profile model is a valuable analytical tool for explaining pro-
cesses such as membrane protein activation. It was shown that changes in the lateral
pressure profile may result in biologically significant changes in protein conforma-
tions [17, 18, 20, 21, 121, 155]. Along these lines is the lateral pressure hypothesis of
the anaesthetic mode of action. Recent computer simulation studies on the influence
of anaesthetics such as ethanol [204] or 1-alkanols [52] on the lateral pressure profile
of a membrane seem to confirm the mechanically driven mechanism of anaesthesia
[17, 19]. Since the lipid composition of cell membranes strongly affects the activity
of membrane proteins, the effects of the phospholipid head group, acyl chain length,
unsaturation, cholesterol content, and surface area per lipid on the pressure profile
across the bilayer were studied using MD simulation methods e.g. in Refs. [22, 53,
139, 153]. It was shown in those studies that all these factors have a considerable
effect on the lateral pressure profile.
Some of the lipid bilayers discussed in this chapter may be viewed as models of lipid
matrices of specific biomembranes. Binary POPC-Chol [26, 179] or SM-Chol [143,
178] bilayers may serve as simple models of a “generic” animal cell membrane,
particularly of its outer leaflet. A binary bilayer made of PE and PG at a 3:1 molar
ratio [128, 231] can serve as a model for the inner bacterial membrane. Binary
bilayers of mono- and digalactosyldiacylglycerol with polyunsaturated acyl chains
are good models of a photosynthetic membrane. More realistic models of an animal
cell membrane are discussed in Sect. 2.6.4 on asymmetric bilayers where two leaflets
of the bilayer have a different but relevant lipid composition. POPC and Chol are the
main lipid species found in human and pig gastric mucus, thus POPC-Chol bilayer
358 M. Pasenkiewicz-Gierula and M. Markiewicz
can also serve as a model for the gastric mucosal cell membrane [111]. A mixture of
DPPC and DPPG at a 7:3 molar ratio in the form of a monolayer might be used as a
model for the lung surfactant [79]. A ternary mixed cardiolipin, PC, and PE bilayer
may constitute a model for the inner mitochondrial membrane [171]. As was already
discussed in Sect. 2.6.3, a ternary mixed bilayer composed of saturated, unsaturated
phospholipids and cholesterol can model a raft-like domain in the bulk membrane
[143].
7 Concluding Remarks
range intermolecular interactions that stabilise the bilayer, the effect of cholesterol
and lipid dynamics in the bilayer and therefore certain important issues relating to
lipid bilayers are not referenced here. Excellent reviews on a broader range of topics
were cited at the end of the Introduction and more specific topics are discussed in
papers cited throughout this chapter.
References
1. Aittoniemi, J., Rog, T., Niemela, P., Pasenkiewicz-Gierula, M., Karttunen, M., Vattulainen,
I.: Tilt: major factor in sterols’ ordering capability in membranes. J. Phys. Chem. B 110(51),
25562–25564 (2006)
2. Alper, H.E., Bassolinoklimas, D., Stouch, T.R.: The limiting behavior of water hydrating a
phospholipid monolayer—a computer-simulation study. J. Chem. Phys. 99(7), 5547–5559
(1993)
3. Alwarawrah, M., Dai, J.A., Huang, J.Y.: A molecular view of the cholesterol condensing
effect in DOPC lipid bilayers. J. Phys. Chem. B 114(22), 7516–7523 (2010)
4. Bachar, M., Brunelle, P., Tieleman, D.P., Rauk, A.: Molecular dynamics simulation of a
polyunsaturated lipid bilayer susceptible to lipid peroxidation. J. Phys. Chem. B 108(22),
7170–7179 (2004)
5. Baczynski, K., Markiewicz, M., Pasenkiewicz-Gierula, M.: A computer model of a polyun-
saturated monogalactolipid bilayer. Biochimie 118, 129–140 (2015)
6. Bassolinoklimas, D., Alper, H.E., Stouch, T.R.: Solute diffusion in lipid bilayer-
membranes—an atomic-level study by molecular-dynamics simulation. Biochemistry 32(47),
12624–12637 (1993)
7. Benz, R.W., Castro-Roman, F., Tobias, D.J., White, S.H.: Experimental validation of molec-
ular dynamics simulations of lipid bilayers: a new approach. Biophys. J. 88(2), 805–817
(2005)
8. Berendsen, H., Postma, J., Van Gunsteren, W., Hermans, J.: Interaction Models for Water in
Relation to Protein Hydration. Intermolecular Forces, vol. 331. Reidel, Dordrecht (1981)
9. Berendsen, H.J.C.: Simulating the Physical World, Hierarchical Modeling from Quantum
Mechanics to Fluid Dynamics. Cambridge University Press, Cambridge (2007)
10. Berendsen, H.J.C., Tieleman, D.P.: Molecular dynamics: studies of lipid bilayers. In: Schleyer,
R. (ed.) Encyclopedia of Computational Chemistry, pp. 1639–1650. Wiley and Sons (1998)
11. Berger, O., Edholm, O., Jahnig, F.: Molecular dynamics simulations of a fluid bilayer of
dipalmitoylphosphatidylcholine at full hydration, constant pressure, and constant temperature.
Biophys. J. 72(5), 2002–2013 (1997)
12. Berkowitz, M.L.: Detailed molecular dynamics simulations of model biological membranes
containing cholesterol. Biochim. Biophys. Acta-Biomem. 1788(1), 86–96 (2009)
13. Berkowitz, M.L., Bostick, D.L., Pandit, S.: Aqueous solutions next to phospholipid membrane
surfaces: insights from simulations. Chem. Rev. 106(4), 1527–1539 (2006)
14. Bhide, S.Y., Zhang, Z.C., Berkowitz, M.L.: Molecular dynamics simulations of SOPS and
sphingomyelin bilayers containing cholesterol. Biophys. J. 92(4), 1284–1295 (2007)
15. Bloom, M., Evans, E., Mouritsen, O.G.: Physical-properties of the fluid lipid-bilayer compo-
nent of cell-membranes—a perspective. Q. Rev. Biophys. 24(3), 293–397 (1991)
360 M. Pasenkiewicz-Gierula and M. Markiewicz
16. Buldt, G.: The headgroup conformation of phospholipids in membranes. J. Membr. Biol.
58(2), 81–100 (1981)
17. Cantor, R.S.: The lateral pressure profile in membranes: a physical mechanism of general
anesthesia. Biochemistry 36(9), 2339–2344 (1997)
18. Cantor, R.S.: Lateral pressures in cell membranes: a mechanism for modulation of protein
function. J. Phys. Chem. B 101(10), 1723–1725 (1997)
19. Cantor, R.S.: The lateral pressure profile in membranes: a physical mechanism of general
anesthesia. Toxicol. Lett. 101, 451–458 (1998)
20. Cantor, R.S.: The influence of membrane lateral pressures on simple geometric models of
protein conformational equilibria. Chem. Phys. Lipids 101(1), 45–56 (1999)
21. Cantor, R.S.: Lipid composition and the lateral pressure profile in bilayers. Biophys. J. 76(5),
2625–2639 (1999)
22. Carrillo-Tripp, M., Feller, S.E.: Evidence for a mechanism by which ω-3 polyunsaturated
lipids may affect membrane protein function. Biochemistry 44(30), 10164–10169 (2005)
23. Cascales, J.J.L., Otero, T.F., Smith, B.D., Gonzalez, C., Marquez, M.: Model of an asymmetric
DPPC/DPPS membrane: effect of asymmetry on the lipid properties. A molecular dynamics
simulation study. J. Phys. Chem. B 110(5), 2358–2363 (2006)
24. Cevc, G., Watts, A., Marsh, D.: Titration of the phase-transition of phosphatidylserine bilayer-
membranes—effects of Ph, surface electrostatics, ion binding, and headgroup hydration. Bio-
chemistry 20(17), 4955–4965 (1981)
25. Chiu, S.W., Jakobsson, E., Mashl, R.J., Scott, H.L.: Cholesterol-induced modifications in lipid
bilayers: a simulation study. Biophys. J. 83(4), 1842–1853 (2002)
26. Chiu, S.W., Jakobsson, E., Scott, H.L.: Combined Monte Carlo and molecular dynamics sim-
ulation of hydrated lipid-cholesterol lipid bilayers at low cholesterol concentration. Biophys.
J. 80(3), 1104–1114 (2001)
27. Chowdhary, J., Harder, E., Lopes, P.E.M., Huang, L., MacKerell, A.D., Roux, B.: A polarizable
force field of dipalmitoylphosphatidylcholine based on the classical drude model for molecular
dynamics simulations of lipids. J. Phys. Chem.B 117(31), 9142–9160 (2013)
28. Cordomi, A., Edholm, O., Perez, J.J.: Effect of ions on a dipalmitoyl phosphatidylcholine
bilayer. A molecular dynamics simulation study. J. Phys. Chem. B 112(5), 1397–1408 (2008)
29. Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Merz, K.M., Ferguson, D.M., Spellmeyer,
D.C., Fox, T., Caldwell, J.W., Kollman, P.A.: A 2nd generation force-field for the simulation
of proteins, nucleic-acids, and organic-molecules. J. Am. Chem. Soc. 117(19), 5179–5197
(1995)
30. Damodaran, K.V., Merz, K.M.: Head group water interactions in lipid bilayers—a comparison
between Dmpc-based and Dlpe-based lipid bilayers. Langmuir 9(5), 1179–1183 (1993)
31. Damodaran, K.V., Merz, K.M.: A comparison of Dmpc-based and Dlpe-based lipid bilayers.
Biophys. J. 66(4), 1076–1087 (1994)
32. Damodaran, K.V., Merz, K.M., Gaber, B.P.: Structure and dynamics of the dilauroylphos-
phatidylethanolamine lipid bilayer. Biochemistry 31(33), 7656–7664 (1992)
33. Davis, J.E., Patel, S.: Charge equilibration force fields for lipid environments: applications to
fully hydrated DPPC bilayers and DMPC-embedded gramicidin a. J. Phys. Chem. B 113(27),
9183–9196 (2009)
34. de Vries, A.H., Mark, A.E., Marrink, S.J.: The binary mixing behavior of phospholipids in a
bilayer: a molecular dynamics study. J. Phys. Chem. B 108(7), 2454–2463 (2004)
35. Demel, R.A., Bruckdorfer, K.R., Vandeene, L.l.: Effect of sterol structure on permeability of
liposomes to glucose, glycerol and Rb+ . Biochim. Biophys. Acta 255(1), 321–330 (1972)
36. Dufourc, E.J., Parish, E.J., Chitrakorn, S., Smith, I.C.P.: Structural and dynamical details of
cholesterol-lipid interaction as revealed by deuterium NMR. Biochemistry 23(25), 6062–6071
(1984)
37. Dzieciuch-Rojek, M., Poojari, C., Bednar, J., Bunker, A., Kozik, B., Nowakowska, M., Vat-
tulainen, I., Wydro, P., Kepczynski, M., Rog, T.: Effects of membrane PEGylation on entry
and location of antifungal drug itraconazole and their pharmacological implications. Mol.
Pharmaceut. 14(4), 1057–1070 (2017)
Computer Modelling of the Lipid Matrix of Biomembranes 361
38. Egberts, E., Marrink, S.J., Berendsen, H.J.C.: Molecular-dynamics simulation of a phospho-
lipid membrane. Eur. Biophys. J. Biophy. Let. 22(6), 423–436 (1994)
39. Eicher, B., Heberle, F.A., Marquardt, D., Rechberger, G.N., Katsaras, J., Pabst, G.: Joint
small-angle X-ray and neutron scattering data analysis of asymmetric lipid vesicles. J. Appl.
Crystallogr. 50(Pt 2), 419–429 (2017)
40. El-Sayed, M., Guion, T., Fayer, M.: Effect of cholesterol on viscoelastic properties of dipalmi-
toylphosphatidylcholine multibilayers as measured by a laser-induced ultrasonic probe. Bio-
chemistry 25(17), 4825–4832 (1986)
41. Elmore, D.E.: Molecular dynamics simulation of a phosphatidylglycerol membrane. FEBS
Lett. 580(1), 144–148 (2006)
42. Epand, R.M.: Role of membrane lipids in modulating the activity of membrane-bound
enzymes. In: Yeagle, P.L. (ed.) The Structure of Biological Membranes, pp. 499–509. CRC
Press, Boca Raton (2005)
43. Essmann, U., Berkowitz, M.L.: Dynamical properties of phospholipid bilayers from computer
simulation. Biophys. J. 76(4), 2081–2089 (1999)
44. Falck, E., Patra, M., Karttunen, M., Hyvonen, M.T., Vattulainen, I.: Lessons of slicing mem-
branes: interplay of packing, free area, and lateral diffusion in phospholipid/cholesterol bilay-
ers. Biophys. J. 87(2), 1076–1091 (2004)
45. Falck, E., Rog, T., Karttunen, M., Vattulainen, I.: Lateral diffusion in lipid membranes through
collective flows. J. Am. Chem. Soc. 130(1), 44–45 (2008)
46. Feigenson, G.W., Chan, S.I.: Nuclear magnetic relaxation behavior of lecithin multilayers. J.
Am. Chem. Soc. 96(5), 1312–1319 (1974)
47. Feller, S.E.: Molecular dynamics simulations of lipid bilayers. Curr. Opin. Colloid Interface
Sci. 5(3–4), 217–223 (2000)
48. Frischleder, H., Gleichmann, S., Krahl, R.: Quantum-chemical and empirical calculations on
phospholipids. 3. Hydration of dimethylphosphate anion. Chem. Phys. Lipids 19(2), 144–149
(1977)
49. Galla, H.J., Hartmann, W., Theilen, U., Sackmann, E.: On 2-dimensional passive random-
walk in lipid bilayers and fluid pathways in biomembranes. J. Membr. Biol. 48(3), 215–236
(1979)
50. Gawrisch, K., Arnold, K., Gottwald, T., Klose, G., Volke, F.: D-2 Nmr-studies of phos-
phate—water interaction in dipalmitoyl phosphatidylcholine—water-systems. Stud. Biophys.
74, 13–14 (1978)
51. Goss, R., Lohr, M., Latowski, D., Grzyb, J., Vieler, A., Wilhelm, C., Strzalka, K.: Role of
hexagonal structure-forming lipids in diadinoxanthin and violaxanthin solubilization and de-
epoxidation. Biochemistry 44(10), 4028–4036 (2005)
52. Griepernau, B., Bockmann, R.A.: The influence of 1-alkanols and external pressure on the
lateral pressure profiles of lipid bilayers. Biophys. J. 95(12), 5766–5778 (2008)
53. Gullingsrud, J., Schulten, K.: Lipid bilayer pressure profiles and mechanosensitive channel
gating. Biophys. J. 86(6), 3496–3509 (2004)
54. Gumbart, J., Trabuco, L.G., Schreiner, E., Villa, E., Schulten, K.: Regulation of the protein-
conducting channel by a bound ribosome. Structure 17(11), 1453–1464 (2009)
55. Gurtovenko, A.A., Patra, M., Karttunen, M., Vattulainen, I.: Cationic DMPC/DMTAP lipid
bilayers: molecular dynamics study. Biophys. J. 86(6), 3461–3472 (2004)
56. Hall, A., Rog, T., Karttunen, M., Vattulainen, I.: Role of glycolipids in lipid rafts: a view
through atomistic molecular dynamics simulations with galactosylceramide. J. Phys. Chem.
B 114(23), 7797–7807 (2010)
57. Hamill, O.P., Martinac, B.: Molecular basis of mechanotransduction in living cells. Physiol.
Rev. 81(2), 685–740 (2001)
58. Hancock, J.F.: Lipid rafts: contentious only from simplistic standpoints. Nat. Rev. Mol. Cell
Biol. 7(6), 456–462 (2006)
59. Hauser, H., Pascher, I., Sundell, S.: Preferred conformation and dynamics of the glycerol
backbone in phospholipids—an Nmr and X-ray single-crystal analysis. Biochemistry 27(26),
9166–9174 (1988)
362 M. Pasenkiewicz-Gierula and M. Markiewicz
60. Helfrich, W.: Elastic properties of lipid bilayers—theory and possible experiments. Z Natur-
forsch C C 28(11–1), 693–703 (1973)
61. Heller, H., Schaefer, M., Schulten, K.: Molecular dynamics simulation of a bilayer of 200
lipids in the gel and liquid-crystal phases. J. Phys. Chem. 97, 8343–8360 (1993)
62. Huang, J., Swanson, J.E., Dibble, A.R., Hinderliter, A.K., Feigenson, G.W.: Nonideal mixing
of phosphatidylserine and phosphatidylcholine in the fluid lamellar phase. Biophys. J. 64(2),
413–425 (1993)
63. Hub, J.S., Salditt, T., Rheinstadter, M.C., de Groot, B.L.: Short-range order and collective
dynamics of DMPC bilayers: a comparison between molecular dynamics simulations, X-ray,
and neutron scattering experiments. Biophys. J. 93(9), 3156–3168 (2007)
64. Hyslop, P.A., Morel, B., Sauerheber, R.D.: Organization and interaction of cholesterol and
phosphatidylcholine in model bilayer membranes. Biochemistry 29, 1025–1038 (1990)
65. Ingolfsson, H.I., Melo, M.N., van Eerden, F.J., Arnarez, C., Lopez, C.A., Wassenaar, T.A.,
Periole, X., de Vries, A.H., Tieleman, D.P., Marrink, S.J.: Lipid organization of the plasma
membrane. J. Am. Chem. Soc. 136(41), 14554–14559 (2014)
66. Jacob, R.F., Cenedella, R.J., Mason, R.P.: Direct evidence for immiscible cholesterol domains
in human ocular lens fiber cell plasma membranes. J. Biol. Chem. 274(44), 31613–31618
(1999)
67. Jacobson, K., Mouritsen, O.G., Anderson, R.G.W.: Lipid rafts: at a crossroad between cell
biology and physics. Nat. Cell Biol. 9(1), 7–14 (2007)
68. Jambeck, J.P.M., Lyubartsev, A.P.: Derivation and systematic validation of a refined all-atom
force field for phosphatidylcholine lipids. J. Phys. Chem. B 116(10), 3164–3179 (2012)
69. Jambeck, J.P.M., Lyubartsev, A.P.: An extension and further validation of an all-atomistic
force field for biological membranes. J. Chem. Theory Comput. 8(8), 2938–2948 (2012)
70. Javanainen, M., Hammaren, H., Monticelli, L., Jeon, J.H., Miettinen, M.S., Martinez-Seara,
H., Metzler, R., Vattulainen, I.: Anomalous and normal diffusion of proteins and lipids in
crowded lipid membranes. Faraday Discuss. 161, 397–417 (2013)
71. Javanainen, M., Martinez-Seara, H., Vattulainen, I.: Nanoscale membrane domain formation
driven by cholesterol. Sci. Rep. 7 (2017)
72. Jedlovszky, P., Mezei, M.: Effect of cholesterol on the properties of phospholipid membranes.
2. Free energy profile of small molecules. J. Phys. Chem. B 107(22), 5322–5332 (2003)
73. Jeon, J.H., Javanainen, M., Martinez-Seara, H., Metzler, R., Vattulainen, I.: Protein crowding
in lipid bilayers gives rise to non-gaussian anomalous lateral diffusion of phospholipids and
proteins. Phys. Rev. X6(2) (2016)
74. Jiang, W., Hardy, D.J., Phillips, J.C., Mackerell Jr., A.D., Schulten, K., Roux, B.: High-
performance scalable molecular dynamics simulations of a polarizable force field based on
classical Drude oscillators in NAMD. J. Phys. Chem. Lett. 2(2), 87–92 (2011)
75. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., Klein, M.L.: Comparison of
simple potential functions for simulating liquid water. J. Chem. Phys. 79(2), 926–935 (1983)
76. Jorgensen, W.L., Maxwell, D.S., TiradoRives, J.: Development and testing of the OPLS all-
atom force field on conformational energetics and properties of organic liquids. J. Am. Chem.
Soc. 118(45), 11225–11236 (1996)
77. Jorgensen, W.L., Tirado-Rives, J.: The OPLS [optimized potentials for liquid simulations]
potential functions for proteins, energy minimizations for crystals of cyclic peptides and
crambin. J. Am. Chem. Soc. 110(6), 1657–1666 (1988)
78. Kaszuba, K., Rog, T., Bryl, K., Vattulainen, I., Karttunen, M.: Molecular dynamics simulations
reveal fundamental role of water as factor determining affinity of binding of beta-blocker
Nebivolol to beta(2)-adrenergic receptor. J. Phys. Chem. B 114(25), 8374–8386 (2010)
79. Kaznessis, Y.N., Kim, S.T., Larson, R.G.: Simulations of zwitterionic and anionic phospho-
lipid monolayers. Biophys. J. 82(4), 1731–1742 (2002)
80. Kim, T., Im, W.: Revisiting hydrophobic mismatch with free energy simulation studies of
transmembrane Helix Tilt and rotation. Biophys. J. 99(1), 175–183 (2010)
81. Klauda, J.B., Venable, R.M., Freites, J.A., O’Connor, J.W., Tobias, D.J., Mondragon-Ramirez,
C., Vorobyov, I., MacKerell, A.D., Pastor, R.W.: Update of the CHARMM all-atom additive
Computer Modelling of the Lipid Matrix of Biomembranes 363
force field for lipids: validation on six lipid types. J. Phys. Chem. B 114(23), 7830–7843
(2010)
82. Kneller, G.R., Baczynski, K., Pasenkiewicz-Gierula, M.: Communication: consistent picture
of lateral subdiffusion in lipid bilayers: molecular dynamics simulation and exact results. J.
Chem. Phys. 135(14) (2011)
83. Koynova, R., Caffrey, M.: Phases and phase transitions of the phosphatidylcholines. Biochim.
Biophys. Acta-Rev. Biomem. 1376(1), 91–145 (1998)
84. Kulig, W., Pasenkiewicz-Gierula, M., Rog, T.: Topologies, structures and parameter files for
lipid simulations in GROMACS with the OPLS-aa force field: DPPC, POPC, DOPC, PEPC,
and cholesterol. Data Brief 5, 333–336 (2015)
85. Kulig, W., Pasenkiewicz-Gierula, M., Rog, T.: Cis and trans unsaturated phosphatidylcholine
bilayers: a molecular dynamics simulation study. Chem. Phys. Lipids 195, 12–20 (2016)
86. Kusumi, A., Pasenkiewicz-Gierula, M.: Rotational diffusion of a steroid molecule in phos-
phatidylcholine membranes—effects of alkyl chain-length, unsaturation, and cholesterol as
studied by a spin-label method. Biochemistry 27(12), 4407–4415 (1988)
87. Lamoureux, G., MacKerell, A.D., Roux, B.: A simple polarizable model of water based on
classical Drude oscillators. J. Chem. Phys. 119(10), 5185–5197 (2003)
88. Lamoureux, G., Roux, B.: Modeling induced polarization with classical Drude oscillators:
theory and molecular dynamics simulation algorithm. J. Chem. Phys. 119(6), 3025–3039
(2003)
89. Leach, A.R.: Molecular Modelling, Principles and Applications, 2nd edn. Pearson Education,
Harlow, UK (2001)
90. Lee, A.G.: How to understand lipid–protein interactions in biological membranes. In: Yeagle,
P.L. (ed.) Structure of Biological Membranes. CRC Press, Boca Raton (2012)
91. Leekumjorn, S., Sum, A.K.: Molecular simulation study of structural and dynamic properties
of mixed DPPC/DPPE bilayers. Biophys. J. 90(11), 3951–3965 (2006)
92. Lehnert, R., Eibl, H.-J., Müller, K.: Order and dynamics in lipid bilayers from 1,2-dipalmitoyl-
sn-glycero-phospho-diglycerol as studied by NMR spectroscopy. J. Phys. Chem. B 108,
12141–12150 (2004)
93. Levine, Y.K., Wilkins, M.H.F.: Structure of oriented lipid bilayers. Nat. New Biol. 230(11),
69 (1971)
94. Levitt, M., Hirshberg, M., Sharon, R., Daggett, V.: Potential-energy function and parameters
for simulations of the molecular-dynamics of proteins and nucleic-acids in solution. Comput.
Phys. Commun. 91(1–3), 215–231 (1995)
95. Lewis, B.A., Engelman, D.M.: Lipid bilayer thickness varies linearly with acyl chain-length
in fluid phosphatidylcholine vesicles. J. Mol. Biol. 166(2), 211–217 (1983)
96. Lewis, R.N.A.H., McElhaney, R.N.: Calorimetric and spectroscopic studies of the ther-
motropic phase behavior of lipid bilayer model membranes composed of a homologous series
of linear saturated phosphatidylserines. Biophys. J. 79(4), 2043–2055 (2000)
97. Lewis, R.N.A.H., Mcelhaney, R.N., Monck, M.A., Cullis, P.R.: Studies of highly asymmetric
mixed-chain diacyl phosphatidylcholines that form mixed-interdigitated gel phases—fourier-
transform infrared and h-2 Nmr spectroscopic studies of hydrocarbon chain conformation and
orientational order in the liquid-crystalline state. Biophys. J. 67(1), 197–207 (1994)
98. Li, H., Chowdhary, J., Huang, L., He, X.B., MacKerell, A.D., Roux, B.: Drude polarizable
force field for molecular dynamics simulations of saturated and unsaturated zwitterionic lipids.
J. Chem. Theory Comput. 13(9), 4535–4552 (2017)
99. Lindahl, E., Edholm, O.: Mesoscopic undulations and thickness fluctuations in lipid bilayers
from molecular dynamics simulations. Biophys. J. 79(1), 426–433 (2000)
100. Lindahl, E., Edholm, O.: Spatial and energetic-entropic decomposition of surface tension
in lipid bilayers from molecular dynamics simulations. J. Chem. Phys. 113(9), 3882–3893
(2000)
101. Lindahl, E., Edholm, O.: Molecular dynamics simulation of NMR relaxation rates and slow
dynamics in lipid bilayers. J. Chem. Phys. 115(10), 4938–4950 (2001)
364 M. Pasenkiewicz-Gierula and M. Markiewicz
102. Lingwood, D., Simons, K.: Lipid rafts as a membrane-organizing principle. Science
327(5961), 46–50 (2010)
103. Luzzati, V., Husson, F.: Structure of liquid-crystalline phases of lipid-water systems. J. Cell
Biol. 12(2), 207 (1962)
104. Lyubartsev, A.P., Rabinovich, A.L.: Recent development in computer simulations of lipid
bilayers. Soft Matter 7(1), 25–39 (2011)
105. Maciejewski, A., Pasenkiewicz-Gierula, M., Cramariuc, O., Vattulainen, I., Rog, T.: Refined
OPLS all-atom force field for saturated phosphatidylcholine bilayers at full hydration. J. Phys.
Chem. B 118(17), 4571–4581 (2014)
106. MacKerell, A.D. Jr., Brooks, B., Brooks, III C.L., Nilsson, L., Roux, B., Won, Y., Karplus, M.:
Charmm: the energy function and its parameterization with an overview of the program. In:
von Rague Schleyer, P. (ed.) Encyclopedia of Computational Chemistry, vol. 2, pp 271–277.
Wiley (1998)
107. MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J.,
Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau,
F.T.K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D.T., Prodhom, B., Reiher, W.E., Roux,
B., Schlenkrich, M., Smith, J.C., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera,
J., Yin, D., Karplus, M.: All-atom empirical potential for molecular modeling and dynamics
studies of proteins. J. Phys. Chem. B 102(18), 3586–3616 (1998)
108. Mainali, L., Raguz, M., Subczynski, W.K.: Formation of cholesterol bilayer domains precedes
formation of cholesterol crystals in cholesterol/dimyristoylphosphatidylcholine membranes:
EPR and DSC studies. J. Phys. Chem. B 117(30), 8994–9003 (2013)
109. Mark, P., Nilsson, L.: Structure and dynamics of the TIP3P, SPC, and SPC/E water models at
298 K. J. Phys. Chem. A 105(43), 9954–9960 (2001)
110. Markiewicz, M., Baczynski, K., Pasenkiewicz-Gierula, M.: Properties of water hydrating the
galactolipid and phospholipid bilayers: a molecular dynamics simulation study. Acta Biochim.
Pol. 62(3), 475–481 (2015)
111. Markiewicz, M., Pasenkiewicz-Gierula, M.: Comparative model studies of gastric toxicity of
nonsteroidal anti-inflammatory drugs. Langmuir 27(11), 6950–6961 (2011)
112. Marrink, S.J., Berkowitz, M., Berendsen, H.J.C.: Molecular dynamics simulation of a mem-
brane/water interface: the ordering of water and its relation to the hydration force. Langmuir
9(11), 3122–3131 (1993)
113. Marrink, S.J., de Vries, A.H., Tieleman, D.P.: Lipids on the move: simulations of membrane
pores, domains, stalks and curves. Biochim. Biophys. Acta-Biomem. 1788(1), 149–168 (2009)
114. Marrink, S.J., Lindahl, E., Edholm, O., Mark, A.E.: Simulation of the spontaneous aggregation
of phospholipids into bilayers. J. Am. Chem. Soc. 123(35), 8638–8639 (2001)
115. Marrink, S.J., Risselada, H.J., Yefimov, S., Tieleman, D.P., de Vries, A.H.: The MARTINI
force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111(27),
7812–7824 (2007)
116. Marsh, D., Smith, I.C.P.: Interacting spin label study of fluidizing and condensing effects of
cholesterol on lecithin bilayers. Biochim. Biophys. Acta 298(2), 133–144 (1973)
117. Martinez-Seara, H., Rog, T., Karttunen, M., Vattulainen, I., Reigada, R.: Cholesterol induces
specific spatial and orientational order in cholesterol/phospholipid membranes. Plos One 5(6)
(2010)
118. McConnell, H.: Molecular motion in biological membranes. In: Berliner, L. (ed.) Spin Label-
ing: Theory and Applications, pp. 525–561. Academic Press, New York (1976)
119. Mcintosh, T.J., Simon, S.A.: Area per molecule and distribution of water in fully hydrated
dilauroylphosphatidylethanolamine bilayers. Biochemistry 25(17), 4948–4952 (1986)
120. Meirovitch, E., Igner, D., Igner, E., Moro, G., Freed, J.H.: Electron-spin relaxation and order-
ing in smectic and supercooled nematic liquid-crystals. J. Chem. Phys. 77(8), 3915–3938
(1982)
121. Meyer, G.R., Gullingsrud, J., Schulten, K., Martinac, B.: Molecular dynamics study of MscL
interactions with a curved lipid bilayer. Biophys. J. 91(5), 1630–1637 (2006)
Computer Modelling of the Lipid Matrix of Biomembranes 365
122. Miao, L., Nielsen, M., Thewalt, J., Ipsen, J.H., Bloom, M., Zuckermann, M.J., Mouritsen,
O.G.: From lanosterol to cholesterol: structural evolution and differential effects on lipid
bilayers. Biophys. J. 82(3), 1429–1444 (2002)
123. Moore, P.B., Lopez, C.F., Klein, M.L.: Dynamical properties of a hydrated lipid bilayer from
a multinanosecond molecular dynamics simulation. Biophys. J. 81(5), 2484–2494 (2001)
124. Mouritsen, O.G.: Life—As a Matter of Fat, The Emerging Science of Lipidomics. Springer-
Verlag, Berlin Heidelberg (2005)
125. Mouritsen, O.G., Jorgensen, K.: Dynamical order and disorder in lipid bilayers. Chem. Phys.
Lipids 73(1–2), 3–25 (1994)
126. Mukhopadhyay, P., Monticelli, L., Tieleman, D.P.: Molecular dynamics simulation of a
palmitoyl-oleoyl phosphatidylserine bilayer with Na+ counterions and NaCl. Biophys. J.
86(3), 1601–1609 (2004)
127. Murzyn, K., Rog, T., Jezierski, G., Takaoka, Y., Pasenkiewicz-Gierula, M.: Effects of phospho-
lipid unsaturation on the membrane/water interface: a molecular simulation study. Biophys.
J. 81(1), 170–183 (2001)
128. Murzyn, K., Rog, T., Pasenkiewicz-Gierula, M.: Phosphatidylethanolamine-
phosphatidylglycerol bilayer as a model of the inner bacterial membrane. Biophys. J.
88(2), 1091–1103 (2005)
129. Murzyn, K., Róg, T., Pasenkiewicz-Gierula, M.: Comparison of the conformation and the
dynamics of saturated and monounsaturated hydrocarbon chains of phosphatidylcholines.
Curr. Top. Biophys. 23(1), 87–94 (1999)
130. Murzyn, K., Zhao, W., Karttunen, M., Kurdziel, M., Rog, T.: Dynamics of water at membrane
surfaces: effect of headgroup structure. Biointerphases 1(3), 98–105 (2006)
131. Nagle, J.F.: Theory of lipid monolayer and bilayer phase-transitions—effect of headgroup
interactions. J. Membr. Biol. 27(3), 233–250 (1976)
132. Nagle, J.F.: Area lipid of bilayers from Nmr. Biophys. J. 64(5), 1476–1481 (1993)
133. Nagle, J.F., Tristram-Nagle, S.: Structure of lipid bilayers. Biochim. Biophys. Acta-Rev.
Biomem. 1469(3), 159–195 (2000)
134. Neria, E., Fischer, S., Karplus, M.: Simulation of activation free energies in molecular systems.
J. Chem. Phys. 105(5), 1902–1921 (1996)
135. Neumann, S., van Meer, G.: Sphingolipid management by an orchestra of lipid transfer pro-
teins. Biol. Chem. 389(11), 1349–1360 (2008)
136. Niemela, P.S., Miettinen, M.S., Monticelli, L., Hammaren, H., Bjelkmar, P., Murtola, T.,
Lindahl, E., Vattulainen, I.: Membrane proteins diffuse as dynamic complexes with lipids. J.
Am. Chem. Soc. 132(22), 7574–7575 (2010)
137. Niemela, P.S., Ollila, S., Hyvonen, M.T., Karttunen, M., Vattulainen, I.: Assessing the nature
of lipid raft membranes. PLoS Comput. Biol. 3(2), 304–312 (2007)
138. Oldfield, E., Meadows, M., Rice, D., Jacobs, R.: Spectroscopic studies of specifically deu-
terium labeled membrane systems. Nuclear magnetic resonance investigation of the effects
of cholesterol in model systems. Biochemistry 17(14), 2727–2740 (1978)
139. Ollila, S., Hyvonen, M.T., Vattulainen, I.: Polyunsaturation in lipid membranes: dynamic
properties and lateral pressure profiles. J. Phys. Chem. B 111(12), 3139–3150 (2007)
140. Orsi, M., Michel, J., Essex, J.W.: Coarse-grain modelling of DMPC and DOPC lipid bilayers.
J. Phys. Condens. Mat. 22(15) (2010)
141. Pandit, S.A., Berkowitz, M.L.: Molecular dynamics simulation of dipalmitoylphos-
phatidylserine bilayer with Na+ counterions. Biophys. J. 82(4), 1818–1827 (2002)
142. Pandit, S.A., Bostick, D., Berkowitz, M.L.: Mixed bilayer containing dipalmitoylphos-
phatidylcholine and dipalmitoylphosphatidylserine: lipid complexation, ion binding, and elec-
trostatics. Biophys. J. 85(5), 3120–3131 (2003)
143. Pandit, S.A., Jakobsson, E., Scott, H.L.: Simulation of the early stages of nano-domain forma-
tion in mixed bilayers of sphingomyelin, cholesterol, and dioleylphosphatidylcholine. Bio-
phys. J. 87(5), 3312–3322 (2004)
144. Pandit, S.A., Scott, H.L.: Multiscale simulations of heterogeneous model membranes.
Biochim. Biophys. Acta-Biomem. 1788(1), 136–148 (2009)
366 M. Pasenkiewicz-Gierula and M. Markiewicz
145. Pasenkiewicz-Gierula, M., Baczynski, K., Markiewicz, M., Murzyn, K.: Computer modelling
studies of the bilayer/water interface. Biochim. Biophys. Acta-Biomem. 1858(10), 2305–2321
(2016)
146. Pasenkiewicz-Gierula, M., Rog, T.: Conformations, orientations and time scales characterising
dimyristoylphosphatidylcholine bilayer membrane. molecular dynamics simulation studies.
Acta Biochim. Pol. 44(3), 607–624 (1997)
147. Pasenkiewicz-Gierula, M., Rog, T., Kitamura, K., Kusumi, A.: Cholesterol effects on the
phosphatidylcholine bilayer polar region: a molecular simulation study. Biophys. J. 78(3),
1376–1389 (2000)
148. Pasenkiewicz-Gierula, M., Subczynski, W.K., Kusumi, A.: Rotational diffusion of a steroid
molecule in phosphatidylcholine-cholesterol membranes: fluid-phase microimmiscibility in
unsaturated phosphatidylcholine-cholesterol membranes. Biochemistry 29(17), 4059–4069
(1990)
149. Pasenkiewicz-Gierula, M., Takaoka, Y., Miyagawa, H., Kitamura, K., Kusumi, A.: Hydrogen
bonding of water to phosphatidylcholine in the membrane as studied by a molecular dynamics
simulation: location, geometry, and lipid-lipid bridging via hydrogen-bonded water. J. Phys.
Chem. A 101(20), 3677–3691 (1997)
150. Pasenkiewicz-Gierula, M., Takaoka, Y., Miyagawa, H., Kitamura, K., Kusumi, A.: Charge
pairing of headgroups in phosphatidylcholine membranes: a molecular dynamics simulation
study. Biophys. J. 76(3), 1228–1240 (1999)
151. Pastor, R.W., Feller, S.E.: Time scales of lipid dynamics and molecular dynamics. In: Merz,
K.M., Roux, B. (eds.) Biological Membranes, a Molecular Perspective from Computation
and Experiment, pp. 3–29. Birkhäυσερ, Boston (1996)
152. Pastor, R.W., MacKerell, A.D.: Development of the CHARMM force field for lipids. J. Phys.
Chem. Lett. 2(13), 1526–1532 (2011)
153. Patra, M.: Lateral pressure profiles in cholesterol-DPPC bilayers. Eur. Biophys. J. Biophy.
Let. 35(1), 79–88 (2005)
154. Patra, M., Salonen, E., Terama, E., Vattulainen, I., Faller, R., Lee, B.W., Holopainen, J.,
Karttunen, M.: Under the influence of alcohol: the effect of ethanol and methanol on lipid
bilayers. Biophys. J. 90(4), 1121–1135 (2006)
155. Perozo, E., Rees, D.C.: Structure and mechanism in prokaryotic mechanosensitive channels.
Curr. Opin. Struct. Biol. 13(4), 432–442 (2003)
156. Petersen, N.O., Chan, S.I.: More on motional state of lipid bilayer membranes—interpretation
of order parameters obtained from nuclear magnetic-resonance experiments. Biochemistry
16(12), 2657–2667 (1977)
157. Pike, L.J.: Rafts defined: a report on the keystone symposium on lipid rafts and cell function.
J. Lipid. Res. 47(7), 1597–1598 (2006)
158. Plesnar, E., Subczynski, W.K., Pasenkiewicz-Gierula, M.: Saturation with cholesterol
increases vertical order and smoothes the surface of the phosphatidylcholine bilayer: a molec-
ular simulation study. Biochim. Biophys. Acta-Biomem. 1818(3), 520–529 (2012)
159. Plesnar, E., Subczynski, W.K., Pasenkiewicz-Gierula, M.: Is the cholesterol bilayer domain
a barrier to oxygen transport into the eye lens? Biochim. Biophys. Acta-Biomem. 1860,
434–441 (2018)
160. Poger, D., Caron, B., Mark, A.E.: Validating lipid force fields against experimental data:
progress, challenges and perspectives. Biochim. Biophys. Acta-Biomem. 1858(7), 1556–1565
(2016)
161. Poger, D., Mark, A.E.: On the validation of molecular dynamics simulations of saturated and
cis-monounsaturated phosphatidylcholine lipid bilayers: a comparison with experiment. J.
Chem. Theory. Comput. 6(1), 325–336 (2010)
162. Ponder, J.W., Case, D.A.: Force fields for protein simulations. Adv. Protein Chem. 66, 27–85
(2003)
163. Poyry, S., Rog, T., Karttunen, M., Vattulainen, I.: Significance of cholesterol methyl groups.
J. Phys. Chem. B 112(10), 2922–2929 (2008)
Computer Modelling of the Lipid Matrix of Biomembranes 367
164. Price, D.J., Brooks, C.L.: A modified TIP3P water potential for simulation with Ewald sum-
mation. J. Chem. Phys. 121(20), 10096–10103 (2004)
165. Rand, R.P., Parsegian, V.A.: Hydration forces between phospholipid-bilayers. Biochim. Bio-
phys. Acta 988(3), 351–376 (1989)
166. Reviakine, I., Brisson, A.: Formation of supported phospholipid bilayers from unilamellar
vesicles investigated by atomic force microscopy. Langmuir 16(4), 1806–1815 (2000)
167. Risselada, H.J., Marrink, S.J.: The molecular face of lipid rafts in model membranes. Proc.
Natl. Acad. Sci. USA 105(45), 17367–17372 (2008)
168. Roark, M., Feller, S.E.: Molecular dynamics simulation study of correlated motions in phos-
pholipid bilayer membranes. J. Phys. Chem. B 113(40), 13229–13234 (2009)
169. Robinson, A.J., Richards, W.G., Thomas, P.J., Hann, M.M.: Head group and chain behavior
in biological-membranes—a molecular-dynamics computer-simulation. Biophys. J. 67(6),
2345–2354 (1994)
170. Robinson, A.J., Richards, W.G., Thomas, P.J., Hann, M.M.: Behavior of cholesterol and its
effect on head group and chain conformations in lipid bilayers—a molecular-dynamics study.
Biophys. J. 68(1), 164–170 (1995)
171. Rog, T., Martinez-Seara, H., Munck, N., Oresic, M., Karttunen, M., Vattulainen, I.: Role of
cardiolipins in the inner mitochondrial membrane: insight gained through atom-scale simu-
lations. J. Phys. Chem. B 113(11), 3413–3422 (2009)
172. Rog, T., Murzyn, K., Gurbiel, R., Takaoka, Y., Kusumi, A., Pasenkiewicz-Gierula, M.: Effects
of phospholipid unsaturation on the bilayer nonpolar region: a molecular simulation study. J.
Lipid. Res. 45(2), 326–336 (2004)
173. Rog, T., Murzyn, K., Pasenkiewicz-Gierula, M.: The dynamics of water at the phospholipid
bilayer surface: a molecular dynamics simulation study. Chem. Phys. Lett. 352(5–6), 323–327
(2002)
174. Rog, T., Pasenkiewicz-Gierula, M.: Cholesterol effects on the phosphatidylcholine bilayer
nonpolar region: a molecular simulation study. Biophys. J. 81, 2190–2202 (2001)
175. Rog, T., Pasenkiewicz-Gierula, M.: Cholesterol effects on the phospholipid condensation and
packing in the bilayer: a molecular simulation study. FEBS Lett. 502, 68–71 (2001)
176. Rog, T., Pasenkiewicz-Gierula, M.: Effects of epicholesterol on the phosphatidylcholine
bilayer: a molecular simulation study. Biophys. J. 84(3), 1818–1826 (2003)
177. Rog, T., Pasenkiewicz-Gierula, M.: Non-polar interactions between cholesterol and phospho-
lipids: a molecular dynamics simulation study. Biophys. Chem. 107(2), 151–164 (2004)
178. Rog, T., Pasenkiewicz-Gierula, M.: Cholesterol-sphingomyelin interactions: a molecular
dynamics simulation study. Biophys. J. 91(10), 3756–3767 (2006)
179. Rog, T., Pasenkiewicz-Gierula, M.: Cholesterol effects on a mixed-chain phosphatidylcholine
bilayer: a molecular dynamics simulation study. Biochimie 88(5), 449–460 (2006)
180. Rog, T., Pasenkiewicz-Gierula, M., Vattulainen, I., Karttunen, M.: What happens if choles-
terol is made smoother: importance of methyl substituents in cholesterol ring structure on
phosphatidylcholine-sterol interaction. Biophys. J. 92(10), 3346–3357 (2007)
181. Rog, T., Pasenkiewicz-Gierula, M., Vattulainen, I., Karttunen, M.: Ordering effects of choles-
terol and its analogues. Biochim. Biophys. Acta 1788, 97–121 (2009)
182. Rog, T., Stimson, L.M., Pasenkiewicz-Gierula, M., Vattulainen, I., Karttunen, M.: Replacing
the cholesterol hydroxyl group with the ketone group facilitates sterol flip-flop and promotes
membrane fluidity. J. Phys. Chem. B 112(7), 1946–1952 (2008)
183. Rosso, L., Gould, I.R.: Structure and dynamics of phospholipid bilayers using recently devel-
oped general all-atom force fields. J. Comput. Chem. 29(1), 24–37 (2008)
184. Samanta, S., Hezaveh, S., Milano, G., Roccatano, D.: Diffusion of 1,2-Dimethoxyethane and
1,2-dimethoxypropane through phosphatidycholine bilayers: a molecular dynamics study. J.
Phys. Chem. B 116(17), 5141–5151 (2012)
185. Schuler, L.D., Daura, X., Van Gunsteren, W.F.: An improved GROMOS96 force field for
aliphatic hydrocarbons in the condensed phase. J. Comput. Chem. 22(11), 1205–1218 (2001)
186. Schwille, P., Korlach, J., Webb, W.W.: Fluorescence correlation spectroscopy with single-
molecule sensitivity on cell and model membranes. Cytometry 36(3), 176–182 (1999)
368 M. Pasenkiewicz-Gierula and M. Markiewicz
187. Scott, H.L.: Modeling the lipid component of membranes. Curr. Opin. Struct. Biol. 12(4),
495–502 (2002)
188. Shi, Q., Voth, G.A.: Multi-scale modeling of phase separation in mixed lipid bilayers. Biophys.
J. 89(4), 2385–2394 (2005)
189. Shin, Y.K., Ewert, U., Budil, D.E., Freed, J.H.: Microscopic versus macroscopic diffusion
in model membranes by electron-spin-resonance spectral-spatial imaging. Biophys. J. 59(4),
950–957 (1991)
190. Shinoda, W., Shimizu, M., Okazaki, S.: Molecular dynamics study on electrostatic properties
of a lipid bilayer: polarization, electrostatic potential, and the effects on structure and dynamics
of water near the interface. J. Phys. Chem. B 102(34), 6647–6654 (1998)
191. Siu, S.W.I., Pluhackova, K., Bockmann, R.A.: Optimization of the OPLS-AA force field for
long hydrocarbons. J. Chem. Theory. Comput. 8(4), 1459–1470 (2012)
192. Smondyrev, A.M., Berkowitz, M.L.: Molecular dynamics simulation of dipalmitoylphos-
phatidylcholine membrane with cholesterol sulfate. Biophys. J. 78(4), 1672–1680 (2000)
193. Smondyrev, A.M., Berkowitz, M.L.: Effects of oxygenated sterol on phospholipid bilayer
properties: a molecular dynamics simulation. Chem. Phys. Lipids 112(1), 31–39 (2001)
194. Soni, S.P., Ward, J.A., Sen, S.E., Feller, S.E., Wassall, S.R.: Effect of trans unsaturation on
molecular organization in a phospholipid membrane. Biochemistry 48(46), 11097–11107
(2009)
195. Stepniewski, M., Bunker, A., Pasenkiewicz-Gierula, M., Karttunen, M., Rog, T.: Effects of
the lipid bilayer phase state on the water membrane interface. J. Phys. Chem. B 114(36),
11784–11792 (2010)
196. Stouch, T.R.: Lipid-membrane structure and dynamics studied by all-atom molecular-
dynamics simulations of hydrated phospholipid-bilayers. Mol. Simulat. 10(2–6), 335–362
(1993)
197. Subczynski, W.K., Hyde, J.S., Kusumi, A.: Effect of alkyl chain unsaturation and cholesterol
intercalation on oxygen transport in membranes: a pulse ESR spin labeling study. Biochem-
istry 30(35), 8578–8590 (1991)
198. Subczynski, W.K., Mainali, L., Raguz, M., O’Brien, W.J.: Organization of lipids in fiber-cell
plasma membranes of the eye lens. Exp. Eye Res. 156, 79–86 (2017)
199. Subczynski, W.K., Wisniewska, A., Yin, J.-J., Hyde, J.S., Kusumi, A.: Hydrophobic barriers of
lipid bilayer membranes formed by reduction of water penetration by alkyl chain unsaturation
and cholesterol. Biochemistry 33, 7670–7681 (1994)
200. Sundaralingam, M.: Molecular structures and conformations of the phospholipids and sphin-
gomyelins. Ann. NY Acad. Sci. 195, 324–355 (1972)
201. Tabony, J., Perly, B.: Quasi-elastic neutron-scattering measurements of fast local translational
diffusion of lipid molecules in phospholipid-bilayers. Biochim. Biophys. Acta 1063(1), 67–72
(1991)
202. Takaoka, Y., Pasenkiewicz-Gierula, M., Miyagawa, H., Kitamura, K., Tamura, Y., Kusumi, A.:
Molecular dynamics generation of nonarbitrary membrane models reveals lipid orientational
correlations. Biophys. J. 79(6), 3118–3138 (2000)
203. Tepper, H.L., Voth, G.A.: Mechanisms of passive ion permeation through lipid bilayers:
insights from simulations. J. Phys. Chem. B 110(42), 21327–21337 (2006)
204. Terama, E., Ollila, O.H.S., Salonen, E., Rowat, A.C., Trandum, C., Westh, P., Patra, M.,
Karttunen, M., Vattulainen, I.: Influence of ethanol on lipid membranes: from lateral pressure
profiles to dynamics and partitioning. J. Phys. Chem. B 112(13), 4131–4139 (2008)
205. Tessier, M.B., DeMarco, M.L., Yongye, A.B., Woods, R.J.: Extension of the GLYCAM06
biomolecular force field to lipids, lipid bilayers and glycolipids. Mol. Simulat. 34(4), 349–363
(2008)
206. Tieleman, D.P., Marrink, S.J., Berendsen, H.J.C.: A computer perspective of membranes:
molecular dynamics studies of lipid bilayer systems. Biochim. Biophys. Acta-Rev. Biomem.
1331(3), 235–270 (1997)
207. Tristram-Nagle, S., Nagle, J.F.: Lipid bilayers: thermodynamics, structure, fluctuations, and
interactions. Chem. Phys. Lipids 127(1), 3–14 (2004)
Computer Modelling of the Lipid Matrix of Biomembranes 369
208. Truscott, R.J.: Age-related nuclear cataract: a lens transport problem. Ophthalmic. Res. 32,
185–194 (2000)
209. Tu, K.C., Klein, M.L., Tobias, D.J.: Constant-pressure molecular dynamics investigation of
cholesterol effects in a dipalmitoylphosphatidylcholine bilayer. Biophys. J. 75(5), 2147–2156
(1998)
210. Tuchtenhagen, J., Ziegler, W., Blume, A.: Acyl-chain conformational ordering in liquid-
crystalline bilayers—comparative Ft-Ir and H-2-Nmr studies of phospholipids differing in
headgroup structure and chain-length. Eur. Biophys. J. 23(5), 323–335 (1994)
211. Ulrich, A.S., Volke, F., Watts, A.: The dependence of phospholipid headgroup mobility on
hydration as studied by deuterium-Nmr spin-lattice relaxation-time measurements. Chem.
Phys. Lipids. 55(1), 61–66 (1990)
212. Vacha, R., Berkowitz, M.L., Jungwirth, P.: Molecular model of a cell plasma membrane with
an asymmetric multicomponent composition: water permeation and ion effects. Biophys. J.
96(11), 4493–4501 (2009)
213. Vainio, S., Jansen, M., Koivusalo, M., Rog, T., Karttunen, M., Vattulainen, I., Ikonen, E.:
Significance of sterol structural specificity—desmosterol cannot replace cholesterol in lipid
rafts. J. Biol. Chem. 281(1), 348–355 (2006)
214. van Gunsteren, W.F., Daura, X., Mark, A.E.: Gromos force field. In: von Rague Schleyer, P.
(ed.) Encyclopedia of Computational Chemistry, vol. 2, pp. 1211–1216. Wiley (1998)
215. van Meer, G.: Cellular lipidomics. EMBO J. 24(18), 3159–3165 (2005)
216. van Meer, G., Voelker, D.R., Feigenson, G.W.: Membrane lipids: where they are and how they
behave. Nat. Rev. Mol. Cell Biol. 9(2), 112–124 (2008)
217. Vattulainen, I., Rog, T.: Lipid simulations: a perspective on lipids in action. Cold Spring
Harbor Perspect. Biol. 3(4) (2011)
218. Vaz, W.L.C., Almeida, P.F.: Microscopic versus macroscopic diffusion in one-component
fluid phase lipid bilayer-membranes. Biophys. J. 60(6), 1553–1554 (1991)
219. Veatch, S.L., Keller, S.L.: Seeing spots: complex phase behavior in simple membranes.
Biochim. Biophys. Acta-Mol. Cell Res. 1746(3), 172–185 (2005)
220. Vist, M.R., Davis, J.H.: Phase-Equilibria of cholesterol dipalmitoyl-phosphatidylcholine
mixtures—H-2 nuclear magnetic-resonance and differential scanning calorimetry. Biochem-
istry 29(2), 451–464 (1990)
221. Volkov, V.V., Palmer, D.J., Righini, R.: Heterogeneity of water at the phospholipid membrane
interface. J. Phys. Chem. B 111(6), 1377–1383 (2007)
222. Vollhardt, D.: Effect of unsaturation in fatty acids on the main characteristics of Langmuir
monolayers. J. Phys. Chem. C 111(18), 6805–6812 (2007)
223. White, S.H., Jacobs, R.E., King, G.I.: Partial specific volumes of lipid and water in mixtures
of egg lecithin and water. Biophys. J. 52(4), 663–665 (1987)
224. Widomska, J., Raguz, M., Subczynski, W.K.: Oxygen permeability of the lipid bilayer mem-
brane made of calf lens lipids. Biochim. Biophys. Acta-Biomem. 1768(10), 2635–2645 (2007)
225. Wiener, M.C., White, S.H.: Structure of a Fluid Dioleoylphosphatidylcholine bilayer deter-
mined by joint refinement of X-Ray and neutron-diffraction data. 2. Distribution and packing
of terminal methyl-groups. Biophys. J. 61(2), 428–433 (1992)
226. Wiener, M.C., White, S.H.: Structure of a Fluid Dioleoylphosphatidylcholine bilayer deter-
mined by joint refinement of X-ray and neutron-diffraction data. 3. Complete structure. Bio-
phys. J. 61(2), 434–447 (1992)
227. Wilkinson, D.A., Nagle, J.F.: Dilatometry and calorimetry of saturated phos-
phatidylethanolamine dispersions. Biochemistry 20(1), 187–192 (1981)
228. Zhang, Z., Lu, L., Berkowitz, M.L.: Energetics of cholesterol transfer between lipid bilayers.
J. Phys. Chem. B 112(12), 3807–3811 (2008)
229. Zhao, W., Gurtovenko, A.A., Vattuainen, I., Karttunen, M.: Cationic Dimyristoylphos-
phatidylcholine and Dioleoyloxytrimethylammonium propane lipid bilayers: atomistic insight
for structure and dynamics. J. Phys. Chem. B 116(1), 269–276 (2012)
230. Zhao, W., Rog, T., Gurtovenko, A.A., Vattulainen, I., Karttunen, M.: Atomic-scale struc-
ture and electrostatics of anionic palmitoyloleoylphosphatidyl-glycerol lipid bilayers with
Na+ counterions. Biophys. J. 92(4), 1114–1124 (2007)
370 M. Pasenkiewicz-Gierula and M. Markiewicz
231. Zhao, W., Rog, T., Gurtovenko, A.A., Vattulainen, I., Karttunen, M.: Role of phosphatidyl-
glycerols in the stability of bacterial membranes. Biochimie 90(6), 930–938 (2008)
Modeling of Membrane Proteins
Abstract The membrane proteins are still the “Wild West” of structural biology.
Although more and more membrane proteins structures are determined, their func-
tioning is still difficult to investigate because they are fully functional only in the
membranous environments. Several specific methodologies were developed to inves-
tigate various aspects of their cellular life but still they are challenging for compu-
tational methods. In this chapter we summarize the efforts made on elucidation the
structural and dynamical properties of different types of membrane proteins empha-
sizing on those computational methods which were designed and employed particu-
larly to study membrane proteins including their interactions in complex membranous
systems. This chapter was updated in all subsections compared to the 1st edition.
1 Introduction
About 30% of the genes included in the human genome encode membrane pro-
teins. These proteins participate in a large number of normal and abnormal cell
processes, including: (1) transport of ions, water and small solutes via pumps and
channels; (2) signaling via receptors; (3) metabolism via membrane enzymes; (4)
entry of pathogens into cells, (5) programmed cell death; and (6) intercellular struc-
tural interactions. This is why a greater attention must be paid to the structures of
these proteins and how they relate to normal and abnormal function. Crystallization
is the method of choice for generating high-resolution structural models. However,
membrane proteins have both hydrophobic and hydrophilic surfaces, a duality that
makes them more difficult to crystallize than water-soluble proteins. Therefore, rel-
atively few structures of membrane proteins have been solved at the level of atomic
resolution compared to soluble proteins. In addition, high-resolution structures are
important but not sufficient to understand how membrane proteins (and soluble pro-
teins as well) function. To explore questions of molecular mechanism, protein-protein
interactions, and others, it is necessary to carry out biochemical, biophysical but also
computational studies that are assisted by structural knowledge. Molecular dynamics
simulations will become increasingly valuable for understanding membrane protein
function, as they can reveal the dynamic behavior not seen in the static structures.
Significant increase of computational power, in synergy with more efficient compu-
tational methodologies, allows to carry out molecular dynamics simulations of any
structurally known membrane protein in its native environment, covering timescales
of up to 0.1 ms in all-atom simulations. At the frontiers of membrane protein sim-
ulations are receptors, ion channels, aquaporins, passive and active transporters,
and bioenergetic proteins. The membrane environment influences the function of
membrane proteins, through electrostatic and steric interactions as well as through
the membrane’s internal pressure. Therefore, the environment needs to be properly
taken into account in simulation studies.
This chapter describes the usage of major methodologies that can be employed
for the research of membrane protein structure and function. The quantum methods
can be used for investigations of active sites of membrane enzymes, like membrane
Modeling of Membrane Proteins 373
proteases, to study in detail the mechanisms of their action, what is similar to studying
of soluble enzymes. On the contrary, the methods for membrane protein structure
prediction must be highly specialized to include specific nature of these proteins
and the effect of the membrane. Usually, it is followed by prediction of location
in the membrane including individual tilt of protein in the membrane. Factors like
lipid tension and a hydrophobic mismatch must also be taken into account. Steered
molecular dynamics simulations help to investigate unfolding processes of membrane
proteins. Uncovered stable regions of protein structure that keep the whole protein
stable provide unique insight into intra-protein interactions in balance with protein-
lipid relations. Interactions in the membrane between proteins lead to the formation
of homo- and hetero-oligomers. Such assemblies can be very important for the proper
function of the cell though the properties of large protein-lipid rafts are still to be
discovered because of their size. The coarse-grain approaches are used to overcome
the space and time limitations in molecular dynamics simulations. Specific coarse-
grain force fields are successfully used to explain dynamics of large portions of
membrane with proteins inside. On the other hand the implicit solvent methods
provide smooth potentials to investigate processes inside the membrane as well as at
the water-membrane border. Similarly to soluble proteins one can also use docking
methods to locate ligands such as agonists, antagonists and inverse agonists in the
binding site of membrane receptors. After binding they change a receptor shape
and due to action of molecular switches linked together by an extended hydrogen
bond network this change propagates through the receptor to the other side of a lipid
bilayer. In this way, specifically for the membrane proteins, the signal is transmitted
from exterior to the inside of the cell and can be traced to some extent by simulation
methods. Ligands can come to the binding site either from the aqueous side (similarly
to soluble proteins) or directly from the membrane provided they are hydrophobic
enough. A little is known about the folding processes of membrane proteins which are
markedly different from that of soluble proteins but unfortunately the computational
methods are still at the very beginning in this area.
In a common classification the integral membrane proteins are divided into five
main types depending on their localization inside the membrane: type I single-
pass transmembrane with cytoplasmic C-termini, type II single-pass transmembrane
with extracellular C-termini, multipass transmembrane, lipid chain-anchored, GPI-
anchored and peripheral membrane proteins [1]. Anchored membrane proteins do not
span across the membrane like integral proteins, but they are attached to it on one side
through a covalently bound lipid or glycosylphosphatidylinositol (GPI)—a glycol-
ipid attached to the C-termini during posttransalational modification. Although not
discussed here, an important type of membrane proteins should also be mentioned,
namely peripheral proteins which bound noncovalently to the surface of membrane
or another transmembrane protein. A distinct anisotropic environment of the lipid
374 D. Latek et al.
globular) for one of the best predictors (PHOBIUS) is quite high—99% [35] enabling
a reliable genome annotation of sequence data.
376 D. Latek et al.
3 Prediction Methods
Except for the general tools for genome annotation there are also classifiers which
point to specific membrane protein families and its division into classes. For exam-
ple to classify members of a GPCRs family several computational methods has been
used, namely a phylogenetic analysis (an A-F GPCRs classification system [36]; with
a Hidden Markov Models-based search (GRAFS [37]—see Fig. 1), self-organizing
maps [38], neighbor-joining [39], unweighted pair group method with arithmetic
mean [40], multidimensional scaling [41]. A useful hierarchical integration of var-
ious alignment-based and alignment-free classification methods was implemented
in a 7TMRmine web server for discovering 7TMRs (seven transmembrane region-
containing receptors) [42]. Several methods were also developed to identify β-barrel
transmembrane proteins and members of the OMP (outer membrane protein) family
which use machine-learning methods [43–47] combined with analysis of amino acids
composition [48, 49], sequence profiles, alignment of secondary structure blocks
[50], C-terminal pattern identification [51] or empirical scores [52].
Even more important than the classification of a membrane protein is information
about its topology. The correct topology can be predicted for 70% of all mem-
brane proteins, mostly by predictors based on Hidden Markov Models (HMMs) (see
Table 2). However, accurate prediction of the start and end of a TM segment still
represents a challenge [34]. Most of the methods for predicting membrane proteins
topology are pointed to either transmembrane helical proteins (TMH) or transmem-
brane β-barrels proteins (TMB), because in these two cases slightly different rules
are taken into account. In case of TMH proteins predictors use the following rules to
distinguish them from globular proteins and to find their topology [53]. Membrane
spanning helices are 20–30 amino acids long and the fraction of hydrophobic amino
acids is high in membrane helices. However, one issue has to be mentioned con-
cerning detection of TM helices based on their hydrophobicity. Namely, there are
other motifs which are highly hydrophobic such as signal peptides, signal anchors,
amphipathic helices or re-entrant helices—helices that enter and exit the membrane
on the same side e.g. in aquaporins [54]. Filtering-out such motifs by e.g. SignalP
[55] or TargetP [56] prior to the TMH topology prediction is certainly beneficial.
In some TM topology predictors detection of signal peptides or re-entrant regions
is already implemented e.g. in Phobius and PolyPhobius [57, 58], TOP-MOD [59]
and OCTOPUS [60]. Globular regions between transmembrane helices are relatively
short and the charge distribution in loops is such as described by the “positive-inside”
rule which states that loops that do not translocate across the membrane are more
positively charged (i.a. Lys and Arg) compared to the ones that translocate [61].
Some membrane proteins have the “inside-out” topology, which means that they
consist of hydrophilic interior and hydrophobic exterior exposed to lipids e.g. bacte-
Modeling of Membrane Proteins 377
Fig. 1 The phylogenetic tree of GPCRs. The image taken with permissions from http://gpcr.scripps.
edu
riorhodopsin [62]. However, in most cases the presence of motifs at helices interfaces
together with the hydrogen bonding network turned out to be more crucial for the
stability of membrane proteins than the hydrophobic effect [63, 64].
The above rules for TMH protein topology prediction were implemented in algo-
rithms that present either statistical or machine-learning approach. Development of
the former methods was started by Kyte and Doolitle [30] with a simple predictor
of membrane spanning helical regions based on calculating an average hydropho-
bicity index for amino acids in a window moving along the protein sequence (a
sliding window). If the average hydrophobicity was above the certain threshold, the
current region was proposed to be a TM helix. In addition to hydrophobicity com-
monly observed amphilphilicity of TM helices was also taken into account [92].
The mentioned above “positive-inside” rule was incorporated in TM helices pre-
diction by van Heijne in TopPred [93]. Later approaches to the TM regions predic-
378 D. Latek et al.
Table 2 (continued)
Web server Website Method used References
TMB proteins
B2TMPRED http://gpcr.biocomp.unibo.it/ SVM [83]
cgi/predictors/outer/pred_
outercgi.cgi
HMM-B2TMR http://gpcr.biocomp.unibo.it/ HMM [46]
biodec
PRED-TMBB http://biophysics.biol.uoa.gr/ HMM [84]
PRED-TMBB
TBBPred http://crdd.osdd.net/raghava/ NN + SVM [85]
tbbpred/
ConBBPRED http://bioinformatics.biol. Consensus method: [86]
uoa.gr/ConBBPRED HMM&NN&SVM
ProfTMB http://www.predictprotein. HMM [87, 67]
org
TransFold http://bioinformatics.bc.edu/ Statistical potentials [88]
clotelab/transFold
TMBpro http://tmbpro.ics.uci.edu NN [89]
BOCTOPUS2 http://boctopus.cbr.su.se SVM & HMM [90, 91]
Abbreviations used:
k-NN k-nearest neighbor algorithm
SVM Support vector machines
NN Neural network
HMM Hidden Markov Model
tion improved the definition of hydrophobicity scale [78] e.g. by adding backbone
constraints related to the alpha helix dehydrating and salt-bridge formation [94] or
by creating knowledge-based scales derived from a database limited to membrane
proteins [95]. Some methods used scales other than hydrophobicity, namely other
properties of amino acids such as [96, 97] charge, aromaticity, size, conformational
properties, electronic properties [98] by which TM regions can be described. Such
amino acids properties were for example estimated based on TM proteins with known
topologies as in TMpred [65]. Combining different scales and properties of amino
acids, as in a SPLIT predictor [73] or a SOSUI predictor [69] which is based on the
Kyte-Doolitle’s hydrophobicity scale, amphiphilicity, relative and net charges and
protein length also proved to be successful. An interesting approach implemented in
PRED-TMR [70] was focused on propensities of terminal amino acids in each TM
helix. As in other fragmentary predictions such as secondary structure or solvent
accessibility prediction usage of sequence profiles instead of protein sequences also
380 D. Latek et al.
improved prediction of TM regions [66, 99, 100]. Nevertheless, lack of close homol-
ogous for 20–30% of membrane proteins (e.g. a GPCRs family) [101] still decreased
the prediction accuracy rates and prone to development of the DAS (dense-alignment
surface) method [68] in which a sequence alignment to non-homologous membrane
proteins used to predict TM regions is improved by usage of a special scoring matrix
and so-called low-stringency dot plots representing similarities between segments
of a certain length and not the whole protein sequences. TM regions can be easily
identified by such grid-like arrangements on plots.
Not only was the description of TM regions improved in topology prediction, but
also algorithms themselves. Kitsas et al. [102] implemented a higher order statis-
tics in his predictor. Machine-learning based approach was started by Rost [66] in
PHDhtm—a predictor employing a neural network. Later, Hidden Markov Models
(HMMs) were used in prediction of TM helices in HMMTOP [71] and TMHMM
[72]. Lio and Vannucci [103] incorporated wavelets in a TM regions predictor and
Nugent and Jones used support vector machines (SVM) in their predictor [104].
Ahmed combined together SVM and HMMs together with a commonly used rules
of TM regions prediction, Shen and Chou [100] used a K-nearest neighbor method
and recently Osmanbeyoglu et al. [105] used an active learning approach. The con-
sensus methods also proved their efficiency in TM topology prediction e.g. TOP-
CONS [79] merges results from OCTOPUS, TMHMM and SCAMPI and MetaTM
[81] derives a consensus TM prediction based on TopPred, PHDhtm, HMMTOP,
TMHMM, PolyPhobius and Memsat.
An interesting approach to prediction of one-dimensional structural features of
TMH proteins were presented recently by Ahmad et al. [82] as a HTM-ONE server.
HTM-ONE is based on a neural network which is trained not with one structural
feature, e.g. TM topology as in most of described above predictors, but simultane-
ously with a number of features: solvent-accessible surface, dihedral angles, kink
angles of TM helices, contacts between helices and PSSM (position-specific scoring
matrices).
The number of crystallized TM β-barrel proteins is much lower than TMH. Addition-
ally, the membrane spanning β-strands are shorter and of less particular amino acids
composition than TM helices [34]. Consequently, the topology prediction is more
difficult in case of TMB proteins. Schultz [106] analyzed β-barrel membrane pro-
teins and assumed several rules describing their topology. The number of β-strands
is always even with N and C-termini at the periplasmic barrel end. Tilt of β-strand is
around 45 degrees and only one of the possible tilt directions is energetically favor-
able. The shear number of a β-barrel is positive and around n + 2, where n is a number
of β-strands in the barrel. β-Strands are anti-parallel connected through short turns
at the periplasmic side and long loops with high sequence variability at the external
side. Described above features of β-barrels were implemented in several algorithms
for topology prediction implementing the most efficient [86] HMMs [107, 87], SVM
Modeling of Membrane Proteins 381
[108], neural networks [89] or statistical methods [109]. As in the case of TMH pre-
diction amino acids composition is taken into account [74], together with sequence
profiles [87] and statistical potentials [88].
Prediction of solvent (or lipid) accessible surface (i.e. buried residues) provides addi-
tional source of information to determine TM topology of a protein and may help
to design mutagenesis experiments aimed at identifying catalytically important TM
residues [110]. Accuracy of burial status predictions is relatively high—above 70%
[110, 111] for TM regions of membrane protein and 58% for entire membrane pro-
teins [112] which is comparable to accuracy achieved for globular proteins. The
main difference between buried and exposed residues in globular proteins is their
hydrophobicity, but in case of membrane proteins this feature is not that well dis-
tinctive [34]. Few methods developed to date which target specifically the membrane
protein accessible surface area (ASA) are based on sequence conservation patterns
as exposed residues are assumed to evolve faster than buried residues. Such conser-
vation patterns, before running the burial status prediction, can be translated e.g. into
a knowledge-based surface propensity scale which is highly correlated with other
propensity scales for membrane proteins such as hydrophobicity or hydropathy [113].
Like TM topology predictors also burial predictors use BLAST and PSI-BLAST
generated sequence profiles and support vector machines [110, 111]. Different con-
servation patterns in TM and globular regions of membrane proteins were taken into
account in MPRAP [112]—a web server that predicts buried and exposed residues
for entire membrane proteins. This unified prediction is possible due to the prior opti-
mization of SVM which included information about the location of residues with
respect to the membrane.
Lack of reliable algorithms which mimic the folding of membrane proteins in silico
and sparse structural information from crystallographic studies prompt to develop-
ment of methods extracting a more fine-grained description of membrane proteins
than a simple definition of their topology. Namely, several additional features were
subjected to prediction from membrane proteins sequence: kinks of TM helices,
location of re-entrant regions [59] (when entry and exit of a protein fragment are at
the same side of the membrane—a common feature of ion and water channel pro-
teins) and finally interfacial residues in a TM core. The key element in detection of
TMH kinks is presence of proline in a particular position of a TM helix either in a
query sequence or in a significant fraction of its close homologs [114–116]. Recently,
Kneissl et al. [117] reported a new kink predictor with included ASA predictions and
382 D. Latek et al.
statistics of Ser and Gly occurrences in kinks. Early methods for contact prediction
were based on correlated mutation analysis (CMA) [118] assuming that residues
close in space mutate in tandem. Additional information about predicted secondary
structure, solvent accessibility, homologous proteins and usage of advanced machine-
learning algorithms improved rather weak performance of CMA-based methods and
enabled to use them not only in a large scale globular protein structure prediction
[119] but also in GPCRs structure prediction [120]. In the latter case only a simple
sequence conservation filter was used. That shows that due to relative structural sim-
plicity imposed by the lipid bilayer of membrane proteins comparing globular ones
contact prediction requires less sophisticated algorithms, e.g. based only on CMA,
which result in quite high prediction accuracy [121]. Although contact predictors tar-
geting specifically membrane proteins are less common, several attempts have been
made in this field. Developed methods introduced similar factors in the contact pre-
diction as in case of globular proteins: sequence conservation and CMA [121, 122],
TM helices and β-strands packing motifs [123]—either structural (‘knob-into-hole’
and ‘ridge-into-groove’ [124]) or sequential [123, 125], amino acids propensities
[126], evolutionary [127] and knowledge-based data [128]. Distinct packing of TM
helices is crucial for the interface contact prediction since such interactions are mainly
accomplished by weakly polar amino acids that create contacts every fourth residue
of a helix in TM channels or by large polar amino acids every 3.5th residue of a
helix in TM receptors and membrane-integral redox proteins. The former type of
contacts were named as right-handed interactions because interacting residues are
placed in such a way that they form a right-handed curve while looking along the
main axis of the helix. The latter were named left-handed interactions, respectively
[129]. Detection of both, right and left-handed interactions in contact prediction was
implemented e.g. in a RHYTHM server [129, 130].
Prediction of kinks of TM helices together with prediction of other structural
deformations such as bulges or constrictions is an important issue in GPCR struc-
ture modeling. Such distinct structural features can be crucial, e.g. for the GPCR
ligand selectivity [131]. Two, recently updated web services, GPCR-SSFE 2.0 [16]
and GPCRDB [14], for GPCR structure modeling implement structural fingerprint
features such as kinks or bulges to search for the best template for the model building.
Attempts of tertiary predictions for membrane proteins are even more problematic
than in case of globular proteins since the number of membrane proteins structures
deposited in PDB is substantially smaller. Thus, comparative modeling—the most
common approach to structure predictions is severely hampered for membrane pro-
teins. On the other hand, de novo methods developed for globular proteins are based
on assumption of polar solvent around proteins and thus hardly could be used for pro-
teins embedded into a specific anisotropic membrane environment. Empirical force
fields which were designed to simulate behavior of membrane proteins are used
Modeling of Membrane Proteins 383
mostly in molecular dynamics which model biological systems in much shorter time
scale than protein folding and cannot be used in structure prediction. Coarse-grained
force fields combined with a Monte Carlo algorithm which enabled to predict folding
of at least small globular proteins [132] in the case of membrane proteins are very rare
(Rosetta-membrane [133] and HBMPs [134] are notable exceptions). For that reason
recent attempts by Ueno et al. [135] to develop a coarse-grained algorithm for fold-
ing of TM helices into the shape derived from a low-resolution electron microscopy
image will certainly gain interest of the research community. Despite those obvious
hindrances in structure modeling of membrane proteins several attempts have been
made either to template-based or de novo modeling (see Table 3) as the knowledge
of 3D structure is not only crucial in drug discovery process but even for reliable
classification of members of membrane protein families [136].
The first step in comparative modeling is the choice of a template (or templates) struc-
tures and generation of the target-template alignment. Except for similarity between
target and template sequences also a biological context should be taken into account,
e.g. an expected activation state of the modeled structure in case of membrane recep-
tors (GPCRs) [145] and similar structural fingerprints such as kinks or bulges [16]
and also coverage of functionally important sequence motifs [131]. Since classifi-
cation of membrane proteins into families is not always straightforward (see above)
an extensive search for close homologs should be performed in prior to structure
prediction by comparative modeling [136] e.g. using an algorithm based on Hidden
Markov Models as in SSFE [15]. Standard scoring matrices such as BLOSUM and
PAM used to align protein sequences were derived mostly from globular proteins
and do not take into account different sequence conservation patterns observed in
membrane proteins. Distinct evolutionary divergence of membrane proteins, high in
loops and low in TM regions, was taken into account in new substitution matrices for
TM helical proteins: JTT [75], PHAT [146], SLIM [147] and also for β-barrels [148].
Usage of those membrane-specific substitution matrices improves sequence align-
ment in many cases [149, 150], however attempts to use them only for TM regions
and e.g. a standard BLOSUM matrix for scoring of loop regions alignment (so-called
bipartite alignments) were not always successful [151]. More beneficial seems to be
a simple increase of a gap cost for TM regions and aligning them separately from
the rest of a protein even without changing the matrix into a membrane-specific,
as was firstly showed by Shafrir and Guy [152]. Such detection of a TM core and
including this information in the alignment generation by a more restrictive gaps
treatment and a membrane-specific substitution score was implemented lately in a
Medeller software [141]. Another approach to target-template alignments for mem-
brane proteins is to use anchored realignment [145], preserving important functional
motifs of membrane proteins and integrity of template TM helices (only one-residue
gaps in the alignment are allowed [153] with only slight intervention into the original
384 D. Latek et al.
Table 3 Web servers and stand-alone applications targeting structure prediction of membrane pro-
teins
Name Website Method References
Interface/contact predictors
HelixCorr http://webclu.bio.wzw.tum. Consensus method and CMA [121]
de/helixcorr
RHYTHM http://proteinformatics. PSSM and secondary [130]
charite.de/rhythm structure prediction and
sequence conservation
Full 3D model predictors
Rosetta- http://www.rosettacommons. Fragment-assembly and [133, 137]
membrane and org membrane proteins-based
Rosetta Broker statistical potentials
BCL::MP http://www.meilerlab.org/ Fragment-assembly and [138]
-Fold bclcommons membrane proteins-based
statistical potentials
FILM3 http://bioinf.cs.ucl.ac.uk/ Fragment-assembly based on [139]
introduction the Fragfold method
ModWeb https://modbase.compbio. Comparative modeling by [140]
ucsf.edu/scgi/modweb.cgi Modeller
Medeller http://opig.stats.ox.ac.uk/ TM core detection in the [141]
webapps/medeller/ alignment generation
Predictors targeting specific families
GPCRM http://gpcrm.biomodellab.eu Comparative modeling by [142]
Modeller and Rosetta;
multiple templates and
profile-profile alignment
GPCR-SSFE http://www.ssfa-7tmr.de/ Comparative modeling by [15, 16]
2.0 ssfe2/ Modeller
GPCR- http://open.gpcr-modsim. Comparative modeling by [143]
ModSim org/ Modeller with identification
of structural fingerprint
features
GPCR-I- https://zhanglab.ccmb.med. Comparative modeling by [10]
TASSER umich.edu/GPCR-I- I-TASSER threading method
TASSER/
GoMoDo http://molsim.sci.univr.it/cgi- Comparative modeling by [144]
bin/cona/begin.php Modeller and docking by
Autodock VINA
Abbreviations used:
SVM Support vector machines
CMA Correlated mutations analysis
OMP Outer membrane proteins
REMC Replica Exchange Monte Carlo
PSSM Position-specific scoring matrix
Modeling of Membrane Proteins 385
Since the model building procedure hardly ever takes into account a different amino
acids rotamers distribution in the membrane comparing the polar environment of
globular proteins even a short minimization of implicit or explicit membrane envi-
ronment improves the local accuracy of the final protein model [153, 162]. Perform-
ing molecular dynamics simulation in a membrane at least as long as the protein
relaxation time before e.g. a docking procedure is undoubtedly more beneficial but
requires a significant amount of computational resources and can be skipped in many
cases when any experimental data confirms reliability of the generated models [153,
163].
A more crucial than the model refinement in a membrane-like environment is a
reliable refinement of loops especially in the binding site area. Accuracy of such
refinement greatly depends on a position of loop anchoring residues in a certain
homology model [164]. Many methods for membrane proteins modeling use the
loop-modeling procedure implemented in Modeller which includes statistical poten-
tials (a DOPE score) [165] and can be characterized as a fragment-based method—-
like a SuperLooper web server based on a database of protein fragments [166]. Less
popular, but of equal performance [167] is another fragment-based method imple-
mented in Rosetta i.e. a cyclic coordinate descent algorithm [168]. Less optimal
treatment of disulfide bonds in Rosetta applications comparing an efficient disulfide
patch in Modeller, either based on the template’s local geometry or general rules of
386 D. Latek et al.
stereochemistry and the CHARMM force field, slightly favors the latter approach
[145]. This is because disulfide bonds are very common in membrane proteins e.g.
in the extracellular loop2 (EC2) in GPCRs. Both, in Modeller and Rosetta secondary
structure predictions can be used during the loop-modeling which improves method
performance especially in the case of long loops (more than 10 amino acids). As for
de novo methods useful in the modeling of long loops and N or C-termini of mem-
brane proteins successful results were obtained by the CABS method [169] in case of
GPCRs models, a Rosetta kinematic closure algorithm [170] and PLOP—a dihedral
angle search procedure with the all-atom OPLS-AA force field energy function and
a Generalized Born implicit solvent model, which was implemented commercially
as Prime (Schrödinger, LLC) [171].
As in the case of structure prediction of globular proteins the selection of the final,
most probable model of a protein is an important step. Yet, there are few MQAPs
(Model Quality Assessment Programs) which were developed specifically for mem-
brane proteins: an IQ method [172] based on the analysis of four types of inter-residue
interactions (hydrophobic interactions, hydrogen bonds, ionic bonds, and disulfide
bonds) within the transmembrane domains and ProQM [173] which is using sup-
port vector machines trained on structural features of membrane proteins such as
inter-atomic and inter-residue contacts, solvent-accessible surfaces, secondary struc-
ture, topology of TM region, a Z-coordinate (describing positioning of residues with
respect to the membrane center) combined with evolutionary information (profiles
and sequence conservation). MQAPs developed for globular proteins perform much
worse on membrane proteins due to significant differences in amino acid propen-
sity, packing density, and side-chain rotamer frequencies in soluble and membrane
proteins [174]. Alternatively to MQAPs, membrane protein models can be assessed
successfully by their stability during molecular dynamics simulations [175] or by
scoring functions provided by model building programs even lacking a representa-
tion of a lipid bilayer [174] e.g. Rosetta total energy [145] or low-resolution energy
function [173], a DOPE Modeller score [145, 143]. Progress in structural determi-
nation of membrane proteins enabled the usage of statistical potentials for scoring
models by, e.g. BCL::Score [176, 177]. Selection of the most suitable model quality
assessment method depends on the purpose. For example, in the GPCR modeling
which is aimed at drug discovery, a ligand-based approach, in which the interactions
with known binding ligands are used in the model assessment, is believed to be the
most beneficial [177–180].
Modeling of Membrane Proteins 387
Since the number of crystal structures of membrane proteins in PDB is limited the
comparative modeling frequently does not provide protein models which could be
confirmed by experimental data e.g. in case of early rhodopsin-based models of
GPCRs [181] or hERG channels [162]. Consequently, de novo methods for mem-
brane protein structure modeling are of great interest. Methods used for globular
proteins can still be used in some cases for membrane proteins provided some adjust-
ments of the solvent-related components in the force field are made e.g. in Rosetta-
membrane (or Rosetta Broker). Rosetta-membrane employs statistical potentials
derived from the known 3D structures of membrane proteins which take into account
two types of environment: polar and hydrophobic [133]. The TM topology prediction
from servers should be added during the modeling procedure. The performance of
Rosetta-membrane is comparable with the Rosetta performance for de novo model-
ing of globular proteins as long as a membrane protein is smaller than 150 amino
acids [182]. Unfortunately, most of membrane proteins of interest are longer than
200 residues and thus at least a limited set of constraints on the structural elements
packing has to be incorporated during the Rosetta-membrane folding [183]. Tertiary
restraints derived from the template structure are also needed for the CHARMM-
based hierarchical approach using an implicit membrane in a foldGPCR tool [184].
Nevertheless, few groups developed their own membrane proteins-specific de
novo tools i.e. GEnSeMBLE [185] and PREDICT [186] which both target 300 or
more residues long members of the GPCRs family. The latter approach is based on
sampling a reduced space of TM helices represented as discs on a 2D plane. The for-
mer, more realistic approach is based on a BiHelix algorithm [187] and its ancestor
Membstruk [188] which use the sampling the helix orientation angles space (a tilt
angle θ , a sweep angle φ and a rotation angle η) in a homology-based starting model.
Since the energy calculation of all possible combinations for 7TM helices is compu-
tational expensive a 7 helices bundle is split into pairs of interacting helices in the first
step and gathered again only from the low-energy conformations [187]. A recently
published de novo algorithm [134] based on a Replica Exchange Monte Carlo method
(REMC) also employs sampling of TMH orientation angles but with a reduced rep-
resentation of an amino acids: C-alpha atoms joined with united side-chains. The
lowest-energy model is refined in all-atom molecular dynamics in the AMBER9
force field. The idea of TM helices rotation with respect to templates structures
has proved its relevance during the last GPCRDock 2010 competition [153], while
the reliable model generation for the chemokine receptor CXCR4 required ~100°
rotation of a part of TM2 with respect to the template. Such rotation could also be
obtained by introducing a certain gap into the target-template alignment [145, 153].
388 D. Latek et al.
Several methods for comparative and de novo modeling of membrane proteins have
been developed to date (see Table 3), some of them in the form of web-servers—the
most beneficial for the research. Most of them target GPCRs family for which only
few structures are available in PDB despite the great interest from the pharmaceutical
industry. Except for the web-servers precomputed 3D models of membrane proteins
with unknown crystal structure can be accessed in various databases e.g. GPCRDB
(all human nonolfactory GPCRs in inactive, intermediate and active states—using
main template and alternative local templates) [14], GPCRRD (ITASSER-generated
models) [10], Mod-Base (Modeller-generated comparative models) [140], GPCR-
SSFE 2.0 (Modeller-generated models) [15, 16]. Critical assessment of available
structure modeling methods targeting membrane proteins is still limited, due to small
number and rare occurrence of membrane proteins in PDB.
5 Docking Methods
In the field of molecular modeling, docking is a method for predicting the preferred
orientation of one molecule to a second when they are bound to each other to form
a stable complex. Knowledge of the preferred orientation in turn may be used to
predict the strength of association or binding affinity between two molecules using
for example scoring functions. In the 1970s, complex modeling revolved around
Modeling of Membrane Proteins 389
manually identifying features on the surfaces of the interacting molecules, and inter-
preting the consequences for binding, function and activity. Computer programs
were typically used at the end of the modeling process, to discriminate between the
relatively few orientations which remained after the heuristic constraints had been
imposed. The computers was first employed in a study on hemoglobin interactions
in sickle-cell fibers by Levinthal et al. [192].
Molecular docking can be thought of as a problem of “lock-and-key”, where one
is interested in finding the correct relative orientation of the “key” which will open
up the “lock” (where on the surface of the lock is the key hole). Here, the protein can
be thought as the “lock” and the ligand as a “key”. Molecular docking may be also
defined as an optimization problem, which would describe the “best-fit” orientation
of a ligand that binds to a particular protein of interest (Fig. 2).
Three questions should be answered before docking experiment. The first one when
planning the docking experiment is there an experimental structure for the protein I
want to use as a target during the docking? To answer this question, it is necessary to
check the PDB database depository (www.pdb.org) and download the corresponding
target. If no 3D structure of a receptor is available, extensive structure prediction
studies should be performed, favorably followed by experimental studies confirming
reliability of the obtained protein model.
The second question to answer is where could my ligand be docked? The binding
site can be determined based on experimental data such as mutagenesis. If a receptor
loses its ligand binding ability after mutation of certain amino acids, most probably
those residues are close to or inside the binding site. In case of lack of experimental
Fig. 2 Formyl peptide fMLF docked to model of FPR1 (Formyl Peptide Receptor 1)
390 D. Latek et al.
data the binding site can be predicted based on geometry or electrostatic of the protein
surface. Several binding pocket prediction tools are described in Table 4.
The third question is how to obtain 3D structure of a ligand including total and
partial charges determined. To build such a 3D structure of a ligand one can use many
stand-alone applications as well as databases with the online access (see Table 5).
In many cases the lowest energy conformation of a ligand downloaded from the
databases or produced by standard tools is not sufficient for docking purposes due
to its flexibility while fitting to the receptor binding site. For that reason, all the
docking programs have the following features: (a) an exhaustive conformation search
algorithm which changes not only the starting conformation of a ligand but sometimes
also the receptor and provides candidate 3D structures of the complex; and (b) a
scoring function that scores all those candidates and ranks them according to the
intermolecular interaction energy (i.e. the more negative this energy is, the higher
the candidate’s score). Despite this apparent coincidence, each docking program
differs from the others in the search method which is used, the level of flexibility
Modeling of Membrane Proteins 391
docking programs such as DOCK [213], LUDI [212], FlexX [214], ADAM [215],
and eHiTs [209]. The last subtype of systematic search algorithms are the database
methods that use libraries of pre-generated conformations (so called conformational
ensembles) that are subsequently subjecting to a rigid body docking. This method is
employed in Glide [216, 217] and FRED [218].
In random or stochastic search algorithms the conformational space is sampled
by performing a random conformational change of a ligand structure followed by
acceptance or rejection of the resulting conformer based on a predefined probability
function. If the generated ligand conformation is accepted, it is used as the start-
ing point for a new random conformational change. Random search methods are
divided into three subtypes: (a) Monte Carlo (MC) methods; (b) Genetic Algorithm
(GA) methods; and (c) tabu search (TS) methods. In MC methods, the position and
conformation of the ligand is subjected to random subsequent changes followed by
the minimization step which are accepted based on the energy-dependent Metropolis
criterion [211]. The docking programs based on MC include: ICM [219], QXP [220],
Prodock [221], and MCDOCK [222]. Another subtype of random methods: GA uses
concepts derived from the theory of biological evolution to explore the conforma-
tional space of the ligand. Unlike MC methods, GAs start from an initial population
of different conformations of the ligand that are defined by sets of state variables or
genes that describe the conformation of the ligand and its translation and orientation
relative to the receptor. GOLD [223], AutoDock [224], SwissDock [225] are the
docking programs in which evolution algorithms are implemented. It is worth noting
that Autodock VINA, the newest version of Autodock, employs parallel processing
and accelerates small molecule ligand docking to such extent that it could be used
for docking on-line. Namely, Autodock VINA was implemented in GoMoDo [144],
recently in GUT-DOCK [226] and in MTiOpenScreen [227] web services.
The last subtype of random search algorithms are the tabu search (TS) methods
that work by imposing restrictions that prevent already explored areas of the ligand
conformational space from being visited again and, therefore, favor the analysis of
new conformations. To exclude already explored conformations, when a new ligand
conformation is available, its root-mean square deviation (RMSD) relative to the
previously visited conformations is computed. The lowest RMSD is compared with
a certain threshold value and, if it is higher, then the analyzed conformation of the
ligand is accepted and its coordinates are stored and used to accept or reject new
conformations.
Once the candidate ligand poses have been predicted, their binding affinity for the
receptor must be scored. This is done by means of a scoring function that evaluates
the search results and then gives, ideally, the highest score to the right pose. In fact,
if the search algorithm can find the correct pose but the scoring function cannot
recognize it, the program will make an invalid and useless suggestion to the scientist.
Modeling of Membrane Proteins 393
Therefore, the role of the scoring function is critical in every docking protocol. The
scoring functions commonly used in protein-ligand docking can be divided into four
major classes: (a) force field-based; (b) empirical-based; (c) knowledge-based, and
(d) consensus-based.
Force field-based scoring functions are similar to empirical-based functions (see
below) because they both predict the binding free energy of a protein-ligand complex
by adding individual contributions from different types of interactions. Nevertheless,
the interaction terms of the former are derived from the theoretical physics that under-
lie molecular mechanics as opposite to the experimental affinities used to derive the
latter. Dock [213] is a classic example of a force field based tool. Created in the 1980s
it was the first docking program. Empirical-based scoring functions are based on the
idea that the binding energy can be obtained by adding several individual and uncor-
related terms. Many of the terms in the empirical scoring functions have equivalences
in the force-field scoring functions but they are usually simpler in form. The programs
like GlideScore [216, 217], SYBYL/F-score [214], X-score [228] and Chemscore
[229] are all belonging to the empirical scoring methods class. Knowledge-based
scoring functions are based on ligand geometry and contact preferences derived
based on the Boltzmann distribution from databases of known protein-ligand com-
plexes. The last but not the least class of methods, consensus scoring functions,
combine the information obtained from different scoring approaches to compensate
for errors introduced by each of them and thus to improve the probability of finding
the correct solution. Examples include DrugScore [230], SMoG [231, 232], BLEEP
[233, 234] and GOLD/ASP [235].
If the bond angles, bond lengths and torsion angles of the components are not modified
at any stage of the docking it is called a rigid-body docking. A subject of speculation
is whether or not rigid-body docking is sufficiently good for most of studies. When
a substantial conformational change occurs within the components at the time of the
protein-ligand complex formation, the rigid-body docking is inadequate. However,
scoring all possible conformational changes is computationally too expensive when
both ligand and receptor structure are changed. For that reason, the flexible docking
procedures which permit a conformational change must efficiently select only a
small subset of possible conformational changes for consideration. Flexible docking
involving flexibility of the side chains of the receptor is called “Induced Fit Docking”.
The “Induced-Fit Docking” (IFD) module from the Schrödinger has been reported
to be a robust and accurate method to account for both ligand and receptor flexibility.
The average ligand root-mean-square deviation (RMSD) for the traditional rigid
receptor docking for 21 cases was 5.5 Å, while the RMSD from the Schrödinger IFD
module was 1.4 Å [236]. Recently, Hanson et al. used IFD method docked ligands
into lysophospholipid sphingosine-1-phosphate (S1P) G-coupled protein receptor
crystal structure to eliminate the differences between agonist and antagonist which
394 D. Latek et al.
have the different impact on the receptor structure [237]. Other programs such as
Gold [223], Autodock [224] and FlexX [214] can also perform flexible docking.
Table 6 Selected topology builder applications and topology databases of small ligands
Application Website Force fields References
ATB http://compbio.biosci. Gromos family [241]
uq.edu.au/atb/
PRODRG http://davapc1.bioch. Gromos 87 [242]
dundee.ac.uk/prodrg/
SwissParam http://swissparam.ch/ CHARMM [243]
CGenFF http://mackerell. CHARMM [244]
umaryland.edu/
~kenno/cgenff/
MKTOP http://www.aribeiro. AMBER03 [245]
net.br/mktop/ OPLS/AA
Acpype http://code.google. GAFF [246, 247]
com/p/acpype/
AutoSMILES http://www.yasara. GAFF [248]
org/autosmilesserver.
htm
Virtual chemistry http:// GAFF [249, 250]
virtualchemistry.org/ OPLS/AA
Lipidbook http://lipidbook.bioch. set of force fields [251]
ox.ac.uk/
[253]). Such experimental data justifies the usage of simple, consisting of one or
two phospholipid types, membrane models in the MD simulations. Nevertheless,
each protein is a separate case. Therefore, the data concerning the sensibility of a
given protein to the lipid composition of the membrane should be checked prior to
MD simulations since in certain cases it may influence the results [256, 257]. For a
thorough review on that subject please refer to [254].
When the lipid composition is finally established, the next step is to generate
an input file with the pre-equilibrated membrane along with topology files of all
molecules inside that bilayer. There exists an excellent lipid topology repository
called Lipidbook [251] which stores topologies parameterized for the commonly used
force fields like GROMOS43a1/53a6 [258, 259], CHARMM22/27/36 [260–262],
GAFF [263], OPLS/AA [264, 265], Slipids [266], Martini [267] and Bondini [268,
269] which are implemented in the popular molecular dynamics software packages:
GROMACS [270–274], NAMD [275], CHARMM [276] and Amber [277]. If the
available packages do not include the membrane topology which is needed for the
certain study, either because of improper size of required periodic box or a compo-
sition, the membrane may be built automatically by CHARMM-GUI [278–280] or
VMD [281] which allow for membrane size adjustments.
Modeling of Membrane Proteins 397
The position of the protein in the bilayer is another key factor heavily influencing
the outcome of MD simulations. Since the membrane position is not provided in PDB
files, a number of computational methods have been developed to facilitate the step
of membrane positioning. The key concept at this stage is the hydrophobicity of the
protein that determines the orientation and thickness of the membrane into which the
protein will be inserted. For a comprehensive review of methods for transmembrane
region prediction and related databases please refer to Sect. 3.2.
When the protein of interest is finally positioned with respect to the bilayer, dele-
tion of several lipid molecules is necessary so that they do not overlap with the
positioned molecule. A simple and naïve approach of lipids deletion may require
a long equilibration due to the very loose lipid packing around the protein. For-
tunately, there exist more sophisticated methods to perform that step. The tools
developed over the last decade implemented several approaches. An inflategro perl
script [282] implements inflation of the membrane followed by lipid deletion within
the given cutoff and subsequent gradual membrane compression with protein coor-
dinates remain constant during the whole process. Another example, a tool from
the GROMACS suite [270–274] called g_membed [283] (currently included into
the code of the main program mdrun) contracts the protein, deletes lipids within
the given cutoff and gradually decompresses the protein to its initial size perform-
ing one step of molecular dynamics during every iteration of the decompression
stage. The same approach is implemented in other tools, for instance a Yasara macro
called md_runmembrane.mcr which was designed to automate the process of mem-
brane simulation setup [284]. Both methods, g_membed and md_runmembrane.mcr,
result in dense lipid packing around the protein whereby the equilibration time is
reduced. The advent of multiscale simulations opened a new way where insertion and
equilibration are performed using a coarse-grained representation. Before running
production simulation a transformation to all-atom resolution is carried out. Insane
[285] and Backward [286] which can handle many types of lipids thereby allowing
for setup of complex membrane environments. Both tools use MARTINI force fields.
If the system after the protein insertion does not contain water layers, the solvation
step is required. Since the software used for the protein insertion takes into account
only the space criterion and not the properties of the environment, the final system
should be verified to capture misplaced water molecules. Such misplacement may
involve water molecules inserted into the hydrophobic core of the membrane and into
solvent-inaccessible protein cavities. Although in the former case water molecules
will diffuse out of the membrane during the equilibration step, it is reasonable to
remove them before starting the simulation at least for the sake of saving the com-
putational time. The latter type of misplaced water molecules are more problematic
since running such simulation with water in buried cavities renders the system to
unphysical states which undermines conclusions drawn from such study. A sudden
crash of the simulation may indicate that water molecules are present in a closed
cavity.
The last question to consider is how long the equilibration step should last and
how to detect its end when one may move on to the production run. It is obvious, that
the preparation of the investigated system should be designed in such a way that at
398 D. Latek et al.
the beginning of the equilibration step the system is as close to equilibrium as pos-
sible. Several steps to shorten the equilibration time were discussed in the previous
paragraphs. They include usage of a pre-equilibrated membrane, more sophisticated
protein insertion methods and a proper solvation of the system. A reliable protein
model is also important and this is a primary distress of researchers performing
homology modeling. Since the equilibration time depends on many factors, it is
essential to choose reasonable criteria that, once fulfilled, mark the end of equili-
bration process. One of the most commonly used criterion is the root mean square
deviation (RMSD) calculated with respect to the reference structure. Other criteria
to consider include various interaction energies (e.g. lipid-water or protein-lipid) or
a simulation box volume (when pressure coupling is applied). Once the properties
of interest converge to a stable value, the equilibration is finished.
A step-by-step manual setup of a membrane protein system is a labour-intensive
task. A notable progress in the development of tools automating this process could
be observed recently. One of such pipelines is used by MemProtMD database [287].
The tool automatically identifies new membrane proteins in Protein Data Bank and
performs membrane insertion, system equilibration and resolution transformation for
which it utilizes the already mentioned Insane and Backward tool duo. The popular
CHARMM-GUI web server gained new features like Martini Maker [288] or Mar-
tini to All-atom Converter. The latter one relies on the same toolset as MemProtMD.
A set of Membrane Builder improvements allows for more efficient construction of
even more complex all-atom membranes [289]. There exist tools that can be installed
and used locally. QwikMD [290] is a recent addition to VMD [291] visualization
toolkit and facilitates both setup and analysis of molecular dynamics simulations
through a graphical user interface. It provides workflows for both beginners and
more advanced users. High Throughput Molecular Dynamics (HTMD) is a platform
which integrates many functionalities from structure manipulation through running
calculations on different resources to trajectory analysis [292]. Its features are avail-
able as a set of Python classes and functions. The popularity of this language in
scientific environment also provides a boost to a further community-driven develop-
ment of extensions.
This chapter part highlighted the selected topics regarding the setup of MD sim-
ulations of membrane proteins. While the development of automated tools capable
of simulation setup, running and analysis serves the scientific community, some sys-
tems or steps might yield errors and require detailed inspection. In such case, it is
crucial to possess a more detailed knowledge.
Fig. 3 Exemplary steps of the unfolding pathway of rhodopsin. a Unfolding of helix TM1 (in blue).
b Unfolding of the protein region containing a disulphide bridge
400 D. Latek et al.
force constant characteristic to the type and model of the cantilever used. In the SMD
simulations the external force can be employed in various ways. (1) Since the AFM
cantilever is subjected to Hooke’s law its attachment to the sample can be modeled
as restrained by harmonic potential to a dummy atom (equivalent of a tip) which
is moving with a constant speed. Such method is very often used for mechanical
unfolding of proteins e.g. titin [293], bacteriorhodopsin [294] and investigation of
intermolecular forces between proteins and smaller molecules [295]. Due to similari-
ties to SMFS the results of simulations can be easily compared with the experimental
force-displacement (F–D) plots. (2) Another implementation of SMD is applying not
a constant speed but a constant force or a torque to selected atoms. Such a force is
added directly to the selected atoms during each step of MD simulation therefore a
dummy atom and a virtual spring is not needed. Such implementation is useful for
achieving nearly equilibrium state during pulling especially when the applied force
is equal to resistance forces so one can investigate internal regrouping of parts of
protein during ligand unbinding or during a movement (even rotational) of domains
[296]. Depending on the introduced force the obtained displacement can resemble
slightly biased thermal movements (very small forces) or molecule diffusion (moder-
ate forces) up to drift movement (strong forces) [297]. (3) The third method involves
using of frozen dummy atom while a spring is relatively week and initially stretched.
During the simulation a force constant of spring is gradually increased so the force
is increasing and enabling movement of atoms. This method was used to investigate
unbinding of avidin-biotin complex [298] but nowadays it is rarely used because a
direction of applied force cannot be changed.
Although the SMD methods are extremely useful providing details of processes
not available from experiment they have also some drawbacks. The most important
is that the pulling speed used in SMD is much larger (about six orders of magnitude)
than that in experiment because a single AFM pulling experiment can last even few
seconds while the longest SMD simulation is in a microsecond time scale. Because
of it the recorded forces in SMD simulations are higher than those in experiment
about one order of magnitude [297]. Nevertheless, since the obtained F-D curves
are very similar to experimental ones the mechanisms of unfolding or unbinding
should be also similar so results taken from SMD are valid and taking into account
a constant increase in computer efficiency the gap between theory and experiment
will be vanishing.
The SMD simulations were successfully used in various investigations. Mechan-
ical unfolding of bacteriorhodopsin (BR) unveiled the sequential unfolding pathway
of that protein and showed that dominant molecular interactions are networked hydro-
gen bonds and Van der Waals interactions between nonpolar groups. The researchers
suggested that the similar dynamic interaction network could be a key factor stabi-
lizing GPCRs and other membrane proteins [294]. Series of fast SMD simulations
concerning unfolding of various rhodopsin mutants associated with an autosomal
dominant form of retinitis pigmentosa also confirmed importance of the dynamic
interaction network. For the selected 20 point mutants all force curves were very
similar to the wild type rhodopsin curves, proving that mutation of one amino acid is
not enough to disrupt the rhodopsin structure and stability even if the protein function
Modeling of Membrane Proteins 401
is ceased [299]. Another SMD study [300] concerned the retinal extraction pathway
from the bacteriorhodopsin binding site into the membrane. A certain assumption
was made here, namely that the protein structure remains intact during the extraction
so the same path could be used for the insertion. Since there is no straight way for
retinal to leave the protein the time dependent force SMD protocol was applied. It
was observed, that retinal formed stable interactions at the assumed entry/exit site
suggesting that they may be formed prior to entering the protein cavity [300].
For the modeling of transition processes between two conformations of the system
a variation of SMD called Targeted Molecular Dynamics (TMD) may be success-
fully used. It consists of series of forced atom movements by which the appropriate
pathway to the final state is reached [301]. In recent years, TMD was used to study
e.g. the behaviour of a c-loop and channel gating in nicotinic receptors. The TMD
protocol was used to displace the c-loop from an “open” to “closed” position which
covers the active site. Such conformational change resulted in the structural reor-
ganization of the ligand-binding pocket, the β1-β2 loop, the Cys-loop and the β10
strand leading to channel widening [302].
The SMD needs to have a predefined direction and a value of applied force, yet it can
be hard to find the ligand access path to the receptor active site. Probing the complex
system with numerous potential solutions would require running a large number
of SMD simulations. Some of the calculations can last very long and therefore are
costly in terms of high performance computing resources. In addition, the possibility
of quick screening of experiment hypothesis may be essential for the success of the
whole project. The best solution to the above problems is to combine an efficient MD
algorithm with a molecular modeling tool to allow the low-cost simplified simulations
with the live interaction option, in other words the Interactive Molecular Dynamics
(IMD). In such simulations a researcher can use standard human interface devices
(e.g. a mouse or a special haptic device) to add forces to pull or restrain particular
atoms in the system. Haptic device allows additionally for bidirectional passing of
the force information, so the resistance of the system to the movement applied can
be felt by hand.
The computer times of the IMD simulations are much shorter than the SMD ones,
respectfully up to hours versus up to few months. Thus, the applied forces in IMD
have to be high to complete the pulling procedure. It is difficult to extract useful
quantitative information from interactive simulations of IMD comparing to SMD.
Nevertheless, IMD may be used to provide initial conformations for SMD.
The IMD protocol with a haptic device was used to investigate transition pathways
of arbitol and ribitol through a GlpF member of the aquaporin membrane proteins
family. From interactive runs there were chosen significant transition states to study
in further MD simulations. Yet, directly from IMD runs it was found which hydrogen
bonds are responsible for selectivity of the water channel in aquaporin [303].
402 D. Latek et al.
Fig. 4 A graphical representation of the exemplary SuMD algorithm. That particular supervision
algorithm was used for identification of the most probable ligand entrance pathway into CB1
cannabinoid receptor [305]
Fig. 5 Selected frames from SuMD simulations trajectories of two agonists—anandamide (left)
and 9 -THC (right)—entering the binding site of CB1 cannabinoid receptor. Those simulations
results indicate that the most probable ligand entrance pathway for CB1 cannabinoid receptor lies
between TM7 and TM1/TM2 and that ligands access the binding site directly from the membrane
[305]
[308]. The SuMD methodology was extensively tested not only for GPCR-s but also
for other membrane proteins and globular proteins [309].
SuMD approach is also very useful to analyze both orthosteric and allosteric
binding events broadening our perspectives in several scientific areas from molecu-
lar pharmacology to drug discovery. In particular it can be applied in a drug design
campaign for lead optimization in order to design novel binders with preferable
pharmacodynamic profiles. Moreover, SuMD represents a powerful tool to assist the
design site-directed mutagenesis experiments in order to investigate the molecular
recognition process. Very likely the future drug design will involve detailed char-
acterization of not only the bound state but also the whole liand-protein network of
recognition pathways, including all metastable intermediate states and for this reason
SuMD will become a very useful tool.
Membrane proteins play crucial role in passing information and transporting small
molecules between membrane-separated compartments. To perform their function
they interact with other proteins, forming transient or more stable homo- or het-
erooligomeric complexes [310–313]. Due to difficulties in solving the structures of
membrane proteins using X-ray diffraction or NMR, computational methods of struc-
ture and interaction prediction became quite important, offering insight into details
at the resolution inaccessible with current experimental methods. In this chapter we
briefly review selected methods of protein-protein interface prediction in the context
of membrane proteins.
Modeling of Membrane Proteins 405
The methods used for protein-protein interface prediction can be classified into
two groups:
• structure-based methods that use atom coordinates and atom types. This category
is employed in case of membrane proteins for which structural information is
available. The most prominent methods in this group are:
– Docking
– Molecular Dynamics (MD)
• sequence-based methods that rely on sequence alignments and residue conserva-
tion.
7.1 Docking
The procedure of docking involves three general steps: (a) generation of a complex
structure followed by (b) filtering out false positives based on a scoring function and
(c) refinement of the best ranked models. Various methods of searching the solution
space and ranking the results are reviewed in [314–316].
The most commonly used protein-protein docking engines are listed in Table 7.
One has to note that they have not been developed specifically for membrane proteins.
This is mostly due to the fact that in order to properly validate any new method,
a sufficient amount of experimental data, such as structures of proteins and their
complexes, is needed. This condition is not easy to meet in case of membrane proteins
due to experimental difficulties in solving their structures. Therefore, complexes of
membrane proteins are underrepresented and hence the docking programs may have
problems delivering good results in this area. Nevertheless, it is possible in many cases
to yield reasonable structures using the available programs. The issues a researcher
has to be aware of while attempting membrane protein docking are briefly outlined.
While some of them are membrane protein-specific, the others are more general.
First of all, the presence of the membrane is not taken into account during the
results ranking stage. Therefore, a solution that would be perfectly valid in a cyto-
plasmic environment is mostly invalid when placed in a lipid bilayer. The burden of
creating a filter that successfully selects and ranks membrane-aware complexes from
a population of results is left upon a researcher but the docking methods themselves
were shown to work even in such hard cases (see for instance [337, 338]). Second,
in case of membrane proteins it may be hard to identify obvious interaction sites like
surface bulges and cavities and the small contact area may not suffice for a good
prediction. Furthermore, if at least one of the proteins undergoes a significant con-
formational change during the formation of a protein-protein interface, the docking
engine, particularly if rigid, will likely fail to yield a native-like structure. What is
more, in order to further validate the model obtained from docking, stability of a
complex may have to be confirmed by MD. Since this is a time-consuming step, one
should employ some other available filters to limit the number of initial configura-
406 D. Latek et al.
tions. Last but not the least, a docking program may allow to use certain constraints in
order to limit a search space and produce more significant results. If any experimental
data, such as distance restraints between certain residues or reciprocal orientation of
complex subunits, is known, one is encouraged to use it to improve the quality of a
generated model. However, this step requires caution especially when interpretation
of experimental data is ambiguous. For instance a mutation of amino acid on site
A may induce conformational changes in a protein so that a distant binding site B
cannot interact with its partner anymore. If the aforementioned amino acid is used
to constrain the searching step, results will be rendered invalid. In this situation it
is desirable to generate more structures and to use experimental data as a filter. The
crystal structures of protein oligomers that can be employed for testing of the above
methods are shown on Fig. 6.
Biomolecules are dynamic systems and the employment of the exploration of their
dynamic properties can reveal their true nature. This is the reason why molecular
dynamics is a widely-used tool in computational research. Yet if one attempts to
find a proper interface by simulating a set of starting random complexes (even if the
presence of membrane is taken into account), they step into a time- and resource-
consuming experience that is simply too costly unless the interacting proteins are
really small. The reason is a timescale of complex formation that may not be reachable
Modeling of Membrane Proteins 407
Fig. 6 The protein-protein interfaces in crystal structures. a The trimer of bacteriorhodopsin. PDB
id:1BRR. b Two different interfaces in oligomer of opioid receptor μOR. PDB id:4DKL The
interfaces are encircled by red dashed ellipses. The interacting helices are colored and labeled
with MD, particularly when the complex formation induces large conformational
changes. This is the reason why MD is usually used as a complementary tool with
a docking engine of one’s choice where docking delivers a set of starting structures
and MD determines whether the complex is transient or stable.
As previously noted, docking engines lack proper filters that remove membrane-
infeasible solutions. This drawback transfers this responsibility to a researcher. The
structures that passed the test can be subjected to molecular dynamics simulation. For
the sake of accuracy, the simulations should be carried out in a membrane environ-
ment and this requirement imposes applying a longer system preparation procedure
in comparison with water-soluble proteins. For more details please see Sect. 6.
The trajectory analysis provides valuable information on the properties of stud-
ied protein complexes: (a) area and type of protein-protein interface, (b) energy of
408 D. Latek et al.
interaction, (c) various structural changes of protomers upon binding, and even (d)
kinetics of complex formation/dissociation for sufficiently long simulations. The
role of computational research is not limited to validation of experimental data. The
results of simulations delineate new research paths for experimental labs, like for
instance picking residues for mutations and predicting resulting interfaces. There-
fore, molecular dynamics is an important tool in a portfolio of a modern scientist
interested in the formation of protein-protein complexes.
The protein sequence records vastly outnumber the protein structures solved to date.
It is not uncommon that for certain protein family very few if any protein struc-
tures are known. This was the case with G protein-coupled receptors (GPCRs) at
the beginning of 2000s when of this important family only rhodopsin structure was
solved [339]. The sequence-based methods, often equipped with a reasonable tem-
plate structure, may still bring valuable information regarding residues of primary
significance for protein structure and function, including protein-protein interfaces.
These methods rely on sequence homology and produce their output after analyzing
multiple sequence alignments. Below there is a brief overview of selected sequence-
based approaches for protein-protein prediction.
Evolutionary trace (ET) method [340] uses a multiple sequence alignments to
build a phylogenetic tree. The sequences are then divided into several groups during
clustering. The population is scanned for the residues that are conserved within
the group but differ in between them. Such residues are labeled evolutionary trace
residues and are claimed to be important due to a lower probability of mutation. The
ET residues are subsequently mapped onto the structure of the protein in order to
visualize the location of functional sites. The different flavors of ET analysis were
used to distinguish residues responsible for binding ligands, G-protein binding and
another monomer [341–343].
Correlated mutation analysis (CMA) searches for mutations that occur together
in a multiple sequence alignment [344]. The mechanism of action is that the effect of
one mutation is compensated by the other one and hence the protein-protein interface
remains functional. This method is in general used for determination of structurally
important residues, not only between but also within a single protein molecules
(please see Sect. 3.3). This method was shown to be useful when applied to membrane
protein interface predictions [342, 345]. Subtractive correlation mutation method
(SCM) can be used for membrane dimers formed by paralogs [346]. A very recent
method Structure-based CMA (SCMA) combines protein structural information and
co-evolutionary information [347] and overcomes the low signal to noise ratio, a
well-known disadvantage of CMA, which was dealt with before [348].
Each method has its strengths and weaknesses. Therefore to avoid a distorted
view and gain predictive edge it is advisable to use both structure and sequence-
based methods. Careful selection of the input data should never be underestimated
Modeling of Membrane Proteins 409
since the computer only processes what it is given and the onus is on a researcher to
produce meaningful results.
The environment has a great impact on properties and function of biomolecules. For
proper modeling of e.g. proteins, one have to simulate all necessary surroundings,
mainly water and/or lipids or more general, solvent. However, the number of solvent
atoms is of at least one order magnitude bigger than that of molecule of interest.
That leads to the conclusion that most of the computer resources in all-atom explicit
simulations are devoted to solvent-solvent interactions.
8.1 Theory
Usually implicit solvent models assume that an examined part of a system is treated
with the full-atom description, whereas solvent is represented as a continuous media
with properties that reflect real but only average qualities of the environment (usually
water). This transformation leads to an additional energy term, the free energy of
solvation, which stand for all the effects that solvent has on solute and is thermo-
dynamically represented by the change in free energy when molecule is transferred
from vacuum to solvent.
Here we present only a very basic theory of implicit solvation; for better descrip-
tion please check Roux [349]. In general, the energy of solvent-solute system depends
on a solute’s configuration (coordination vector X) and solvent (coordination vector
Y):
where G slv (X) is a solvation term, averaged solvent influence to the solute at fixed
position X. One can decompose the free energy of solvation G slv (X) into two terms:
np
nonpolar solvation effects G slv (X), and electrostatic contribution G elec
slv (X). The
410 D. Latek et al.
latter is mainly electrostatic potential acting on the molecules charges from polarized
solvent and is commonly called the reaction field. The nonpolar term is mainly
governed by the work which is needed for displacing solvent molecules from the
space occupied by solute and is commonly called the cavity formation. Calculation
np
of G slv (X) and G elec
slv (X) depends on specific methods for implicit solvation which
can be divided into two groups: based on continuum electrostatic and semi-empirical
methods. Here we describe the methods which are suitable for molecular dynamics,
e.g. with analytical derivatives which allows to calculate forces acting on the system.
Methods based on electrostatics assume that solute charges reside in low dielectric
cavity which is immersed in continuous dielectric environment (solvent). Therefore,
calculations of the G elec
slv (X) are based on Poisson-Boltzmann equation [350, 351],
differential equation derived from Maxwell laws, describing electrical potential for a
given charge distribution. The solution to the Poisson equation strongly depends on
the geometrical factors of the solute, e.g. charge distribution or cavity shape. In prac-
tice, the solution has an analytical form only for very basic, symmetrical problems.
It can be solved numerically but the high computational cost limits its application to
stationary problems, where solute position is fixed, making it impractical for molec-
ular dynamics. To overcome above limitations, the semi analytical approximations
have been developed, from which the generalized Born formalism (GB) is the most
commonly used [352]. GB methods estimate G elec slv (X) as a pairwise sum of all
interacting charges with so called effective Born radii.
The complete calculation of solvation term G slv (X) requires also estimation of
np
the nonpolar entropic solvation effects—G slv (X). This is achieved by introducing
solvent-accessible surface area potential (SASA):
np
G slv (X) λS(X) (3)
where S(X) is surface area of solute, and λ has interpretation of surface tension, and is
phenomenologically adjusted so one can obtain proper values of solvation free energy
for simple molecules in water, like alkanes [353]. GB methods combined with SASA
are commonly named GBSA methods and have many variations [354–356].
Another approach is used in semi-empirical methods like EEF1, based on solvent
exclusion functions [357]. The main idea is to take some reference parameters for
small model molecules and extrapolate them to bigger systems, like proteins. Hence,
the solvation term is calculated by a combination of experimental knowledge and
theoretical considerations. It is based on reference solvation parameters, G r e f (the
solvation of reference molecule) and takes into account a burial of the group:
G slv G r e f − f (r )dr (4)
For the newly evaluated force-field one of the most fundamental features is an ability
to recognize a native fold of proteins. To test this ability one can consider two
proteins having the same length but different sequence and fold. Next, each of their
3D structure is transformed into the other to obtain so called decoys—the known folds
but deriving from different sequences. Implicit methods were able to discriminate
between natively folded proteins and decoys based on energy function including
Fig. 7 An implicit solvent method IMM1. a A continuous change of solvation potential in a water-
membrane system. b A rhodopsin simulated in implicit membrane environment. Red surfaces denote
pure hydrophobic part of the membrane, blue surfaces denote bulk water areas
412 D. Latek et al.
implicit solvation [361, 362]. The implicit solvent methods are also employed for
protein structure prediction [363] and for ligand docking [364] in Rosetta.
Some basic considerations about influence of biological membranes on protein
structure and conformational changes were discussed by Im et al. [365]. They exam-
ined three small membrane proteins: mellitin, the transmembrane domain of the M2
protein from Influenza A (M2-TMP) and transmembrane domain of glycophorin A
(GpA) with newly developed implicit membrane model GB/SA. One of the most
interesting experiments was related to the GpA protein. Starting from two separated
helices in the membrane system they were able to reproduce NMR structure of GpA
dimer and GxxxG interface with RMSD as low as 1.2 Å.
The same authors also studied the problem of membrane protein folding [366].
Five artificially designed peptides (WALP16, WALP19, WALP23, TMX-1, TMX-
3) were subjected to test with replica-exchange molecular dynamics (REX-MD) in
GB/SA implicit membrane model. Initial configuration began from extended con-
formation and about 30 Å away from the membrane. Four peptides, all WALPs and
TMX-1 acquired most of their a-helical structure at the membrane surface, before
they were able to fully penetrate the bilayer. Only TMX-3 does not insert but fluc-
tuates at the interface with low helical content. These facts allowed deriving the
conclusion that spontaneous peptide insertion requires very high ratio of secondary
structure.
The membrane protein folding problem has been examined also by Ulmschneider
et al. [367]. The transmembrane part of virus protein U (Vpu) was subjected to
several Monte Carlo folding simulations and it was shown that folded structures
were converging to the one obtained in NMR study. Interesting advantage of implicit
solvation is a straightforward free energy evaluation. Here, authors investigated free
energy landscape of protein insertion into the membrane and role of charged termini
residues in insertion profiles. The dependence of G slv (X) on position and tilt angle
was checked for both, charged N termini and capped with neutral methyl group. It
was realized that lack of charged residue at the N termini lowers the energy barrier
and could result in peptide leaving the membrane. Additionally, they were able to
reproduce so called hydrophobic mismatch effect—an increase of helix tilt with
decreasing hydrophobic thickness of the membrane.
An extension to IMM1 model is discusses in Mottamal and Lazaridis [368]. They
showed that transmembrane voltage correction has the great impact on optimal orien-
tation of alamethicin helices in the membranes. Without the transmembrane potential
the protein orientation is rather parallel to the membrane and stays at the interface,
whereas TM voltage compels the protein to adopt more perpendicular, transmem-
brane orientation.
One of the biggest advantages of MD is that it provides insight into real pro-
tein dynamics. In addition, the IM method allows obtaining several independent
trajectories. These facts were employed to explore unfolding pathways and stabil-
ity during the atomic force microscopy simulation of bacterioopsin [369]. Authors
applied an external force to the C-terminus of bacterioopsin and pulled it with con-
stant velocity (SMD, steered molecular dynamic) or force (CFMD, constant force
molecular dynamics) along direction perpendicular to the membrane. The force-
Modeling of Membrane Proteins 413
distance profiles obtained with SMD simulations were in very good agreement with
AFM experiment: a number of main peaks, their relative height and distance between
the maxima show significant similarity to AFM studies. That suggests that unfolding
mechanisms in SMD and AFM are also similar (although pulling velocities are much
different—about five orders of magnitude). However molecular dynamics allows
examination on molecular level with atomic resolution, so authors could interpret
the AFM force-peaks and correlate them with structural changes during unfolding.
Among the others, they explained the origins of the highest resistance—threading
and flipping helix F through bundle of other in membrane helices.
Another successful usage of IM method is in the paper of Park et al. [370] where the
effects of palmitylation were investigated both, experimentally and computationally.
The palmitate-deficient rhodopsin was examined to study molecular interactions that
stabilize its structure. The palmitate had the biggest impact on stability of small helix
H8, which is believed to mediate in the transducin activation. Indeed, experiments
show that activation rate drops significantly with the lack of palmitylation.
Although implicit membrane methods are still in a development and their usage
is limited, a growing number of known crystal structures of membrane proteins
allows for interesting validity tests and applications. Especially, the methods show
great potential in modeling of protein-protein interactions. They were able to restore
known protein features and interactions (e.g. GxxxG interface) without any con-
straints. They allow fast and reliable energy evaluation which is extremely useful
for creating free energy landscapes and basic knowledge of protein folding with
all-atom resolution of peptide chain. Moreover, the IM methods make possible to
run hundreds of individual, independent in silico experiments. Such experiments do
not need massively parallel computers to obtain biologically relevant timescales. The
absence of periodic boundary conditions, artifacts arising from finite simulation box,
complicated calculations for crystal electrostatic (e.g. Ewald summation [371]) make
implicit methods much easier in setup then standard all-atom molecular dynamics.
Recent improvements, like inclusion of membrane dipole potential make implicit
methods more detailed and reliable [372].
Of course there are still fields where implicit solvation would fail. Besides obvious
applications where solvent-mediated interactions are important, the IMS methods are
questionable when one wants to examine protein interactions with water/membrane
interface. The lack of data on the exact properties of water molecules in vicinity of
the lipid head groups make it very hard to incorporate in the present models. It is
also not possible to simulate the membranous water channels or mimic membrane
deformations caused by proteins. Usually these models do not include friction terms,
however this problem may be overcome by solving the Langevin equation of motion
[373]. Finally, the new generation of mixed implicit/explicit methods could overcome
the present difficulties [374, 375].
414 D. Latek et al.
9 Coarse-Grain Methods
Although the coarse-grained strategy became very popular recently and many
researchers begin to rely on coarse-grained simulations of large biomolecular sys-
tems, it has been developed many years ago. The main reason to use the coarse-
grained modeling is that it provides a significant speed up when compared with
classical all-atomic molecular dynamics simulations. The coarse-grained simulation
allows the investigation of the large biological systems by using the simplified but
reasonable models able to reproduce the experimental data. The idea behind the
coarse-grain methods is to represent a group of atoms as one united bead and to use
a longer time step which enables researchers to study the behavior of the system in
longer periods of time.
From the early beginning of MD the scientists thought about simplified represen-
tation of investigated systems and the proteins in particular. The first step to build
transferable coarse-grained model was done by Levitt, who reported a knowledge
based parameterization [379]. Probably the earliest example of the CG idea in biology
was the development of the simplified protein folding model by Levitt and Warshel
[380]. The process of protein folding presents an enormous challenge, in light of the
Levinthal paradox [381] which states that it is close to impossible to rationalize how a
protein with so many degrees of freedom is capable of folding within any reasonable
timescale. In 1975 Levitt and Warshel [380], being aware of the fact that even minor
energy minimization of an all-atom protein takes an extremely long time, attacked
the Levinthal paradox by moving to a drastic simplification of a protein representa-
tion with retaining the main functionality of the system. The much simpler and less
physical Go model [382–385] was also developed that time.
Modeling of Membrane Proteins 415
The first idea of reducing the amino acid representation by grouping atoms into
a bead called a united atom or pseudo-atom was based on the uniaxial Gay-Berne
model [386, 387]. A united-atom approach was further improved by grouping each
carbon with its bonded hydrogen atoms into one united atom [388]. Precisely, an
aliphatic carbon atom and attached hydrogen atoms were represented as one bead.
The united atom representation is widely used because it is computationally efficient
and provides results in reasonable agreement with available experimental data. The
idea of united atoms was further extended by coarse-grained force fields in which
several heavy atoms were mapped onto one bead. Coarse-grained force fields are
available for commonly used MD programs. Even though they share the same idea,
they differ in details. In this work we compared popular coarse-grained models used
in GROMACS [270] and NAMD [389], the two MD program suites used in the
standard research studies.
In many coarse-grain methods, in which the implicit solvent is used instead of
water and ion molecules, such a simplification leads to the reduction of the system
by one order of magnitude. Representing each amino acid, containing on average 20
atoms, by two beads reduces the number of particles in proteins by a factor of 10. If
we consider large systems, calculation of forces scales proportionally to the number
of particles squared, so the acceleration may by even of two orders of magnitude.
The second factor of the speed up is the integration time-step, which is dependent
on the fastest frequencies of protein motions, which are about 10 times slower in
coarse-grained representation than in all-atom model so the integration time-step is
proportionally larger. Another source of speed up has its origin in a fact that the
energy landscape is much smoother and reduces the number of local energy minima
that are present in case of all-atom molecular dynamics. Above assessment of the
possible speed-up is very simplified and finally depends on an applied coarse-grained
method and the investigated system.
One can find interesting surveys of coarse grained models of proteins in [390–392],
and also entirely focused on membrane proteins in [393]. The coarse-grained models
of proteins available at present can be divided in two categories based on different
treatment of nonbonded interactions. In one group of models those interactions rely
on an initial (e.g. crystal) structure of a protein. Models belonging to this category
use the initial structure of the investigated molecule in defining the potential of
the system. Such models are widely applied to study functional dynamics of larger
biomolecules.
The nonbonded interactions of coarse-grained models belonging to the second
category are defined in the similar way as in the Molecular Mechanics force fields.
The initial structure is not considered in the definition of the interactions in the system.
These models are directly or indirectly based on physicochemical interactions.
416 D. Latek et al.
The elastic network models (ENM) and Go-like models are methods belonging to
the first of the categories introduced above, with the very strong structure-based
bias. In ENM approach the system is represented by a network of beads connected
by harmonic strings. These connections are introduced for beads which are spatially
close to each other in the native structure. Usually one bead represents a whole amino
acid. Despite its simplicity, an ENM was able to reproduce the correct pattern of the
principal modes (with the largest amplitude), which usually are most important for
protein function. This method was applied in the studies of the mechanism of the
pore opening for five different potassium channels [394]. The study revealed that all
five structures display the common gating mechanism and the same intrinsic motions
at their gating region despite differences in their sequences, structures, and activation
mechanisms. The equilibrium dynamics of these five potassium channels were found
to obey similar patterns on a global scale.
The Go model was developed by Taketomi et al. in Go group and published in
1975 [382] and later improved and modified [383–385]. Basically, in this model a
protein is represented as a chain of beads, where each of them represents one amino
acid. A protein structure is biased toward the native conformation by means of simple
attractive and repulsive non-bonded interactions between beads represented by the
Lennard-Jones potential. Despite its simplicity, that approach was very successful in
reproducing several aspects of thermodynamics and especially kinetics of folding. It
is due to the fact that the immanent feature of the original Go model is that the system
is minimally frustrated so it can reproduce the folding process of many proteins. There
is a big variety of Go models with many modifications introduced, e.g. by adding
additional energy terms decreasing frustration of the system.
The Go-like model was applied to investigate the pulling a single bacteri-
orhodopsin molecule out of the membrane [395]. Firstly, the all-atom representation
of the bacteriorhodopsin-membrane system was generated. Secondly, the protein
Go-like model representation of the proteins conformation was constructed. The
membrane was set frozen and represented by C atoms of the phospholipids. Addi-
tionally, it was determined which of those carbon atoms form contacts in the starting
conformation. Those interactions were represented in the same way like the non-local
native interactions within the protein, namely by the Lennard-Jones potential. The
model introduced by the authors reproduced qualitatively experimentally observed
differences between force-extension patterns obtained on bacteriorhodopsin at dif-
ferent temperatures. Moreover, asymmetry was observed when pulling by different
terminus. Authors also showed that the interactions of the protein with the mem-
brane play the decisive role in determining the force pattern and thus the stability of
transmembrane proteins.
Different approach of investigating the protein-membrane system using Go model
is presented by Orlandini et al. [396]. The authors study immersing into a membrane
and folding kinetics of a two-helix fragment of bacteriorhodopsin. The membrane
was introduced by the slab as a defined fragment of the space. The native contacts
were divided into different classes depending on the location of the residues compris-
Modeling of Membrane Proteins 417
ing given contact with respect to the membrane position. This model allowed for the
characterization of the thermodynamics and dynamics of the protein folding process.
Authors identified various intermediates and the free energy barriers between them,
and the folding process was predicted as involving many pathways with a dominant
folding channel.
Among the models belonging to the second category of models, the most attention
currently receives the MARTINI force field, initially developed for coarse grained
simulations of lipids [397–399]. The MARTINI potential for proteins is mainly based
on physico-chemical modeling with a weak bias to the native structure mostly through
the secondary structure constraints. The methodology applied to construct MARTINI
force field was based on extensive calibration of the peptide-bilayer systems of the
coarse-grained force field against thermodynamic data, in particular, oil/water parti-
tioning coefficients. In that model, four heavy atoms on average are represented by
one interaction site (bead) and also water is represented in that way. Each bead is
assigned to one of four main types: polar, nonpolar, apolar, or charged. Within each
type there are different subtypes introducing more detailed features of interacting
sites (like hydrogen bonding capabilities or degree of polarity). Beads (i, j) inter-
act with each other similarly to atoms in all-atom force fields. Nonbonded potential
involves the Lennard-Jones potential:
6
σi j 12 σi j
VLennar d−J ones (ri j ) 4εi j − (5)
ri j ri j
The energy parameter ε determining the depth of the potential well depends on
the bead’s type and varies between 2.0 and 5.6 kJ/mol. All particles has the effective
size σ equal to 0.47 nm apart from the beads comprising ring like molecules (σ
0.43 nm). Electrostatic interactions between charged beads are incorporated via the
Coulombic potential with the appropriately adjusted dielectric constant (εrel 15):
qi q j
Velectr ostatic (6)
4π ε0 εr el ri j
Bonded interactions are used for chemically bonded sites, to represent chain stiff-
ness, and to impose secondary structure of the peptide backbone. Potential energy
functions for bonded sites i, j, k and l with the equilibrium distance d b , angle ϕ a and
dihedral angles ψ d and ψ id have the following forms:
1 2
Vb
K b di j − db (7)
2
1 2
Va K a cos ϕi jk − cos(ϕa ) (8)
2
418 D. Latek et al.
Vd K d 1 + cos nψi jkl − ψd (9)
2
Vid K id ψi jkl − ψid (10)
Fig. 8 The MARTINI coarse-graining procedure for membrane components, amino acids and
solvents. New types of potential for grains are specified. The image taken with permissions from
http://md.chem.rug.nl/cgmartini/
Modeling of Membrane Proteins 419
mechanism is general and likely to be relevant for protein sorting, also in vivo. In
another example [402], systems with up to 16 rhodopsin molecules at a protein-to-
lipid ratio of 1:100 were simulated for time scales of up to 8 microseconds. The
results obtained for four different phospholipid environments showed that localized
adaptation of the membrane bilayer to the presence of receptors is reproducibly most
pronounced near transmembrane helices 2, 4, and 7 of bacteriorhodopsin. That local
membrane deformation appears to be a key factor defining the rate, extent, and orien-
tation preference of the protein-protein association. Among other protein-membrane
system models based on the methods derived for lipids by Marrink et al. [397], e.g.
Bond and Sansom [403] explored interactions between a phospholipid bilayer of
the voltage sensor domain and the S4 helix from the archaebacterial voltage-gated
potassium channel (KvAP).
Simplified MARTINI version was presented in [404]. Authors proposed an
implicit-solvent version called Dry-MARTINI, in which the solvation effect was
introduced only by strength adjustment of existing pairwise Lennard-Jones interac-
tions to retain the hydrophobic/hydrophilic behavior of molecules in standard MAR-
TINI. In consequence also some bonded parameters were adapted to keep the equilib-
rium values in studied lipid molecules. The reparametrized model reproduces main
features of lipidic systems observed in standard (wet) MARTINI. However, Dry-
MARTINI does not mimic aqueous phase realistically enough, which has an impact
on protein interactions in solvent. All nonbonded interactions are attractive (Lennard-
Jones potential) and simulations of soluble proteins in general would lead to global
aggregation of the molecules or aggregates. Authors, however, suggest necessary
modifications needed to solve this problem in the future. Moreover, more systematic
testing of peptide-lipids systems is required before applying Dry-MARTINI to study
membrane protein systems.
Shih et al. [405] from Schulten’s group proposed the model applied to simula-
tions of discoidal high-density lipoprotein particles. That model, although is based
on original MARTINI approach [397], differs from the MARTINI-protein extension.
Here, each amino acid is represented by only two beads (apart from glycine). The
types of the amino acid side chains were previously defined in the lipid MARTINI
force field. Microsecond simulations of lipoprotein assembly showed that the over-
all structural features of high-density lipoproteins were reproduced accurately and
revealed the formation of a protein-lipid complex.
As it was mentioned above, the MARTINI-like approach imposes the a priori
knowledge of the secondary structure on the model. Spijker et al. [406] introduced
the force field in which one does not incorporate the secondary structure information.
This model is an extension to the lipid-water model by Markvoort et al. [407]. Each
amino acid is represented by two sites (one for backbone and one representing side
chain). For the protein backbone authors do not introduce the angle potential in the
harmonic form (as V a potential in MARTINI), but it is represented by the double-well
potential using a fourth power polynomial, for which the parameters were derived
from the MD simulations of two membrane proteins. Torsion terms, mimicked by
dihedral (V d ) and improper (V id ) potentials, are not present in this model. Their
role of stabilizing the secondary structure of the protein is played by an additional
420 D. Latek et al.
non-bonded interaction, which mimics the forming of the hydrogen bond between
i-th and (i + 4)-th of the backbone beads. The H-bond contribution has the following
form as:
where μij is the location of the H-bond minimum, κ ij determines the width of the
H-bond well, and ηij represents the well depth of the H-bond minimum. The authors
used the model in simulations of WALP-peptides of different length immersion in
the lipid membranes of different thickness. The results pointed out, that until it is
possible, the membrane adapts to the TM helix length. When the membrane thickness
cannot be increased, peptides tilt in respect to the membrane normal. Such events
are not observed simultaneously but sequentially.
Another coarse-graining approach is represented by an integration of reduced
protein representation integrated with a fully implicit membrane model. One of the
examples is PRIMO-M [408], which is an extension of PRIMO (PRotein Intermediate
Model) for soluble proteins [409]. To mimic environment with two phases, authors
applied heterogeneous dielectric generalized Born methodology. The PRIMO energy
function consists of standard molecular dynamics energy terms with additional
hydrogen-bonding potential term. The backbone is represented with N, C, and a
combined carbonyl site (CO). Detailed backbone representation coupled with preser-
vation of hydrogen bonding allowed to an accurate description of the secondary struc-
ture of proteins. Each non-glycine side chain is represented with another CG site.
The PRIMO-M model reproduces such phenomena as the water-to-membrane free
energy of insertion for amino acids, or tilt angles of simulated transmembrane pep-
tides. This force field also provides trajectories of membrane proteins with calculated
beta-factors being in agreement with experiment.
Recently, the PRIMO and PRIMO-M models were combined with all-atom force
field (CHARMM36) within an all-atom/coarse-grained in a preliminary attempt to
build a hybrid model with solvent environment treated at the continuum level via the
generalized Born with molecular volume [410].
The force fields that are commonly used for simulations of the coarse-grained
membrane protein systems are summarized in Table 8.
Due to their large computational requirements and poor scaling quantum chemistry
(QM) methods are usually not suitable for describing membrane proteins and pro-
teins in general. Quantum chemistry is based on converging to the exact solution of
the electronic Schrodinger equation and while it usually gives very good accuracy, it
is simply not possible to solve this equation for a system of the size of a protein. Still,
QM methods can be very useful and are commonly used for various tasks in com-
putational biology/chemistry of proteins; selected examples of such QM treatment
Modeling of Membrane Proteins 421
The presence and importance of retinal for the activation and action of rhodopsin has
been known for many years before obtaining the X-ray structure of this system in
2000 [339]. Before that date several computational studies were performed to better
understand the chemistry of retinal and the energetics of the cis-to-trans transition. In
1996 Terstegen and Buss performed Hartree-Fock (HF) calculations on three different
retinal conformers and with different protonation states of the N-methyl Schiff base
using the standard 6-31G** basis set [412]. They have shown a very good agreement
with the experimental data and noticed that protonation is accompanied by the loss
of double-bond fixation. In a follow-up articles the authors have estimated the energy
minima and transition states of various retinal conformers [413] and also performed
ab initio molecular dynamics [414]. According to their calculations the rotational
barriers around relevant dihedral angles were in the range of 2–5 kcal/mol and ring
inversion barriers in the range of 5–6 kcal/mol, making the whole system labile. Some
of these calculations were repeated in the following year using the density functional
B3LYP method, which gave an improved description of the retinal conformational
space [415]. Another approach towards retinal analysis was presented in a series of
papers by Bifone et al. [416]. They performed a Car-Parrinello ab initio molecular
dynamics (CPMD) (using DFT local density approximation) of all-trans and 13-cis
retinal molecules and shown good agreement with experimental data in the structure
and vibrational modes of this molecule. In all these calculations the protein part of
422 D. Latek et al.
the system was not included due to computational limitations. In the same year the
first simple model of rhodopsin chromophore has been built based on available NMR
data [417]. Using this very simple model which included retinal molecule, a chlorine
ion placed in the position of Glu113 and a CH2 –CH3 group mimicking the linkage
of the chromophore Lys296 they observed a coherent propagation of a conjunction
defect, which was associated to charge transport along the chromophore backbone.
A year later the same model and approach was used to o study the energy storage
mechanism in bathorhodopsin [418]. In the final paper in this series La Penna et al.
used CPMD simulations with additional external force to obtain information about
the transition state of 11-cis to all-trans isomerization [419].
Around the same time a series of studies by Garavelli et al. explained the mech-
anism of retinal photoisomerization using accurate MC-SCF or CASSCF methods,
though without any presence of the protein environment [420]. This group contin-
ued later the research on photoisomerization of conjugated and protonated imines,
modelling retinal protonated Schiff base chromophore, using more and more sophis-
ticated computational approaches such as multireference configuration interaction
with single and double excitations, multireference second order perturbation the-
ory, time-dependent DFT methods and equation-of-motion coupled-cluster methods
[421].
The solution of the first crystal structure of membrane proteins gave rise to much
more detailed description of the ligand binding sites and much improved calcula-
tions. In the classic paper from 2002 Sugihara et al. [422] used self-consistent-charge
density functional based tight-binding (SCC-DFTB) method [423] to study retinal
binding site, which included the retinal molecule and 27 amino acid moieties. Using
structure optimization and MD simulations they were able to investigate the influence
of the protein pocket on the structure of the ligand. They showed that both 6-s-cis
and 6-s-trans conformations of retinal and tolerated by the binding pocket, as well as
showed that the pocket forces the ligand to adopt a slightly distorted conformation.
In the following years similar studies has been performed on rhodopsin, but using
various sets of residues from the binding site and various computational methods. To
study rhodopsin chromophore excitation Hufen et al. [424] used high-level DFT and
ab initio CASSCF/CASPT2 approaches to a model of the bonding pocket including
the ligand, two amino acid residues and a water molecule. They obtained a good
agreement with the experimental data of the electric dipole moment of the chro-
mophore upon excitation and showed the importance of using correlated theoretical
method in proper description of the protonated Schiff base. Excitation energies of
protonated Schiff base of retinal was also studied by the means of time-dependent
DFT (TD-DFT) method using a model of the binding site consisting of 23 amino acid
residues and five water molecules, showing good agreement with the experimental
spectral data [425]. In another paper, Sugihara et al. explored the importance of sev-
eral counterions of the binding pocket on the stability of chromophore using DFT
approach [426]. The performance of various ab initio methods in the description of
retinal was summarized a year later by Blomgren and Larsson [427].
Modeling of Membrane Proteins 423
The fast development of new computational methods leads to a new set of publi-
cations, in which the whole protein was taken into the account. It was possible due to
the two-layer description of the system where the binding site was simulated using
QM approaches and the rest of the protein was simulated using molecular mechan-
ics (MM) methods [428]. One of such QM/MM methods is ONIOM [429] which
was applied to the rhodopsin system first in 2004 by Gascon and Batista [430]. In
this study rhodopsin was divided into inner layer consisting of retinal and a part
of Lys296 and treated with the B3LYP/6-31G* and TD-B3LYP/6-31G* methods,
while the rest of the protein was simulated using classical MM with AMBER force-
field. Authors of this study obtained a very accurate storage energies and electronic
excitation energies for the chromophore, in very good agreement with the experi-
mental data. A follow-up article using the same method showed also the strength of
the gauge independent atomic orbital (GIAO) method by predicting the NMR spec-
trum of rhodopsin pharmacophore [431]. Similar QM/MM studies are now routinely
performed for membrane proteins of similar size [432–435] and allow for precise
description of the pharmacophore interacting with the whole protein, which may be
additionally embedded in the membrane and/or solvent. Some of approaches used
to study of rhodopsin chromophore are summarized on Fig. 9.
In the recent years the rise of computational power made it possible to swap TD-
DFT methods with much more accurate CASSCF and CASPT2 schemes in QM/MM
description of rhodopsins [436]. A thorough description of the history and most recent
advances in simulation of double-bond isomerization of biological chromophores is
available in a recent review by Gozem et al. [437].
Fig. 9 Evolution of the structural models of retinal binding site in rhodopsin used in classical
and hybrid quantum chemical calculations. a A model of retinal chromophore [412]. b The model
including part of Lys296 [417]. c The model of chromophore and two amino acids to study excitation
[424]. d The model used for study of counterions in retinal binding pocket [426]. e An extended
retinal binding site model including 27 amino acids [422]. f All-atom rhodopsin model used in
QM/MM approach [430]
The linear-scaling MOZYME approach has been used in several membrane pro-
teins studies. In 2001 Ren et al. [444] studied microbial sensory rhodopsin II and
optimized the chromophore within its binding site using MOZYME which allowed
them to identify principal mechanism and residues responsible for spectral blue shift
in this protein using other semiempirical methods. They showed that their calcu-
lations can reproduce well the experimental facts of formation of Schiff bases at
various residues. In a study from 2006 Lee et al. [445] used this computational
approach to obtain an all-atom model of bacteriorhodopsin mutant and the elec-
trostatic difference map of the whole protein. A recent study by the author of the
PM6 method describes in details its strengths and disadvantages in protein modeling
[446]. MOZYME approach can also be combined with other computational methods
within the ONIOM framework; the most commonly used implementation combining
MOZYME with DFT has been developed in 2001 by Ohno et al. and used for pKa
prediction of various proteins [447], including membrane proteins [448].
A second area of biological systems calculations where QM is very important is
the determination of molecular interactions potentials, and more specifically, deter-
mination of partial charges of ligands. Many of the membrane proteins interact with
various ligands and form complexes, i.e. drug-receptor systems. The binding of lig-
and occurs via a recognition process at relatively large distances and the electrostatic
field surrounding each molecule (as well as other molecular features like polariz-
ability and hydrophobicity) plays an important role in this process. Also, molecular
docking simulations usually need a proper parametrization of ligands including par-
tial charges. In most computational cases the electron distribution in molecules is
mimicked by a set of partial charges to each atom/nucleus center of the system.
For amino acids these partial charges are usually parametrized in each force-field
to reproduce a large range of experimental data and rarely changed. If one wants to
consider a protein complex with a ligand a set of partial charges has to be calculated
and it is usually a task for QM methods.
Charge densities can be obtained from wavefunctions using very different pro-
cedures; a comparison of different schemes is also available [449]. Traditionally,
Mulliken population analysis has been the most widely used method for determining
atomic charges, though it gives unnatural values for a number of cases and highly
depends on the used basis set [450, 451]. ESP method, which is also commonly
used, derives partial charges by fitting the molecular electrostatic potential available
from the calculations or crystallographic data [452]. Most of these methods give
reasonable results even when using moderate-size basis sets. In some cases it is
advisable to validate the calculated partial charges by deriving a theoretical dipole
moment and comparing it to the experimental one, which is usually easy to obtain or
find in the literature. A recent example of an improvement of force-field important
from the membrane proteins point of view is an advanced parametrization of the
tyrosine-choline cation-π interaction, based on a very accurate symmetry-adopted
perturbation theory potential energy surface [453].
The previously mentioned MOZYME method may also be used to facili-
tate protein-ligand docking. One of the most commonly used docking programs,
Autodock, uses simple Gesteiger partial charges both for protein and ligand, which
426 D. Latek et al.
in some cases leads to poor description of the complex [454]. It has been shown that
the accuracy of Autodock docking may be enhanced by using MOZYME-derived
partial charges [455]. In another study from 2010 Fanfrlik et al. [456] used the cor-
rected PM6-DH2 method of MOZYME combined with AMBER interaction entropy
and SMD deformation and desolvation energies of the ligand to construct fast and reli-
able docking scheme. They showed a dramatic improvement of results over standard
DOCK results, which were not able to distinguish between bonders and non-binders.
Finally, there is a number of problems in studying membrane protein, where the
use of QM/MM approach is indispensable or at least desired for an accurate descrip-
tion of mechanistic features of the system. The first example is any redox system,
where the QM part is needed for the elucidation of the electron transfer mechanism,
as in the previously described rhodopsins. A recent example of such approach is a
B3LYP/CHARMM investigation of the respiratory complex I—a redox-driven pro-
ton pump activated by the reduction of quinone molecule [457]. Results obtained
from the study involving more than 800,000 atoms revealed that that the initial
activation steps involve a charge imbalance arising from quinone reduction in the
soluble domain leading to a local proton-coupled electron transfer process in the
quinone-binding site and the effect of the excess charge is transmitted by concerted
side-chain reorientations of charged residues at the interface of the soluble and mem-
brane domains. The second problem is the accurate description of ion selectivities in
ion channels, an important group of membrane proteins. While the mechanisms of ion
conductance and channel gating can be and have been extensively studied in details
with classic MD approaches [458, 459], the proper description of ion selectivity can
be a challenging problem due to relative simplicity of forcefield-based description
of ions. To overcome this challenge Sadhu et al. [460] used DFT approach to obtain
accurate free binding energies of Na+ , K+ and Cs+ ions at different, well-defined
ion-chelating sites of NaK channel for which combined with MD approach gave a
more realistic description of channel permeabilities.
References
1. Chou, K.C., Elrod, D.W.: Prediction of membrane protein types and subcellular locations.
Proteins 34(1), 137–153 (1999)
2. White, S.H., Snaider, C.: http://blanco.biomol.uci.edu/mpstruc/listAll/list
3. Lomize, M.A., Pogozheva, I.D., Joo, H., Mosberg, H.I., Lomize, A.L.: OPM database and
PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res.
40(Database issue), D370–376 (2012). https://doi.org/10.1093/nar/gkr703
4. Jayasinghe, S., Hristova, K., White, S.H.: MPtopo: a database of membrane protein topology.
Protein Sci. 10(2), 455–458 (2001). https://doi.org/10.1110/ps.43501
5. Tusnady, G.E., Dosztanyi, Z., Simon, I.: PDB_TM: selection and membrane localization of
transmembrane proteins in the protein data bank. Nucleic Acids Res. 33(Database issue),
D275–278 (2005). https://doi.org/10.1093/nar/gki002
6. Kozma, D., Simon, I., Tusnady, G.E.: PDBTM: protein data bank of transmembrane proteins
after 8 years. Nucleic Acids Res. 41(Database issue), D524–529 (2013). https://doi.org/10.
1093/nar/gks1169
Modeling of Membrane Proteins 427
7. Raman, P., Cherezov, V., Caffrey, M.: The membrane protein data bank. Cell. Mol. Life Sci.
63(1), 36–51 (2006). https://doi.org/10.1007/s00018-005-5350-6
8. Kazius, J., Wurdinger, K., van Iterson, M., Kok, J., Back, T., Ijzerman, A.P.: GPCR NaVa
database: natural variants in human G protein-coupled receptors. Hum. Mutat. 29(1), 39–44
(2008). https://doi.org/10.1002/humu.20638
9. Okuno, Y., Tamon, A., Yabuuchi, H., Niijima, S., Minowa, Y., Tonomura, K., Kunimoto, R.,
Feng, C.: GLIDA: GPCR—ligand database for chemical genomics drug discovery–database
and tools update. Nucleic Acids Res. 36(Database issue), D907–912 (2008). https://doi.org/
10.1093/nar/gkm948
10. Zhang, J., Zhang, Y.: GPCRRD: G protein-coupled receptor spatial restraint database for
3D structure modeling and function annotation. Bioinformatics 26(23), 3004–3005 (2010).
https://doi.org/10.1093/bioinformatics/btq563
11. Tsirigos, K.D., Bagos, P.G., Hamodrakas, S.J.: OMPdb: a database of beta-barrel outer
membrane proteins from Gram-negative bacteria. Nucleic Acids Res. 39(Database issue),
D324–331 (2011). https://doi.org/10.1093/nar/gkq863
12. Vroling, B., Sanders, M., Baakman, C., Borrmann, A., Verhoeven, S., Klomp, J., Oliveira,
L., de Vlieg, J., Vriend, G.: GPCRDB: information system for G protein-coupled recep-
tors. Nucleic Acids Res. 39(Database issue), D309–319 (2011). https://doi.org/10.1093/nar/
gkq1009
13. Isberg, V., Mordalski, S., Munk, C., Rataj, K., Harpsoe, K., Hauser, A.S., Vroling, B., Bojarski,
A.J., Vriend, G., Gloriam, D.E.: GPCRdb: an information system for G protein-coupled recep-
tors. Nucleic Acids Res. 44(D1), D356–D364 (2016). https://doi.org/10.1093/nar/gkv1178
14. Pandy-Szekeres, G., Munk, C., Tsonkov, T.M., Mordalski, S., Harpsoe, K., Hauser, A.S.,
Bojarski, A.J., Gloriam, D.E.: GPCRdb in 2018: adding GPCR structure models and ligands.
Nucleic Acids Res. 46(D1), D440–D446 (2018). https://doi.org/10.1093/nar/gkx1109
15. Worth, C.L., Kreuchwig, A., Kleinau, G., Krause, G.: GPCR-SSFE: a comprehensive database
of G-protein-coupled receptor template predictions and homology models. BMC Bioinform.
12, 185 (2011). https://doi.org/10.1186/1471-2105-12-185
16. Worth, C.L., Kreuchwig, F., Tiemann, J.K.S., Kreuchwig, A., Ritschel, M., Kleinau, G.,
Hildebrand, P.W., Krause, G.: GPCR-SSFE 2.0-a fragment-based molecular modeling web
tool for Class A G-protein coupled receptors. Nucleic Acids Res. (2017). https://doi.org/10.
1093/nar/gkx399
17. Sharman, J.L., Mpamhanga, C.P., Spedding, M., Germain, P., Staels, B., Dacquet, C., Laudet,
V., Harmar, A.J.: IUPHAR-DB: new receptors and tools for easy searching and visualization
of pharmacological data. Nucleic Acids Res. 39(Database issue), D534–538 (2011). https://
doi.org/10.1093/nar/gkq1062
18. Harding, S.D., Sharman, J.L., Faccenda, E., Southan, C., Pawson, A.J., Ireland, S., Gray,
A.J.G., Bruce, L., Alexander, S.P.H., Anderton, S., Bryant, C., Davenport, A.P., Doerig,
C., Fabbro, D., Levi-Schaffer, F., Spedding, M., Davies, J.A., Nc, I.: The IUPHAR/BPS
guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide
to IMMUNOPHARMACOLOGY. Nucleic Acids Res. (2017). https://doi.org/10.1093/nar/
gkx1121
19. Saier, M.H., Jr., Yen, M.R., Noto, K., Tamang, D.G., Elkan, C.: The transporter classification
database: recent advances. Nucleic Acids Res. 37(Database issue), D274–278 (2009). https://
doi.org/10.1093/nar/gkn862
20. Saier, M.H., Jr., Reddy, V.S., Tamang, D.G., Vastermark, A.: The transporter classification
database. Nucleic Acids Res. 42(Database issue), D251–258 (2014). https://doi.org/10.1093/
nar/gkt1097
21. Neumann, S., Fuchs, A., Mulkidjanian, A., Frishman, D.: Current status of membrane protein
structure classification. Proteins 78(7), 1760–1773 (2010). https://doi.org/10.1002/prot.22692
22. Bernsel, A., Viklund, H., Elofsson, A.: Remote homology detection of integral membrane
proteins using conserved sequence features. Proteins 71(3), 1387–1399 (2008). https://doi.
org/10.1002/prot.21825
428 D. Latek et al.
23. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S.,
Pagni, M., Sigrist, C.J.: The PROSITE database. Nucleic Acids Res. 34(Database issue),
D227–230 (2006). https://doi.org/10.1093/nar/gkj063
24. Tusnady, G.E., Kalmar, L., Hegyi, H., Tompa, P., Simon, I.: TOPDOM: database of domains
and motifs with conservative location in transmembrane proteins. Bioinformatics 24(12),
1469–1470 (2008). https://doi.org/10.1093/bioinformatics/btn202
25. Senes, A., Engel, D.E., DeGrado, W.F.: Folding of helical membrane proteins: the role of polar,
GxxxG-like and proline motifs. Curr. Opin. Struct. Biol. 14(4), 465–479 (2004). https://doi.
org/10.1016/j.sbi.2004.07.007
26. Shen, H.B., Yang, J., Chou, K.C.: Fuzzy KNN for predicting membrane protein types from
pseudo-amino acid composition. J. Theor. Biol. 240(1), 9–13 (2006). https://doi.org/10.1016/
j.jtbi.2005.08.016
27. Cai, Y.D., Ricardo, P.W., Jen, C.H., Chou, K.C.: Application of SVM to predict membrane
protein types. J. Theor. Biol. 226(4), 373–376 (2004). https://doi.org/10.1016/j.jtbi.2003.08.
015
28. Wang, S.-Q., Yang, J., Chou, K.-C.: Using stacked generalization to predict membrane protein
types based on pseudo-amino acid composition. J. Theor. Biol. 242(4), 941–946 (2006).
https://doi.org/10.1016/j.jtbi.2006.05.006
29. Cedano, J., Aloy, P., Perez-Pons, J.A., Querol, E.: Relation between amino acid composition
and cellular location of proteins. J. Mol. Biol. 266(3), 594–600 (1997). https://doi.org/10.
1006/jmbi.1996.0804
30. Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein.
J. Mol. Biol. 157(1), 105–132 (1982)
31. Steitz, T.A., Goldman, A., Engelman, D.M.: Quantitative application of the helical hairpin
hypothesis to membrane proteins. Biophys. J. 37(1), 124–125 (1982)
32. Engelman, D.M., Steitz, T.A.: The spontaneous insertion of proteins into and across mem-
branes: the helical hairpin hypothesis. Cell 23(2), 411–422 (1981)
33. Hedin, L.E., Illergard, K., Elofsson, A.: An introduction to membrane proteins. J. Proteome
Res. 10(8), 3324–3331 (2011). https://doi.org/10.1021/pr200145a
34. Elofsson, A., von Heijne, G.: Membrane protein structure: prediction versus reality. Annu. Rev.
Biochem. 76, 125–140 (2007). https://doi.org/10.1146/annurev.biochem.76.052705.163539
35. Bernsel, A., Viklund, H., Falk, J., Lindahl, E., von Heijne, G., Elofsson, A.: Prediction of
membrane-protein topology from first principles. Proc. Natl. Acad. Sci. U.S.A. 105(20),
7177–7181 (2008)
36. Attwood, T.K., Findlay, J.B.: Fingerprinting G-protein-coupled receptors. Protein Eng. 7(2),
195–203 (1994)
37. Fredriksson, R., Lagerström, M.C., Lundin, L.-G., Schiöth, H.B.: The G-protein-coupled
receptors in the human genome form five main families. Phylogenetic analysis, paralogon
groups, and fingerprints. Mol. Pharmacol. 63(6), 1256–1272 (2003). https://doi.org/10.1124/
mol.63.6.1256
38. Otaki, J.M., Mori, A., Itoh, Y., Nakayama, T., Yamamoto, H.: Alignment-free classifica-
tion of G-protein-coupled receptors using self-organizing maps. J. Chem. Inf. Model. 46(3),
1479–1490 (2006). https://doi.org/10.1021/ci050382y
39. Deville, J., Rey, J., Chabbert, M.: An indel in transmembrane helix 2 helps to trace the
molecular evolution of class A G-protein-coupled receptors. J. Mol. Evol. 68(5), 475–489
(2009)
40. Surgand, J.S., Rodrigo, J., Kellenberger, E., Rognan, D.: A chemogenomic analysis of
the transmembrane binding cavity of human G-protein-coupled receptors. Proteins 62(2),
509–538 (2006)
41. Pele, J., Abdi, H., Moreau, M., Thybert, D., Chabbert, M.: Multidimensional scaling reveals
the main evolutionary pathways of class A G-protein-coupled receptors. PLoS ONE 6(4),
e19094 (2011)
42. Lu, G., Wang, Z., Jones, A.M., Moriyama, E.N.: 7TMRmine: a Web server for hierarchical
mining of 7TMR proteins. BMC Genom. 10, 275 (2009). https://doi.org/10.1186/1471-2164-
10-275
Modeling of Membrane Proteins 429
43. Park, K.-J., Gromiha, M.M., Horton, P., Suwa, M.: Discrimination of outer membrane proteins
using support vector machines. Bioinformatics 21(23), 4223–4229 (2005). https://doi.org/10.
1093/bioinformatics/bti697
44. Gromiha, M.M., Suwa, M.: Discrimination of outer membrane proteins using machine learn-
ing algorithms. Proteins 63(4), 1031–1037 (2006). https://doi.org/10.1002/prot.20929
45. Gromiha, M.M., Ahmad, S., Suwa, M.: Neural network-based prediction of transmembrane
beta-strand segments in outer membrane proteins. J. Comput. Chem. 25(5), 762–767 (2004).
https://doi.org/10.1002/jcc.10386
46. Martelli, P.L., Fariselli, P., Krogh, A., Casadio, R.: A sequence-profile-based HMM for
predicting and discriminating beta barrel membrane proteins. Bioinformatics 18(Suppl 1),
S46–S53 (2002)
47. Remmert, M., Linke, D., Lupas, A.N., Soding, J.: HHomp–prediction and classification of
outer membrane proteins. Nucleic Acids Res. 37(Web Server issue), W446–451 (2009).
https://doi.org/10.1093/nar/gkp325
48. Garrow, A.G., Agnew, A., Westhead, D.R.: TMB-Hunt: an amino acid composition based
method to screen proteomes for beta-barrel transmembrane proteins. BMC Bioinform. 6, 56
(2005). https://doi.org/10.1186/1471-2105-6-56
49. Gromiha, M.M., Ahmad, S., Suwa, M.: Application of residue distribution along the sequence
for discriminating outer membrane proteins. Comput. Biol. Chem. 29(2), 135–142 (2005).
https://doi.org/10.1016/j.compbiolchem.2005.02.006
50. Yan, R.-X., Chen, Z., Zhang, Z.: Outer membrane proteins can be simply identified using
secondary structure element alignment. BMC Bioinform. 12(1), 76 (2011)
51. Berven, F.S., Flikka, K., Jensen, H.B., Eidhammer, I.: BOMP: a program to predict integral β-
barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. Nucleic
Acids Res. 32(suppl 2), W394–W399 (2004). https://doi.org/10.1093/nar/gkh351
52. Freeman, T.C., Wimley, W.C.: A highly accurate statistical approach for the prediction of
transmembrane β-barrels. Bioinformatics 26(16), 1965–1974 (2010). https://doi.org/10.1093/
bioinformatics/btq308
53. van Geest, M., Lolkema, J.S.: Membrane topology and insertion of membrane proteins: search
for topogenic signals. Microbiol. Mol. Biol. Rev. 64(1), 13–33 (2000). https://doi.org/10.1128/
mmbr.64.1.13-33.2000
54. Fu, D., Libson, A., Miercke, L.J., Weitzman, C., Nollert, P., Krucinski, J., Stroud, R.M.:
Structure of a glycerol-conducting channel and the basis for its selectivity. Science 290(5491),
481–486 (2000)
55. Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S.: Improved prediction of signal peptides:
SignalP 3.0. J. Mol. Biol. 340(4), 783–795 (2004). https://doi.org/10.1016/j.jmb.2004.05.028
56. Emanuelsson, O., Brunak, S., von Heijne, G., Nielsen, H.: Locating proteins in the cell using
TargetP, SignalP and related tools. Nat. Protoc. 2(4), 953–971 (2007). https://doi.org/10.1038/
nprot.2007.131
57. Kall, L., Krogh, A., Sonnhammer, E.L.: An HMM posterior decoder for sequence feature pre-
diction that includes homology information. Bioinformatics 21(Suppl 1), i251–i257 (2005).
https://doi.org/10.1093/bioinformatics/bti1014
58. Kall, L., Krogh, A., Sonnhammer, E.L.: Advantages of combined transmembrane topology
and signal peptide prediction—the Phobius web server. Nucleic Acids Res. 35(Web Server
issue), W429–432 (2007). https://doi.org/10.1093/nar/gkm256
59. Viklund, H., Granseth, E., Elofsson, A.: Structural classification and prediction of reentrant
regions in alpha-helical transmembrane proteins: application to complete genomes. J. Mol.
Biol. 361(3), 591–603 (2006). https://doi.org/10.1016/j.jmb.2006.06.037
60. Viklund, H., Elofsson, A.: OCTOPUS: improving topology prediction by two-track ANN-
based preference scores and an extended topological grammar. Bioinformatics 24(15),
1662–1668 (2008). https://doi.org/10.1093/bioinformatics/btn221
61. von Heijne, G.: Membrane protein structure prediction: hydrophobicity analysis and the
positive-inside rule. J. Mol. Biol. 225(2), 487–494 (1992). https://doi.org/10.1016/0022-
2836(92)90934-c
430 D. Latek et al.
62. Engelman, D.M., Zaccai, G.: Bacteriorhodopsin is an inside-out protein. Proc. Natl. Acad.
Sci. U.S.A. 77(10), 5894–5898 (1980)
63. Stevens, T.J., Arkin, I.T.: Turning an opinion inside-out: Rees and Eisenberg’s commentary
(Proteins 2000;38:121–122) on “Are membrane proteins ‘inside-out’ proteins?” (Proteins
1999;36:135–143). Proteins: Struct. Funct. Bioinf. 40(3), 463–464 (2000)
64. Adamian, L., Liang, J.: Interhelical hydrogen bonds and spatial motifs in membrane proteins:
polar clamps and serine zippers. Proteins 47(2), 209–218 (2002)
65. Hofmann, K.: TMbase—a database of membrane spanning proteins segments. Biol. Chem.
Hoppe-Seyler 374(166) (1993). doi:citeulike-article-id:9087200
66. Rost, B., Sander, C., Casadio, R., Fariselli, P.: Transmembrane helices predicted at 95%
accuracy. Protein Sci. 4(3), 521–533 (1995)
67. Yachdav, G., Kloppmann, E., Kajan, L., Hecht, M., Goldberg, T., Hamp, T., Honigschmid,
P., Schafferhans, A., Roos, M., Bernhofer, M., Richter, L., Ashkenazy, H., Punta, M., Sch-
lessinger, A., Bromberg, Y., Schneider, R., Vriend, G., Sander, C., Ben-Tal, N., Rost, B.:
PredictProtein–an open resource for online prediction of protein structural and functional
features. Nucleic Acids Res. 42(Web Server issue), W337–343 (2014). https://doi.org/10.
1093/nar/gku366
68. Cserzo, M., Wallin, E., Simon, I., von Heijne, G., Elofsson, A.: Prediction of transmembrane
alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein
Eng. 10(6), 673–676 (1997)
69. Hirokawa, T., Boon-Chieng, S., Mitaku, S.: SOSUI: classification and secondary structure
prediction system for membrane proteins. Bioinformatics 14(4), 378–379 (1998)
70. Pasquier, C., Promponas, V.J., Palaios, G.A., Hamodrakas, J.S., Hamodrakas, S.J.: A novel
method for predicting transmembrane segments in proteins based on a statistical analysis of
the SwissProt database: the PRED-TMR algorithm. Protein Eng. 12(5), 381–385 (1999)
71. Tusnady, G.E., Simon, I.: The HMMTOP transmembrane topology prediction server. Bioin-
formatics 17(9), 849–850 (2001)
72. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein
topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305(3),
567–580 (2001). https://doi.org/10.1006/jmbi.2000.4315
73. Juretic, D., Zoranic, L., Zucic, D.: Basic charge clusters and predictions of membrane protein
topology. J. Chem. Inf. Comput. Sci. 42(3), 620–632 (2002)
74. Liu, Q., Zhu, Y.S., Wang, B.H., Li, Y.X.: A HMM-based method to predict the transmembrane
regions of beta-barrel membrane proteins. Comput. Biol. Chem. 27(1), 69–76 (2003)
75. Jones, D.T.: Improving the accuracy of transmembrane protein topology prediction using
evolutionary information. Bioinformatics 23(5), 538–544 (2007). https://doi.org/10.1093/
bioinformatics/btl677
76. Peters, C., Tsirigos, K.D., Shu, N., Elofsson, A.: Improved topology prediction using the
terminal hydrophobic helices rule. Bioinformatics 32(8), 1158–1162 (2016). https://doi.org/
10.1093/bioinformatics/btv709
77. Viklund, H., Bernsel, A., Skwark, M., Elofsson, A.: SPOCTOPUS: a combined predictor of
signal peptides and membrane protein topology. Bioinformatics 24(24), 2928–2929 (2008)
78. Snider, C., Jayasinghe, S., Hristova, K., White, S.H.: MPEx: a tool for exploring membrane
proteins. Protein Sci. 18(12), 2624–2628 (2009). https://doi.org/10.1002/pro.256
79. Bernsel, A., Viklund, H., Hennerdal, A., Elofsson, A.: TOPCONS: consensus prediction of
membrane protein topology. Nucleic Acids Res. 37(Web Server issue), W465–468 (2009).
https://doi.org/10.1093/nar/gkp363
80. Tsirigos, K.D., Peters, C., Shu, N., Kall, L., Elofsson, A.: The TOPCONS web server for
consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Res.
43(W1), W401–W407 (2015). https://doi.org/10.1093/nar/gkv485
81. Klammer, M., Messina, D.N., Schmitt, T., Sonnhammer, E.L.: MetaTM—a consensus method
for transmembrane protein topology prediction. BMC Bioinform. 10, 314 (2009). https://doi.
org/10.1186/1471-2105-10-314
Modeling of Membrane Proteins 431
82. Ahmad, S., Singh, Y.H., Paudel, Y., Mori, T., Sugita, Y., Mizuguchi, K.: Integrated prediction
of one-dimensional structural features and their relationships with conformational flexibility
in helical membrane proteins. BMC Bioinform. 11, 533 (2010). https://doi.org/10.1186/1471-
2105-11-533
83. Jacoboni, I., Martelli, P.L., Fariselli, P., De Pinto, V., Casadio, R.: Prediction of the trans-
membrane regions of beta-barrel membrane proteins with a neural network-based predictor.
Protein Sci. 10(4), 779–787 (2001). https://doi.org/10.1110/ps.37201
84. Bagos, P.G., Liakopoulos, T.D., Spyropoulos, I.C., Hamodrakas, S.J.: PRED-TMBB: a web
server for predicting the topology of beta-barrel outer membrane proteins. Nucleic Acids Res.
32(Web Server issue), W400–404 (2004). https://doi.org/10.1093/nar/gkh417
85. Natt, N.K., Kaur, H., Raghava, G.P.: Prediction of transmembrane regions of beta-barrel
proteins using ANN- and SVM-based methods. Proteins: Struct. Funct. Bioinf. 56(1), 11–18
(2004). https://doi.org/10.1002/prot.20092
86. Bagos, P.G., Liakopoulos, T.D., Hamodrakas, S.J.: Evaluation of methods for predicting the
topology of beta-barrel outer membrane proteins and a consensus prediction method. BMC
Bioinform. 6, 7 (2005). https://doi.org/10.1186/1471-2105-6-7
87. Bigelow, H., Rost, B.: PROFtmb: a web server for predicting bacterial transmembrane beta
barrel proteins. Nucleic Acids Res. 34(Web Server issue), W186–188 (2006). https://doi.org/
10.1093/nar/gkl262
88. Waldispuhl, J., Berger, B., Clote, P., Steyaert, J.M.: Predicting transmembrane beta-barrels
and interstrand residue interactions from sequence. Proteins 65(1), 61–74 (2006). https://doi.
org/10.1002/prot.21046
89. Randall, A., Cheng, J., Sweredoski, M., Baldi, P.: TMBpro: secondary structure, beta-contact
and tertiary structure prediction of transmembrane beta-barrel proteins. Bioinformatics 24(4),
513–520 (2008). https://doi.org/10.1093/bioinformatics/btm548
90. Hayat, S., Elofsson, A.: BOCTOPUS: improved topology prediction of transmem-
brane beta barrel proteins. Bioinformatics 28(4), 516–522 (2012). https://doi.org/10.1093/
bioinformatics/btr710
91. Hayat, S., Peters, C., Shu, N., Tsirigos, K.D., Elofsson, A.: Inclusion of dyad-repeat pattern
improves topology prediction of transmembrane beta-barrel proteins. Bioinformatics 32(10),
1571–1573 (2016). https://doi.org/10.1093/bioinformatics/btw025
92. Eisenberg, D., Weiss, R.M., Terwilliger, T.C.: The hydrophobic moment detects periodicity
in protein hydrophobicity. Proc. Natl. Acad. Sci. U.S.A. 81(1), 140–144 (1984)
93. Claros, M.G., von Heijne, G.: TopPred II: an improved software for membrane protein struc-
ture predictions. Comput. Appl. Biosci. 10(6), 685–686 (1994)
94. Jayasinghe, S., Hristova, K., White, S.H.: Energetics, stability, and prediction of transmem-
brane helices. J. Mol. Biol. 312(5), 927–934 (2001). https://doi.org/10.1006/jmbi.2001.5008
95. Koehler, J., Woetzel, N., Staritzbichler, R., Sanders, C.R., Meiler, J.: A unified hydrophobicity
scale for multispan membrane proteins. Proteins 76(1), 13–29 (2009). https://doi.org/10.1002/
prot.22315
96. Deber, C.M., Wang, C., Liu, L.P., Prior, A.S., Agrawal, S., Muskat, B.L., Cuticchia, A.J.:
TM Finder: a prediction program for transmembrane protein segments using a combination
of hydrophobicity and nonpolar phase helicity scales. Protein Sci. 10(1), 212–219 (2001).
https://doi.org/10.1110/ps.30301
97. Zhou, H., Zhou, Y.: Predicting the topology of transmembrane helical proteins using mean
burial propensity and a hidden-Markov-model-based method. Protein Sci. 12(7), 1547–1555
(2003). https://doi.org/10.1110/ps.0305103
98. Ganapathiraju, M., Balakrishnan, N., Reddy, R., Klein-Seetharaman, J.: Transmembrane helix
prediction using amino acid property features and latent semantic analysis. BMC Bioinform.
9(Suppl 1), S4 (2008)
99. Persson, B., Argos, P.: Prediction of membrane protein topology utilizing multiple sequence
alignments. J. Protein Chem. 16(5), 453–457 (1997)
100. Shen, H., Chou, J.J.: MemBrain: improving the accuracy of predicting transmembrane helices.
PLoS ONE 3(6), e2399 (2008). https://doi.org/10.1371/journal.pone.0002399
432 D. Latek et al.
101. Cserzo, M., Bernassau, J.M., Simon, I., Maigret, B.: New alignment strategy for transmem-
brane proteins. J. Mol. Biol. 243(3), 388–396 (1994). https://doi.org/10.1006/jmbi.1994.1666
102. Kitsas, I.K., Panas, S.M., Hadjileontiadis, L.J.: Linear discrimination of transmembrane from
non-transmembrane segments in proteins using higher-order crossings. Conf Proc IEEE Eng
Med Biol Soc 1, 5818–5821 (2006)
103. Lio, P., Vannucci, M.: Wavelet change-point prediction of transmembrane proteins. Bioinfor-
matics 16(4), 376–382 (2000)
104. Nugent, T., Jones, D.T.: Transmembrane protein topology prediction using support vector
machines. BMC Bioinform. 10, 159 (2009). https://doi.org/10.1186/1471-2105-10-159
105. Osmanbeyoglu, H.U., Wehner, J.A., Carbonell, J.G., Ganapathiraju, M.K.: Active machine
learning for transmembrane helix prediction. BMC Bioinform. 11 Suppl 1, S58 (2010).
https://doi.org/10.1186/1471-2105-11-s1-s58
106. Schulz, G.E.: Beta-Barrel membrane proteins. Curr. Opin. Struct. Biol. 10(4), 443–447 (2000).
https://doi.org/10.1016/s0959-440x(00)00120-2
107. Bagos, P.G., Liakopoulos, T.D., Spyropoulos, I.C., Hamodrakas, S.J.: A Hidden Markov
Model method, capable of predicting and discriminating beta-barrel outer membrane proteins.
BMC Bioinform. 5, 29 (2004). https://doi.org/10.1186/1471-2105-5-29
108. Ou, Y., Chen, S., Gromiha, M.M.: Prediction of membrane spanning segments and topology
in β-barrel membrane proteins at better accuracy. J. Comput. Chem. 31(1), 217–223 (2010)
109. Gromiha, M.M., Suwa, M.: A simple statistical method for discriminating outer membrane
proteins with better accuracy. Bioinformatics 21(7), 961–968 (2005). https://doi.org/10.1093/
bioinformatics/bti126
110. Park, Y., Hayat, S., Helms, V.: Prediction of the burial status of transmembrane residues of
helical membrane proteins. BMC Bioinform. 8, 302 (2007). https://doi.org/10.1186/1471-
2105-8-302
111. Yuan, Z., Zhang, F., Davis, M.J., Boden, M., Teasdale, R.D.: Predicting the solvent accessi-
bility of transmembrane residues from protein sequence. J. Proteome Res. 5(5), 1063–1070
(2006). https://doi.org/10.1021/pr050397b
112. Illergard, K., Callegari, S., Elofsson, A.: MPRAP: an accessibility predictor for a-helical
transmembrane proteins that performs well inside and outside the membrane. BMC Bioinform.
11, 333 (2010). https://doi.org/10.1186/1471-2105-11-333
113. Beuming, T., Weinstein, H.: A knowledge-based scale for the analysis and prediction of buried
and exposed faces of transmembrane domain proteins. Bioinformatics 20(12), 1822–1835
(2004). https://doi.org/10.1093/bioinformatics/bth143
114. von Heijne, G.: Proline kinks in transmembrane alpha-helices. J. Mol. Biol. 218(3), 499–503
(1991)
115. Yohannan, S., Faham, S., Yang, D., Whitelegge, J.P., Bowie, J.U.: The evolution of trans-
membrane helix kinks and the structural diversity of G protein-coupled receptors. Proc. Natl.
Acad. Sci. U.S.A. 101(4), 959–963 (2004)
116. Meruelo, A.D., Samish, I., Bowie, J.U.: TMKink: a method to predict transmembrane helix
kinks. Protein Sci. 20(7), 1256–1264 (2011). https://doi.org/10.1002/pro.653
117. Kneissl, B., Mueller, S.C., Tautermann, C.S., Hildebrandt, A.: String kernels and high-quality
data set for improved prediction of kinked helices in alpha-helical membrane proteins. J.
Chem. Inf. Model. 51(11), 3017–3025 (2011). https://doi.org/10.1021/ci200278w
118. Göbel, U., Sander, C., Schneider, R., Valencia, A.: Correlated mutations and residue contacts
in proteins. Proteins: Struct. Funct. Bioinf. 18(4), 309–317 (1994)
119. Latek, D., Kolinski, A.: Contact prediction in protein modeling: scoring, folding and refine-
ment of coarse-grained models. BMC Struct. Biol. 8, 36 (2008). https://doi.org/10.1186/1472-
6807-8-36
120. Michino, M., Brooks 3rd, C.L.: Predicting structurally conserved contacts for homologous
proteins using sequence conservation filters. Proteins 77(2), 448–453 (2009). https://doi.org/
10.1002/prot.22456
121. Fuchs, A., Martin-Galiano, A.J., Kalman, M., Fleishman, S., Ben-Tal, N., Frishman, D.: Co-
evolving residues in membrane proteins. Bioinformatics 23(24), 3312–3319 (2007). https://
doi.org/10.1093/bioinformatics/btm515
Modeling of Membrane Proteins 433
122. Taylor, W.R., Jones, D.T., Green, N.M.: A method for alpha-helical integral membrane protein
fold prediction. Proteins 18(3), 281–294 (1994). https://doi.org/10.1002/prot.340180309
123. Walters, R.F., DeGrado, W.F.: Helix-packing motifs in membrane proteins. Proc. Natl. Acad.
Sci. U.S.A. 103(37), 13658–13663 (2006). https://doi.org/10.1073/pnas.0605878103
124. Langosch, D., Heringa, J.: Interaction of transmembrane helices by a knobs-into-holes packing
characteristic of soluble coiled coils. Proteins 31(2), 150–159 (1998)
125. Russ, W.P., Engelman, D.M.: The GxxxG motif: a framework for transmembrane helix-helix
association. J. Mol. Biol. 296(3), 911–919 (2000). https://doi.org/10.1006/jmbi.1999.3489
126. Pilpel, Y., Ben-Tal, N., Lancet, D.: kPROT: a knowledge-based scale for the propensity of
residue orientation in transmembrane segments. Application to membrane protein structure
prediction. J. Mol. Biol. 294(4), 921–935 (1999). https://doi.org/10.1006/jmbi.1999.3257
127. Lo, A., Chiu, Y.Y., Rodland, E.A., Lyu, P.C., Sung, T.Y., Hsu, W.L.: Predicting helix-helix
interactions from residue contacts in membrane proteins. Bioinformatics 25(8), 996–1003
(2009). https://doi.org/10.1093/bioinformatics/btp114
128. MacKenzie, K.R., Engelman, D.M.: Structure-based prediction of the stability of transmem-
brane helix-helix interactions: the sequence dependence of glycophorin A dimerization. Proc.
Natl. Acad. Sci. U.S.A. 95(7), 3583–3590 (1998)
129. Hildebrand, P.W., Lorenzen, S., Goede, A., Preissner, R.: Analysis and prediction of helix-
helix interactions in membrane channels and transporters. Proteins 64(1), 253–262 (2006).
https://doi.org/10.1002/prot.20959
130. Rose, A., Lorenzen, S., Goede, A., Gruening, B., Hildebrand, P.W.: RHYTHM–a server to
predict the orientation of transmembrane helices in channels and membrane-coils. Nucleic
Acids Res. 37(Web Server issue), W575–580 (2009). https://doi.org/10.1093/nar/gkp418
131. Isberg, V., de Graaf, C., Bortolato, A., Cherezov, V., Katritch, V., Marshall, F.H., Mordalski, S.,
Pin, J.P., Stevens, R.C., Vriend, G., Gloriam, D.E.: Generic GPCR residue numbers—aligning
topology maps while minding the gaps. Trends Pharmacol. Sci. 36(1), 22–31 (2015). https://
doi.org/10.1016/j.tips.2014.11.001
132. Kolinski, A., Skolnick, J.: Reduced models of proteins and their applications. Polymer 45(2),
511–524 (2004). https://doi.org/10.1016/j.polymer.2003.10.064
133. Yarov-Yarovoy, V., Schonbrun, J., Baker, D.: Multipass membrane protein structure prediction
using Rosetta. Proteins 62(4), 1010–1025 (2006). https://doi.org/10.1002/prot.20817
134. Wu, H.H., Chen, C.C., Chen, C.M.: Replica exchange Monte-Carlo simulations of helix
bundle membrane proteins: rotational parameters of helices. J. Comput. Aided Mol. Des.
26(3), 363–374 (2012). https://doi.org/10.1007/s10822-012-9562-1
135. Ueno, Y., Kawasaki, K., Saito, O., Arai, M., Suwa, M.: Folding elastic transmembrane helices
to fit in a low-resolution image by electron microscopy. J. Bioinform. Comput. Biol. 9(Suppl
1), 37–50 (2011)
136. Hurwitz, N., Pellegrini-Calace, M., Jones, D.T.: Towards genome-scale structure prediction
for transmembrane proteins. Philos. Trans. R. Soc. Lond. B Biol. Sci. 361(1467), 465–475
(2006). https://doi.org/10.1098/rstb.2005.1804
137. Porter, J.R., Weitzner, B.D., Lange, O.F.: A framework to simplify combined sampling strate-
gies in Rosetta. PLoS ONE 10(9), e0138220 (2015). https://doi.org/10.1371/journal.pone.
0138220
138. Weiner, B.E., Woetzel, N., Karakas, M., Alexander, N., Meiler, J.: BCL:MP-fold: folding
membrane proteins through assembly of transmembrane helices. Structure 21(7), 1107–1117
(2013). https://doi.org/10.1016/j.str.2013.04.022
139. Pellegrini-Calace, M., Carotti, A., Jones, D.T.: Folding in lipid membranes (FILM): a novel
method for the prediction of small membrane protein 3D structures. Proteins 50(4), 537–545
(2003). https://doi.org/10.1002/prot.10304
140. Pieper, U., Webb, B.M., Barkan, D.T., Schneidman-Duhovny, D., Schlessinger, A., Braberg,
H., Yang, Z., Meng, E.C., Pettersen, E.F., Huang, C.C., Datta, R.S., Sampathkumar, P., Mad-
husudhan, M.S., Sjolander, K., Ferrin, T.E., Burley, S.K., Sali, A.: ModBase, a database of
annotated comparative protein structure models, and associated resources. Nucleic Acids Res.
39(Database issue), D465–474 (2011). https://doi.org/10.1093/nar/gkq1091
434 D. Latek et al.
141. Kelm, S., Shi, J., Deane, C.M.: MEDELLER: homology-based coordinate generation
for membrane proteins. Bioinformatics 26(22), 2833–2840 (2010). https://doi.org/10.1093/
bioinformatics/btq554
142. Miszta, P., Pasznik, P., Jakowiecki, J., Sztyler, A., Latek, D., Filipek, S.: GPCRM: a homology
modelling web service with triple membrane-fitted quality assessment of GPCR models.
Nucleic Acids Res. 46(W1), W387–W395 (2018). https://doi.org/10.1093/nar/gky429
143. Rodríguez, D., Bello, X., Gutiérrez-de-Terán, H.: Molecular modelling of G protein-coupled
receptors through the web. Mol. Inform. 31(5), 334–341 (2012)
144. Sandal, M., Duy, T.P., Cona, M., Zung, H., Carloni, P., Musiani, F., Giorgetti, A.: GOMoDo:
a GPCRs online modeling and docking webserver. PLoS ONE 8(9), e74092 (2013). https://
doi.org/10.1371/journal.pone.0074092
145. Latek, D., Pasznik, P., Carlomagno, T., Filipek, S.: Towards improved quality of GPCR models
by usage of multiple templates and profile-profile comparison. PLoS ONE 8(2), e56742
(2013). https://doi.org/10.1371/journal.pone.0056742
146. Ng, P.C., Henikoff, J.G., Henikoff, S.: PHAT: a transmembrane-specific substitution matrix.
Predicted hydrophobic and transmembrane. Bioinformatics 16(9), 760–766 (2000)
147. Muller, T., Rahmann, S., Rehmsmeier, M.: Non-symmetric score matrices and the detection
of homologous transmembrane proteins. Bioinformatics 17(Suppl 1), S182–S189 (2001)
148. Jimenez-Morales, D., Adamian, L., Liang, J.: Detecting remote homologues using scoring
matrices calculated from the estimation of amino acid substitution rates of beta-barrel mem-
brane proteins. Conf. Proc. IEEE Eng. Med. Biol. Soc. 1347–1350 (2008)
149. Pirovano, W., Feenstra, K.A., Heringa, J.: PRALINETM: a strategy for improved multiple
alignment of transmembrane proteins. Bioinformatics 24(4), 492–497 (2008). https://doi.org/
10.1093/bioinformatics/btm636
150. Hill, J.R., Kelm, S., Shi, J., Deane, C.M.: Environment specific substitution tables improve
membrane protein alignment. Bioinformatics 27(13), i15–i23 (2011). https://doi.org/10.1093/
bioinformatics/btr230
151. Forrest, L.R., Tang, C.L., Honig, B.: On the accuracy of homology modeling and sequence
alignment methods applied to membrane proteins. Biophys. J. 91(2), 508–517 (2006). https://
doi.org/10.1529/biophysj.106.082313
152. Shafrir, Y., Guy, H.R.: STAM: simple transmembrane alignment method. Bioinformatics
20(5), 758–769 (2004). https://doi.org/10.1093/bioinformatics/btg482
153. Kufareva, I., Rueda, M., Katritch, V., Stevens, R.C., Abagyan, R.: Status of GPCR modeling
and docking as reflected by community-wide GPCR Dock 2010 assessment. Structure 19(8),
1108–1126 (2011)
154. Khafizov, K., Staritzbichler, R., Stamm, M., Forrest, L.R.: A study of the evolution of
inverted-topology repeats from LeuT-fold transporters using AlignMe. Biochemistry 49(50),
10702–10713 (2010). https://doi.org/10.1021/bi101256x
155. Rychlewski, L., Jaroszewski, L., Li, W., Godzik, A.: Comparison of sequence profiles. Strate-
gies for structural predictions using sequence information. Protein Sci. 9(2), 232–241 (2000).
https://doi.org/10.1110/ps.9.2.232
156. Fiser, A., Sali, A.: Modeller: generation and refinement of homology-based protein struc-
ture models. Methods Enzymol. 374, 461–491 (2003). https://doi.org/10.1016/S0076-
6879(03)74020-8
157. Krieger, E., Darden, T., Nabuurs, S.B., Finkelstein, A., Vriend, G.: Making optimal use of
empirical energy functions: Force-field parameterization in crystal space. Proteins: Struct.
Funct. Bioinf. 57(4), 678–683 (2004)
158. Schwede, T., Kopp, J., Guex, N., Peitsch, M.C.: SWISS-MODEL: an automated protein
homology-modeling server. Nucleic Acids Res. 31(13), 3381–3385 (2003)
159. Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E.,
DiMaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B.-H., Das, R., Grishin, N.V., Baker,
D.: Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins: Struct.
Funct. Bioinf. 77(S9), 89–99 (2009)
Modeling of Membrane Proteins 435
160. Zhang, Y.: I-TASSER server for protein 3D structure prediction. BMC Bioinform. 9, 40
(2008). https://doi.org/10.1186/1471-2105-9-40
161. Latek, D.: Rosetta Broker for membrane protein structure prediction: concentrative nucleoside
transporter 3 and corticotropin-releasing factor receptor 1 test cases. BMC Struct. Biol. 17(1),
8 (2017). https://doi.org/10.1186/s12900-017-0078-8
162. Recanatini, M., Cavalli, A., Masetti, M.: Modeling HERG and its interactions with drugs:
recent advances in light of current potassium channel simulations. ChemMedChem 3(4),
523–535 (2008). https://doi.org/10.1002/cmdc.200700264
163. Latek, D., Kolinski, M., Ghoshdastider, U., Debinski, A., Bombolewski, R., Plazinska, A.,
Jozwiak, K., Filipek, S.: Modeling of ligand binding to G protein coupled receptors: cannabi-
noid CB1, CB2 and adrenergic beta 2 AR. J. Mol. Model. 17(9), 2353–2366 (2011). https://
doi.org/10.1007/s00894-011-0986-7
164. Arora, B., Coudrat, T., Wootten, D., Christopoulos, A., Noronha, S.B., Sexton, P.M.: Prediction
of loops in G protein-coupled receptor homology models: effect of imprecise surroundings
and constraints. J. Chem. Inf. Model. 56(4), 671–686 (2016). https://doi.org/10.1021/acs.jcim.
5b00554
165. Shen, M.Y., Sali, A.: Statistical potential for assessment and prediction of protein structures.
Protein Sci. 15(11), 2507–2524 (2006). https://doi.org/10.1110/ps.062416606
166. Hildebrand, P.W., Goede, A., Bauer, R.A., Gruening, B., Ismer, J., Michalsky, E., Preissner,
R.: SuperLooper–a prediction server for the modeling of loops in globular and membrane
proteins. Nucleic Acids Res. 37(Web Server issue), W571–574 (2009). https://doi.org/10.
1093/nar/gkp338
167. Jamroz, M., Kolinski, A.: Modeling of loops in proteins: a multi-method approach. BMC
Struct. Biol. 10, 5 (2010). https://doi.org/10.1186/1472-6807-10-5
168. Canutescu, A.A., Dunbrack Jr., R.L.: Cyclic coordinate descent: a robotics algorithm for
protein loop closure. Protein Sci. 12(5), 963–972 (2003). https://doi.org/10.1110/ps.0242703
169. Kolinski, M., Filipek, S.: Study of a structurally similar kappa opioid receptor agonist and
antagonist pair by molecular dynamics simulations. J. Mol. Model. 16(10), 1567–1576 (2010).
https://doi.org/10.1007/s00894-010-0678-8
170. Mandell, D.J., Coutsias, E.A., Kortemme, T.: Sub-angstrom accuracy in protein loop recon-
struction by robotics-inspired conformational sampling. Nat. Methods 6(8), 551–552 (2009).
https://doi.org/10.1038/nmeth0809-551
171. Jacobson, M.P., Pincus, D.L., Rapp, C.S., Day, T.J., Honig, B., Shaw, D.E., Friesner, R.A.:
A hierarchical approach to all-atom protein loop prediction. Proteins 55(2), 351–367 (2004).
https://doi.org/10.1002/prot.10613
172. Heim, A.J., Li, Z.: Developing a high-quality scoring function for membrane protein structures
based on specific inter-residue interactions. J. Comput. Aided Mol. Des. 26(3), 301–309
(2012). https://doi.org/10.1007/s10822-012-9556-z
173. Ray, A., Lindahl, E., Wallner, B.: Model quality assessment for membrane proteins. Bioin-
formatics 26(24), 3067–3074 (2010). https://doi.org/10.1093/bioinformatics/btq581
174. Gao, C., Stern, H.A.: Scoring function accuracy for membrane protein structure prediction.
Proteins 68(1), 67–75 (2007). https://doi.org/10.1002/prot.21421
175. Law, R.J., Capener, C., Baaden, M., Bond, P.J., Campbell, J., Patargias, G., Arinaminpathy,
Y., Sansom, M.S.: Membrane protein structure quality in molecular dynamics simulation. J.
Mol. Graph. Model. 24(2), 157–165 (2005). https://doi.org/10.1016/j.jmgm.2005.05.006
176. Woetzel, N., Karakas, M., Staritzbichler, R., Muller, R., Weiner, B.E., Meiler, J.:
BCL:score–knowledge based energy potentials for ranking protein models represented by
idealized secondary structure elements. PLoS ONE 7(11), e49242 (2012). https://doi.org/10.
1371/journal.pone.0049242
177. Latek, D., Bajda, M., Filipek, S.: A hybrid approach to structure and function modeling of
G protein-coupled receptors. J. Chem. Inf. Model. 56(4), 630–641 (2016). https://doi.org/10.
1021/acs.jcim.5b00451
436 D. Latek et al.
178. Mordalski, S., Witek, J., Smusz, S., Rataj, K., Bojarski, A.J.: Multiple conformational
states in retrospective virtual screening—homology models vs. crystal structures: beta-2
adrenergic receptor case study. J. Cheminform. 7, 13 (2015). https://doi.org/10.1186/s13321-
015-0062-x
179. Coudrat, T., Simms, J., Christopoulos, A., Wootten, D., Sexton, P.M.: Improving virtual
screening of G protein-coupled receptors via ligand-directed modeling. PLoS Comput. Biol.
13(11), e1005819 (2017). https://doi.org/10.1371/journal.pcbi.1005819
180. Kufareva, I., Katritch, V., Participants of GPCR DOCK 2013, Stevens, R.C., Abagyan, R.:
Advances in GPCR modeling evaluated by the GPCR Dock 2013 assessment: meeting new
challenges. Structure 22(8), 1120–1139 (2014). https://doi.org/10.1016/j.str.2014.06.012
181. Bissantz, C., Bernard, P., Hibert, M., Rognan, D.: Protein-based virtual screening of chemical
databases. II. Are homology models of G-protein coupled receptors suitable targets? Proteins
50(1), 5–25 (2003). https://doi.org/10.1002/prot.10237
182. Barth, P., Schonbrun, J., Baker, D.: Toward high-resolution prediction and design of trans-
membrane helical protein structures. Proc. Natl. Acad. Sci. U.S.A. 104(40), 15682–15687
(2007). https://doi.org/10.1073/pnas.0702515104
183. Barth, P., Wallner, B., Baker, D.: Prediction of membrane protein structures with complex
topologies using limited constraints. Proc. Natl. Acad. Sci. U.S.A. 106(5), 1409–1414 (2009).
https://doi.org/10.1073/pnas.0808323106
184. Michino, M., Chen, J., Stevens, R.C., Brooks 3rd, C.L.: FoldGPCR: structure prediction
protocol for the transmembrane domain of G protein-coupled receptors from class A. Proteins
78(10), 2189–2201 (2010). https://doi.org/10.1002/prot.22731
185. Abrol, R., Griffith, A.R., Bray, J.K., Goddard, W.A.r.: Structure prediction of G protein-
coupled receptors and their ensemble of functionally important conformations. Complemen-
tary experimental and computational techniques to study membrane protein structure, dynam-
ics and interactions (Methods in Molecular Biology) (2011)
186. Shacham, S., Marantz, Y., Bar-Haim, S., Kalid, O., Warshaviak, D., Avisar, N., Inbal, B.,
Heifetz, A., Fichman, M., Topf, M., Naor, Z., Noiman, S., Becker, O.M.: PREDICT modeling
and in-silico screening for G-protein coupled receptors. Proteins 57(1), 51–86 (2004). https://
doi.org/10.1002/prot.20195
187. Abrol, R., Bray, J.K., Goddard 3rd, W.A.: Bihelix: towards de novo structure prediction of
an ensemble of G-protein coupled receptor conformations. Proteins 80(2), 505–518 (2011).
https://doi.org/10.1002/prot.23216
188. Trabanino, R.J., Hall, S.E., Vaidehi, N., Floriano, W.B., Kam, V.W., Goddard 3rd, W.A.: First
principles predictions of the structure and function of g-protein-coupled receptors: validation
for bovine rhodopsin. Biophys. J. 86(4), 1904–1921 (2004). https://doi.org/10.1016/S0006-
3495(04)74256-3
189. Chun, L., Zhang, W.H., Liu, J.F.: Structure and ligand recognition of class C GPCRs. Acta
Pharmacol. Sin. 33(3), 312–323 (2012). https://doi.org/10.1038/aps.2011.186
190. Nussinov, R., Tsai, C.J., Csermely, P.: Allo-network drugs: harnessing allostery in cellular net-
works. Trends Pharmacol. Sci. 32(12), 686–693 (2011). https://doi.org/10.1016/j.tips.2011.
08.004
191. Canals, M., Sexton, P.M., Christopoulos, A.: Allostery in GPCRs: ‘MWC’ revisited. Trends
Biochem. Sci. 36(12), 663–672 (2011). https://doi.org/10.1016/j.tibs.2011.08.005
192. Levinthal, C., Wodak, S.J., Kahn, P., Dadivanian, A.K.: Hemoglobin interaction in sickle cell
fibers. I: theoretical approaches to the molecular contacts. Proc Natl Acad Sci U S A 72(4),
1330–1334 (1975)
193. Brylinski, M., Konieczny, L., Roterman, I.: Ligation site in proteins recognized in silico.
Bioinformation 1(4), 127–129 (2006)
194. Soga, S., Shirai, H., Kobori, M., Hirayama, N.: Use of amino acid composition to predict
ligand-binding sites. J. Chem. Inf. Model. 47(2), 400–406 (2007). https://doi.org/10.1021/
Ci6002202
195. Koczyk, G., Wyrwicz, L.S., Rychlewski, L.: LigProf: a simple tool for in silico prediction of
ligand-binding sites. J. Mol. Model. 13(3), 445–455 (2007). https://doi.org/10.1007/s00894-
006-0165-4
Modeling of Membrane Proteins 437
196. Lo, Y.T., Wang, H.W., Pai, T.W., Tzou, W.S., Hsu, H.H., Chang, H.T.: Protein-ligand binding
region prediction (PLB-SAVE) based on geometric features and CUDA acceleration. BMC
Bioinform. 14 Suppl 4, S4 (2013). https://doi.org/10.1186/1471-2105-14-s4-s4
197. Chang, D.T., Weng, Y.Z., Lin, J.H., Hwang, M.J., Oyang, Y.J.: Protemot: prediction of protein
binding sites with automatically extracted geometrical templates. Nucleic Acids Res 34(Web
Server issue), W303–309 (2006). https://doi.org/10.1093/nar/gkl344
198. Dundas, J., Ouyang, Z., Tseng, J., Binkowski, A., Turpaz, Y., Liang, J.: CASTp: computed atlas
of surface topography of proteins with structural and topographical mapping of functionally
annotated residues. Nucleic Acids Res. 34, W116–W118 (2006). https://doi.org/10.1093/Nar/
Gkl282
199. Chang, D.T., Oyang, Y.J., Lin, J.H.: MEDock: a web server for efficient prediction of ligand
binding sites based on a novel optimization algorithm. Nucleic Acids Res. 33(Web Server
issue), W233–238 (2005)
200. Brady Jr., G.P., Stouten, P.F.: Fast prediction and visualization of protein binding pockets with
PASS. J. Comput. Aided Mol. Des. 14(4), 383–401 (2000)
201. Molecular Operating Environment (MOE), 2013.08. Chemical Computing Group ULC, 1010
Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7 (2017)
202. Dimitropoulos, D., Ionides, J., Henrick, K.: Using PDBeChem to search the PDB ligand
dictionary. Curr. Protoc. Bioinform. 14.13.11–14.13.13 (2006)
203. Irwin, J.J., Sterling, T., Mysinger, M.M., Bolstad, E.S., Coleman, R.G.: ZINC: a free tool to dis-
cover chemistry for biology. J. Chem. Inf. Model. (2012). https://doi.org/10.1021/ci3001277
204. Sterling, T., Irwin, J.J.: ZINC 15–Ligand discovery for everyone. J. Chem. Inf. Model. 55(11),
2324–2337 (2015). https://doi.org/10.1021/acs.jcim.5b00559
205. Li, Q., Cheng, T., Wang, Y., Bryant, S.H.: PubChem as a public resource for drug discovery.
Drug Discov. Today 15(23–24), 1052–1057 (2010). https://doi.org/10.1016/j.drudis.2010.10.
003
206. Kim, S., Thiessen, P.A., Bolton, E.E., Chen, J., Fu, G., Gindulyte, A., Han, L., He, J., He,
S., Shoemaker, B.A., Wang, J., Yu, B., Zhang, J., Bryant, S.H.: PubChem substance and
compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016). https://doi.org/10.
1093/nar/gkv951
207. Liu, T., Lin, Y., Wen, X., Jorissen, R.N., Gilson, M.K.: BindingDB: a web-accessible
database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res.
35(Database issue), D198–201 (2007). https://doi.org/10.1093/nar/gkl999
208. Gaulton, A., Bellis, L.J., Bento, A.P., Chambers, J., Davies, M., Hersey, A., Light, Y.,
McGlinchey, S., Michalovich, D., Al-Lazikani, B., Overington, J.P.: ChEMBL: a large-scale
bioactivity database for drug discovery. Nucleic Acids Res. 40(Database issue), D1100–1107
(2012). https://doi.org/10.1093/nar/gkr777
209. Zsoldos, Z., Reid, D., Simon, A., Sadjad, B.S., Johnson, A.P.: eHITS: an innovative approach
to the docking and scoring function problems. Curr. Protein Pept. Sci. 7(5), 421–435 (2006)
210. Vaque, M., Ardrevol, A., Blade, C., Salvado, M.J., Blay, M., Fernandez-Larrea, J., Arola,
L., Pujadas, G.: Protein-ligand docking: a review of recent advances and future perspectives.
Curr. Pharm. Anal. 4(1), 1–19 (2008)
211. Curco, D., Rodriguez-Ropero, F., Aleman, C.: Force-field parametrization of retro-inverso
modified residues: development of torsional and electrostatic parameters. J. Comput. Aided
Mol. Des. 20(1), 13–25 (2006). https://doi.org/10.1007/s10822-005-9032-0
212. Bohm, H.J.: The computer program LUDI: a new method for the de novo design of enzyme
inhibitors. J. Comput. Aided Mol. Des. 6(1), 61–78 (1992)
213. Ewing, T.J.A., Kuntz, I.D.: Critical evaluation of search algorithms for automated molecular
docking and database screening. J. Comput. Chem. 18(9), 1175–1189 (1997)
214. Rarey, M., Kramer, B., Lengauer, T., Klebe, G.: A fast flexible docking method using an
incremental construction algorithm. J. Mol. Biol. 261(3), 470–489 (1996)
215. Mizutani, M.Y., Tomioka, N., Itai, A.: Rational automatic search method for stable docking
models of protein and ligand. J. Mol. Biol. 243(2), 310–326 (1994)
438 D. Latek et al.
216. Halgren, T.A., Murphy, R.B., Friesner, R.A., Beard, H.S., Frye, L.L., Pollard, W.T., Banks,
J.L.: Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment fac-
tors in database screening. J. Med. Chem. 47(7), 1750–1759 (2004). https://doi.org/10.1021/
jm030644s
217. Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz, D.T., Repasky,
M.P., Knoll, E.H., Shelley, M., Perry, J.K., Shaw, D.E., Francis, P., Shenkin, P.S.: Glide: a
new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking
accuracy. J. Med. Chem. 47(7), 1739–1749 (2004). https://doi.org/10.1021/jm0306430
218. McGann, M.R., Almond, H.R., Nicholls, A., Grant, J.A., Brown, F.K.: Gaussian docking
functions. Biopolymers 68(1), 76–90 (2003). https://doi.org/10.1002/bip.10207
219. Abagyan, R., Totrov, M., Kuznetsov, D.: Icm - a new method for protein modeling and
design—applications to docking and structure prediction from the distorted native conforma-
tion. J. Comput. Chem. 15(5), 488–506 (1994)
220. McMartin, C., Bohacek, R.S.: QXP: powerful, rapid computer algorithms for structure-based
drug design. J. Comput. Aided Mol. Des. 11(4), 333–344 (1997)
221. Trosset, J.Y., Scheraga, H.A.: PRODOCK: software package for protein modeling and dock-
ing. J. Comput. Chem. 20(4), 412–427 (1999)
222. Liu, M., Wang, S.M.: MCDOCK: A Monte Carlo simulation approach to the molecular
docking problem. J. Comput. Aided Mol. Des. 13(5), 435–451 (1999)
223. Jones, G., Willett, P., Glen, R.C., Leach, A.R., Taylor, R.: Development and validation of a
genetic algorithm for flexible docking. J. Mol. Biol. 267(3), 727–748 (1997)
224. Namasivayam, V., Gunther, R.: A fast flexible molecular docking program based on swarm
intelligence. Chem. Biol. Drug Des. 70(6), 475–484 (2007). https://doi.org/10.1111/j.1747-
0285.2007.00588.x
225. Grosdidier, A., Zoete, V., Michielin, O.: SwissDock, a protein-small molecule docking web
service based on EADock DSS. Nucleic Acids Res. 39, W270–W277 (2011). https://doi.org/
10.1093/Nar/Gkr366
226. Pasznik, P., Rutkowska, E., Niewieczerzal, S., Cielecka-Piontek, J., Filipek, S., Latek, D.:
GUT-DOCK—a web-service to predict off-target interactions of drugs with gut hormone
GPCRs. Submitted
227. Labbe, C.M., Rey, J., Lagorce, D., Vavrusa, M., Becot, J., Sperandio, O., Villoutreix, B.O.,
Tuffery, P., Miteva, M.A.: MTiOpenScreen: a web server for structure-based virtual screening.
Nucleic Acids Res. 43(W1), W448–W454 (2015). https://doi.org/10.1093/nar/gkv306
228. Wang, R.X., Liu, L., Lai, L.H., Tang, Y.Q.: SCORE: a new empirical method for estimating
the binding affinity of a protein-ligand complex. J. Mol. Model. 4(12), 379–394 (1998)
229. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V., Mee, R.P.: Empirical scoring func-
tions.1. The development of a fast empirical scoring function to estimate the binding affinity
of ligands in receptor complexes. J. Comput. Aided Mol. Des. 11(5), 425–445 (1997)
230. Gohlke, H., Hendlich, M., Klebe, G.: Knowledge-based scoring function to predict protein-
ligand interactions. J. Mol. Biol. 295(2), 337–356 (2000)
231. DeWitte, R.S., Shakhnovich, E.: SMoG: De novo design method based on simple, fast and
accurate free energy estimates. Abstr. Pap. Am. Chem. Soc. 214, 6-Comp (1997)
232. DeWitte, R.S., Ishchenko, A.V., Shakhnovich, E.I.: SMoG: De novo design method based on
simple, fast, and accurate free energy estimates.2. Case studies in molecular design. J. Am.
Chem. Soc. 119(20), 4608–4617 (1997)
233. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Thornton, J.M.: BLEEP—potential of mean
force describing protein-ligand interactions: I. Generating potential. J. Comput. Chem. 20(11),
1165–1176 (1999)
234. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M.J., Thornton, J.M.: BLEEP - Potential
of mean force describing protein-ligand interactions: II. Calculation of binding energies and
comparison with experimental data. J. Comput. Chem. 20(11), 1177–1185 (1999)
235. Mooij, W.T.M., Verdonk, M.L.: General and targeted statistical potentials for protein-ligand
interactions. Proteins 61(2), 272–287 (2005). https://doi.org/10.1002/Prot.20588
Modeling of Membrane Proteins 439
236. Sherman, W., Day, T., Jacobson, M.P., Friesner, R.A., Farid, R.: Novel procedure for modeling
ligand/receptor induced fit effects. J. Med. Chem. 49(2), 534–553 (2006). https://doi.org/10.
1021/Jm050540c
237. Hanson, M.A., Roth, C.B., Jo, E., Griffith, M.T., Scott, F.L., Reinhart, G., Desale, H., Clemons,
B., Cahalan, S.M., Schuerer, S.C., Sanna, M.G., Han, G.W., Kuhn, P., Rosen, H., Stevens,
R.C.: Crystal structure of a lipid G protein-coupled receptor. Science 335(6070), 851–855
(2012). https://doi.org/10.1126/science.1215904
238. Shoichet, B.K., Kobilka, B.K.: Structure-based drug screening for G-protein-coupled recep-
tors. Trends Pharmacol. Sci. 33(5), 268–272 (2012). https://doi.org/10.1016/j.tips.2012.03.
007
239. Kandt, C., Schlitter, J., Gerwert, K.: Dynamics of water molecules in the bacteriorhodopsin
trimer in explicit lipid/water environment. Biophys. J. 86(2), 705–717 (2004). https://doi.org/
10.1016/S0006-3495(04)74149-1
240. Lemkul, J.A., Allen, W.J., Bevan, D.R.: Practical considerations for building GROMOS-
compatible small-molecule topologies. J. Chem. Inf. Model. 50(12), 2221–2235 (2010).
https://doi.org/10.1021/Ci100335w
241. Malde, A.K., Zuo, L., Breeze, M., Stroet, M., Poger, D., Nair, P.C., Oostenbrink, C., Mark,
A.E.: An automated force field topology builder (ATB) and repository: Version 1.0. J. Chem.
Theory Comput. 7(12), 4026–4037 (2011). https://doi.org/10.1021/ct200196m
242. Schuttelkopf, A.W., van Aalten, D.M.F.: PRODRG: a tool for high-throughput crystallogra-
phy of protein-ligand complexes. Acta Crystallogr. Sect. D-Biol. Crystallogr. 60, 1355–1363
(2004). https://doi.org/10.1107/S0907444904011679
243. Zoete, V., Cuendet, M.A., Grosdidier, A., Michielin, O.: SwissParam: a fast force field gener-
ation tool for small organic molecules. J. Comput. Chem. 32(11), 2359–2368 (2011). https://
doi.org/10.1002/jcc.21816
244. Vanommeslaeghe, K., Hatcher, E., Acharya, C., Kundu, S., Zhong, S., Shim, J., Darian, E.,
Guvench, O., Lopes, P., Vorobyov, I., Mackerell Jr., A.D.: CHARMM general force field: a
force field for drug-like molecules compatible with the CHARMM all-atom additive biological
force fields. J. Comput. Chem. 31(4), 671–690 (2010). https://doi.org/10.1002/jcc.21367
245. Ribeiro, A.A.S.T., Horta, B.A.C., de Alencastro, R.B.: MKTOP: a program for automatic
construction of molecular topologies. J. Brazil Chem. Soc. 19(7), 1433–1435 (2008)
246. Sousa da Silva, A.W.V., W.F.; Laue, E: ACPYPE—AnteChamber PYthon Parser interfacE.
In
247. Sousa da Silva, A.W., Vranken, W.F.: ACPYPE—anteChamber PYthon parser interfacE.
BMC Res. Notes 5, 367 (2012). https://doi.org/10.1186/1756-0500-5-367
248. Jakalian, A., Jack, D.B., Bayly, C.I.: Fast, efficient generation of high-quality atomic charges.
AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 23(16), 1623–1641
(2002). https://doi.org/10.1002/Jcc.10128
249. Caleman, C., van Maaren, P.J., Hong, M.Y., Hub, J.S., Costa, L.T., van der Spoel, D.: Force
field benchmark of organic liquids: density, enthalpy of vaporization, heat capacities, surface
tension, isothermal compressibility, volumetric expansion coefficient, and dielectric constant.
J. Chem. Theory Comput. 8(1), 61–74 (2012). https://doi.org/10.1021/Ct200731v
250. van der Spoel, D., van Maaren, P.J., Caleman, C.: GROMACS molecule & liquid database.
Bioinformatics 28(5), 752–753 (2012). https://doi.org/10.1093/bioinformatics/bts020
251. Domanski, J., Stansfeld, P.J., Sansom, M.S., Beckstein, O.: Lipidbook: a public repository
for force-field parameters used in membrane simulations. J. Membr. Biol. 236(3), 255–258
(2010). https://doi.org/10.1007/s00232-010-9296-8
252. Adamian, L., Naveed, H., Liang, J.: Lipid-binding surfaces of membrane proteins: evi-
dence from evolutionary and structural analysis. Biochim. Biophys. Acta 1808(4), 1092–1102
(2011). https://doi.org/10.1016/j.bbamem.2010.12.008
253. Opekarova, M., Tanner, W.: Specific lipid requirements of membrane proteins—a putative
bottleneck in heterologous expression. Biochim. Biophys. Acta-Biomembr. 1610(1), 11–22
(2003). https://doi.org/10.1016/S0005-2736(02)00708-3
440 D. Latek et al.
254. Sanders, C.R., Mittendorf, K.F.: Tolerance to changes in membrane lipid composition as a
selected trait of membrane proteins. Biochemistry 50(37), 7858–7867 (2011). https://doi.org/
10.1021/bi2011527
255. Berger, C., Ho, J.T.C., Kimura, T., Hess, S., Gawrisch, K., Yeliseev, A.: Preparation of stable
isotope-labeled peripheral cannabinoid receptor CB2 by bacterial fermentation. Protein Expr.
Purif. 70(2), 236–247 (2010). https://doi.org/10.1016/j.pep.2009.12.011
256. Soubias, O., Gawrisch, K.: The role of the lipid matrix for structure and function of the
GPCR rhodopsin. Biochim. Biophys. Acta 1818(2), 234–240 (2012). https://doi.org/10.1016/
j.bbamem.2011.08.034
257. Lee, S.Y., Lee, A., Chen, J.Y., MacKinnon, R.: Structure of the KvAP voltage-dependent
K+ channel and its dependence on the lipid membrane. Proc. Natl. Acad. Sci. U.S.A. 102(43),
15441–15446 (2005). https://doi.org/10.1073/pnas.0507651102
258. Oostenbrink, C., Villa, A., Mark, A.E., Van Gunsteren, W.F.: A biomolecular force field based
on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5
and 53A6. J. Comput. Chem. 25(13), 1656–1676 (2004). https://doi.org/10.1002/jcc.20090
259. Scott, W.R.P., Hunenberger, P.H., Tironi, I.G., Mark, A.E., Billeter, S.R., Fennen, J., Torda,
A.E., Huber, T., Kruger, P., van Gunsteren, W.F.: The GROMOS biomolecular simulation
program package. J. Phys. Chem. A 103(19), 3596–3607 (1999)
260. Foloppe, N., MacKerell, A.D.: All-atom empirical force field for nucleic acids: I. Parameter
optimization based on small molecule and condensed phase macromolecular target data. J.
Comput. Chem. 21(2), 86–104 (2000)
261. Klauda, J.B., Venable, R.M., Freites, J.A., O’Connor, J.W., Tobias, D.J., Mondragon-Ramirez,
C., Vorobyov, I., MacKerell Jr., A.D., Pastor, R.W.: Update of the CHARMM all-atom additive
force field for lipids: validation on six lipid types. J. Phys. Chem. B 114(23), 7830–7843
(2010). https://doi.org/10.1021/jp101759q
262. MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J.,
Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau,
F.T.K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D.T., Prodhom, B., Reiher, W.E., Roux,
B., Schlenkrich, M., Smith, J.C., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera,
J., Yin, D., Karplus, M.: All-atom empirical potential for molecular modeling and dynamics
studies of proteins. J. Phys. Chem. B 102(18), 3586–3616 (1998)
263. Wang, J.M., Wolf, R.M., Caldwell, J.W., Kollman, P.A., Case, D.A.: Development and testing
of a general amber force field. J. Comput. Chem. 25(9), 1157–1174 (2004)
264. Jorgensen, W.L., Maxwell, D.S., TiradoRives, J.: Development and testing of the OPLS all-
atom force field on conformational energetics and properties of organic liquids. J. Am. Chem.
Soc. 118(45), 11225–11236 (1996)
265. Kaminski, G.A., Friesner, R.A., Tirado-Rives, J., Jorgensen, W.L.: Evaluation and
reparametrization of the OPLS-AA force field for proteins via comparison with accurate
quantum chemical calculations on peptides. J. Phys. Chem. B 105(28), 6474–6487 (2001).
https://doi.org/10.1021/Jp003919d
266. Jambeck, J.P., Lyubartsev, A.P.: Derivation and systematic validation of a refined all-atom
force field for phosphatidylcholine lipids. J. Phys. Chem. B 116(10), 3164–3179 (2012).
https://doi.org/10.1021/jp212503e
267. Marrink, S.J., Risselada, H.J., Yefimov, S., Tieleman, D.P., de Vries, A.H.: The MARTINI
force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111(27),
7812–7824 (2007). https://doi.org/10.1021/jp071097f
268. Sansom, M.S.P., Scott, K.A., Bond, P.J.: Coarse-grained simulation: a high-throughput com-
putational approach to membrane proteins. Biochem. Soc. Trans. 36, 27–32 (2008). https://
doi.org/10.1042/Bst0360027
269. Scott, K.A., Bond, P.J., Ivetac, A., Chetwynd, A.P., Khalid, S., Sansom, M.S.P.: Coarse-
grained MD simulations of membrane protein-bilayer self-assembly. Structure 16(4), 621–630
(2008). https://doi.org/10.1016/j.str.2008.01.014
270. Berendsen, H.J.C., van der Spoel, D., van Drunen, R.: GROMACS: a message-passing parallel
molecular dynamics implementation. Comput. Phys. Commun. 91(1–3), 43–56 (1995)
Modeling of Membrane Proteins 441
271. Hess, B., Kutzner, C., van der Spoel, D., Lindahl, E.: GROMACS 4: algorithms for highly
efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 4(3),
435–447 (2008)
272. Lindahl, E., Hess, B., van der Spoel, D.: GROMACS 3.0: a package for molecular simulation
and trajectory analysis. J. Mol. Model. 7(8), 306–317 (2001)
273. Van der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E., Berendsen, H.J.C.:
GROMACS: fast, flexible, and free. J. Comput. Chem. 26(16), 1701–1718 (2005). https://
doi.org/10.1002/jcc.20291
274. Abraham, M.J., Murtola, T., Schulz, R., Páll, S., Smith, J.C., Hess, B., Lindahl, E.: GRO-
MACS: high performance molecular simulations through multi-level parallelism from lap-
tops to supercomputers. SoftwareX 1–2, 19–25 (2015). https://doi.org/10.1016/j.softx.2015.
06.001
275. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel,
R.D., Kale, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comput. Chem.
26(16), 1781–1802 (2005)
276. Brooks, B.R., III, C.L.B., Jr, A.D.M., Nilsson, L., Petrella, R.J., Roux, B., Won, Y., Archontis,
G., Bartels, C., Boresch, S., Caflisch, A., Caves, L., Cui, Q., Dinner, A.R., Feig, M., Fischer,
S., Gao, J., Hodoscek, M., Im, W., Kuczera, K., Lazaridis, T., Ma, J., Ovchinnikov, V., Paci,
E., Pastor, R.W., Post, C.B., Pu, J.Z., Schaefer, M., Tidor, B., Venable, R.M., Woodcock,
H.L., Wu, X., Yang, W., York, D.M., Karplus, M.: CHARMM: the biomolecular simulation
program. J. Comput. Chem. 30(10), 1545–1614 (2009)
277. Case, D.A., Cheatham, T.E., Darden, T., Gohlke, H., Luo, R., Merz, K.M., Onufriev, A.,
Simmerling, C., Wang, B., Woods, R.J.: The amber biomolecular simulation programs. J.
Comput. Chem. 26(16), 1668–1688 (2005). https://doi.org/10.1002/Jcc.20290
278. Jo, S., Kim, T., Iyer, V.G., Im, W.: CHARMM-GUI: a web-based graphical user interface for
CHARMM. J. Comput. Chem. 29(11), 1859–1865 (2008). https://doi.org/10.1002/jcc.20945
279. Jo, S., Lim, J.B., Klauda, J.B., Im, W.: CHARMM-GUI membrane builder for mixed bilayers
and its application to yeast membranes. Biophys. J. 97(1), 50–58 (2009). https://doi.org/10.
1016/j.bpj.2009.04.013
280. Jo, S., Kim, T., Im, W.: Automated builder and database of protein/membrane complexes
for molecular dynamics simulations. PLoS ONE 2(9), e880 (2007). https://doi.org/10.1371/
journal.pone.0000880
281. Humphrey, W., Dalke, A., Schulten, K.: VMD: visual molecular dynamics. J. Mol. Graph.
Model. 14(1), 33–38 (1996)
282. Kandt, C., Ash, W.L., Tieleman, D.P.: Setting up and running molecular dynamics simulations
of membrane proteins. Methods 41(4), 475–488 (2007). https://doi.org/10.1016/j.ymeth.2006.
08.006
283. Wolf, M.G., Hoefling, M., Aponte-Santamaria, C., Grubmuller, H., Groenhof, G.: g_membed:
efficient insertion of a membrane protein into an equilibrated lipid bilayer with minimal
perturbation. J. Comput. Chem. 31(11), 2169–2174 (2010). https://doi.org/10.1002/jcc.21507
284. Krieger, E., Darden, T., Nabuurs, S.B., Finkelstein, A., Vriend, G.: Making optimal use
of empirical energy functions: force-field parameterization in crystal space. Proteins 57(4),
678–683 (2004)
285. Wassenaar, T.A., Ingolfsson, H.I., Bockmann, R.A., Tieleman, D.P., Marrink, S.J.: Computa-
tional lipidomics with insane: a versatile tool for generating custom membranes for molecular
simulations. J. Chem. Theory Comput. 11(5), 2144–2155 (2015). https://doi.org/10.1021/acs.
jctc.5b00209
286. Wassenaar, T.A., Pluhackova, K., Bockmann, R.A., Marrink, S.J., Tieleman, D.P.: Going
backward: a flexible geometric approach to reverse transformation from coarse grained to
atomistic models. J. Chem. Theory Comput. 10(2), 676–690 (2014). https://doi.org/10.1021/
ct400617g
287. Stansfeld, P.J., Goose, J.E., Caffrey, M., Carpenter, E.P., Parker, J.L., Newstead, S., Sansom,
M.S.: MemProtMD: automated insertion of membrane protein structures into explicit lipid
membranes. Structure 23(7), 1350–1361 (2015). https://doi.org/10.1016/j.str.2015.05.006
442 D. Latek et al.
288. Qi, Y., Ingolfsson, H.I., Cheng, X., Lee, J., Marrink, S.J., Im, W.: CHARMM-GUI Martini
maker for coarse-grained simulations with the Martini force field. J. Chem. Theory Comput.
11(9), 4486–4494 (2015). https://doi.org/10.1021/acs.jctc.5b00513
289. Wu, E.L., Cheng, X., Jo, S., Rui, H., Song, K.C., Davila-Contreras, E.M., Qi, Y., Lee, J.,
Monje-Galvan, V., Venable, R.M., Klauda, J.B., Im, W.: CHARMM-GUI membrane builder
toward realistic biological membrane simulations. J. Comput. Chem. 35(27), 1997–2004
(2014). https://doi.org/10.1002/jcc.23702
290. Ribeiro, J.V., Bernardi, R.C., Rudack, T., Stone, J.E., Phillips, J.C., Freddolino, P.L., Schulten,
K.: QwikMD—integrative molecular dynamics toolkit for novices and experts. Sci. Rep. 6,
26536 (2016). https://doi.org/10.1038/srep26536
291. Humphrey, W., Dalke, A., Schulten, K.: VMD: visual molecular dynamics. J Mol Graph 14(1),
33–38, 27–38 (1996)
292. Doerr, S., Harvey, M.J., Noe, F., De Fabritiis, G.: HTMD: high-throughput molecular dynam-
ics for molecular discovery. J. Chem. Theory Comput. 12(4), 1845–1852 (2016). https://doi.
org/10.1021/acs.jctc.6b00049
293. Lu, H., Isralewitz, B., Krammer, A., Vogel, V., Schulten, K.: Unfolding of titin immunoglob-
ulin domains by steered molecular dynamics simulation. Biophys. J. 75(2), 662–671 (1998).
https://doi.org/10.1016/S0006-3495(98)77556-3
294. Kappel, C., Grubmuller, H.: Velocity-dependent mechanical unfolding of bacteriorhodopsin
is governed by a dynamic interaction network. Biophys. J. 100(4), 1109–1119 (2011). https://
doi.org/10.1016/j.bpj.2011.01.004
295. Grubmuller, H., Heymann, B., Tavan, P.: Ligand binding: molecular mechanics calculation
of the streptavidin-biotin rupture force. Science 271(5251), 997–999 (1996)
296. Wriggers, W., Schulten, K.: Stability and dynamics of G-actin: back-door water diffusion
and behavior of a subdomain 3/4 loop. Biophys. J. 73(2), 624–639 (1997). https://doi.org/10.
1016/S0006-3495(97)78098-6
297. Izrailev, S., Stepaniants, S., Isralewitz, B., Kosztin, D., Lu, H., Molnar, F., Wriggers, W.,
Schulten, K.: Steered molecular dynamics. In: Deuflhard, P., Hermans, J., Leimkuhler, B.,
Mark, A.E., Reich, S., Skeel, R.D. (eds.) Computational Molecular Dynamics: Challenges,
Methods, Ideas, vol. 4. pp. 39–65. Springer, Berlin (1998)
298. Izrailev, S., Stepaniants, S., Balsera, M., Oono, Y., Schulten, K.: Molecular dynamics study
of unbinding of the avidin-biotin complex. Biophys. J. 72(4), 1568–1581 (1997). https://doi.
org/10.1016/S0006-3495(97)78804-0
299. Fanelli, F., Seeber, M.: Structural insights into retinitis pigmentosa from unfolding simula-
tions of rhodopsin mutants. FASEB J. 24(9), 3196–3209 (2010). https://doi.org/10.1096/fj.
09-151084
300. Isralewitz, B., Izrailev, S., Schulten, K.: Binding pathway of retinal to bacterio-opsin: a pre-
diction by molecular dynamics simulations. Biophys. J. 73(6), 2972–2979 (1997). https://doi.
org/10.1016/S0006-3495(97)78326-7
301. Wroblowski, B., Diaz, J.F., Schlitter, J., Engelborghs, Y.: Modelling pathways of alpha-
chymotrypsin activation and deactivation. Protein Eng. 10(10), 1163–1174 (1997)
302. Cheng, X., Wang, H., Grant, B., Sine, S.M., McCammon, J.A.: Targeted molecular dynamics
study of C-loop closure and channel gating in nicotinic receptors. PLoS Comput. Biol. 2(9),
e134 (2006). https://doi.org/10.1371/journal.pcbi.0020134
303. Grayson, P., Tajkhorshid, E., Schulten, K.: Mechanisms of selectivity in channels and enzymes
studied with interactive molecular dynamics. Biophys. J. 85(1), 36–48 (2003). https://doi.org/
10.1016/S0006-3495(03)74452-X
304. Sabbadin, D., Moro, S.: Supervised molecular dynamics (SuMD) as a helpful tool to depict
GPCR-ligand recognition pathway in a nanosecond time scale. J. Chem. Inf. Model. 54(2),
372–376 (2014). https://doi.org/10.1021/ci400766b
305. Jakowiecki, J., Filipek, S.: Hydrophobic ligand entry and exit pathways of the CB1 cannabi-
noid receptor. J. Chem. Inf. Model. 56(12), 2457–2466 (2016). https://doi.org/10.1021/acs.
jcim.6b00499
Modeling of Membrane Proteins 443
306. Deganutti, G., Cuzzolin, A., Ciancetta, A., Moro, S.: Understanding allosteric interactions in G
protein-coupled receptors using supervised molecular dynamics: a prototype study analysing
the human A3 adenosine receptor positive allosteric modulator LUF6000. Bioorg. Med. Chem.
23(14), 4065–4071 (2015). https://doi.org/10.1016/j.bmc.2015.03.039
307. Deganutti, G., Moro, S.: Supporting the identification of novel fragment-based positive
allosteric modulators using a supervised molecular dynamics approach: a retrospective analy-
sis considering the human A2A adenosine receptor as a key example. Molecules 22(5) (2017).
https://doi.org/10.3390/molecules22050818
308. Paoletta, S., Sabbadin, D., von Kugelgen, I., Hinz, S., Katritch, V., Hoffmann, K., Abdelrah-
man, A., Strassburger, J., Baqi, Y., Zhao, Q., Stevens, R.C., Moro, S., Muller, C.E., Jacobson,
K.A.: Modeling ligand recognition at the P2Y12 receptor in light of X-ray structural infor-
mation. J. Comput. Aided Mol. Des. 29(8), 737–756 (2015). https://doi.org/10.1007/s10822-
015-9858-z
309. Cuzzolin, A., Sturlese, M., Deganutti, G., Salmaso, V., Sabbadin, D., Ciancetta, A., Moro, S.:
Deciphering the complexity of ligand-protein recognition pathways using supervised molec-
ular dynamics (SuMD) simulations. J. Chem. Inf. Model. 56(4), 687–705 (2016). https://doi.
org/10.1021/acs.jcim.5b00702
310. Fotiadis, D., Liang, Y., Filipek, S., Saperstein, D.A., Engel, A., Palczewski, K.: Atomic-force
microscopy: rhodopsin dimers in native disc membranes. Nature 421(6919), 127–128 (2003).
https://doi.org/10.1038/421127a
311. Gorman, P.M., Kim, S., Guo, M., Melnyk, R.A., McLaurin, J., Fraser, P.E., Bowie, J.U.,
Chakrabartty, A.: Dimerization of the transmembrane domain of amyloid precursor proteins
and familial Alzheimer’s disease mutants. BMC Neurosci. 9, 17 (2008). https://doi.org/10.
1186/1471-2202-9-17
312. George, S.R., O’Dowd, B.F., Lee, S.P.: G-protein-coupled receptor oligomerization and its
potential for drug discovery. Nat. Rev. Drug Discov. 1(10), 808–820 (2002). https://doi.org/
10.1038/nrd913
313. De Strooper, B.: Aph-1, Pen-2, and Nicastrin with Presenilin generate an active gamma-
Secretase complex. Neuron 38(1), 9–12 (2003)
314. Janin, J.: Protein-protein docking tested in blind predictions: the CAPRI experiment. Mol.
BioSyst. 6(12), 2351–2362 (2010). https://doi.org/10.1039/c005060c
315. Moreira, I.S., Fernandes, P.A., Ramos, M.J.: Protein-protein docking dealing with the
unknown. J. Comput. Chem. 31(2), 317–342 (2010). https://doi.org/10.1002/jcc.21276
316. Zacharias, M.: Accounting for conformational changes during protein-protein docking. Curr.
Opin. Struct. Biol. 20(2), 180–186 (2010). https://doi.org/10.1016/j.sbi.2010.02.001
317. Comeau, S.R., Gatchell, D.W., Vajda, S., Camacho, C.J.: ClusPro: a fully automated algorithm
for protein-protein docking. Nucleic Acids Res. 32(Web Server issue), W96–99 (2004).
https://doi.org/10.1093/nar/gkh354
318. Comeau, S.R., Gatchell, D.W., Vajda, S., Camacho, C.J.: ClusPro: an automated docking and
discrimination method for the prediction of protein complexes. Bioinformatics 20(1), 45–50
(2004)
319. Kozakov, D., Brenke, R., Comeau, S.R., Vajda, S.: PIPER: an FFT-based protein docking
program with pairwise potentials. Proteins 65(2), 392–406 (2006). https://doi.org/10.1002/
prot.21117
320. Kozakov, D., Beglov, D., Bohnuud, T., Mottarella, S.E., Xia, B., Hall, D.R., Vajda, S.: How
good is automated protein docking? Proteins 81(12), 2159–2166 (2013). https://doi.org/10.
1002/prot.24403
321. Kozakov, D., Hall, D.R., Xia, B., Porter, K.A., Padhorny, D., Yueh, C., Beglov, D., Vajda, S.:
The ClusPro web server for protein-protein docking. Nat. Protoc. 12(2), 255–278 (2017).
https://doi.org/10.1038/nprot.2016.169
322. Tovchigrechko, A., Vakser, I.A.: GRAMM-X public web server for protein-protein docking.
Nucleic Acids Res. 34(Web Server issue), W310–314 (2006). https://doi.org/10.1093/nar/
gkl206
444 D. Latek et al.
323. Pierce, B.G., Hourai, Y., Weng, Z.: Accelerating protein docking in ZDOCK using an advanced
3D convolution library. PLoS ONE 6(9), e24657 (2011). https://doi.org/10.1371/journal.pone.
0024657
324. Chen, R., Li, L., Weng, Z.: ZDOCK: an initial-stage protein-docking algorithm. Proteins
52(1), 80–87 (2003). https://doi.org/10.1002/prot.10389
325. Li, L., Chen, R., Weng, Z.: RDOCK: refinement of rigid-body protein docking predictions.
Proteins 53(3), 693–707 (2003). https://doi.org/10.1002/prot.10460
326. Chaudhury, S., Gray, J.J.: Conformer selection and induced fit in flexible backbone protein-
protein docking using computational and NMR ensembles. J. Mol. Biol. 381(4), 1068–1087
(2008). https://doi.org/10.1016/j.jmb.2008.05.042
327. Lyskov, S., Gray, J.J.: The RosettaDock server for local protein-protein docking. Nucleic
Acids Res. 36(Web Server issue), W233–238 (2008). https://doi.org/10.1093/nar/gkn216
328. Gray, J.J., Moughon, S., Wang, C., Schueler-Furman, O., Kuhlman, B., Rohl, C.A., Baker,
D.: Protein-protein docking with simultaneous optimization of rigid-body displacement and
side-chain conformations. J. Mol. Biol. 331(1), 281–299 (2003)
329. Lyskov, S., Chou, F.C., Conchuir, S.O., Der, B.S., Drew, K., Kuroda, D., Xu, J., Weitzner,
B.D., Renfrew, P.D., Sripakdeevong, P., Borgo, B., Havranek, J.J., Kuhlman, B., Kortemme,
T., Bonneau, R., Gray, J.J., Das, R.: Serverification of molecular modeling applications: the
Rosetta Online Server that Includes Everyone (ROSIE). PLoS ONE 8(5), e63906 (2013).
https://doi.org/10.1371/journal.pone.0063906
330. Chaudhury, S., Berrondo, M., Weitzner, B.D., Muthu, P., Bergman, H., Gray, J.J.: Benchmark-
ing and analysis of protein docking performance in Rosetta v3.2. PLoS ONE 6(8), e22477
(2011). https://doi.org/10.1371/journal.pone.0022477
331. de Vries, S.J., van Dijk, M., Bonvin, A.M.: The HADDOCK web server for data-driven
biomolecular docking. Nat. Protoc. 5(5), 883–897 (2010). https://doi.org/10.1038/nprot.2010.
32
332. Karaca, E., Melquiond, A.S., de Vries, S.J., Kastritis, P.L., Bonvin, A.M.: Building macro-
molecular assemblies by information-driven docking: introducing the HADDOCK multibody
docking server. Mol. Cell. Proteomics: MCP 9(8), 1784–1794 (2010). https://doi.org/10.1074/
mcp.M000051-MCP201
333. de Vries, S.J., van Dijk, A.D., Krzeminski, M., van Dijk, M., Thureau, A., Hsu, V., Wassenaar,
T., Bonvin, A.M.: HADDOCK versus HADDOCK: new features and performance of HAD-
DOCK2.0 on the CAPRI targets. Proteins 69(4), 726–733 (2007). https://doi.org/10.1002/
prot.21723
334. Dominguez, C., Boelens, R., Bonvin, A.M.: HADDOCK: a protein-protein docking approach
based on biochemical or biophysical information. J. Am. Chem. Soc. 125(7), 1731–1737
(2003). https://doi.org/10.1021/ja026939x
335. van Zundert, G.C.P., Rodrigues, J., Trellet, M., Schmitz, C., Kastritis, P.L., Karaca, E.,
Melquiond, A.S.J., van Dijk, M., de Vries, S.J., Bonvin, A.: The HADDOCK2.2 Web Server:
user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol. 428(4), 720–725
(2016). https://doi.org/10.1016/j.jmb.2015.09.014
336. Schneidman-Duhovny, D., Inbar, Y., Nussinov, R., Wolfson, H.J.: PatchDock and SymmDock:
servers for rigid and symmetric docking. Nucleic Acids Res. 33(Web Server issue), W363–367
(2005). https://doi.org/10.1093/nar/gki481
337. Casciari, D., Seeber, M., Fanelli, F.: Quaternary structure predictions of transmembrane pro-
teins starting from the monomer: a docking-based approach. BMC Bioinform. 7, 340 (2006).
https://doi.org/10.1186/1471-2105-7-340
338. Canals, M., Marcellino, D., Fanelli, F., Ciruela, F., de Benedetti, P., Goldberg, S.R., Neve,
K., Fuxe, K., Agnati, L.F., Woods, A.S., Ferre, S., Lluis, C., Bouvier, M., Franco, R.:
Adenosine A2A-dopamine D2 receptor-receptor heteromerization: qualitative and quantita-
tive assessment by fluorescence and bioluminescence energy transfer. J. Biol. Chem. 278(47),
46741–46749 (2003). https://doi.org/10.1074/jbc.M306451200
339. Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H., Fox, B.A., Le Trong,
I., Teller, D.C., Okada, T., Stenkamp, R.E., Yamamoto, M., Miyano, M.: Crystal structure of
rhodopsin: A G protein-coupled receptor. Science 289(5480), 739–745 (2000)
Modeling of Membrane Proteins 445
340. Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding
surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)
341. Madabushi, S., Gross, A.K., Philippi, A., Meng, E.C., Wensel, T.G., Lichtarge, O.: Evolution-
ary trace of G protein-coupled receptors reveals clusters of residues that determine global and
class-specific functions. J. Biol. Chem. 279(9), 8126–8132 (2004). https://doi.org/10.1074/
jbc.M312671200
342. Gouldson, P.R., Higgs, C., Smith, R.E., Dean, M.K., Gkoutos, G.V., Reynolds, C.A.: Dimer-
ization and domain swapping in g-protein-coupled receptors: a computational study. Neu-
ropsychopharmacology 23(4), S60–S77 (2000)
343. Dean, M.K., Higgs, C., Smith, R.E., Bywater, R.P., Snell, C.R., Scott, P.D., Upton, G.J.G.,
Howe, T.J., Reynolds, C.A.: Dimerization of G-protein-coupled receptors. J. Med. Chem.
44(26), 4595–4614 (2001)
344. Gobel, U., Sander, C., Schneider, R., Valencia, A.: Correlated mutations and residue contacts
in proteins. Proteins 18(4), 309–317 (1994)
345. Gouldson, P.R., Dean, M.K., Snell, C.R., Bywater, R.P., Gkoutos, G., Reynolds, C.A.: Lipid-
facing correlated mutations and dimerization in G-protein coupled receptors. Protein Eng.
14(10), 759–767 (2001)
346. Filizola, M., Olmea, O., Weinstein, H.: Prediction of heterodimerization interfaces of G-
protein coupled receptors with a new subtractive correlated mutation method. Protein Eng.
15(11), 881–885 (2002)
347. Park, K., Kim, D.: Structure-based rebuilding of coevolutionary information reveals functional
modules in rhodopsin structure. Biochim. Biophys. Acta (2012). https://doi.org/10.1016/j.
bbapap.2012.05.015
348. Noivirt, O., Eisenstein, M., Horovitz, A.: Detection and reduction of evolutionary noise in
correlated mutation analysis. Protein Eng. Des. Sel. 18(5), 247–253 (2005). https://doi.org/
10.1093/protein/gzi029
349. Roux, B.: Implicit solvent models. In: Becker, O.M., MacKerell Jr, A.D., Roux, B. (eds.)
Computational Biochemistry and Biophysics. CRC Press (2001)
350. Jackson, J.D.: Classical Electrodynamics. New York (1975)
351. Landau, L.D., Lifshitz, E.M., Pitaevskii, L.P.: Electrodynamics of Continuous Media.
Butterworth-Heinenann, Boston (1982)
352. Still, W.C., Tempczyk, A., Hawley, R.C., Hendrickson, T.: Semianalytical treatment of sol-
vation for molecular mechanics and dynamics. J. Am. Chem. Soc. 112, 6127–6129 (1990)
353. Lee, B., Richards, F.M.: The interpretation of protein structures: estimation of static accesi-
bility. J. Mol. Biol. 55, 379–400 (1971)
354. Lee, M.S., Salsbury, F.R., Brooks, C.L.: Novel generalized Born methods. J. Chem. Phys.
116(24), 10606–10614 (2002). https://doi.org/10.1063/1.1480013
355. Gallicchio, E., Levy, R.M.: AGBNP: an analytic implicit solvent model suitable for molec-
ular dynamics simulations and high-resolution modeling. J. Comput. Chem. 25(4), 479–499
(2004). https://doi.org/10.1002/Jcc.10400
356. Lee, M.S., Feig, M., Salsbury, F.R., Brooks, C.L.: New analytic approximation to the standard
molecular volume definition and its application to generalized born calculations. J. Comput.
Chem. 24(11), 1348–1356 (2003). https://doi.org/10.1002/Jcc.10272
357. Lazaridis, T., Karplus, M.: Effective energy function for proteins in solution. Proteins 35(2),
133–152 (1999)
358. Spassov, V.Z., Yan, L., Szalma, S.: Introducing an implicit membrane in generalized
Born/solvent accessibility continuum solvent models. J. Phys. Chem. B 106(34), 8726–8738
(2002). https://doi.org/10.1021/Jp020674r
359. Tanizaki, S., Feig, M.: A generalized Born formalism for heterogeneous dielectric environ-
ments: Application to the implicit modeling of biological membranes. J. Chem. Phys. 122(12)
(2005). doi:Artn 124706. https://doi.org/10.1063/1.1865992
360. Lazaridis, T.: Effective energy function for proteins in lipid membranes. Proteins 52(2),
176–192 (2003)
446 D. Latek et al.
361. Lazaridis, T., Karplus, M.: Discrimination of the native from misfolded protein models with
an energy function including implicit solvation. J. Mol. Biol. 288(3), 477–487 (1999)
362. Felts, A.K., Gallicchio, E., Wallqvist, A., Levy, R.M.: Distinguishing native conformations
of proteins from decoys with an effective free energy estimator based on the OPLS all-atom
force field and the surface generalized born solvent model. Proteins 48(2), 404–422 (2002).
https://doi.org/10.1002/Prot.10171
363. Rohl, C.A., Strauss, C.E., Misura, K.M., Baker, D.: Protein structure prediction using Rosetta.
Methods Enzymol. 383, 66–93 (2004). https://doi.org/10.1016/S0076-6879(04)83004-0
364. Davis, I.W., Baker, D.: RosettaLigand docking with full ligand and receptor flexibility. J. Mol.
Biol. 385(2), 381–392 (2009). https://doi.org/10.1016/j.jmb.2008.11.010
365. Im, W., Feig, M., Brooks, C.L.: An implicit membrane generalized born theory for the study
of structure, stability, and interactions of membrane proteins. Biophys. J. 85(5), 2900–2918
(2003)
366. Im, W., Brooks, C.L.: Interfacial folding and membrane insertion of designed peptides studied
by molecular dynamics simulations. Proc. Natl. Acad. Sci. U.S.A. 102(19), 6771–6776 (2005).
https://doi.org/10.1073/pnas.0408135102
367. Ulmschneider, J.P., Ulmschneider, M.B.: Folding Simulations of the transmembrane helix of
virus protein U in an implicit membrane model. J. Chem. Theory Comput. 3(6), 2335–2346
(2007). https://doi.org/10.1021/Ct700103k
368. Mottamal, M., Lazaridis, T.: Voltage-dependent energetics of alamethicin monomers in the
membrane. Biophys. Chem. 122(1), 50–57 (2006). https://doi.org/10.1016/j.bpc.2006.02.005
369. Seeber, M., Fanelli, F., Paci, E., Caflisch, A.: Sequential unfolding of individual helices of
bacterioopsin observed in molecular dynamics simulations of extraction from the purple mem-
brane. Biophys. J. 91(9), 3276–3284 (2006). https://doi.org/10.1529/biophysj.106.088591
370. Park, P.S.H., Sapra, K.T., Jastrzebska, B., Maeda, T., Maeda, A., Pulawski, W., Kono, M.,
Lem, J., Crouch, R.K., Filipek, S., Muller, D.J., Palczewski, K.: Modulation of molecular
interactions and function by rhodopsin palmitylation. Biochemistry 48(20), 4294–4304 (2009)
371. Ewald, P.P.: Die Berchnung optischer und elektrostatischer Gitterpotentiale. Ann. Phys. 64,
253–287 (1921)
372. Zhan, H., Lazaridis, T.: Influence of the membrane dipole potential on peptide binding to lipid
bilayers. Biophys. Chem. 161, 1–7 (2012). https://doi.org/10.1016/j.bpc.2011.10.002
373. Zagrovic, B., Pande, V.: Solvent viscosity dependence of the folding rate of a small protein:
distributed computing study. J. Comput. Chem. 24(12), 1432–1436 (2003). https://doi.org/
10.1002/Jcc.10297
374. Lee, M.S., Olson, M.A.: Evaluation of poisson solvation models using a hybrid
explicit/implicit solvent method. J. Phys. Chem. B 109(11), 5223–5236 (2005). https://doi.
org/10.1021/Jp046377z
375. Kelly, C.P., Cramer, C.J., Truhlar, D.G.: Adding explicit solvent molecules to continuum
solvent calculations for the calculation of aqueous acid dissociation constants. J. Phys. Chem.
A 110(7), 2493–2499 (2006). https://doi.org/10.1021/J055336f
376. Stagg, S.M., Harvey, S.C.: Exploring the flexibility of ribosome recycling factor using molec-
ular dynamics. Biophys. J. 89(4), 2659–2666 (2005). https://doi.org/10.1529/biophysj.104.
052373
377. Bast, T., Hentschke, R.: Molecular dynamics simulation of a micellar system. J. Mol. Model.
2(9), 330–340 (1996)
378. Freddolino, P.L., Arkhipov, A.S., Larson, S.B., McPherson, A., Schulten, K.: Molecular
dynamics simulations of the complete satellite tobacco mosaic virus. Structure 14(3), 437–449
(2006). https://doi.org/10.1016/j.str.2005.11.014
379. Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein
folding. J. Mol. Biol. 104(1), 59–107 (1976)
380. Levitt, M., Warshel, A.: Computer simulation of protein folding. Nature 253(5494), 694–698
(1975)
381. Levinthal, C.: Are there pathways for protein folding? J. Chim. Phys. 65, 44–45 (1968)
Modeling of Membrane Proteins 447
382. Taketomi, H., Ueda, Y., Go, N.: Studies on protein folding, unfolding and fluctuations by
computer simulation. I. The effect of specific amino acid sequence represented by specific
inter-unit interactions. Int. J. Pept. Protein Res. 7(6), 445–459 (1975)
383. Ueda, Y., Taketomi, H., Gō, N.: Studies on protein folding, unfolding, and fluctuations by
computer simulation. II. A. Three-dimensional lattice model of lysozyme. Biopolymers 17(6),
1531–1548 (1978)
384. Go, N., Taketomi, H.: Studies on protein folding, unfolding and fluctuations by computer
simulation. III. Effect of short-range interactions. Int. J. Pept. Protein Res. 13(3), 235–252
(1979)
385. Go, N., Taketomi, H.: Studies on protein folding, unfolding and fluctuations by computer
simulation. IV. Hydrophobic interactions. Int. J. Pept. Protein Res. 13(5), 447–461 (1979)
386. Gay, J.G., Berne, B.J.: Modification of the overlap potential to mimic a linear site-site potential.
J. Chem. Phys. 74(6), 3316–3319 (1981)
387. Berne, B.J., Pechukas, P.: Gaussian model potentials for molecular interactions. J. Chem.
Phys. 56(8), 4213–4216 (1972)
388. Smith, G.D., Paul, W.: United atom force field for molecular dynamics simulations of 1,4-
Polybutadiene based on quantum chemistry calculations on model molecules. J. Phys. Chem.
A 102(7), 1200–1208 (1998)
389. Kale, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., Phillips, J., Shi-
nozaki, A., Varadarajan, K., Schulten, K.: NAMD2: greater scalability for parallel molecular
dynamics. J. Comput. Phys. 151(1), 283–312 (1999)
390. Takada, S.: Coarse-grained molecular simulations of large biomolecules. Curr. Opin. Struct.
Biol. 22(2), 130–137 (2012)
391. Tozzini, V.: Coarse-grained models for proteins. Curr. Opin. Struct. Biol. 15(2), 144–150
(2005)
392. Rader, A.J.: Coarse-grained models: getting more with less. Curr. Opin. Pharmacol. 10(6),
753–759 (2010)
393. Lindahl, E., Sansom, M.S.: Membrane proteins: molecular dynamics simulations. Curr. Opin.
Struct. Biol. 18(4), 425–431 (2008)
394. Shrivastava, I.H., Bahar, I.: Common mechanism of pore opening shared by five different
potassium channels. Biophys. J. 90(11), 3929–3940 (2006)
395. Cieplak, M., Filipek, S., Janovjak, H., Krzysko, K.A.: Pulling single bacteriorhodopsin out of
a membrane: comparison of simulation and experiment. Biochem. Biophys. Acta. 1758(4),
537–544 (2006)
396. Orlandini, E., Seno, F., Banavar, J.R., Laio, A., Maritan, A.: Deciphering the folding kinetics
of transmembrane helical proteins. Proc. Natl. Acad. Sci. U.S.A. 97(26), 14229–14234 (2000)
397. Marrink, S.J., de Vries, A.H., Mark, A.E.: Coarse grained model for semiquantitative lipid
simulations. J. Phys. Chem. B 108(2), 750–760 (2004)
398. Monticelli, L., Kandasamy, S.K., Periole, X., Larson, R.G., Tieleman, D.P., Marrink, S.-J.:
The MARTINI coarse-grained force field: extension to proteins. J. Chem. Theory Comput.
4(5), 819–834 (2008). https://doi.org/10.1021/ct700324x
399. Yesylevskyy, S.O., Schafer, L.V., Sengupta, D., Marrink, S.J.: Polarizable water model for
the coarse-grained MARTINI force field. PLoS Comput. Biol. 6(6), e1000810 (2010)
400. Holdbrook, D.A., Leung, Y.M., Piggot, T.J., Marius, P., Williamson, P.T., Khalid, S.: Stability
and membrane orientation of the fukutin transmembrane domain: a combined multiscale
molecular dynamics and circular dichroism study. Biochemistry 49(51), 10796–10802 (2010)
401. Schafer, L.V., de Jong, D.H., Holt, A., Rzepiela, A.J., de Vries, A.H., Poolman, B., Killian,
J.A., Marrink, S.J.: Lipid packing drives the segregation of transmembrane helices into disor-
dered lipid domains in model membranes. Proc. Natl. Acad. Sci. U.S.A. 108(4), 1343–1348
(2010)
402. Periole, X., Huber, T., Marrink, S.J., Sakmar, T.P.: G protein-coupled receptors self-assemble
in dynamics simulations of model bilayers. J. Am. Chem. Soc. 129(33), 10126–10132 (2007)
403. Bond, P.J., Sansom, M.S.P.: Bilayer deformation by the Kv channel voltage sensor domain
revealed by self-assembly simulations. Proc Natl Acad Sci USA 104(8), 2631–2636 (2007).
https://doi.org/10.1073/pnas.0606822104
448 D. Latek et al.
404. Arnarez, C., Uusitalo, J.J., Masman, M.F., Ingolfsson, H.I., de Jong, D.H., Melo, M.N., Periole,
X., de Vries, A.H., Marrink, S.J.: Dry Martini, a coarse-grained force field for lipid membrane
simulations with implicit solvent. J. Chem. Theory Comput. 11(1), 260–275 (2015). https://
doi.org/10.1021/ct500477k
405. Shih, A.Y., Arkhipov, A., Freddolino, P.L., Schulten, K.: Coarse grained protein-lipid model
with application to lipoprotein particles. J. Phys. Chem. B 110(8), 3674–3684 (2006)
406. Spijker, P., van Hoof, B., Debertrand, M., Markvoort, A.J., Vaidehi, N., Hilbers, P.A.: Coarse
grained molecular dynamics simulations of transmembrane protein-lipid systems. Int. J. Mol.
Sci. 11(6), 2393–2420 (2010)
407. Markvoort, A.J., Pieterse, K., Steijaert, M.N., Spijker, P., Hilbers, P.A.: The bilayer-vesicle
transition is entropy driven. J. Phys. Chem. B 109(47), 22649–22654 (2005)
408. Kar, P., Gopal, S.M., Cheng, Y.M., Panahi, A., Feig, M.: Transferring the PRIMO coarse-
grained force field to the membrane environment: simulations of membrane proteins and
helix-helix association. J. Chem. Theory Comput. 10(8), 3459–3472 (2014). https://doi.org/
10.1021/ct500443v
409. Kar, P., Gopal, S.M., Cheng, Y.M., Predeus, A., Feig, M.: PRIMO: a transferable coarse-
grained force field for proteins. J. Chem. Theory Comput. 9(8), 3769–3788 (2013). https://
doi.org/10.1021/ct400230y
410. Kar, P., Feig, M.: Hybrid all-atom/coarse-grained simulations of proteins by direct coupling
of CHARMM and PRIMO force fields. J. Chem. Theory Comput. 13(11), 5753–5765 (2017).
https://doi.org/10.1021/acs.jctc.7b00840
411. Májek, P., Elber, R.: A coarse-grained potential for fold recognition and molecular dynamics
simulations of proteins. Proteins: Struct. Funct. Bioinf. 76(4), 822–836 (2009). https://doi.
org/10.1002/prot.22388
412. Terstegen, F., Buss, V.: All-trans- and 11-cis-retinal, their N-methyl Schiff base and N-methyl
protonated Schiff base derivatives: a comparative ab initio study. Theochem-J Mol Struc 369,
53–65 (1996)
413. Terstegen, F., Buss, V.: Geometries and interconversion pathways of free and protonated beta-
ionone Schiff bases. An ab initio study of photoreceptor chromophore model compounds.
Chem. Phys. 225(1–3), 163–171 (1997). https://doi.org/10.1016/s0301-0104(97)00194-8
414. Terstegen, F., Carter, E.A., Buss, V.: Interconversion pathways of the protonated beta-ionone
Schiff base: An ab initio molecular dynamics study. Int. J. Quantum Chem. 75(3), 141–145
(1999). https://doi.org/10.1002/(sici)1097-461x(1999)75:3%3c141::aid-qua4%3e3.3.co;2-0
415. Terstegen, F., Buss, V.: Influence of DFT-calculated electron correlation on energies and
geometries of retinals and of retinal derivatives related to the bacteriorhodopsin and rhodopsin
chromophores. Theochem-J. Mol. Struc. 430, 209–218 (1998)
416. Bifone, A., deGroot, H.J.M., Buda, F.: Ab initio molecular dynamics of retinals. Chem. Phys.
Lett. 248(3–4), 165–172 (1996). https://doi.org/10.1016/0009-2614(95)01312-1
417. Buda, F., deGroot, H.J.M., Bifone, A.: Charge localization and dynamics in rhodopsin. Phys.
Rev. Lett. 77(21), 4474–4477 (1996). https://doi.org/10.1103/PhysRevLett.77.4474
418. Bifone, A., deGroot, H.J.M., Buda, F.: Energy storage in the primary photoproduct of vision.
J. Phys. Chem. B 101(15), 2954–2958 (1997). https://doi.org/10.1021/jp9623397
419. La Penna, G., Buda, F., Bifone, A., de Groot, H.J.M.: The transition state in the isomeriza-
tion of rhodopsin. Chem. Phys. Lett. 294(6), 447–453 (1998). https://doi.org/10.1016/s0009-
2614(98)00870-7
420. Garavelli, M., Negri, F., Olivucci, M.: Initial excited-state relaxation of the isolated 11-cis
protonated schiff base of retinal: evidence for in-plane motion from ab initio quantum chemical
simulation of the resonance Raman spectrum. J. Am. Chem. Soc. 121(5), 1023–1029 (1999).
https://doi.org/10.1021/ja981719y
421. Gozem, S., Melaccio, F., Lindh, R., Krylov, A.I., Granovsky, A.A., Angeli, C., Olivucci,
M.: Mapping the excited state potential energy surface of a retinal chromophore model with
multireference and equation-of-motion coupled-cluster methods. J. Chem. Theory Comput.
9(10), 4495–4506 (2013). https://doi.org/10.1021/ct400460h
Modeling of Membrane Proteins 449
422. Sugihara, M., Buss, V., Entel, P., Elstner, M., Frauenheim, T.: 11-cis-retinal protonated Schiff
base: influence of the protein environment on the geometry of the rhodopsin chromophore.
Biochemistry 41(51), 15259–15266 (2002). https://doi.org/10.1021/bi020533f
423. Elstner, M., Porezag, D., Jungnickel, G., Elsner, J., Haugk, M., Frauenheim, T., Suhai, S.,
Seifert, G.: Self-consistent-charge density-functional tight-binding method for simulations
of complex materials properties. Phys. Rev. B 58(11), 7260–7268 (1998). https://doi.org/10.
1103/PhysRevB.58.7260
424. Hufen, J., Sugihara, M., Buss, V.: How the counterion affects ground- and excited-state prop-
erties of the rhodopsin chromophore. J. Phys. Chem. B 108(52), 20419–20426 (2004). https://
doi.org/10.1021/jp046147k
425. Tachikawa, H., Kawabata, H.: Effects of the residues on the excitation energies of protonated
Schiff base of retinal (PSBR) in bR: A TD-DFT study. J. Photochem. Photobiol. B-Biol.
79(3), 191–195 (2005). https://doi.org/10.1016/j.jphotobiol.2005.01.004
426. Sugihara, M., Buss, V., Entel, P., Hafner, J.: The nature of the complex counterion of the
chromophore in rhodopsin. J. Phys. Chem. B 108(11), 3673–3680 (2004). https://doi.org/10.
1021/jp0362786
427. Blomgren, F., Larsson, S.: Exploring the potential energy surface of retinal, a comparison of
the performance of different methods. J. Comput. Chem. 26(7), 738–742 (2005). https://doi.
org/10.1002/jcc.20210
428. Maseras, F., Morokuma, K.: IMOMM—a new integrated ab-initio plus molecular mechanics
geometry optimization scheme of equilibrium structures and transition-states. J. Comput.
Chem. 16(9), 1170–1179 (1995). https://doi.org/10.1002/jcc.540160911
429. Warshel, A., Levitt, M.: Theoretical studies of enzymic reactions—dielectric, electrostatic and
steric stabilization of carbonium-ion in reaction of lysozyme. J. Mol. Biol. 103(2), 227–249
(1976). https://doi.org/10.1016/0022-2836(76)90311-9
430. Gascon, J.A., Batista, V.S.: QM/MM study of energy storage and molecular rearrangements
due to the primary event in vision. Biophys. J. 87(5), 2931–2941 (2004)
431. Gascon, J.A., Sproviero, E.M., Batista, V.S.: QM/MM study of the NMR spectroscopy of the
retinyl chromophore in visual rhodopsin. J. Chem. Theory Comput. 1(4), 674–685 (2005).
https://doi.org/10.1021/ct0500850
432. Gascon, J.A., Sproviero, E.M., Batista, V.S.: Computational studies of the primary photo-
transduction event in visual rhodopsin. Acc. Chem. Res. 39(3), 184–193 (2006). https://doi.
org/10.1021/ar050027t
433. Illingworth, C.J.R., Gooding, S.R., Winn, P.J., Jones, G.A., Ferenczy, G.G., Reynolds, C.A.:
Classical polarization in hybrid QM/MM methods. J. Phys. Chem. A 110(20), 6487–6497
(2006). https://doi.org/10.1021/jp046944i
434. Altun, A., Yokoyama, S., Morokuma, K.: Spectral tuning in visual pigments: an ONIOM(QM:
MM) study on bovine rhodopsin and its mutants. J. Phys. Chem. B 112(22), 6814–6827 (2008).
https://doi.org/10.1021/jp709730b
435. Wiliam Hernandez-Rodriguez, E., Sanchez-Garcia, E., Crespo-Otero, R., Lilian Montero-
Alejo, A., Alberto Montero, L., Thiel, W.: Understanding rhodopsin mutations linked to the
retinitis pigmentosa disease: a QM/MM and DFT/MRCI Study. J. Phys. Chem. B 116(3),
1060–1076 (2012). https://doi.org/10.1021/jp2037334
436. Manathunga, M., Yang, X., Luk, H.L., Gozem, S., Frutos, L.M., Valentini, A., Ferre, N.,
Olivucci, M.: Probing the photodynamics of rhodopsins with reduced retinal chromophores.
J. Chem. Theory Comput. 12(2), 839–850 (2016). https://doi.org/10.1021/acs.jctc.5b00945
437. Gozem, S., Luk, H.L., Schapiro, I., Olivucci, M.: Theory and simulation of the ultrafast
double-bond isomerization of biological chromophores. Chem. Rev. 117(22), 13502–13565
(2017). https://doi.org/10.1021/acs.chemrev.7b00177
438. Stewart, J.J.P.: Application of localized molecular orbitals to the solution of semiempirical
self-consistent field equations. Int. J. Quantum Chem. 58(2), 133–146 (1996). https://doi.org/
10.1002/(sici)1097-461x(1996)58:2%3c133::aid-qua2%3e3.0.co;2-z
439. Daniels, A.D., Millam, J.M., Scuseria, G.E.: Semiempirical methods with conjugate gradient
density matrix search to replace diagonalization for molecular systems containing thousands
of atoms. J. Chem. Phys. 107(2), 425–431 (1997). https://doi.org/10.1063/1.474404
450 D. Latek et al.
440. Dixon, S.L., Merz, K.M.: Fast, accurate semiempirical molecular orbital calculations for
macromolecules. J. Chem. Phys. 107(3), 879–893 (1997). https://doi.org/10.1063/1.474386
441. Stewart, J.J.P.: Optimization of parameters for semiempirical methods V: modification of
NDDO approximations and application to 70 elements. J. Mol. Model. 13(12), 1173–1213
(2007). https://doi.org/10.1007/s00894-007-0233-4
442. Rezac, J., Fanfrlik, J., Salahub, D., Hobza, P.: Semiempirical quantum chemical PM6 method
augmented by dispersion and H-bonding correction terms reliably describes various types of
noncovalent complexes. J. Chem. Theory Comput. 5(7), 1749–1760 (2009). https://doi.org/
10.1021/ct9000922
443. Rezac, J., Hobza, P.: Advanced corrections of hydrogen bonding and dispersion for semiem-
pirical quantum mechanical methods. J. Chem. Theory Comput. 8(1), 141–151 (2012). https://
doi.org/10.1021/ct200751e
444. Ren, L., Martin, C.H., Wise, K.J., Gillespie, N.B., Luecke, H., Lanyi, J.K., Spudich, J.L.,
Birge, R.R.: Molecular mechanism of spectral tuning in sensory rhodopsin II. Biochemistry
40(46), 13906–13914 (2001). https://doi.org/10.1021/bi0116487
445. Lee, I., Greenbaum, E., Budy, S., Hillebrecht, J.R., Birge, R.R., Stuart, J.A.: Photoinduced
surface potential change of bacteriorhodopsin mutant D96N measured by scanning surface
potential microscopy. J. Phys. Chem. B 110(22), 10982–10990 (2006). https://doi.org/10.
1021/jp052948r
446. Stewart, J.J.P.: Application of the PM6 method to modeling proteins. J. Mol. Model. 15(7),
765–805 (2009). https://doi.org/10.1007/s00894-008-0420-y
447. Ohno, K., Kamiya, N., Asakawa, N., Inoue, Y., Sakurai, M.: Application of an integrated
MOZYME plus DFT method to pKa calculations for proteins. Chem. Phys. Lett. 341(3–4),
387–392 (2001). https://doi.org/10.1016/s0009-2614(01)00499-7
448. Yoda, M., Inoue, Y., Sakurai, M.: Effect of protein environment on pK(a) shifts in the active
site of photoactive yellow protein. J. Phys. Chem. B 107(51), 14569–14575 (2003). https://
doi.org/10.1021/jp0364102
449. Gross, K.C., Seybold, P.G., Hadad, C.M.: Comparison of different atomic charge schemes for
predicting pK(a) variations in substituted anilines and phenols. Int. J. Quantum Chem. 90(1),
445–458 (2002). https://doi.org/10.1002/qua.10108
450. Mulliken, R.S.: Electronic population analysis on LCAO-MO molecular wave functions.1. J.
Chem. Phys. 23(10), 1833–1840 (1955). https://doi.org/10.1063/1.1740588
451. Reed, A.E., Weinstock, R.B., Weinhold, F.: Natural-population analysis. J. Chem. Phys. 83(2),
735–746 (1985). https://doi.org/10.1063/1.449486
452. Wang, B., Ford, G.P.: Atomic charges derived from a fast and accurate method for electrostatic
potentials based on modified AM1 calculations. J. Comput. Chem. 15(2), 200–207 (1994).
https://doi.org/10.1002/jcc.540150210
453. Khan, H.M., Grauffel, C., Broer, R., MacKerell Jr., A.D., Havenith, R.W., Reuter, N.: Improv-
ing the force field description of tyrosine-choline cation-pi interactions: QM investigation of
Phenol-N(Me)4(+) interactions. J. Chem. Theory Comput. 12(11), 5585–5595 (2016). https://
doi.org/10.1021/acs.jctc.6b00654
454. Morris, G.M., Goodsell, D.S., Halliday, R.S., Huey, R., Hart, W.E., Belew, R.K., Olson, A.J.:
Automated docking using a Lamarckian genetic algorithm and an empirical binding free
energy function. J. Comput. Chem. 19(14), 1639–1662 (1998)
455. Bikadi, Z., Hazai, E.: Application of the PM6 semi-empirical method to modeling proteins
enhances docking accuracy of AutoDock. J. Cheminform. 1 (2009). https://doi.org/10.1186/
1758-2946-1-15
456. Fanfrlik, J., Bronowska, A.K., Rezac, J., Prenosil, O., Konvalinka, J., Hobza, P.: A reliable
Docking/scoring scheme based on the semiempirical quantum mechanical PM6-DH2 method
accurately covering dispersion and H-bonding: HIV-1 protease with 22 ligands. J. Phys. Chem.
B 114(39), 12666–12678 (2010). https://doi.org/10.1021/jp1032965
457. Sharma, V., Belevich, G., Gamiz-Hernandez, A.P., Rog, T., Vattulainen, I., Verkhovskaya,
M.L., Wikstrom, M., Hummer, G., Kaila, V.R.: Redox-induced activation of the proton pump
in the respiratory complex I. Proc Natl Acad Sci USA 112(37), 11571–11576 (2015). https://
doi.org/10.1073/pnas.1503761112
Modeling of Membrane Proteins 451
458. Maffeo, C., Bhattacharya, S., Yoo, J., Wells, D., Aksimentiev, A.: Modeling and simulation
of ion channels. Chem. Rev. 112(12), 6250–6284 (2012). https://doi.org/10.1021/cr3002609
459. Kutzner, C., Kopfer, D.A., Machtens, J.P., de Groot, B.L., Song, C., Zachariae, U.: Insights
into the function of ion channels by computational electrophysiology simulations. Biochim.
Biophys. Acta 1858(7 Pt B), 1741–1752 (2016). https://doi.org/10.1016/j.bbamem.2016.02.
006
460. Sadhu, B., Sundararajan, M., Bandyopadhyay, T.: Selectivity of a singly permeating ion in
nonselective NaK channel: combined QM and MD based investigations. J. Phys. Chem. B
119(40), 12783–12797 (2015). https://doi.org/10.1021/acs.jpcb.5b05996
Peptide Folding in Cellular
Environments: A Monte Carlo
and Markov Modeling Approach
1 Introduction
In the crowded interior of living cells, proteins are surrounded by high concentrations
of macromolecules. For instance, the cytosol of Escherichia coli bacteria has been
estimated to contain 300–400 g/L of proteins and RNA [1]. However, biophysical
studies of proteins are usually conducted in dilute solutions. A fundamental and
long-standing question, therefore, is how macromolecular crowding affects reactions
such as protein folding, binding and aggregation. This question is currently being
intensely studied by both experimental [2, 3] and computational [4, 5] methods.
Most computational/theoretical studies so far focused on the universal excluded-
volume effect [6, 7], which is independent of the precise nature of the crowders. This
effect favors reactions that increase the available volume, such as the folding of a
globular protein to its compact native state, or the binding of proteins to each other.
Its implications have been extensively studied through simulations, typically with
hard spheres as crowders [8–19]. In particular, it was shown that volume exclusion
can lead to a significant stabilization of globular proteins, depending on the size and
density of the crowders [8, 11, 13]. Moreover, good agreement was found between
simulations with hard-sphere crowders and experiments with inert crowders [10].
While universal, the excluded-volume effect need not dominate the interaction of a
protein with surrounding macromolecules. In fact, both stabilization and destabiliza-
tion of globular proteins have been observed in experiments conducted in cells and
concentrated protein solutions [20, 21]. However, the precise nature of the non-steric
effects involved remains incompletely understood.
Recent years have seen increasing efforts to conduct protein simulations with
explicit crowder molecules [22, 23], rather than hard-sphere crowders. One approach
is to build crowding environments mimicking cellular conditions [24]. A recent exam-
ple is the detailed and extensive model of a bacterial cytoplasm (Mycoplasma geni-
talium) developed by Feig et al. [25], which includes proteins, RNAs, protein/RNA
complexes, metabolites, ions as well as explicit solvent molecules. Another approach
is to use simplified homogeneous crowding environments [26–28], as in experiments
conducted in concentrated protein solutions. In this case, the number of crowder
molecules can be smaller, so that larger timescales can be reached. A common choice
is to have around ten crowder molecules. Nevertheless, even with a moderate number
of crowder molecules, examining the conformational properties of the test protein in
question represents a challenge.
In this article, we summarize some recent Monte Carlo (MC) studies of peptide
folding in the presence of explicit protein crowders [29–32], performed by us using an
all-atom protein model along with an implicit solvent force field. The peptides studied
are the compact α-helical trp-cage [33] and the β-hairpin-forming GB1m3 [34]. Each
peptide is studied using two different crowding agents, namely bovine pancreatic
trypsin inhibitor (BPTI) and the B1 domain of streptococcal protein G (GB1). Both
these proteins are thermally highly stable [35, 36] and therefore modeled using a
fixed-backbone approximation, whereas the peptides are free to fold and unfold in
the simulations.
A challenge when analyzing data from crowding simulations is in identifying
the relevant states and dynamical modes, which may not be easily anticipated. Two
methods that can be used to tackle this problem are time-lagged independent compo-
nent analysis (TICA) [37–40] and Markov state modeling [41–45]. These methods
have in recent years found widespread use in studies of biomolecular processes such
as folding and binding [46, 47]. In this article, we briefly discuss the results obtained
Peptide Folding in Cellular Environments: A Monte Carlo … 455
when using these techniques to elucidate the interplay between peptide folding and
peptide-crowder interactions in our simulations of the β-hairpin-forming GB1m3
peptide [32].
This article is organized as follows. Section 2 briefly describes the systems stud-
ied and our computational methodology. Section 3 gives an overview of our main
findings. The article ends with a brief summary in Sect. 4.
2 Methods
This section describes the simulated systems and outlines the biophysical model,
sampling techniques and data analysis methods used.
Throughout this article, we consider systems consisting of one test molecule (trp-
cage or GB1m3) and eight crowder molecules (BPTI or GB1), confined to a cubic
box and subject to periodic boundary conditions. The crowder density is around 100
g/L. This value is somewhat lower than that for the E. coli cytosol mentioned earlier,
but sufficiently high for the presence of the crowders to have significant effects on
the test peptides in the simulations (see below). The volume fraction occupied by the
crowders is around 7%.
The trp-cage peptide is a designed mini-protein with 20 residues [33]. Its NMR-
derived native fold is compact and helical. The 16-residue GB1m3 peptide is an
optimized variant of the second β-hairpin (residues 41–56) in protein GB1, with
enhanced stability [34]. It differs from the original sequence at 7 of the 16 positions.
To our knowledge, no experimental structure is available for GB1m3, but its native
fold is expected to be similar to the parent β-hairpin in GB1.
Both proteins used as crowders, BPTI and GB1, are small but thermally highly
stable [35, 36], with 58 and 56 residues, respectively.
and charge (E sc ). This potential is an effective energy function for protein folding
simulations, parameterized through folding thermodynamics studies for a structurally
diverse set of peptides and small proteins [48, 49]. In multi-chain simulations, inter-
molecular interaction terms are taken to have the same form and strength as the
corresponding intramolecular ones.
The model has been applied to study folding/unfolding properties of several pro-
teins with >90 residues [50–55]. Previous applications also include simulations of
peptide aggregation [56–60].
As indicated above, the thermally highly stable BPTI and GB1 proteins are mod-
eled with side-chain rotations as their only internal degrees of freedom; their back-
bones are held fixed in the simulations. The assumed backbone conformations are
model approximations of the crystal structures (PDB codes 4PTI and 2GB1), derived
by MC with minimization. The structures were selected for both low energy and high
similarity to the experimental structures. The root-mean-square deviations from the
experimental structures were 1 Å.
2.3 MC Simulations
The model described above is implemented into the open source MC simulation code
PROFASI [61]. All simulations discussed below were run with this program, using
both vector and thread parallelization.
The efficiency with which the conformational space is sampled in a MC simulation
depends critically on the move set used. Our simulations are based on the following
four elementary moves: (i) pivot-type rotation about individual backbone bonds, (ii) a
semi-local backbone update, Biased Gaussian Steps (BGS) [62], involving concerted
rotation of up to eight angles, (iii) rotation of individual side-chain angles, and (iv)
rigid-body translation or rotation of whole chains. The pivot move can generate
large-scale deformations of a chain, and can, despite its simplicity, be very useful
for unfolded chains in implicit solvent. The semi-local BGS move is an important
complement to the pivot update, especially for folded chains. There are also strictly
local torsion-angle updates available [63, 64], but the computationally convenient
BGS move works well for the peptides studied in this article.
A potentially valuable addition to the move set above would be to include rigid-
body motion of whole clusters of interacting molecules, based, for example, on the
stochastic cluster construction procedure in [65, 66].
The simulations discussed in this article are of two types. Our first set of simu-
lations focuses entirely on the equilibrium thermodynamics of the systems. These
simulations use the full move set described above (i–iv), and the replica exchange,
or parallel tempering, technique [67]. This method, and extensions of it [68–70], are
often used with the aim to enhance the sampling efficiency. Here, we used replica
exchange primarily as a convenient method to study a range of temperatures in a
single simulation.
Peptide Folding in Cellular Environments: A Monte Carlo … 457
TICA and MSM methods are becoming increasingly popular tools for analyzing
biomolecular simulations, and several software packages are available for this kind
of analysis [71–74]. The calculations discussed in this article were done using the
pyEMMA software [71].
TICA can be used as a dimensionality reduction method. It is somewhat similar
to principal component analysis, but identifies high-autocorrelation (or slow) rather
than high-variance coordinates. Given time trajectories of a set of observables, {on },
one constructs the time-lagged covariance matrix cnm (τcm ) = on (t)om (t + τcm )t −
on (t)t om (t + τcm )t , where τcm is the lag time and ·t denotes an average over
time t. By solving the generalized eigenvalue problem C(τcm )v̂i = λ̂i C(0)v̂i , slow
linear combinations of the original observables can be identified.
To build an MSM, the state space needs to be discretized. In our calculations,
following [40], the discretization is achieved by clustering the data with the k-means
algorithm [75] in a low-dimensional subspace spanned by slow TICA coordinates. By
computing the probabilities of transition among these clusters in a time τtm (which,
like τcm , is an adjustable parameter), a transition matrix is obtained. Assuming Marko-
vian dynamics, the eigenvectors of this matrix have relaxation times given by
where 1 = λ̃0 > λ̃1 ≥ λ̃2 ≥ · · · > 0 are the eigenvalues. The eigenvalue λ̃0 corre-
sponds to a stationary distribution (t˜0 = ∞), whereas all other eigenvalues corre-
spond to relaxation modes with finite timescales t˜i . The timescales obtained using
Eq. (1) are expected to reproduce the dominant relaxation times of the full system if
the discretization is sufficiently fine [76, 77], or if the lag time is sufficiently large
[77, 78]. However, for a given discretization and a given lag time, the use of Eq. (1)
may entail significant systematic errors.
Another way of estimating the relaxation times of the MSM eigenfunctions is
by computing their autocorrelations. The (normalized) autocorrelation function of a
general property f is given by C f (τ ) = [ f (t) f (t + τ )t − f (t)t f (t + τ )t ]/σ 2f ,
458 D. Nilsson et al.
where σ 2f is the variance of f . Let ψiMSM be the ith eigenfunction of a given MSM,
and let ψi be the true ith eigenfunction of the system’s time transfer operator [45].
The autocorrelation function of ψiMSM , Ci (τ ), may be expanded as
Ci (τ ) = c j e−τ/t j (2)
j
where c j = |ψ j , ψiMSM |2 and t j is the exact jth relaxation time. Now, if ψiMSM
is a good approximation of ψi , then c j ci for j = i. If this holds, Ci (τ ) decays
approximately as e−τ/ti for not too large τ (compared to ti ), so that ti can be estimated
through a simple exponential fit. In the calculations discussed below, we used data
for Ci (τ ) in the range of τ where 0.2 < Ci (τ ) < 0.8. Over this range, Ci (τ ) was
approximately single exponential for all MSM eigenfunctions studied. It is worth
noting that the upper bound on τ is set primarily by statistical uncertainties, rather
than by deviations from single-exponential behavior.
3 Results
This section briefly describes the main findings of our studies of the trp-cage and
GB1m3 peptides in the presence of protein crowders (BPTI or GB1) [29–32]. The
first two subsections describe results obtained using the replica-exchange method.
The final third subsection discusses findings obtained by applying TICA and MSM
techniques to data from constant-temperature simulations.
Using replica exchange with a wide range of temperatures, the folding thermodynam-
ics of the trp-cage and GB1m3 peptides were studied under the following conditions:
with BPTI crowders, with GB1 crowders, with hard-sphere crowders, and without
crowders. The three systems with crowders had the same number of crowders, eight,
and the same box size, (95 Å)3 . However, the volume of the hard spheres was taken
approximately three times larger than that of the BPTI and GB1 molecules, to enhance
the otherwise very weak effects of these crowders.
Figure 1 compares the behavior of trp-cage in the different simulated environ-
ments. To this end, the temperature dependence of four structural properties of trp-
cage are shown, namely the helix content, the radius of gyration, the root-mean-square
deviation from the native structure, and the end-to-end distance. The effects of the
purely steric crowders are, despite their larger size, modest. As expected, the effects
are largest at high temperatures, where the peptide is unfolded and requires the most
Peptide Folding in Cellular Environments: A Monte Carlo … 459
(a) (b)
(c) (d)
Fig. 1 Folding thermodynamics of trp-cage without crowders (red line), with hard-sphere crowders
(red dashes), with BPTI crowders (blue), and with GB1 crowders (magenta). The properties shown
are a the helix content, H , b the radius of gyration, Rg , c the root-mean-square deviation from the
native state, , and d the end-to-end distance, Ree . Reproduced from [30], with the permission of
AIP Publishing
volume. The smaller protein crowders cause only tiny changes at these temperatures.
At low temperatures, the BPTI and GB1 crowders tend to distort the native structure
of trp-cage. In the GB1 case, this effect is weak but noticeable, and in line with a
previous molecular dynamics-based study [27]. In the BPTI case, the distortion is
easily visible, especially from the data for the end-to-end distance (Fig. 1d). BPTI
interacts primarily with the C-terminal tail of trp-cage (see below), and this interac-
tion prevents a native-like packing of this part against the N-terminal α-helix, which
leads to an increased end-to-end distance.
Figure 2 shows a similar compilation of data from the GB1m3 simulations. When
adding hard-sphere crowders, the response of GB1m3 resembles that of trp-cage.
However, GB1m3 responds differently than trp-cage upon the addition of BPTI or
GB1 crowders. While distorting the trp-cage fold, these crowders have a stabilizing
effect on GB1m3 (Fig. 2c). A comparison with the results obtained using hard-sphere
crowders shows that this stabilization cannot be explained in terms of steric inter-
actions alone. Rather, the main cause is the ability of the folded GB1m3 to interact
favorably with both BPTI and GB1. The results obtained with BPTI crowders suggest
an increase in the melting temperature of GB1m3 by as much as roughly 15 K.
460 D. Nilsson et al.
(a) (b)
(c) (d)
Fig. 2 Folding thermodynamics of GB1m3 without crowders (red line), with hard-sphere crowders
(red dashes), with BPTI crowders (blue), and with GB1 crowders (magenta). The properties shown
are a the strand content, S, b the radius of gyration, Rg , c a hydrogen bond-based measure of
nativeness, q, and d the end-to-end distance, Ree . Reproduced from [30], with the permission of
AIP Publishing
The above comparison with data obtained using hard spheres strongly indicate that
attractive peptide-crowder interactions play an important role in the systems with
protein crowders. Insight into the nature of these attractive interactions can be gained
by computing test peptide-crowder protein residue-pair contact maps. Figure 3 shows
contact maps for all the four test peptide-crowder protein combinations studied,
calculated at the melting temperatures of the respective free peptides, where the
peptides sample a wide spectrum of conformations.
The contact maps reveal that both BPTI and GB1 have specific surface patches
that dominate their interaction with the peptides. A large majority of the contacts
formed by BPTI involve a hydrophobic surface patch centered around its proline
residues Pro8 and Pro9. On GB1, which contains a four-stranded β-sheet, a similar,
although somewhat less dominant, role is played by the two edge strands.
Peptide Folding in Cellular Environments: A Monte Carlo … 461
Fig. 3 Test peptide-crowder protein residue-pair contact maps for the simulated trp-cage–BPTI (left
upper panel), trp-cage–GB1 (left lower panel), GB1m3-BPTI (right upper panel) and GB1m3-GB1
(right lower panel) systems, calculated at the melting temperatures of the respective free peptides.
The color indicates the average number of contacts that a given residue in the test peptide forms
with residues in a given position in any of the eight crowder proteins. Note the differences in scale.
Two residues are in contact if their Cα atoms are within 8 Å from each other. Red lines indicate
the hydrophobic surface patch of BPTI mentioned in the text and the two edge strands of GB1.
Reproduced from [30], with the permission of AIP Publishing
The previous two subsections dealt separately with the folding properties of the pep-
tides and their interactions with the crowders. For a proper understanding of the
systems, one also has to analyze the interplay between peptide folding and peptide-
crowder interactions. To this end, one needs to identify suitable coordinates in a
high-dimensional space with both intra- and intermolecular degrees of freedom,
which are not easy to guess. A possible approach to this problem is to use TICA
and MSM techniques. These methods have proven useful for analyzing biomolec-
ular simulations [46, 47], but the systems studied were typically relatively small.
Recently, we tested the usefulness of these methods for analyzing data from crowd-
ing simulations, by applying them to data from constant-temperature simulations of
GB1m3 with BPTI and GB1 crowders [32].
This analysis used time trajectories for a broad set of observables, consisting of
all (non-constant) intramolecular Cα -Cα distances within the peptide as well as a
collection of intermolecular distances between the peptide and the crowders, called
di j . Specifically, di j was defined as the shortest Cα -Cα (periodic) distance between
462 D. Nilsson et al.
peptide residue i and residue j in any of the eight crowder molecules. The total
number of intra- and intermolecular distances used as input for the analysis was
around 1000 for each of the two systems studied. Using TICA, a handful of slow
linear combinations of these observables were identified in each system.
The slow TICA coordinates turned out to be capable of separating the major free-
energy minima of the peptide. Additionally, the slow TICA coordinates were used to
define a low-dimensional subspace in which the simulated conformations could be
efficiently clustered. After this discretization, MSMs were built and used to estimate
the dominant (longest) relaxation times. Relaxation times can be conveniently esti-
mated from the MSM eigenvalues via Eq. (1), which, however, assumes Markovian
dynamics. Unfortunately, the results obtained this way showed a strong dependence
on the lag time τtm . A more direct way of estimating relaxation times from the MSMs
is to measure and analyze the autocorrelations of the eigenfunctions. It turned out
that fits to autocorrelation data for the MSM eigenfunctions yield much more robust
relaxation time estimates, with essentially no τtm dependence. A detailed discussion
of these findings can be found in [32].
4 Concluding Remarks
Acknowledgements The work discussed in this article was in part supported by the Swedish
Research Council (Grant no. 621-2014-4522) and the Swedish strategic research program eSSENCE.
The simulations were performed on resources provided by the Swedish National Infrastructure for
Computing (SNIC) at LUNARC, Lund University, Sweden, and Jülich Supercomputing Centre,
Forschungszentrum Jülich, Germany.
References
20. Miklos, A.C., Sarkar, M., Wang, Y., Pielak, G.J.: Protein crowding tunes protein stability. J.
Am. Chem. Soc. 133, 7116 (2011)
21. Guzman, I., Gelman, H., Tai, J., Gruebele, M.: The extracellular protein VlsE is destabilized
inside cells. J. Mol. Biol. 426, 11 (2014)
22. Feig, M., Yu, I., Wang, P.H., Nawrocki, G., Sugita, Y.: Crowding in cellular environments at
an atomistic level from computer simulations. J. Phys. Chem. B 121, 8009 (2017)
23. Qin, S., Zhou, H.X.: Protein folding, binding, and droplet formation in cell-like conditions.
Curr. Opin. Struct. Biol. 43, 28 (2017)
24. McGuffee, S.R., Elcock, A.H.: Diffusion, crowding & protein stability in a dynamic molecular
model of the bacterial cytoplasm. PLOS Comput. Biol. 6, e1000694 (2010)
25. Yu, I., Mori, T., Ando, T., Harada, R., Jung, J., Sugita, Y., Feig, M.: Biomolecular interactions
modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm.
eLife 5, 18457 (2016)
26. Feig, M., Sugita, Y.: Variable interactions between protein crowders and biomolecular solutes
are important in understanding cellular crowding. J. Phys. Chem. B 116, 599 (2012)
27. Predeus, A.V., Gul, S., Gopal, S.M., Feig, M.: Conformational sampling of peptides in the
presence of protein crowders from AA/CG-multiscale simulations. J. Phys. Chem. B 116,
8610 (2012)
28. Macdonald, B., McCarley, S., Noeen, S., van Giessen, A.E.: Protein–protein interactions affect
alpha helix stability in crowded environments. J. Phys. Chem. B 119, 2956 (2015)
29. Bille, A., Linse, B., Mohanty, S., Irbäck, A.: Equilibrium simulation of trp-cage in the presence
of protein crowders. J. Chem. Phys. 143, 175102 (2015)
30. Bille, A., Mohanty, S., Irbäck, A.: Peptide folding in the presence of interacting protein crow-
ders. J. Chem. Phys. 144, 175105 (2016)
31. Irbäck, A., Mohanty, S.: Protein folding/unfolding in the presence of interacting macromolec-
ular crowders. Eur. Phys. J. - Spec. Top. 226, 627 (2017)
32. Nilsson, D., Mohanty, S., Irbäck, A.: Markov modeling of peptide folding in the presence of
protein crowders. J. Chem. Phys. 148, 055101 (2018)
33. Neidigh, J.W., Fesinmeyer, R.M., Andersen, N.H.: Designing a 20-residue protein. Nat. Struct.
Biol. 9, 425 (2002)
34. Fesinmeyer, R.M., Hudson, F.M., Andersen, N.H.: Enhanced hairpin stability through loop
design: the case of the protein g b1 domain hairpin. J. Am. Chem. Soc. 126, 7238 (2004)
35. Moses, E., Hinz, H.J.: Basic pancreatic trypsin inhibitor has unusual thermodynamic stability
parameters. J. Mol. Biol. 170, 765 (1983)
36. Gronenborn, A.M., Filpula, D.R., Essig, N.Z., Achari, A., Whitlow, M., Wingfield, P.T., Clore,
G.M.: A novel, highly stable fold of the immunoglobulin binding domain of streptococcal
protein G. Science 253, 657 (1991)
37. Molgedey, L., Schuster, H.G.: Separation of a mixture of independent signals using time delayed
correlations. Phys. Rev. Lett. 72, 3634 (1994)
38. Naritomi, Y., Fuchigami, S.: Slow dynamics of a protein backbone in molecular dynamics
simulation revealed by time-structure based independent component analysis. J. Chem. Phys.
139, 215102 (2013)
39. Schwantes, C.R., Pande, V.S.: Improvements in Markov state model construction reveal many
non-native interactions in the folding of NTL9. J. Chem. Theor. Comput. 9, 2000 (2013)
40. Pérez-Hernández, G., Paul, F., Giorgino, T., De Fabritiis, G., Noé, F.: Identification of slow
molecular order parameters for Markov model construction. J. Chem. Phys. 139, 015102 (2013)
41. Schütte, C., Fischer, A., Huisinga, W., Deuflhard, P.: A direct approach to conformational
dynamics based on Hybrid Monte Carlo. J. Comput. Phys. 151, 146 (1999)
42. Chodera, J.D., Singhal, N., Pande, V.S., Dill, K.A., Swope, W.C.: Automatic discovery of
metastable states for the construction of markov models of macromolecular conformational
dynamics. J. Chem. Phys. 126, 155101 (2007)
43. Buchete, N.V., Hummer, G.: Coarse master equations for peptide folding dynamics. J. Phys.
Chem. B 112, 6057 (2008)
Peptide Folding in Cellular Environments: A Monte Carlo … 465
44. Bowman, G.R., Beauchamp, K.A., Boxer, G., Pande, V.S.: Progress and challenges in the
automated construction of Markov state models for full protein systems. J. Chem. Phys. 131,
124101 (2009)
45. Prinz, J.H., Wu, H., Sarich, M., Keller, B., Senne, M., Held, M., Chodera, J.D., Schütte, C.,
Noé, F.: Markov models of molecular kinetics: generation and validation. J. Chem. Phys. 134,
174105 (2011)
46. Chodera, J.D., Noé, F.: Markov state models of biomolecular conformational dynamics. Curr.
Opin. Struct. Biol. 25, 135 (2014)
47. Noé, F., Clementi, C.: Collective variables for the study of long-time kinetics from molecular
trajectories: theory and methods. Curr. Opin. Struct. Biol. 43, 141 (2017)
48. Irbäck, A., Mitternacht, S., Mohanty, S.: An effective all-atom potential for proteins. BMC
Biophys. 2, 2 (2009)
49. Irbäck, A., Mohanty, S.: Folding thermodynamics of peptides. Biophys. J. 88, 1560 (2005)
50. Mitternacht, S., Luccioli, S., Torcini, A., Imparato, A., Irbäck, A.: Changing the mechanical
unfolding pathway of FnIII10 by tuning the pulling strength. Biophys. J. 96, 429 (2009)
51. Jónsson, S.Æ., Mohanty, S., Irbäck, A.: Distinct phases of free α-synuclein – a Monte Carlo
study. Proteins 80, 2169 (2012)
52. Mohanty, S., Meinke, J.H., Zimmermann, O.: Folding of Top7 in unbiased all-atom Monte
Carlo simulations. Proteins 81, 1446 (2013)
53. Bille, A., Jónsson, S.Æ., Akke, M., Irbäck, A.: Local unfolding and aggregation mechanisms
of SOD1 – a Monte Carlo exploration. J. Phys. Chem. B 117, 9194 (2013)
54. Jónsson, S.Æ., Mitternacht, S., Irbäck, A.: Mechanical resistance in unstructured proteins.
Biophys. J. 104, 2725 (2013)
55. Petrlova, J., Bhattacherjee, A., Boomsma, W., Wallin, S., Lagerstedt, J.O., Irbäck, A.: Confor-
mational and aggregation properties of the 1–93 fragment of apolipoprotein A-I. Protein Sci.
23, 1559 (2014)
56. Favrin, G., Irbäck, A., Mohanty, S.: Oligomerization of amyloid Aβ16−22 peptides using hydro-
gen bonds and hydrophobicity forces. Biophys. J. 87, 3657 (2004)
57. Cheon, M., Chang, I., Mohanty, S., Luheshi, L.M., Dobson, C.M., Vendruscolo, M., Favrin,
G.: Structural reorganisation and potential toxicity of oligomeric species formed during the
assembly of amyloid fibrils. PLOS Comput. Biol. 3, e173 (2007)
58. Irbäck, A., Mitternacht, S.: Spontaneous β-barrel formation: an all-atom Monte Carlo study of
Aβ(16–22) oligomerization. Proteins 71, 207 (2008)
59. Li, D., Mohanty, S., Irbäck, A., Huo, S.: Formation and growth of oligomers: a Monte Carlo
study of an amyloid tau fragment. PLOS Comput. Biol. 4, e1000238 (2008)
60. Mitternacht, S., Staneva, I., Härd, T., Irbäck, A.: Monte Carlo study of the formation and
conformational properties of dimers of aβ42 variants. J. Mol. Biol. 410, 357 (2011)
61. Irbäck, A., Mohanty, S.: PROFASI: a Monte Carlo simulation package for protein folding and
aggregation. J. Comput. Chem. 27, 1548 (2006)
62. Favrin, G., Irbäck, A., Sjunnesson, F.: Monte Carlo update for chain molecules: biased Gaussian
steps in torsional space. J. Chem. Phys. 114, 8154 (2001)
63. Dodd, L.R., Boone, T.D., Theodorou, D.N.: A concerted rotation algorithm for atomistic Monte
Carlo simulation of polymer melts and glasses. Mol. Phys. 78, 961 (1993)
64. Zamuner, S., Rodriguez, A., Seno, F., Trovato, A.: An efficient algorithm to perform local
concerted movements of a chain molecule. PLOS One 10, e0118342 (2015)
65. Irbäck, A., Jónsson, S.Æ., Linnemann, N., Linse, B., Wallin, S.: Aggregate geometry in amyloid
fibril nucleation. Phys. Rev. Lett. 110, 058101 (2013)
66. Irbäck, A., Wessén, J.: Thermodynamics of amyloid formation and the role of intersheet inter-
actions. J. Chem. Phys. 143, 105104 (2015)
67. Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin glasses. Phys. Rev. Lett.
57, 2607 (1986)
68. Neuhaus, T., Hager, J.S.: Free-energy calculations with multiple Gaussian modified ensembles.
Phys. Rev. E 74, 036702 (2006)
466 D. Nilsson et al.
69. Kim, J., Straub, J.E.: Generalized simulated tempering for exploring strong phase transitions.
J. Chem. Phys. 133, 154101 (2010)
70. Lindahl, V., Lidmar, J., Hess, B.: Accelerated weight histogram method for exploring free
energy landscapes. J. Chem. Phys. 141, 044110 (2014)
71. Scherer, M.K., Trendelkamp-Schroer, B., Paul, F., Pérez-Hernández, G., Hoffmann, M., Plat-
tner, N., Wehmeyer, C., Prinz, J.H., Noé, F.: PyEMMA 2: a software package for estimation,
validation, and analysis of Markov models. J. Chem. Theor. Comput. 11, 5525 (2015)
72. Seeber, M., Felline, A., Raimondi, F., Muff, S., Friedman, R., Rao, F., Caflisch, A., Fanelli,
F.: Wordom: A user-friendly program for the analysis of molecular structures, trajectories, and
free energy surfaces. J. Comput. Chem. 32, 1183 (2010)
73. Biarnés, X., Pietrucci, F., Marinelli, F., Laio, A.: METAGUI. A VMD interface for analyzing
metadynamics and molecular dynamics simulations. Comput. Phys. Commun. 183, 203 (2012)
74. Harrigan, M.P., Sultan, M.M., Hernández, C.X., Husic, B.E., Eastman, P., Schwantes, C.R.,
Beauchamp, K.A., McGibbon, R.T., Pande, V.S.: MSMBuilder: statistical models for biomolec-
ular dynamics. Biophys. J. 112, 10 (2017)
75. Lloyd, S., Trans, I.E.E.E.: Least squares quantization in PCM. Inf. Theor. 28, 129 (1982)
76. Kube, S., Weber, M.: A coarse graining method for the identification of transition rates between
molecular conformations. J. Chem. Phys. 126, 024103 (2007)
77. Djurdjevac, N., Sarich, M., Schütte, C.: Estimating the eigenvalue error of Markov state models.
Multiscale Model. Simul. 10, 61 (2012)
78. Prinz, J.H., Chodera, J.D., Noé, F.: Spectral rate theory for two-state kinetics. Phys. Rev. X 4,
011020 (2014)
Molecular Dynamics Studies
on Amyloidogenic Proteins
mental results and suggest a mechanism of elongation for the fibril protofilament
formation.
1 Introduction
Fig. 1 Formation of proteins and various paths of denatured or partly unfolded proteins (according
to [6])
protein “factory” in the cell. Proteins are folded in the ribosome within seconds, thus
gaining their secondary structure. Tertiary structure is obtained within a few minutes
in cytosol or endoplasmic reticulum [4]. This process is assisted by additional enzy-
matic proteins (so called chaperones) and disulfide isomerases. Before the protein is
transported to its final destination it is subjected to a control that rejects misfolded
proteins. Properly folded proteins get into the Golgi apparatus, from where they are
directed into the cytosol. Misfolded proteins are intercepted by the proteosome and
“digested” by a group of proteolytic enzymes. If a misfolded protein is not intercepted
and “digested” on time, it gets into the cell, where it can be “repaired”. Unfortunately,
despite sophisticated quality control system, formation of aggregates and amyloid
deposits may occur.
Oligomerization of proteins may occur spontaneously. This process can have
physiological functions or can be an adverse phenomenon. Both pathological and
normal processes are based on the same mechanisms. It is generally accepted that in
the initial stages of formation of amyloid fibrils monomeric proteins show a partly
unfolded conformation caused by partial denaturation or misfolding. As a result,
470 S. Rodziewicz-Motowidło et al.
hydrophobic areas of the protein become exposed to the solvent, which promotes
aggregation. The first stage of amyloid fibrils formation, a common stage of all
processes of amyloid aggregation, is the formation of meta-stable oligomers. During
the phase preceding the amyloid formation, some amyloid proteins have been shown
to form round, non-fibrillar structures resembling little “tires”. “Tires” can form
channels (pores) in the cell membrane. It is now considered that, instead of amyloid
fibrils—they are the toxic and pathogenic factor. The round intermediate structures
have been observed during formation of amyoid fibrils of: amyloid β peptide (Aβ)
[5], transthyretin [6], insulin [7], β2-microglobulin [8], immunoglobulin light chains
[9], lisozyme [10] and cytostatin C [11]. It is assumed that aggregates share common
structural features, since oligomers formed through aggregation of different proteins
bind specific anti-oligomer antibodies [12]. On the other hand, however, they show
different characteristics, because in some cases the oligomers maintain the original,
native structure of monomers to some extent [13]. There are premises that oligomer
forms are highly cytotoxic, and it was claimed that they are the main pathogenic factor
in many amyloidoses. The second stage of amyloid fibrils formation is aggregation of
the above mentioned oligomers [14], which leads to formation of amyloid fibrils or
amorphic deposits, so called inclusion bodies detected for example in the Parkinson’s
disease.
Experimental studies of the protein aggregation are unfortunately limited by the non-
crystallizable structure of aggregates, their insolubility in water and often by their
involvement in the cell membrane. These difficulties have stimulated the usage of
computational methods in the studies of amyloid structure as well as development
of new experimental methods as well as intensive efforts to match computational
results with the results of experimental investigations. The number of papers pub-
lished on simulations of amyloidogenic proteins has increased rapidly during the last
decade. The simulation systems covered a range from simple peptides (Alzheimer Aβ
peptides or peptides being fragments of amyloidogenic proteins) [15–18], to large
proteins (transthyretin, prion protein, cystatin C, β2-microglobulin etc.) [19–25].
In studies of aggregation, very important is the complementarity and comparison
of results of experimental and computational studies. Computational simulations
constitute an “analytical tool” to explain the mechanisms of amyloidogenesis, and
to understand the role amino-acid sequences in amyloidogenic proteins. The the-
oretical methods for prediction of protein aggregation propensities from primary
sequence have been proposed [26–28]. The computational methods can predict puta-
tive aggregation-prone regions (“hot-spots”) within a protein sequence, determina-
tion of which is very expensive and time-consuming experimentally. In general the
in silico simulations increase our understanding of the protein aggregation process.
Molecular Dynamics Studies on Amyloidogenic Proteins 471
In the last years, many theoretical methods to model fibril formation have been
applied, but most of the simulation studies aimed at understanding the molecular
mechanism of protein aggregation. Currently, MD simulations are the major com-
putational tool used to help define the structure of many molecular systems, amyloid
proteins, as well as fibrils. MD is now an important tool for understanding confor-
mational and aggregation phenomena at the molecular level [29].
Different algorithms and parameters have been used, depending on the problems
to be solved. Explicit treatment of solvation [30, 31] or Generalized Born solvation
model [32] are used. Traditional all-atom models with explicit water or other solvent
simulations are used most often to test the stability of β-structures in amyloid fibrils
and oligomers [17, 33]. Simulations can be also used to study the conformational
changes of native or intermediate states triggering the amyloid fibril formation [34,
35]. Because in traditional MD simulations the protein molecule can get trapped
in a local minimum, the enhanced sampling techniques like replica exchange MD
(REMD) simulations are often used to overcome this problem. Replica exchange
MD simulation is particularly useful for simulating the large conformational changes
related to modeling of misfolded protein associations [36, 37]. The conformational
changes of proteins and aggregation processes were also simulated with discrete
molecular dynamics (DMD) [15, 30], or the ‘activation-relaxation technique’ [31].
The coarse-grained models in a number of resolutions were applied for modeling pro-
tein folding and aggregation, because in the case of these methods one can use longer
simulation time at lower computational cost compared with all-atom simulations [38,
39]. One case of usage of united-atom MD simulations are the investigations of Aβ
peptide folding [40]. The coarse-grained models (united-atom, united-residue, or
other) were also applied to study protein aggregation [39, 41]. Another method used
simplified models in which the polypeptide chain is represented by a tube and the
interactions between amino acids are determined by geometry and symmetry [42].
Very interesting results were obtained by combining the REMD technique with a
united-residue model to study the Aβ peptide aggregation process [43]. Using this
method Smith and Hall proposed the description of the mechanism of fibrillization
growth at the molecular level (see Fig. 2).
as well as emphasized greater impact of low pH on prion stability [49]. In line with
that the spectroscopic data shows the strong pH dependence of PrP stability and the
conformation [74–78]. An equilibrium unfolding intermediate of PrP125–228 that
shows similar spectral characteristics as β-sheet proteins has been observed exclu-
sively at acidic pH [74]. Both the acidic, and the high-temperature environment can
lead to a partial unfolding of the PrP protein. MD simulations point to the high
flexibility of the loop 167–171 and the loop between helix 2 and helix 3. The high
flexibility of these two loops may cause the characteristic instability of PrP protein
[51, 52, 71] confirmed also by NMR studies [61].
The MD studies suggested also the subtle stability of the PrP native structure and
the great impact of the disturbed electrostatic interactions on the wt conformation.
The main observed changes in conformation were the extension of the already present
β-sheet and different position and structure of helix H1 and the adjusted S1-H1 loop.
Mutation of some amino acids in prion protein can influence its conformational
transition from PrPC to PrPSc [79–82]. Human familial prion diseases are associated
with about 40 point mutations of the gene coding prion protein (PrP), with most
of them located in the globular domain of the protein [83]. Many simulations were
performed on prion protein variants involved in prion diseases, e.g. D202N, E211Q,
Q217R [79], D178N [71], protonated Asp202, and Glu196 [79]. As most of the
destabilizing mutations are connected with polar residues the special attention should
be paid to proper treatment of the electrostatic interactions. Zuegg and Gready [73]
and El-Bastawissy et al. [71] reported that the stabilization of the native structure
of PrPC could only be achieved by treating the long-range electrostatic interactions
with PME method and by neutralizing the system with counter ions.
The all-atom MD simulations of D202N, E211Q, and Q217R variants in the
third native α-helix of human PrP (see Fig. 3), show that the globular domain was
stable during the simulations of wt PrP protein and its variants with only minor
changes in the secondary structure, although increase in the solvent accessible area
was also reported. The results indicate that substitutions have subtle effects on pro-
tein structures, but influence substantially the electrostatic potential distribution.
These changes may affect intermolecular interactions and facilitate the aggrega-
tion process [79]. MD studies of D178N PrP variant by Gsponer and coworkers
showed only a slight increase in β-sheet content and no other significant structural
changes [51]. The authors suggested that the Arg164–Asp178 salt bridge did not
seem to contribute to the overall stability of mPrPC . Contrarily, the all-atom sim-
ulations of human and Syrian hamster PrPC indicated the importance of three salt
bridges (Glu146/Asp144–Arg208, Arg164–Asp178, Arg156–Glu196) for the stabil-
ity of PrPC [72]. Gu et al. investigated the roles of Glu196 and Asp202 in salt bridge
formation with MD simulations by studying the effect of their protonation [49]. In
these simulations some conformational changes like the helix 2 partial unfolding,
bending of helix 3 or elongation of the overall structure without bending of helix 3
could be observed. The results indicated that the elimination of even a single charge
at certain positions may significantly disturb the native conformation [49].
Molecular Dynamics Studies on Amyloidogenic Proteins 475
Fig. 3 NMR structure of the globular domain of wt human PrP (PDB ID: 1HJN)—residues
125–228. Secondary structure elements in the C-terminal, globular domain are labeled, and the
mutated residues analyzed in another study [79] are shown as sticks. α-helices (H1, H2, H3) and a
very short anti-parallel β-sheet (S1, S2)
3.2 Transthyretin
Fig. 4 Three-dimensional
structure of wt-TTR in the
tetrameric form. The eight
β-strands are named from A
to H. The inner sheet
(DAGH) is shown at the
front, whereas the outer sheet
(CBEF) is at the back [151]
Human cystatin C (HCC) is a small cysteine proteinase inhibitor (120 amino acids)
present in all human body fluids at physiologically relevant concentrations [100].
The physiological role of HCC is to regulate the activity of endogenous cysteine
proteases [101]. HCC monomer structure consists of a core composed of a five-
stranded antiparallel β-sheet wrapped around a central α-helix. Two hairpin loops
(L1 and L2), together with the N-terminal fragment are involved in interactions with
target proteolytic enzymes [102]. In pathological processes, HCC and its mutant
(L68Q) form part of the amyloid deposits in the brain arteries of young adults, which
leads to brain hemorrhages and finally to death of patients with Hereditary Cystatin C
478 S. Rodziewicz-Motowidło et al.
Fig. 5 Superposition of the αC atoms of the final wt (green) and L68Q (blue) cystatin C structures.
The small figure in the right corner shows the placement of Leu68 and Gln68 in a hydrophobic
pocket formed by the β-sheet and α-helix residues [25]
MD simulations of the native cystatins and its variants were used as a tool to
analyze the influence of a single-point mutation on the secondary and tertiary con-
formation [25, 110–112]. The MD results at the temperature of 300 K [111] or 308 K
[25] indicate that L68Q cystatin C monomer undergoes substantially bigger struc-
tural changes during the simulation than the wt cystatin C monomer. However, the
global structure remains native-like in both proteins, although some hydrogen bonds
between β4 and β5 strands were broken. As a result, β5 strand was destroyed in the
wt and L68Q molecules at the end of the simulations. Contradictory to the experi-
mental data [113], no significant changes in the α-helix structures of the investigated
Molecular Dynamics Studies on Amyloidogenic Proteins 479
Fig. 6 The proposed mechanism of domain swapping in monomeric HCC. a The closed-form of
monomeric HCC with a hydrophobic core intact; b partially unfolded monomeric HCC with a
disrupted hydrophobic core; c partially unfolded monomeric HCC with the central helix moving
away from the β-region; d partially unfolded monomeric HCC with the β2-L1-β3 hairpin unfolded
via destruction of three salt bridges following the “zip-up” mechanism; and e open-form structure
of monomeric HCC [111]
Val residues in L1 loop of cystatins might be important for the interactions with
the inhibited enzyme. Investigations of molecular dynamics (MD) of cystatin C
fragments containing point mutations in Val57 position confirm the significance of
this position in L1 loop of human cystatin C for loop structure [118]. We exchanged
the Val57 in L1 loop to residues known to stabilize (Asp, Asn) or destabilize (Pro)
β-turns in proteins and conducted the MD simulations on them and on wt loop.
We observed the expansion of the wt HCC L1 loop that may have been caused by
an alleviation of distortions present in the loop with Val57. During MD simulation
of HCC monomer the size of L1 loop remains stable (data not shown), which is
probably caused by the interactions with the rest of the protein not allowing the
expansion of L1 loop. The L1 loops with V57N and V57D mutations do not expand
during MD simulations whereas the loop with the V57P mutation expands to greater
extent, compared with the wt loop. It implicates that the residue in position 57 is
of great importance to the conformation of β2-L1-β3 fragment of HCC. It seems
that the conformation of Val57 residue, which is forced by the interaction with the
entire protein can be strained, has intrinsic tendency to expand the loop to change its
conformation for more favorable. In addition to the influence of L68Q mutation on the
stability of the hydrophobic part of the protein, the tendency of L1 loop to expand
may trigger the partial unfolding of HCC monomer leading to dimerization and
oligomerization. The opening of the monomeric HCC structure takes place only in
L68Q mutant or in native HCC protein under denaturating conditions. This suggest
that the strained Val57 conformation in the L1 loop of the HCC protein does not
Molecular Dynamics Studies on Amyloidogenic Proteins 481
provide a sufficient force to open the monomeric structure, but can provide such
force when combined with other mutations or under denaturing conditions [118].
Polypeptides and proteins able to form amyloid do not share any common struc-
tural features. However, amyloid deposits show homogenous morphology. X-ray
diffraction images of amyloid fibrils show characteristic reflections: meridional one
around 4.75 Å ´ and equatorian one at 10 Å [119, 120]. Such diffraction image is
characteristic to β-sheet structures, so it is generally accepted that amyloid structure
is an extended β-sheet in which β-chains are located perpendicularly to the long
axis of the fibril, and the hydrogen bonds between the main β-chains are located
in parallel to that axis. The presence of a β-structure in amyloid is confirmed by
binding of thioflavin T test. This binding is characteristic to proteins, which are rich
in β-structures. Amyloid fibrils can be also stained with Congo red which results
in apple green birefringence of polarized light [121, 122]. Fibrillar structures that
form amyloid have been investigated by electron transmission microscopy (EM) and
atomic force microscopy (AFM) [123]. It has been shown that the amyloid fibril is
an extended structure most frequently consisting of a few protofilaments of 2–5 nm
in diameter, which are twisted around each other forming fibrils of 7–13 nm in diam-
eter and 1000–1600 nm long [124]. Protofibrils are transitional structures observed
in vitro during formation of mature amyloid fibrils.
In case of amyloidogenic proteins three models of the oligomerization mechanism
have been proposed by Nelson and Eisenberg [125]: refolding, natively disordered,
and gain of interaction (see Fig. 7).
In refolding model, the protein unfolds, and then folds into a defective structure
which is stabilized mostly by hydrogen bonds (Fig. 7a). The hydrogen bonds influ-
ence the structure and stability of fibrils. This model was proposed for SH3 domain
of insulin and prion protein [126, 127]. Natively disordered model (Fig. 7b) was
proposed for amyloid β peptide and huntingtin [128, 129]. In the process of the fibril
formation, part or all of the previously unstructured polypeptides are organized in
β-sheets that form the core of amyloid fibrils. Gain of interaction model (Fig. 7c) is
based on conformational changes that lead to exposition of previously unreachable
fragments of structure to the outside. It enables interaction between those struc-
tures, thus leading to fibril formation. The model includes four sub-models: direct
stacking, cross-β spine, three-dimensional domain swapping, and three-dimensional
domain swapping with a cross-β spine. In the stacking model, the newly formed
fragments of identical molecules stack on each other forming fibrils (Fig. 8a). This
model was proposed for transthyretin [130]. In cross-β spine model (Fig. 8b), β-sheet
structures align in antiparallel to other, identical molecules. In this way, β-spine is
created. The rest of the structural fragments protrude from the spine. An example of
protein which forms fibrils according to this mechanism is β2-microglobulin [131].
482 S. Rodziewicz-Motowidło et al.
Fig. 7 Formation of fibrils according to different models: refolding (a), natively disordered (b),
gain of interaction (c); (according to [125])
Fig. 8 Sub-models of fibril formation of the “gain of interaction” model: stacking (a), cross-β spine
(b), three-dimensional domain swapping with a cross-β spine (c), and three-dimensional domain
swapping (d) (according to [125])
many computational studies that provide insight into the characteristics of the short
segments of amyloid-like aggregates [140]. For example, the contributions of differ-
ent structural elements of trimeric and pentameric, full-length Aβ (1–42) peptides to
the aggregation in solution were analyzed [141]. Kent et al. reported that a solvent-
exposed hydrophobic patch is important for the aggregation of Aβ(10–35) [142].
Nussinov and coworkers studied Aβ40 elongation, association, and the aggregation
pathway of β2-microglobulin amyloid [143]. Wang et al. studied the disaggrega-
tion behaviour of GNNQQNY oligomers during the microsecond-scale simulations
[144]. Gnanakaran et al. investigated the aggregation of simple amyloid beta peptide
dimer with REMD technique [145]. The MD results indicate that studies of short
peptide aggregation could reveal some common, fundamental mechanisms of fibril
formation.
There are many computational studies to provide an insight into the characteristic
of the short segments of the protofibrils or aggregates built from the short peptides [15,
18, 140, 143, 145–148] whereas for protein structures mainly docking procedure was
used to model the protofilament of the fibril, e.g., for prion protofibril [24] (Fig. 9).
MD studies of the protofilaments were done for example for transthyrethin [149]
(Fig. 10) and ribonuclease A [150] proteins.
To build amyloid protofilaments of transthyretin from partially disrupted TTR
monomeric structures a docking-and-alignment protocol was used [149]. The con-
structed model of TTR protofibril was in good agreement with known experimental
data and general amyloid properties. The final structure was formed by two extended
continuous b-sheets with the β-strands nearly perpendicular to the main axis of the
protofilament. The protofilament, with a diameter of 50 Å was twisted along its
helical axis with a period of 48° β-strands, that is, 16 monomeric units with two
three-stranded β-sheets each (BEF and AGH) (Fig. 10). After 100 ps-long MD simu-
lation the global fold of the protofilament was not changed. Not all the features of the
model are in agreement with the experimental data, for instance, there are differences
in the helical period. The model of TTR protofibril can be therefore further refined
using some new experimentally derived constraints.
In our laboratory we performed studies of oligomers of HCC by using MD method
and build the HCC protofibril. The results are described below.
Based on the data published so far [45, 115] we developed four models of HCC
oligomers with domain-swapped HCC dimer serving as a building block. In the first
proposed model of HCC oligomer, the dimers with swapped domains were arranged
one after another interacting with “front-back” surfaces, i.e. alternately with β-sheet
and α-helix surfaces (Fig. 11). The dimers were aligned evenly one after another, thus
forming an oligomer, which by analogy to nucleic acids, can be called an oligomer
Molecular Dynamics Studies on Amyloidogenic Proteins 485
with “blunt ends”. The propagation of such an oligomer occurs through addition of
consecutive domain-swapped dimers to the already associated ones.
The second considered model was proposed by Janowski et al. [45]. The HCC
dimers are stacked one on another and form the oligomer through the interactions
of top and bottom surfaces of consecutive dimers (Fig. 12). Like in the previous
486 S. Rodziewicz-Motowidło et al.
Fig. 10 a Schematic representation of the TTR protofilament model, showing the size of half of
the repeating unit. b Protofilament cross-section dimension including only the core β-strands [149]
Fig. 11 Model I of HCC oligomer structure. The picture contains numeration of dimers. Figure
based on [45]
model the oligomer formed this way can be called an oligomer with “blunt ends”.
Propagation of this oligomer also occurs through addition of consecutive dimers with
swapped domains to the oligomer.
The third model, proposed for cystatin family in general by Staniforth [115],
the oligomer consists of dimers swapping their domains in an unsymmetrical way
with the unpaired monomer at the end of the structure (Fig. 13). In contrast to the
mechanism of propagation in the previous models, the propagation of this oligomer
occurs not through addition of domain-swapped dimers, but through addition of
“open” monomers, which allows domain swapping. By analogy to nucleic acids
such oligomer can be called an oligomer with “sticky ends”, because of an unpaired
monomer at its end. In model III, the oligomer was built with the use of a HCC dimer
subunit in which the conformation of β-L structure was changed, in order to allow
domain swapping between the subunits, which are positioned at an angle, and not
like in a dimer—in parallel.
Molecular Dynamics Studies on Amyloidogenic Proteins 487
Fig. 12 Model II of HCC oligomer structure. The picture contains numeration of dimers. According
to [45]
Model IV (Fig. 14) has a similar topology as model II, but the domain-swapped
HCC dimers that stack one upon another are turned around the long axis of the
oligomer with an angle of 55°.
The analysis of the models stability after nano-scale MD simulations suggests
that the most stable structures were model II and III. The first tested type of dimer
organization, model I, was clearly unstable. All three dimers involved in the oligomer
changed their positions relative to each other, at the same time showing the instability
within the dimer structure itself. Model IV was also unstable, as one of the dimers
488 S. Rodziewicz-Motowidło et al.
Fig. 13 Model III of HCC oligomer structure. The picture contains numeration of dimers. Accord-
ing to [45]
involved in it changed its position relative to rest of the oligomer. Thus it seems
that the structures of oligomers in models I and IV did not maintain the “fibril-
like” topology, i.e. the elongated shape, during the simulation. Moreover the two
Molecular Dynamics Studies on Amyloidogenic Proteins 489
Fig. 14 Model IV of HCC oligomer structure. The picture contains numeration of dimers. Based
on Fig. 2 in [45]
models show higher energy of interactions between the subunits within the oligomer
determined with MM-GBSA (Molecular Mechanics Generalized Born Surface Area)
method, compared to models II and III. On the other hand, the topology of oligomers
of models II and III were stable during the simulation, also due to the interactions
of hydrogen bonds between subunits. Model II built with the dimers stacked on
one another showed high stability. The dimers formed stable hydrogen bonds, and
490 S. Rodziewicz-Motowidło et al.
salt bridges between each other. The dimer building blocks in this oligomer did not
shift significantly relative to each other and showed only minor changes in their
inner structure. The top and bottom surface of HCC is populated with many polar
or charged amino acids capable to form salt bridges and hydrogen bonds, which
favours this arrangement. The arrangement of subunits in model III, which used
unfolded monomers formed in a structure in which domain swapping was possible,
was stable. The subunits approached each other during the simulation and formed a
network of stable hydrogen bonds. In model III it was also possible for the dimers
stacked one on another to form a continuous β structure, as suggested by Wahlbom
et al. [11]. However, during the simulation, only side-chains hydrogen bonds were
created. The results are consistent with the values of Gibbs energy of the interactions
between oligomer subunits. The most favorable energy level was observed between
the subunits of model III. The second most favorable energy level was observed in
model II. The highest energy of interaction between subunits was found in the least
stable model III (Fig. 15).
It is believed that domain swapping is associated with the formation of amyloid
deposits of HCC. The dimers with swapped domains or the monomers, which swap
Molecular Dynamics Studies on Amyloidogenic Proteins 491
Fig. 16 Schematic representation of the HCC protofilament model obtained for 24 HCC units with
swapped domains (build according to model III)
5 Conclusion
References
1. Virchow, R.: Ueber eine im Gehirn und Rückenmark des Menschen aufgefundene Substanz
mit der chemischen Reaction der Cellulose. Acad. Sci. (Paris) 37, 860–861 (1854)
2. Gertz, M.A., Lacy, M.Q., Dispenzieri, A., Hayman, S.R.: Amyloidosis. Best. Pract. Res. Clin.
Haematol. 18, 709–727 (2005)
3. Hawkins, P.N.: Diagnosis and treatment of amyloidosis. Ann. Rheum. Dis. 56, 631–633 (1997)
4. Stryer, L., Berg, J.M.: Biochemistry 5e+ Hemoglobin Chapter for Biochem 6e. W H Freeman
& Company, New York (2005)
5. Harper, J.D., Wong, S.S., Lieber, C.M., Lansbury, P.T.: Observation of metastable Abeta
amyloid protofibrils by atomic force microscopy. Chem. Biol. 4, 119–125 (1997)
6. Reixach, N., Deechongkit, S., Jiang, X., Kelly, J.W., Buxbaum, J.N.: Tissue damage in the
amyloidoses: transthyretin monomers and nonnative oligomers are the major cytotoxic species
in tissue culture. Proc. Natl. Acad. Sci. U S A 101, 2817–2822 (2004)
7. Krebs, M.R.H., Macphee, C.E., Miller, A.F., Dunlop, I.E., Dobson, C.M., Donald, A.M.: The
formation of spherulites by amyloid fibrils of bovine insulin. Proc. Natl. Acad. Sci. U.S.A.
101, 14420–14424 (2004)
8. Gosal, W.S., Morten, I.J., Hewitt, E.W., Smith, D.A., Thomson, N.H., Radford, S.E.: Compet-
ing pathways determine fibril morphology in the self-assembly of beta2-microglobulin into
amyloid. J. Mol. Biol. 351, 850–864 (2005)
Molecular Dynamics Studies on Amyloidogenic Proteins 493
9. Ionescu-Zanetti, C., Khurana, R., Gillespie, J.R., Petrick, J.S., Trabachino, L.C., Minert, L.J.,
Carter, S.A., Fink, A.L.: Monitoring the assembly of Ig light-chain amyloid fibrils by atomic
force microscopy. Proc. Natl. Acad. Sci. U S A 96, 13175–13179 (1999)
10. Malisauskas, M., Zamotin, V., Jass, J., Noppe, W., Dobson, C.M., Morozova-Roche, L.A.:
Amyloid protofilaments from the calcium-binding protein equine lysozyme: formation of ring
and linear structures depends on pH and metal ion concentration. J. Mol. Biol. 330, 879–890
(2003)
11. Wahlbom, M., Wang, X., Lindström, V., Carlemalm, E., Jaskolski, M., Grubb, A.: Fibrillogenic
oligomers of human cystatin C are formed by propagated domain swapping. J. Biol. Chem.
282, 18318–18326 (2007)
12. Kayed, R., Head, E., Thompson, J.L., McIntire, T.M., Milton, S.C., Cotman, C.W., Glabe,
C.G.: Common structure of soluble amyloid oligomers implies common mechanism of patho-
genesis. Science 300, 486–489 (2003)
13. Rousseau, F., Wilkinson, H., Villanueva, J., Serrano, L., Schymkowitz, J.W.H., Itzhaki, L.S.:
Domain swapping in p13suc1 results in formation of native-like, cytotoxic aggregates. J. Mol.
Biol. 363, 496–505 (2006)
14. Xu, S.: Aggregation drives “misfolding” in protein amyloid fiber formation. Amyloid 14,
119–131 (2007)
15. Nguyen, H.D., Hall, C.K.: Spontaneous fibril formation by polyalanines; discontinuous molec-
ular dynamics simulations. J. Am. Chem. Soc. 128, 1890–1901 (2006)
16. Buchete, N.-V., Tycko, R., Hummer, G.: Molecular dynamics simulations of Alzheimer’s
β-amyloid protofilaments. J. Mol. Biol. 353, 804–821 (2005)
17. Haspel, N., Zanuy, D., Ma, B., Wolfson, H., Nussinov, R.: A comparative study of amyloid
fibril formation by residues 15–19 of the human calcitonin hormone: a single beta-sheet model
with a small hydrophobic core. J. Mol. Biol. 345, 1213–1227 (2005)
18. Röhrig, U.F., Laio, A., Tantalo, N., Parrinello, M., Petronzio, R.: Stability and structure of
oligomers of the Alzheimer peptide Abeta16-22: from the dimer to the 32-mer. Biophys. J.
91, 3217–3229 (2006)
19. Deng, N.-J., Yan, L., Singh, D., Cieplak, P.: Molecular basis for the Cu2+ binding-induced
destabilization of β2-microglobulin revealed by molecular dynamics simulation. Biophys. J.
90, 3865–3879 (2006)
20. Yang, M., Lei, M., Huo, S.: Why is Leu55 → Pro55 transthyretin variant the most amyloido-
genic: Insights from molecular dynamics simulations of transthyretin monomers. Protein Sci.
12, 1222–1231 (2003)
21. Park, S., Saven, J.G.: Simulation of pH-dependent edge strand rearrangement in human beta-2
microglobulin. Protein Sci. 15, 200–207 (2005)
22. Armen, R.S., Daggett, V.: Characterization of two distinct beta2-microglobulin unfolding
intermediates that may lead to amyloid fibrils of different morphology. Biochemistry 44,
16098–16107 (2005)
23. Santini, S., Derreumaux, P.: Helix H1 of the prion protein is rather stable against environmental
perturbations: molecular dynamics of mutation and deletion variants of PrP(90-231). Cell.
Mol. Life Sci. 61, 951–960 (2004)
24. DeMarco, M.L., Daggett, V.: From conversion to aggregation: protofibril formation of the
prion protein. Proc. Natl. Acad. Sci. U S A 101, 2293–2298 (2004)
25. Rodziewicz-Motowidło, S., Wahlbom, M., Wang, X., Lagiewka, J., Janowski, R., Jaskolski,
M., Grubb, A., Grzonka, Z.: Checking the conformational stability of cystatin C and its L68Q
variant by molecular dynamics studies: why is the L68Q variant amyloidogenic? J. Struct.
Biol. 154, 68–78 (2006)
26. DuBay, K.F.K., Pawar, A.P.A., Chiti, F.F., Zurdo, J.J., Dobson, C.M.C., Vendruscolo, M.M.:
Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains. J. Mol.
Biol. 341, 10–10 (2004)
27. Fernandez-Escamilla, A.-M., Rousseau, F., Schymkowitz, J., Serrano, L.: Prediction of
sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat.
Biotechnol. 22, 1302–1306 (2004)
494 S. Rodziewicz-Motowidło et al.
28. Tartaglia, G.G., Cavalli, A., Pellarin, R., Caflisch, A.: Prediction of aggregation rate and
aggregation-prone segments in polypeptide sequences. Protein Sci. 14, 2723–2734 (2005)
29. Ma, B., Nussinov, R.: Simulations as analytical tools to understand protein aggregation and
predict amyloid conformation. Curr. Opin. Chem. Biol. 10, 445–452 (2006)
30. Borreguero, J.M., Urbanc, B., Lazo, N.D., Buldyrev, S.V., Teplow, D.B., Stanley, H.E.: Folding
events in the 21-30 region of amyloid beta-protein (Abeta) studied in silico. Proc. Natl. Acad.
Sci. U S A 102, 6015–6020 (2005)
31. Wei, G., Mousseau, N., Derreumaux, P.: Sampling the self-assembly pathways of KFFE
hexamers. Biophys. J. 87, 9–9 (2004)
32. Baumketner, A., Shea, J.-E.: Free energy landscapes for amyloidogenic tetrapeptides dimer-
ization. Biophys. J. 89, 1493–1503 (2005)
33. Han, W., Wu, Y.-D.: A strand-loop-strand structure is a possible intermediate in fibril elon-
gation: long time simulations of amyloid-beta peptide (10-35). J. Am. Chem. Soc. 127,
15408–15416 (2005)
34. Ma, B., Nussinov, R.: Molecular dynamics simulations of the unfolding of 2-microglobulin
and its variants. Protein Eng. Des. Sel. 16, 561–575 (2003)
35. Moraitakis, G., Goodfellow, J.M.: Simulations of human lysozyme: probing the conformations
triggering amyloidosis. Biophys. J. 84, 10–10 (2003)
36. Tsai, H.-H.G., Reches, M., Tsai, C.-J., Gunasekaran, K., Gazit, E., Nussinov, R.: Energy land-
scape of amyloidogenic peptide oligomerization by parallel-tempering molecular dynamics
simulation: significant role of Asn ladder. Proc. Natl. Acad. Sci. U S A 102, 8174–8179 (2005)
37. Wu, K.-P., Weinstock, D.S., Narayanan, C., Levy, R.M., Baum, J.: Structural reorganization
of alpha-synuclein at low pH observed by NMR and REMD simulations. J. Mol. Biol. 391,
784–796 (2009)
38. Li, M.S., Klimov, D.K., Straub, J.E., Thirumalai, D.: Probing the mechanisms of fibril for-
mation using lattice models. J. Chem. Phys. 129, 175101 (2008)
39. Zhang, J., Muthukumar, M.: Simulations of nucleation and elongation of amyloid fibrils. J.
Chem. Phys. 130, 035102 (2009)
40. Rojas, A., Liwo, A., Browne, D., Scheraga, H.A.: Mechanism of fiber assembly: treatment
of Aβ peptide aggregation with a coarse-grained united-residue force field. J. Mol. Biol. 404,
537–552 (2010)
41. Fawzi, N.L., Chubukov, V., Clark, L.A., Brown, S., Head-Gordon, T.: Influence of denatured
and intermediate states of folding on protein aggregation. Protein Sci. 14, 993–1003 (2005)
42. Auer, S., Dobson, C.M., Vendruscolo, M.: Characterization of the nucleation barriers for
protein aggregation and amyloid formation. HFSP J. 1, 137–146 (2007)
43. Smith, A.V., Hall, C.K.: Protein refolding versus aggregation: computer simulations on an
intermediate-resolution protein model. J. Mol. Biol. 312, 16–16 (2001)
44. Thirumalai, D., Klimov, D.K., Dima, R.I.: Emerging ideas on the molecular basis of protein
and peptide aggregation. Curr. Opin. Struct. Biol. 13, 14–14 (2003)
45. Janowski, R., Kozak, M., Jankowska, E., Grzonka, Z., Grubb, A., Abrahamson, M., Jaskól-
ski, M.: Human cystatin C, an amyloidogenic protein, dimerizes through three-dimensional
domain swapping. Nat. Struct. Mol. Biol. 8, 316–320 (2001)
46. Bennett, M.J., Sawaya, M.R., Eisenberg, D.: Deposition diseases and 3D domain swapping.
Structure 14, 811–824 (2006)
47. Armen, R.S., DeMarco, M.L., Alonso, D.O.V., Daggett, V.: Pauling and Corey’s alpha-pleated
sheet structure may define the prefibrillar amyloidogenic intermediate in amyloid disease.
Proc. Natl. Acad. Sci. U S A 101, 11622–11627 (2004)
48. Ma, B., Nussinov, R.: The Stability of monomeric intermediates controls amyloid formation:
Aβ25-35 and its N27Q mutant. Biophys. J. 90, 10–10 (2006)
49. Gu, W., Wang, T., Zhu, Y., Shi, J., Liu, H.: Molecular dynamics simulation of the unfolding
of the human prion protein domain under low pH and high temperature conditions. Biophys.
Chem. 104, 16–16 (2003)
50. Alonso, D.O., Alm, E., Daggett, V.: Characterization of the unfolding pathway of the cell-
cycle protein p13suc1 by molecular dynamics simulations: implications for domain swapping.
Structure 8, 101–110 (2000)
Molecular Dynamics Studies on Amyloidogenic Proteins 495
51. Gsponer, J., Ferrara, P., Caflisch, A.: Flexibility of the murine prion protein and its Asp178Asn
mutant investigated by molecular dynamics simulations. J. Mol. Graph. Model. 20, 169–182
(2001)
52. Alonso, D.O., DeArmond, S.J., Cohen, F.E., Daggett, V.: Mapping the early steps in the pH-
induced conformational conversion of the prion protein. Proc. Natl. Acad. Sci. U S A 98,
2985–2989 (2001)
53. Yang, M., Lei, M., Bruschweiler, R., Huo, S.: Initial conformational changes of human
transthyretin under partially denaturing conditions. Biophys. J. 89, 11–11 (2005)
54. Skoulakis, S., Goodfellow, J.M.: The pH-dependent stability of wild-type and mutant
transthyretin oligomers. Biophys. J. 84, 10–10 (2003)
55. Mu, Y., Nordenskiöld, L., Tam, J.P.: Folding, misfolding, and amyloid protofibril formation
of WW domain FBP28. Biophys. J. 90, 10–10 (2006)
56. Nowak, M.: Immunoglobulin kappa light chain and its amyloidogenic mutants: a molecular
dynamics study. Proteins 55, 11–21 (2004)
57. Prusiner, S.B.: Biology and genetics of prion diseases. Annu. Rev. Microbiol. 48, 655–686
(1994)
58. Prusiner, S.B.: Neurodegenerative diseases and prions. N. Engl. J. Med. 344, 1516–1526
(2001)
59. Stahl, N., Prusiner, S.B.: Prions and prion proteins (1991)
60. Riesner, D.: Biochemistry and structure of PrP(C) and PrP(Sc). Br. Med. Bull. 66, 21–33
(2003)
61. Zahn, R.: NMR solution structure of the human prion protein. Proc. Natl. Acad. Sci. 97,
145–150 (2000)
62. Cox, D.L., Lashuel, H., Lee, K.Y.C., Singh, R.R.P.: The materials science of protein aggre-
gation. MRS Bull. 30, 452–457 (2005)
63. Lansbury, P.T., Lashuel, H.A.: A century-old debate on protein aggregation and neurodegen-
eration enters the clinic. Nature 443, 774–779 (2006)
64. Dima, R.I., Thirumalai, D.: Exploring the propensities of helices in PrPC to form β sheet
using NMR structures and sequence alignments. Biophys. J. 83, 1268–1280 (2002)
65. Lu, X., Wintrode, P.L., Surewicz, W.K.: Beta-sheet core of human prion protein amyloid
fibrils as determined by hydrogen/deuterium exchange. Proc. Natl. Acad. Sci. U S A 104,
1510–1515 (2007)
66. Cohen, F.E., Pan, K.M., Huang, Z., Baldwin, M., Fletterick, R.J., Prusiner, S.B.: Structural
clues to prion replication. Science 264, 530–531 (1994)
67. Dima, R.I., Thirumalai, D.: Probing the instabilities in the dynamics of helical fragments from
mouse PrPC. Proc. Natl. Acad. Sci. U S A 101, 15335–15340 (2004)
68. Kunes, K.C., Clark, S.C., Cox, D.L., Singh, R.R.P.: Left handed beta helix models for mam-
malian prion fibrils. Prion 2, 81–90 (2008)
69. Cobb, N.J., Apetri, A.C., Surewicz, W.K.: Prion protein amyloid formation under native-like
conditions involves refolding of the C-terminal alpha-helical domain. J. Biol. Chem. 283,
34704–34711 (2008)
70. Prusiner, S.B., McKinley, M.P., Bowman, K.A., Bolton, D.C., Bendheim, P.E., Groth, D.F.,
Glenner, G.G.: Scrapie prions aggregate to form amyloid-like birefringent rods. Cell 35,
349–358 (1983)
71. El-Bastawissy, E., Knaggs, M.H., Gilbert, I.H.: Molecular dynamics simulations of wild-type
and point mutation human prion protein at normal and elevated temperature. J. Mol. Graph.
Model. 20, 145–154 (2001)
72. Parchment, O.G., Essex, J.W.: Molecular dynamics of mouse and Syrian hamster PrP: impli-
cations for activity. Proteins 38, 327–340 (2000)
73. Zuegg, J., Gready, J.E.: Molecular dynamics simulations of human prion protein: importance
of correct treatment of electrostatic interactions. Biochemistry 38, 13862–13876 (1999)
74. Hornemann, S., Glockshuber, R.: A scrapie-like unfolding intermediate of the prion protein
domain PrP(121-231) induced by acidic pH. Proc. Natl. Acad. Sci. U S A 95, 6010–6014
(1998)
496 S. Rodziewicz-Motowidło et al.
75. Swietnicki, W., Morillas, M., Chen, S.G., Gambetti, P., Surewicz, W.K.: Aggregation and
fibrillization of the recombinant human prion protein huPrP90-231. Biochemistry 39, 424–431
(2000)
76. Swietnicki, W., Petersen, R., Gambetti, P., Surewicz, W.K.: pH-dependent stability and
conformation of the recombinant human prion protein PrP(90-231). J. Biol. Chem. 272,
27517–27520 (1997)
77. Zhang, H., Stockel, J., Mehlhorn, I., Groth, D., Baldwin, M.A., Prusiner, S.B., James, T.L.,
Cohen, F.E.: Physical studies of conformational plasticity in a recombinant prion protein.
Biochemistry 36, 3543–3553 (1997)
78. Jackson, G.S., Hosszu, L.L., Power, A., Hill, A.F., Kenney, J., Saibil, H., Craven, C.J., Waltho,
J.P., Clarke, A.R., Collinge, J.: Reversible conversion of monomeric human prion protein
between native and fibrilogenic conformations. Science 283, 1935–1937 (1999)
79. Guo, J., Ren, H., Ning, L., Liu, H., Yao, X.: Exploring structural and thermodynamic stabilities
of human prion protein pathogenic mutants D202N, E211Q and Q217R. J. Struct. Biol. 178,
225–232 (2012)
80. Collinge, J.: Prion diseases of humans and animals: their causes and molecular basis. Ann.
Rev. Neurosci. 519–550 (2001)
81. Mead, S.: Prion disease genetics. Eur. J. Hum. Genet. 14, 273–281 (2006)
82. van der Kamp, M.W., Daggett, V.: The consequences of pathogenic mutations to the human
prion protein. Protein Eng. Des. Sel. 22, 461–468 (2009)
83. Rossetti, G., Cong, X., Caliandro, R., Legname, G., Carloni, P.: Common structural traits
across pathogenic mutants of the human prion protein and their implications for familial
prion diseases. J. Mol. Biol. 411, 13–13 (2011)
84. Hamilton, J.A., Steinrauf, L.K., Braden, B.C., Liepnieks, J., Benson, M.D., Holmgren, G.,
Sandgren, O., Steen, L.: The x-ray crystal structure refinements of normal human transthyretin
and the amyloidogenic Val-30–> Met variant to 1.7-A resolution. J. Biol. Chem. 268,
2416–2424 (1993)
85. Sebastião, M.P., Saraiva, M.J., Damas, A.M.: The crystal structure of amyloidogenic Leu55–>
Pro transthyretin variant reveals a possible pathway for transthyretin polymerization into
amyloid fibrils. J. Biol. Chem. 273, 24715–24722 (1998)
86. Hammarström, P.: Trans-suppression of misfolding in an amyloid disease. Science 293,
2459–2462 (2001)
87. Hammarström, P., Jiang, X., Hurshman, A.R., Powers, E.T., Kelly, J.W.: Sequence-dependent
denaturation energetics: a major determinant in amyloid disease diversity. Proc. Natl. Acad.
Sci. U S A 99(Suppl 4), 16427–16432 (2002)
88. Schneider, F., Hammarström, P., Kelly, J.W.: Transthyretin slowly exchanges subunits under
physiological conditions: a convenient chromatographic method to study subunit exchange
in oligomeric proteins. Protein Sci. 10, 1606–1613 (2001)
89. Hurshman, A.R., White, J.T., Powers, E.T., Kelly, J.W.: Transthyretin aggregation under par-
tially denaturing conditions is a downhill polymerization. Biochemistry 43, 7365–7381 (2004)
90. Jiang, X., Smith, C.S., Petrassi, H.M., Hammarström, P., White, J.T., Sacchettini, J.C., Kelly,
J.W.: An engineered transthyretin monomer that is nonamyloidogenic, unless it is partially
denatured. Biochemistry 40, 11442–11452 (2001)
91. Armen, R.S., Alonso, D.O.V., Daggett, V.: Anatomy of an amyloidogenic intermediate -
conversion of β-sheet to α-sheet structure in transthyretin at acidic pH. Structure 12, 17–17
(2004)
92. Liu, K., Cho, H.S., Hoyt, D.W., Nguyen, T.N., Olds, P., Kelly, J.W., Wemmer, D.E.: Deuterium-
proton exchange on the native wild-type transthyretin tetramer identifies the stable core of the
individual subunits and indicates mobility at the subunit interface. J. Mol. Biol. 303, 555–565
(2000)
93. Saraiva, M.J.: Transthyretin mutations in hyperthyroxinemia and amyloid diseases. Hum.
Mutat. 17, 493–503 (2001)
94. Lashuel, H.A., Lai, Z., Kelly, J.W.: Characterization of the transthyretin acid denaturation path-
ways by analytical ultracentrifugation: implications for wild-type, V30M, and L55P amyloid
fibril formation. Biochemistry 37, 17851–17864 (1998)
Molecular Dynamics Studies on Amyloidogenic Proteins 497
95. Hörnberg, A., Eneqvist, T., Olofsson, A., Lundgren, E., Sauer-Eriksson, A.E.: A comparative
analysis of 23 structures of the amyloidogenic protein transthyretin. J. Mol. Biol. 302, 21–21
(2000)
96. Wojtczak, A., Neumann, P., Cody, V.: Structure of a new polymorphic monoclinic form of
human transthyretin at 3 Å resolution reveals a mixed complex between unliganded and
T4-bound tetramers of TTR. Acta Crystallogr. D: Biol. Crystallogr. 57, 957–967 (2001)
97. Hörnberg, A., Olofsson, A., Eneqvist, T., Lundgren, E., Sauer-Eriksson, A.E.: The beta-
strand D of transthyretin trapped in two discrete conformations. Biochim. Biophys. Acta
1700, 93–104 (2004)
98. Banerjee, A., Bairagya, H.R., Mukhopadhyay, B.P.B., Nandi, T.K., Bera, A.K.: Structural
insight to mutated Y116S transthyretin by molecular dynamics simulation. Indian J. Biochem.
Biophys. 47, 197–202 (2010)
99. Xu, X., Wang, X., Xiao, Z., Li, Y., Wang, Y.: Probing the structural and functional link
between mutation- and pH-dependent hydration dynamics and amyloidosis of transthyretin.
Soft Matter 8, 324–336 (2011)
100. Abrahamson, M., Barrett, A.J., Salvesen, G., Grubb, A.: Isolation of six cysteine proteinase
inhibitors from human urine. Their physicochemical and enzyme kinetic properties and con-
centrations in biological fluids. J. Biol. Chem. 261, 11282–11289 (1986)
101. Grubb, A.O.: Cystatin C-properties and use as diagnostic marker. In: Advances in Clinical
Chemistry. Elsevier, pp. 63–99 (2001)
102. Grzonka, Z., Jankowska, E., Kasprzykowski, F., et al.: Structural studies of cysteine proteases
and their inhibitors. Acta Biochim. Pol. 48, 1–20 (2001)
103. Ghiso, J., Jensson, O., Frangione, B.: Amyloid fibrils in hereditary cerebral hemorrhage with
amyloidosis of Icelandic type is a variant of gamma-trace basic protein (cystatin C). Proc.
Natl. Acad. Sci. U S A 83, 2974–2978 (1986)
104. Abrahamson, M.: Molecular basis for amyloidosis related to hereditary brain hemorrhage.
Scand. J. Clin. Lab. Invest. Suppl. 226, 47–56 (1996)
105. Olafsson, I., Grubb, A.O.: Hereditary cystatin C amyloid angiopathy. Amyloid 7, 70–79
(2000)
106. Gerhartz, B., Ekiel, I., Abrahamson, M.: Two stable unfolding intermediates of the disease-
causing L68Q variant of human cystatin C. Biochemistry 37, 17309–17317 (1998)
107. Abrahamson, M., Grubb, A.: Increased body temperature accelerates aggregation of the Leu-
68–> Gln mutant cystatin C, the amyloid-forming protein in hereditary cystatin C amyloid
angiopathy. Proc. Natl. Acad. Sci. U S A 91, 1416–1420 (1994)
108. Jankowska, E., Wiczk, W., Grzonka, Z.: Thermal and guanidine hydrochloride-induced denat-
uration of human cystatin C. Eur. Biophys. J. 33, 454–461 (2004)
109. Nilsson, M., Wang, X., Rodziewicz-Motowidlo, S., Janowski, R., Lindström, V., Onnerfjord,
P., Westermark, G., Grzonka, Z., Jaskolski, M.M., Grubb, A.A.: Prevention of domain swap-
ping inhibits dimerization and amyloid fibril formation of cystatin C: use of engineered disul-
fide bridges, antibodies, and carboxymethylpapain to stabilize the monomeric form of cystatin
C. J. Biol. Chem. 279, 24236–24245 (2004)
110. Liu, H.-L., Lin, Y.-M., Zhao, J.-H., Hsieh, M.-C., Lin, H.-Y., Huang, C.-H., Fang, H.-W., Ho,
Y., Chen, W.-Y.: Molecular dynamics simulations of human cystatin C and its L68Q varient
to investigate the domain swapping mechanism. J. Biomol. Struct. Dyn. 25, 135–144 (2007)
111. Lin, Y.-M., Liu, H.-L., Zhao, J.-H., Huang, C.-H., Fang, H.-W., Ho, Y., Chen, W.-Y.: Molecular
dynamics simulations to investigate the domain swapping mechanism of human cystatin C.
Biotechnol. Prog. 23, 577–584 (2008)
112. Yu, Y., Wang, Y., He, J., Liu, Y., Li, H., Zhang, H., Song, Y.: Structural and dynamic properties
of a new amyloidogenic chicken cystatin mutant I108T. J. Biomol. Struct. Dyn. 27, 641–649
(2010)
113. Ekiel, I., Abrahamson, M., Fulton, D.B., et al.: NMR structural studies of human cystatin C
dimers and monomers. J. Mol. Biol. 271, 12–12 (1997)
114. Sinha, N., Tsai, C.J., Nussinov, R.: A proposed structural model for amyloid fibril elongation:
domain swapping forms an interdigitating beta-structure polymer. Protein Eng. 14, 93–103
(2001)
498 S. Rodziewicz-Motowidło et al.
115. Staniforth, R.A., Giannini, S., Higgins, L.D., Conroy, M.J., Hounslow, A.M., Jerala, R.,
Craven, C.J., Waltho, J.P.: Three-dimensional domain swapping in the folded and molten-
globule states of cystatins, an amyloid-forming structural superfamily. EMBO J. 20,
4774–4781 (2001)
116. Stubbs, M.T., Laber, B., Bode, W., Huber, R., Jerala, R., Lenarcic, B., Turk, V.: The refined
2.4 A X-ray crystal structure of recombinant human stefin B in complex with the cysteine
proteinase papain: a novel type of proteinase inhibitor interaction. EMBO J. 9, 1939–1947
(1990)
117. Engh, R.A., Dieckmann, T., Bode, W., Auerswald, E.A., Turk, V., Huber, R., Oschkinat, H.:
Conformational variability of chicken cystatin. Comparison of structures determined by X-ray
diffraction and NMR spectroscopy. J. Mol. Biol. 234, 1060–1069 (1993)
118. Rodziewicz-Motowidło, S., Iwaszkiewicz, J., Sosnowska, R., Czaplewska, P., Sobolewski,
E., Szymańska, A., Stachowiak, K., Liwo, A.: The role of the Val57 amino-acid residue in the
hinge loop of the human cystatin C. Conformational studies of the beta2-L1-beta3 segments
of wild-type human cystatin C and its mutants. Biopolymers 91, 373–383 (2009)
119. Sunde, M., Serpell, L.C., Bartlam, M., Fraser, P.E., Pepys, M.B., Blake, C.C.: Common core
structure of amyloid fibrils by synchrotron X-ray diffraction. J. Mol. Biol. 273, 11–11 (1997)
120. Blake, C., Serpell, L.: Synchrotron X-ray studies suggest that the core of the transthyretin
amyloid fibril is a continuous β-sheet helix. Structure 4, 10–10 (1996)
121. Cohen, A.S., Shirahama, T., Skinner, M.: Electron microscopy of amyloid. Electron
microscopy of proteins 3, 165–205 (1982)
122. Puchtler, H., Sweat, F.: Congo red as a stain for fluorescence microscopy of amyloid. J.
Histochem. Cytochem. 13, 693–694 (1965)
123. Chiti, F., Dobson, C.M.: Protein misfolding, functional amyloid, and human disease. Ann.
Rev. Biochem. 75, 333–366 (2006)
124. Serpell, L.C., Sunde, M., Benson, M.D., Tennent, G.A., Pepys, M.B., Fraser, P.E.: The protofil-
ament substructure of amyloid fibrils. J. Mol. Biol. 300, 1033–1039 (2000)
125. Nelson, R., Eisenberg, D.: Recent atomic models of amyloid fibril structure. Curr. Opin.
Struct. Biol. 16, 260–265 (2006)
126. Jiménez, J.L., Guijarro, J.I., Orlova, E., Zurdo, J., Dobson, C.M., Sunde, M., Saibil, H.R.:
Cryo-electron microscopy structure of an SH3 amyloid fibril and model of the molecular
packing. EMBO J. 18, 815–821 (1999)
127. Govaerts, C., Wille, H., Prusiner, S.B., Cohen, F.E.: Evidence for assembly of prions with
left-handed beta-helices into trimers. Proc. Natl. Acad. Sci. U S A 101, 8342–8347 (2004)
128. Sikorski, P., Atkins, E.: New model for crystalline polyglutamine assemblies and their con-
nection with amyloid fibrils. Biomacromol 6, 425–432 (2005)
129. Lührs, T., Ritter, C., Adrian, M., Riek-Loher, D., Bohrmann, B., Döbeli, H., Schubert, D.,
Riek, R.: 3D structure of Alzheimer’s amyloid-beta(1-42) fibrils. Proc. Natl. Acad. Sci. U S
A 102, 17342–17347 (2005)
130. Serag, A.A., Altenbach, C., Gingery, M., Hubbell, W.L., Yeates, T.O.: Arrangement of subunits
and ordering of beta-strands in an amyloid sheet. Nat. Struct. Biol. 9, 734–739 (2002)
131. Ivanova, M.I., Sawaya, M.R., Gingery, M., Attinger, A., Eisenberg, D.: An amyloid-forming
segment of beta2-microglobulin suggests a molecular model for the fibril. Proc. Natl. Acad.
Sci. U S A 101, 10584–10589 (2004)
132. Gronenborn, A.M.: Protein acrobatics in pairs—dimerization via domain swapping. Curr.
Opin. Struct. Biol. 19, 39–49 (2009)
133. la Paz de, M.L., de Mori, G.M.S., Serrano, L., Colombo, G.: Sequence dependence of amyloid
fibril formation: insights from molecular dynamics simulations. J. Mol. Biol. 349, 14–14
(2005)
134. Li, L., Darden, T.A., Bartolotti, L., Kominos, D., Pedersen, L.G.: An atomic model for the
pleated beta-sheet structure of Abeta amyloid protofilaments. Biophys. J. 76, 2871–2878
(1999)
135. Zanuy, D., Nussinov, R.: The sequence dependence of fiber organization. A comparative
molecular dynamics study of the islet amyloid polypeptide segments 22-27 and 22-29. J.
Mol. Biol. 329, 20–20 (2003)
Molecular Dynamics Studies on Amyloidogenic Proteins 499
136. Haspel, N., Gunasekaran, K., Ma, B., Tsai, C.-J.C., Nussinov, R.: The stability and dynamics
of the human calcitonin amyloid peptide DFNKF. Biophys. J. 87, 13–13 (2004)
137. Ye, W., Chen, Y., Wang, W., Yu, Q., Li, Y., Zhang, J., Chen, H.-F.: Insight into the stability
of cross-β amyloid fibril from VEALYL short peptide with molecular dynamics simulation.
PLoS ONE 7, e36382 (2012)
138. Periole, X., Rampioni, A., Vendruscolo, M., Mark, A.E.: Factors that affect the degree of twist
in beta-sheet structures: A molecular dynamics simulation study of a cross-beta filament of
the GNNQQNY peptide. J. Phys. Chem. B 113, 10548–10548 (2009)
139. Song, W., Wei, G., Mousseau, N., Derreumaux, P.: Self-assembly of the beta2-microglobulin
NHVTLSQ peptide using a coarse-grained protein model reveals a beta-barrel species. J.
Phys. Chem. B 112, 4410–4418 (2008)
140. Berryman, J.T., Radford, S.E., Harris, S.A.: Systematic examination of polymorphism in
amyloid fibrils by molecular-dynamics simulation. Biophys. J. 100, 9–9 (2011)
141. Connelly, L., Jang, H., Arce, F.T., Capone, R., Kotler, S.A., Ramachandran, S., Kagan, B.L.,
Nussinov, R., Lal, R.: Atomic force microscopy and MD simulations reveal pore-like struc-
tures of all-d-enantiomer of Alzheimer’s β-amyloid peptide: relevance to the ion channel
mechanism of AD pathology. J. Phys. Chem. B 116, 1728–1735 (2012)
142. Kent, A., Jha, A.K., Fitzgerald, J.E., Freed, K.F.: Benchmarking implicit solvent folding
simulations of the amyloid beta(10-35) fragment. J. Phys. Chem. B 112, 6175–6186 (2008)
143. Zheng, J., Jang, H., Nussinov, R.: Beta2-microglobulin amyloid fragment organization and
morphology and its comparison to Abeta suggests that amyloid aggregation pathways are
sequence specific. Biochemistry 47, 2497–2509 (2008)
144. Wang, J., Tan, C., Chen, H.-F., Luo, R.: All-atom computer simulations of amyloid fibrils
disaggregation. Biophys. J. 95, 5037–5047 (2008)
145. Gnanakaran, S., Nussinov, R., García, A.E.: Atomic-level description of amyloid beta-dimer
formation. J. Am. Chem. Soc. 128, 2158–2159 (2006)
146. Boucher, G., Mousseau, N., Derreumaux, P.: Aggregating the amyloid Abeta(11-25) peptide
into a four-stranded beta-sheet structure. Proteins 65, 877–888 (2006)
147. Lipfert, J., Franklin, J., Wu, F., Doniach, S.: Protein misfolding and amyloid formation for the
peptide GNNQQNY from yeast prion protein Sup35: simulation by reaction path annealing.
J. Mol. Biol. 349, 11–11 (2005)
148. Soto, P., Cladera, J., Mark, A.E., Daura, X.: Stability of SIV gp32 fusion-peptide single-layer
protofibrils as monitored by molecular-dynamics simulations. Angew. Chem. 117, 1089–1091
(2005)
149. Correia, B.E., Loureiro-Ferreira, N., Rodrigues, J.R., Brito, R.M.M.: A structural model of
an amyloid protofilament of transthyretin. Protein Sci. 15, 28–32 (2005)
150. Colombo, G., Meli, M., De Simone, A.: Computational studies of the structure, dynamics and
native content of amyloid-like fibrils of ribonuclease A. Proteins 70, 863–872 (2007)
151. Cendron, L., Trovato, A., Seno, F., Folli, C., Alfieri, B., Zanotti, G., Berni, R.: Amyloidogenic
potential of transthyretin variants: insights from structural and computational analyses. J. Biol.
Chem. 284, 25832–25841 (2009)
Raman and Infrared Spectra
of Acoustical, Functional Modes
of Proteins from All-Atom
and Coarse-Grained Normal Mode
Analysis
Abstract The directions of the largest thermal fluctuations of the structure of a pro-
tein in its native state are the directions of its low-frequency modes (below 1 THz),
named acoustical modes by analogy with the acoustical phonons of a material. The
acoustical modes of a protein assist its conformational changes and are related to its
biological functions. Low-frequency modes are difficult to detect experimentally. A
survey of experimental data of low-frequency modes of proteins is presented. Theo-
retical approaches, based on normal mode analysis, are of first interest to understand
the role of the acoustical modes in proteins. In this chapter, the fundamentals of
normal mode analysis using all-atom models and coarse-grained elastic models are
reviewed. Then, they are applied to: first, a protein studied in recent single molecule
experiments, conalbumin and second, to a protein intimately related to human dis-
eases: the 70 kDa Heat-Shock Protein (Hsp70). The conalbumin protein consists of
two homologous N- and C-lobes and was recently used as a benchmark protein for
Extraordinary Acoustic Raman (EAR) spectroscopy. Present all-atom calculations
demonstrate that acoustical modes of conalbumin recently measured experimentally
are both infrared and Raman active. The molecular chaperone Hsp70 is an exem-
plary model to illustrate the different properties of the low-frequency modes of a
multi-domain protein which occurs in two well distinct structural states (open and
closed states), which might be also detectable in the sub-THz frequency range by
single molecule spectroscopy. The role of the low-frequency modes in the transi-
tion between the two states of Hsp70 is analyzed in details. It is shown that the
low-frequency modes provide an easy means of communication between protein
domains separated by a large distance.
1 Introduction
The directions of the largest thermal fluctuations of the structure of a protein in its
native state are the directions of its low-frequency modes (below 1 THz), named
Raman and Infrared Spectra of Acoustical, Functional Modes … 503
acoustical modes by analogy with the acoustical phonons of a material. The acous-
tical modes assist the conformational changes of proteins necessary to perform their
function [24, 25]. The low-frequency modes are related to the amino-acid sequence
of the protein because they depend on the tertiary structure. Proteins for which the
amino-acid sequences lead to the same fold (having the same main-chain conforma-
tion) have similar confined acoustical modes because the lowest frequency modes
depend mainly on the connectivity of the main chain of the protein and not on
the atomistic details. Natural selection of an amino-acid sequence not only selects a
structure, and thus a biological function, but also the low-frequency collective modes
associated to it.
Since nearly four decades up to today, there has been a considerable interest to
establish the possible role of the low-frequency (<200 cm−1 ) modes of proteins for
their biological function [13, 19–21, 26–31]. To perform their functions, most of
the proteins need to alternate between different states separated by an activation
barrier. The passage from one state to another is coupled to the binding/release of
one or several ligands and could be assisted by confined acoustical modes [29, 30,
32]. The directions of the low-frequency modes provide the direction of the largest
deformation at thermal equilibrium and can serve as collective coordinates to describe
the conformational changes [19–21]. Intrinsic dynamics of proteins correlates with
the structural changes induced by ligand or protein binding [32–34]. In enzymes, MD
simulations revealed long-range interactions which manifest as correlated motions of
distant residues which might play a role in enzyme catalysis [31, 35, 36]. The details
and the importance of the collective modes of proteins for their biological function
are still not fully understood however. This is due to the fact that both theoretical and
experimental approaches were not enable so far to follow the biological events on the
multiple timescales on which these events occur (from femtosecond to second, Fig. 1)
[37, 38]. In spite of these limitations, a perturbative approach, based on the dynamics
of a protein using an harmonic all-atom or coarse-grained potential energy surface,
had proven to be useful to understand the conformational changes of proteins.
The chapter is organized as follows. The fundamentals of the theory of the vibra-
tional modes in the harmonic approximation (also named normal modes) for proteins
are reviewed in Sect. 2. There, we present the equations to compute the absorption
(infrared) and Raman spectra of proteins. The application of the normal mode anal-
ysis (NMA) to describe transition pathways is briefly reviewed and the limit of the
harmonic approximation on which NMA is based is described. In Sect. 3, we present
a survey of the measurements of the low-frequency modes of proteins in their native
state by Extraordinary Acoustic Raman (EAR) spectroscopy, and their relation with
the present all-atom normal mode calculations of a model protein, conalbumin. As
shown theoretically elsewhere [39], acoustical modes of proteins studied by EAR
spectroscopy are both infrared and Raman active modes, with a remarkable agreement
between theory and experiments. In Sect. 4, an analysis of the low-frequency collec-
tive motions of a large multi-domain protein of first interest in medicine, the human
70 kDa heat-shock protein (hHsp70) is described. The vibrational modes of hHsp70
were studied in the vicinity (harmonic approximation) of the two main local min-
ima of its free-energy landscape: the nucleotide-free or ADP-bound hHsp70 (named
closed state) and the ATP-bound hHsp70 (named open state). As shown elsewhere
[40, 41], the open and closed states of human Hsp70 represent initial and final struc-
tures of the conformational transitions of the functional cycle of this chaperone. In an
attempt to identify the functionally important motions for the transition between the
open and closed states, we computed the collective modes of the open model and the
closed model of hHsp70 using, first, a coarse-grained normal mode analysis using the
popular Anistropic Network Model (ANM) [7], and second, all-atom normal mode
analysis. All-atom and coarse-grained calculations of the low-frequency motions of
hHsp70 were compared. The chapter ends with concluding remarks.
2 Theory
2.1 Introduction
Since several decades [4, 13–17, 26], NMA has been used successfully to determine
protein slow motions, which are coupled to conformational changes. The NMA
method describes all possible (small) deformations a protein can undergo around its
native state by representing the protein by a set of harmonic oscillators [42, 43]. The
vibrational low-frequency modes correspond to collective or global motions, whereas
the higher frequency modes correspond to local deformations. Several studies have
shown that theses low-frequency modes are related to relevant much slower motions
in proteins and that conformational transitions often follow one or a combination of
a few normal modes [18, 30, 44–47].
Two main NMA approaches have been used in the present chapter. The first is the
all-atom NMA (aa-NMA) with the standard all-atom representation of the protein
(Fig. 2a) and an all-atom force-field. The aa-NMA is limited to proteins of hundreds
of residues due to the memory requirements for the diagonalization of the 3N × 3N
force constant (Hessian) matrix, where N is the number of atoms. This is the main
computational limitation of aa-NMA. A reduction of these degrees of freedom is
commonly used to reduce the size of the Hessian matrix. This can be achieved
Raman and Infrared Spectra of Acoustical, Functional Modes … 505
by holding the bond lengths and angles fixed for example [48], or by considering
only the rotation of several residues [49]. The second NMA approach used here, is
the coarse-grained NMA where each residue of a protein is represented by a point
(effective) mass (Fig. 2b). The most well-known coarse-grained model for NMA is
the elastic network model (ENM). In ENM, the all-atom force field is replaced by a
ball-and-spring harmonic potential with a single force constant parameter (Fig. 2c).
The first elastic network model was proposed by Tirion [50], who showed that an
all-atom homogeneous elastic network model can reproduce the shape of the low-
frequency part of the density of states of a protein as well as the fluctuations of
its Cα atoms very well. Later, Hinsen introduced a simplified coarse-grained elastic
network model, based on the position of the Cα atoms only and demonstrated its
usefulness to identify dynamical domains in proteins [51]. Since then, many variants
were developed [7, 52] and applied to a large number of proteins [53–55]. Combined
with a coarse-grained representation of the protein where only the Cα atoms are
considered (Fig. 2b), ENM has emerged as the preferred approach to perform NMA
on large systems. Although simple and efficient (calculation could be done within
seconds on a regular desktop computer), it has been shown to provide robust and
reliable results.
The collective modes of a protein can be defined from the eigenvectors and eigen-
values of the covariance matrix of the displacements of the atoms (or group of atoms
in a coarse-grained representation) relative to their equilibrium positions [42]. The
structure of the protein is described by a set of point masses M1 , M2 , . . . M N {Mi }
located at R1 , R2 , . . . R N {Ri }, respectively. The most probable position of the
mass Mi is Ri0 (i 1 to N ). Each mass represents either an atomic mass (in an
all-atom representation of the protein) or an effective mass (in a coarse-grained rep-
resentation of the protein). The probability distribution of the displacements of the
point masses relative to their equilibrium position, Ri Ri − Ri0 , is assumed to
be a multivariate Gaussian distribution
⎡ ⎤
1 N N
P({Ri }) P(0)ex p ⎣− Ai j : Ri R j ⎦, (1)
2 i1 j1
Fig. 2 a Representative hydrated structure of hHsp70 in the closed state used as input for both
coarse-grained and all-atom NMA. The color code is the following: NBD-IA = blue, NBD-IB =
marine, NBD-IIA = lightblue, NBD-IIB = cyan, linker = magenta, SBD-β = green, SBD-α = red and
C-terminal = gray. b Structure of hHsp70 in the closed state where only the Cα atoms are represented
in the same view as in panel C. The color code is the same as in panel C. Elastic network connections
between the Cα atoms used to construct the ANM force constant matrix. Cα atoms within a cutoff
of 11 Å are shown connected via a “bond” in black. c Schematic representation of nodes in elastic
network of ANM. Every node is connected to its spatial neighbors by uniform springs. d Distance
vector between two nodes, i and j, is shown by an arrow and labeled Rij . Equilibrium positions
of the ith and jth nodes, R0i and R0j , are shown in xyz coordinates system. R0ij is the equilibrium
distance between nodes i and j. Instantaneous fluctuation vectors, Ri and Rj , and instantaneous
distance vector, Rij , are shown by dashed arrows. Panels c and d were prepared with PyMOL (
http://www.pymol.org)
and for a rigid rotation of the molecule around an axis of direction , one finds also
P Ri Ri0 × P(0). (3)
Raman and Infrared Spectra of Acoustical, Functional Modes … 507
By using Eqs. 2 and 3 in Eq. 1, we deduce that the matrix A obey two relations
N
αβ
Ai j 0, (4)
j1
and
N
αβ
Ai j R0j × β
0, (5)
j1 β
Aek ak ek , (6)
N
Ai j ek ( j) ak ek (i). (7)
j1
y
In Eq. 7, ek (i) ekx (i), ek (i), ekz (i) is the projection of the N -dimensional
y
eigenvector ek on the site i located at Ri Rix , Ri , Riz . It is easy to show that
the relation given in Eq. 4 implies that three eigenvalues of A are null. Each of these
modes has a normalized eigenvector corresponding to a rigid translation along one
of the Cartesian axis. Similarly, the relation given in Eq. 5 implies that three other
eigenvalues are zero, each with an eigenvector corresponding to a rigid rotation along
one of the three Cartesian axes. The eigenvalues of A being ranked by increasing
values, the first non-zero eigenvalue of A is a7 .
The eigenvectors of A form a complete basis set:
3N
β
eαk (i)ek ( j) δi j δαβ . (8)
k1
The scalar product of each eigenvector of A with the displacements Ri defines
a scalar collective coordinate qk :
N
qk ≡ ek (i).Ri . (9)
i1
Any displacement Ri (including the rigid translation and rotation of the
molecule as a whole) can be expanded in collective modes:
508 A. Nicolaï et al.
3N
Ri qk ek (i) (10)
k1
The relation given in Eq. 10 follows from Eqs. 8 and 9. Using Eq. 6 in Eqs. 1 and
10, we find
1
3N
P({Ri }) P({qk }) P(0)ex p − ak q k .
2
(11)
2 k7
3N
1 α
αβ β β −1αβ
σ i j Riα R j ek (i)ek ( j) Ai j , (13)
a
k7 k
where means an average over all possible values of the collective coordinates.
The quantities 1/ak and ek represent the eigenvalues and the eigenvectors of the
covariance matrix of the displacements, respectively. From Eqs. 9 and 13, one finds
that 1/ak (k > 6) is simply the average value of the square of the collective coordinate,
i.e.,
2 1
qk . (14)
ak
8π 2 8π 2
3N
|ek (i)|2
Bi ≡ |Ri |2 . (15)
3 3 k7 ak
N
∂E
E − E(0) .Ri
i1
∂Ri 0
N
1
N
∂ E2
+ : Ri R j ,
2 i1 j1 ∂Ri ∂R j 0
1
N N
i j : Ri R j , (16)
2 i1 j1
λk ak k B T, (18)
with
"
Ai j Di j /k B T ≡ i j /k B T / Mi M j . (21)
N
Di j êk ( j) ωk2 êk (i), (22)
j1
and
ωk2
ak , (23)
kB T
one can reformulated the Bi factors (Eq. 15) in terms of the normal modes:
# #2
8π 2 k B T #êk (i)#
3N
Bi (24)
3 Mi k7 ωk2
The summation of the influence over all the atoms of a subdomain is a measure
of the contribution of this domain to the normal mode.
where h is the Planck constant, W is the energy absorbed by the molecule, ωk and γk
are respectively the vibrational frequency and damping of the kth vibrational mode,
N is the total number of atoms and ρ k is the variation of the molecular dipole
moment in the vibrational mode k, with
Raman and Infrared Spectra of Acoustical, Functional Modes … 511
N
ql êk (i)
ρ k √ . (27)
i1
mi
In Eq. 27, qi and m i are the charge and the mass of the atom i of the protein,
respectively. The vector êk (i) is the eigenvector component of the atom i of the kth
mode (Eq. 22). The damping factor γk was taken arbitrarily identical (γk γ
0.1 cm−1 ) for all acoustical modes because their frequencies and the scale of their
motions are similar [39].
Raman activity of the vibrational modes of proteins from normal modes calcula-
tions is computed as follows. In a Raman active mode, the elastic deformation of
the molecule induces a variation of the molecular electronic polarizability α [56]
and the Raman intensity is proportional to the square of the derivative of the molec-
ular polarizability relative to the collective normal coordinate q (Eq. 9). As shown
elsewhere [57], the electronic polarizability of an amino acid, computed ab initio, is
simply proportional to its number of electrons. Therefore, making the assumption of
an average electronic density for all amino-acids, the polarizability of an amino acid
is simply proportional to its steric volume [39]. Using this property, the Raman activ-
ity A of each mode k of frequency ωk can be estimated by computing the following
quantity:
# # # # # #2
# ∂α #2 # ∂α∂ V #2 #
2# ∂ V #
#
A(ωk ) ≡ # # # #
# # ∼
C # , (28)
∂q k # ∂ V ∂q k # ∂q k #
where V is the steric volume of the protein and the constant C is 353.34 a.u./nm3
(1 a.u 1.649 × 10−41 C2 m2 J−1 ). The derivative in Eq. 28 is computed by
finite difference using q k ±0.1 and the steric volume V is computed using the
software GROMACS [58]. Finally, using the Raman activities (Eq. 28), we defined
a continuous Raman spectrum P (ω) using a Lorentzian broadening:
4π 2 A(ωk )
P (ω) 2
, (29)
ωk2 − ω2 + (γ /2)2
512 A. Nicolaï et al.
where Ri0 is the equilibrium position of the Cα atom of residue i (Fig. 2d) and H is
the Heaviside function.
There are only two parameters in ANM: the force constant A and the cutoff radius
Rc . The model is strictly equivalent to the Born-von Karman model developed in the
first days of solid state physics to describe the phonons of crystals [59]. Indeed,
Eq. 30 is the simplest form of which is invariant by global translation and rotation
of the molecule and obeys the relations given in Eqs. 4 and 5.
The contribution of a given collective mode to the transition between two states of
a protein (as for example the open and closed state of Hsp70), can be defined by
an individual and a cumulative involvement coefficient adapted from Ref. [30] and
computed as follows. A “transition pathway” is determined by linearly interpolating
between two structural states of the protein (say A an initial state and B a final
state) after optimal superposition of all the Cα atoms of these two structural states
(Fig. 3a). Only the positions of the Cα atoms are considered to describe the transition
pathway in ANM whereas in the all-atom calculation, the positions of all atoms are
considered. The positions of the ith atom in the structural states A and B are defined
by RiA and RiB , respectively. The linear pathway followed by the ith atom is defined
by RiA − RiB (Fig. 3b). The contribution of the ith atom in the mode k to the transition
between A and B is measured by the following projection
Raman and Infrared Spectra of Acoustical, Functional Modes … 513
Fig. 3 Illustrations of the linear interpolated transition pathway between the Cα atoms (in black,
panel a) and the involvement coefficients (panel b) between the open (red cartoon) and closed (blue
cartoon) states of hHsp70. The superposition of the structures in panel a was done by minimizing
the RMSD of the Cα atoms of the full-length structure
R A − RiB
I˜ik ≡ $% i · ek (i), (31)
2
j Rj − Rj
A B
where the sum is over all the N sites considered to represent the molecule, i.e. all
atoms of the protein in aa-NMA and only the Cα atoms in ANM.
Thus, the value of the involvement coefficients Ik indicates in a semi-quantitative
way the contribution of each collective motion to a given conformational change.
The maximum value of Ik is 1 and corresponds to a situation in which a single
mode contributes to the conformational change between the states A and B. In this
case, the eigenvector components are exactly in the direction of the linear interpolated
pathway between the structures A and B. A complementary quantity is the cumulative
involvement coefficient C I K , which is computed as:
K
C IK Ik2 , (33)
k1
514 A. Nicolaï et al.
which measures the contribution of the K first lowest-frequency modes to the con-
formational change.
The cumulative coefficient is normalized:
3N
Ik2 1. (34)
k1
The starting point of the harmonic approximation is the representation of the protein
by a single structure corresponding to the structure at the minimum potential energy.
Actually, it is the structure found by minimizing the structure measured by XRD
using a model of the surface potential energy. However, in solution, a protein occurs
in many conformational substates [60]. The free-energy landscape of a protein is
best regarded as a multi-dimensional surface with multiple local minima separated
by barriers. The static structure used in normal mode calculations corresponds to only
one of these minima. The conformational substates of the multi-dimensional free-
energy landscape of a protein can be projected along the amino-acid sequence [61]
showing which part of the backbone and side chains occurred in multiple substates. At
the level of one residue or bonds, the protein motion within local minima corresponds
to an anomalous diffusion [62] which can be related to NMR data [63].
Because the multiple substates are separated by activation barriers, two types of
collective atomic motions are possible in the native state: either intra-minima motion
or jumps between the minima [64]. Because jumps between the minima of the free-
energy landscape are transient events (the probability is minimal at the activation bar-
rier), a protein spent most of its time by oscillating on a multi-dimensional parabolic
free-energy surface [65]. One expects therefore that most of the collective modes
of a protein are actually harmonic in the native state. Principal component analysis
(PCA) of the protein structural fluctuations computed in molecular dynamics (MD)
lead indeed to that conclusion [43, 66]. A small fraction (12–20%) of the lowest
frequency modes (<80 cm−1 ) are anharmonic at room-temperature according to MD
simulations [43, 66]. Strictly speaking, the (harmonic) collective modes between 20
and 80 cm−1 are only well defined in crystals and crystal powders at low hydration in
the harmonic approximation (and are actually measured in these conditions). How-
ever, the study of the directions of these modes in the harmonic approximation is in a
first approximation well correlated to the actual conformational changes of proteins
in solution.
Raman and Infrared Spectra of Acoustical, Functional Modes … 515
The frequencies of the confined acoustical phonons of proteins are much smaller
than the vibrational frequencies of the chemical groups of the organic molecules and
are more difficult to measure [67]. In proteins, the lowest vibrational frequencies
of the chemical groups correspond to the librations of the methyl groups of the
side chains of the amino acids which form a large band of modes at about ≈8 THz
(240 cm−1 ) [5, 68, 69]. Direct experimental observations of low frequency modes in
proteins (<200 cm−1 ) is hampered by several factors; proximity of the frequencies
with the elastic peak, anharmonicity with lead to asymmetric broadening, damping
of the modes by the hydration layer or by the solvent, the large density of modes and
the absence of symmetry. To the best of our knowledge, the lowest frequency of a
normal mode measured in proteins is about 0.3 cm−1 (10 GHz) and corresponds to
the frequency of a longitudinal acoustical phonon in collagen [5, 70].
The most important source of experimental data about the vibrational spectra of
proteins arises from Inelastic Incoherent Neutron Scattering (IINS) experiments [68].
In IINS, neutrons of thermal energy, i.e., with a typical incoming energy of the order
of 100 meV and a wavelength of 0.1 nm, are diffracted by a protein crystal while
exciting (energy lost) or de-exciting (energy gain) a delocalized vibrational mode of
the crystal (a phonon). Because of the laws of energy and momentum conservation,
the neutrons loose or gain an energy quantum èω and exchange a momentum èQ,
where Q is the wave-vector of the phonon excited or de-excited with a wavelength
λ 2π/ Q and an energy èω. The scattering intensity measured in the detector is
proportional to the so-called incoherent dynamic structure factor Sinc (Q, ω) (where
èω and èQ are the energy transfer and momentum transfer of scattered neutrons).
The function Sinc (Q, ω) is the space-time Fourier transform of the self-correlation
function Gs (r, t) which describes the correlation of the position of an atom at time 0
with its position at time t. Therefore, Sinc (Q, ω) reflects the single particle dynamical
spectra. The number of vibrational modes (phonons) by frequency unit, named the
vibrational density of states (VDOS), can be approximately extracted from Sinc (Q,
ω) at low temperature for small wave-vectors (Q < 10 nm−1 ) [69, 71]. The VDOS
extracted from the IINS function reflects the dynamics of the hydrogen atoms of the
protein. Scattering of neutrons is from the nuclei and all vibrational modes can be
excited/de-excited.
IINS measurements of collagen [69, 72], lysozyme [73–76] and myoglobin [71,
75] crystals at low temperatures revealed one or several peaks between about 600 GHz
(20 cm−1 ) and 1.2 THz (40 cm−1 ) in Sinc (Q, ω). It is worth noting that the positions
of the maxima in the VDOS of proteins do not correspond to the positions of the
peaks in Sinc (Q, ω). In collagen, the IINS VDOS contains only two features at
low frequencies: a broad band with a maximum around 100 cm−1 and a narrower
distribution of modes with a maximum around 250 cm−1 [69]. The low-frequency
516 A. Nicolaï et al.
VDOS of hydrated [71] and dry [77] myoglobin at 100 K extracted from the Sinc (Q,
ω) resembles to the one of collagen with a broad band of modes and a maximum
around 80–100 cm−1 . For proteins, the modes around 20 cm−1 contribute the most to
the dynamic structure factor [69, 71]. Indeed, because of the Bose-Einstein statistics,
the population of these vibrational levels is large.
Low-frequency modes of proteins were also studied by Raman spectroscopy
[78–84]. Raman scattering is an inelastic light scattering process in which incident
photons with energy typically of hν 1 eV (visible light) excite (energy lost) or
de-excite (energy gain) vibrational modes of matter. In addition to scattered light
at the same frequency than the incident light, the energy lost and gain of the pho-
tons appears as scattered light at smaller frequency (named anti-Stoke lines in the
spectra) and larger frequency (named Stoke lines in the spectra), respectively. In
Raman, inelastic scattering is through the electronic density and only vibrational
modes which modify the electronic polarizability of the molecules are probed. Light
scattering probes the vibrational modes at long wavelength and does not require a
protein crystal but can be applied in solution. Because of the lack of symmetry in
proteins, the usual Raman selection rules are broken and most of the normal modes of
the macromolecule should contribute to the Raman intensity. Therefore, the Raman
intensity of the (slow) modes of proteins should be closely related to their weight in
the VDOS [85].
The low-frequency modes of lysozyme were extensively studied by Raman scat-
tering [79, 81–84]. A peak at 29 cm−1 was observed in the Raman spectra of powders
and crystals of α-chymotrypsin in the native state [78]. In the denatured state, this
peak disappeared and was replaced by a broad band between 20 and 150 cm−1 [78]. A
peak in the Raman spectra at frequencies below 30 cm−1 was observed in the native
state of several other proteins (powders and crystals) [80]: bovine serum albumin
(BSA) (14 cm−1 ), thyroglobulin (17 cm−1 ), pepsin and convanavalin A (20 cm−1 ),
insulin and ovalbumin (22 cm−1 ), lysozyme [79] and β-lactoglobulin (25 cm−1 ). For
lysozyme, the peak at 25 cm−1 was observed in protein crystals but not in solu-
tion [79]. The frequency of this peak in the Raman spectra varied with the level of
hydration: from 17 cm−1 for wet lysozyme powders to 27 cm−1 for dried lysozyme
powders [81].
The spectroscopic feature around 20–30 cm−1 observed both in Sinc (Q, ω) by
IINS and in the Raman spectra of proteins is often referred as the “boson peak”
in the literature, by analogy with the boson peak observed in disordered (glass)
materials, see for example Ref. [86]. The interpretation of this peak in glasses [87]
and biopolymers [88, 89] is still debated. In proteins, the boson peak appears at low
temperature (180 K) in protein crystals and persists at high temperature (300 K)
only in dry protein powders. In hydrated powders at high temperature (above the
so-called dynamical transition [90] 200 K) or in solution the hydration water and
side chains of the amino acids diffuse and their contribution to Sinc (Q, ω) overlap the
frequency range of the boson peak. MD simulations of protein powder in realistic
environments compare very well to the IINS data [89]. From these MD simulations,
the boson peak in proteins is believed to arise from (transverse) motions of both the
backbone and of the nonpolar buried side chains and polar hydrated side chains of
Raman and Infrared Spectra of Acoustical, Functional Modes … 517
the amino acids [89]. Beside the 20–30 cm−1 peak, the Raman spectra of lysozyme
crystals shows peaks at 75, 115 and 160 cm−1 [77]. The Raman spectra of dry and
wet lysozyme powders were fitted by using a Brownian oscillator model revealing
four contributions above 30 cm−1 : at 42, 83, 114 and 162 cm−1 in dry lysozyme
which are shifted to 45, 85, 112 and 183 cm−1 in wet lysozyme [79]. Because of the
solvent, one expects the lowest frequency protein modes to be overdamped in general
[91]. In lysozyme, no spectral feature was observed below 75 cm−1 in solution [79].
In solution, the motions of the atoms of the protein are stochastic due to water-
protein collisions. However, in the harmonic approximation of the potential energy
of the protein, the random motions of the atoms are oscillations along the directions
identical to those of the undamped vibrational modes of the molecule.
Infrared spectroscopy, scattering of infrared electromagnetic radiation, provide
information only on normal modes which modify the dipole moment of a molecule.
Infrared, Raman an IINS are hence complementary techniques. Far-infrared absorp-
tion using synchrotron radiation detected absorption at 19 cm−1 in low hydration
lysozyme [92] which could be an undamped vibrational protein mode. At high hydra-
tion, the same technique only showed infrared absorption around 26 and 38 cm−1
which resemble the one of pure water [92]. Applications of new spectroscopic tech-
niques to protein samples, such as Surface Enhanced Raman Spectroscopy [93, 94],
UV resonance Raman [95] and new circular polarization Raman spectroscopies [96]
should provide more accurate vibrational spectra of proteins in near future.
So far, the spectroscopic technique which provides the most detailed information
about low-frequency excitations of proteins (<100 GHz) is the very recent single-
molecule spectroscopy named Extraordinary Acoustic Raman (EAR) spectroscopy
[97]. In this technique, a single protein molecule is trapped in a nanohole and then
excited by two optical lasers of slightly different wavelengths which produce a beat
signal at low-frequency (<100 GHz). The beat signal corresponds to an electro-
magnetic field which can interact with the protein acoustical modes. Vibrational
resonances are then detected by measuring the increase of the molecule fluctuations
when the frequency of the beat field matches the frequency of an acoustical mode.
The mechanism of excitation of the acoustical (Raman) active modes of proteins in
EAR spectroscopy is not fully explained from experiments but is believed to be due
to the modulation of the electrostriction force at the trapping site of the molecule.
Electrostriction is a nonlinear phenomenon in which the strain induced by an elec-
trostatic electric field applied to a dielectric body is proportional to the square of the
applied electric field [98]. At the microscopic level, it is related to the anharmonicity
of the interaction potential between the atoms of a molecule and to the nonlinearity of
its electronic polarisability [99]. Electrostriction is a general nonlinear phenomenon
occurring for all dielectrics to which an electric field is applied.
Three different proteins with different sizes and shapes were tested using EAR
[97] and particularly, conalbumin. In the frequency range between 0 and 2.7 cm−1 ,
a spectral feature made of three peaks around 0.9, 1.5 and 2.5 cm−1 was observed
for this protein. To the best of our knowledge, these are the lowest acoustical modes
of proteins never detected experimentally so far. In addition, different fingerprints
518 A. Nicolaï et al.
were measured for aprotinin, carbonic anhydrase and conalbumin, showing that the
low-frequency spectra depend on the protein size and shape [97].
To illustrate the main vibrational features of a protein in its native state, we performed
all-atom NMA of the model protein conalbumin. Conalbumin or Ovotransferrin was
identified in 1944 [100] and is well known as an iron-transport protein, which can also
bind other metal ions, including toxic ones, and is considered to play an important role
in the transportation of such metal ions. Conalbumin is a ~80 kDa single-chain protein
which is folded into two homologous lobes (N- and C-lobes) with two domains, the
two metal-binding sites being located within the inter-domain clefts of each lobe.
The structure of hydrated conalbumin used to compute its vibrational modes
was taken from the protein data bank (PDB ID: 2D3I) [101]. The normal modes
calculations were performed with the GROMACS package [58, 102] using the TIP3P
water model and the AMBER99sb-ILDN force field [103]. Only the first hydration
water layer of conalbumin was kept corresponding to 1612 water molecules, all
within 3 Å from the protein atoms. More details about the simulations can be found
here [39]. The hydrated structure is represented at Fig. 4 (panel a).
After optimization of the hydrated structure, the normal modes were calculated
and the computed spectra of infrared active modes P(ω) (Eq. 26) and Raman active
modes P (ω) (Eq. 29) were represented and compared below 10 cm−1 in Fig. 4b.
First, an interesting property is that the lowest frequency acoustical modes of conal-
bumin are separated by a gap from the rest of the normal modes. The first three
frequencies computed using all-atom NMA are: ω7 1.9 cm−1 , ω8 2.34 cm−1 ,
ω9 2.86 cm−1 (named modes 1, 2 and 3 in the present work), whereas the fourth one
is characterized by a frequency ω10 4.74 cm−1 (gap of around 2.0 cm−1 ). More-
over, as shown in Fig. 4b, signatures of the low-frequency modes occurred both in the
infrared and Raman spectra of active modes. In Fig. 4c, we compare the spectra of the
lowest frequency-modes (below 3 cm−1 ~ 100 GHz) with EAR experimental data. In
the absorption spectra P(ω), we clearly distinguish three low-frequency acoustical
modes for conalbumin. Similar signatures are found in the spectra of the Raman
active modes, P (ω). There is a striking similarity between the computed spectra of
the acoustical modes of the conalbumin and those measured by EAR except for a
frequency shift of the whole computed vibrational spectrum to higher frequencies
compared to the experimental ones. More precisely, the frequency shift ω between
the computed spectra and the experimental one is around 1.0 cm−1 . This frequency
shift was shown to be dependent on the size of the protein structure [39]. The limits
of the harmonic approximation, as detailed in Sect. 2.7 of the present chapter, might
be the key to understand the frequency shift between theory and experiment. Indeed,
the anharmonicity is not included in the present calculations. Another hypothesis
is the possible softening of the acoustical modes of proteins by the bulk solvent,
which is ignored in the present simulations. For an hydrated biomolecule, there is a
Raman and Infrared Spectra of Acoustical, Functional Modes … 519
strong coupling between the biopolymer and water, which decreases with the par-
ticle size [39]. Indeed, we observe that the hydration water contributes to 40% to
the atomic displacements in the computed acoustical vibrational modes of conalbu-
min (Rg 3.0 nm) whereas the hydration water contributes to 65% to the atomic
displacements in the computed acoustical vibrational modes of aprotinin (Rg 1.1
nm).
The fact that the acoustical modes of conalbumin are infrared and Raman active
modes, as shown by the theoretical spectra in Fig. 4, means that the beat signal used
experimentally to excite the single molecule trapped in the nanohole induces two
things: first, a variation of the molecular dipole moment of the biomolecule due to
the excitation of the infrared modes which is an absorption, and second, a variation
of the real part of the molecular electronic polarizability due to the excitation of the
Raman active modes. Therefore, NMA calculations here are helpful to get insights
into experimental mechanisms of excitation in the EAR spectroscopic technique.
520 A. Nicolaï et al.
Finally, from the theoretical point of view, the corresponding motions of these
active acoustical modes can be depicted in order to understand the mechanisms of
excitation at the atomic level. As shown in Fig. 5a, the global motions described by
the lowest-frequency modes of conalbumin correspond to torsional motions of the N-
and C-lobes along different axes. In order to decipher the origin of the dipole moments
variations due to electric field excitation, we also computed the distribution of the
variation of the molecular dipole moment ρ k (Eq. 27) along the protein sequence.
As shown in Fig. 5b, the largest contribution to ρ k is due almost exclusively to the
positively charged residues, Arginine and Lysine, which are characterized by longer
side-chains than negatively charged residues. A further analysis of the role of theses
modes in the biological function of conalbumin will be presented by the authors
elsewhere.
Fig. 5 a Cartoon representation of the acoustic collective modes extracted from classical spectra
shown in Fig. 4 for modes 1, 2 and 3 of conalbumin. Black arrows represent the direction and the
strength of the atomic displacement vector in the corresponding mode for the Cα atoms. Colored
arrows represent the global motion of each protein in the corresponding mode. Spheres represent the
position of the Cα and the color code corresponds to the strength of the displacement per residue. The
figure was prepared with PyMOL (http://www.pymol.org). b Norm of the dipole moment variation
ρ k along the amino acid sequence of conalbumin for each normal mode k shown in panel (a).
Positively and negatively charged residues are shown with blue and red dots along the sequence,
respectively. Other residues are shown by black dots
the hydrophobic pocket within SBD-β, we named this state the “open” conformation
of the chaperone (as shown in Fig. 6 conformation a), or the lid is closed and the
peptide is trapped into the pocket, we refer this as the “closed” conformation (as
shown in Fig. 6 conformation b) [40, 41]. In ATP-bound Hsp70 (open structural
state), the SBD is opened with fast binding and release of the protein substrate,
and the SBD and NBD are docked, as shown by low-resolution Small-Angle X-ray
Scattering (SAXS) data [115, 116] and suggested by the XRD structure of an ATP-
Hsp110 homologue [112–114]. In the nucleotide-free Hsp70 and in ADP-bound
Hsp70 (closed structural state), the SBD is assumed to be closed with low binding
and release rate of the protein substrate. The SBD and the NBD are undocked and
522 A. Nicolaï et al.
Fig. 6 Hsp70 chaperone cycle. The color code is the same as in Fig. 1. The main states of Hsp70
are named [A] for the open state, [B] for the closed state and [B*] for the intermediate state after
release of ADP. The directions of motion in the lowest frequency acoustic mode found in all-atom
calculations of the normal modes of hHsp70 in the states [A] and [B] are schematically represented
the inter-domain linker is exposed to solvent, as shown by SAXS data [115, 116]
and by the two-domain Hsp70 NMR derived-structure [117].
Although the main steps of the chaperoning cycle of Hsp70s (Fig. 6) are clearly
identified as described above, the details of the conformational changes and the mech-
anism of communication between the NBD and the SBD remain unclear. Numerical
simulations of the Hsp70 cycle could help to understand the mechanism of commu-
nication between the different subdomains of this rather large (10 nm in the closed
state) molecule. On one hand, only coarse-grained models using realistic anharmonic
potentials are able to reach the time-scale of the conformational changes (opening of
the SBD and docking of the NBD onto the SBD) [118]. However, such a simulation
method missed the detailed interactions between the nucleotides and the NBD pocket
as well as the possible role of water. On the other hand, all-atom simulations easily
include these effects but at the expense of large computational times and because of
that, they are still limited to the microsecond time-scale [40, 41] which is far from
the actual time-scale of the conformational changes (millisecond-second).
Another approach consists not to simulate explicitly the transition between the
open and closed states of the Hsp70 chaperone but only its structural fluctuations in
the vicinity of the two main local minima of its free-energy landscape: the nucleotide-
free or ADP-bound Hsp70 (close state) and the ATP-bound Hsp70 (open state). There
Raman and Infrared Spectra of Acoustical, Functional Modes … 523
4.2.1 Methods
As described in details elsewhere [40], the initial models of human Hsp70 (hHsp70)
in an open state and in a closed state were built by homology modeling based on the
templates Hsp110 (PDB ID: 3C7N chain A) and DnaK (PDB ID: 2KHO), respec-
tively. The models were relaxed by using all-atom MD simulations in explicit water
with the GROMACS software package [58, 102] using the Simple Point Charge
(SPC) water model and the GROMOS96 ffG43a1 force field [119, 120]. The two
hydrated structures of hHsp70 used to compute the normal modes were the repre-
sentative structures extracted from the MD run APO1 of the open model and from
the MD run APO1 of the closed model [40]. Only the first hydration water layer of
hHsp70 was kept corresponding to 939 (open) and 915 (closed) water molecules, all
within 3 Å from the protein atoms, as shown in Fig. 7a.
In the all-atom normal mode analysis, the structure of the protein (including the
first hydration layer) is described by the set of point masses M1 , M2 , . . . M N {Mi }
located at R1 , R2 , . . . R N {Ri }, respectively. Each point mass Mi represents an
atomic mass and all degrees of freedom are taken into account explicitly. In this
case, E in the Eq. 16 is simply the all-atom potential energy of the hydrated protein.
The harmonic vibrational modes of the open and closed structures of hHsp70 were
determined using the GROMACS [58, 102] software package and the GROMOS96
ffG43a1 force field [119, 120]. The resulting number of modes is 3N, where N , the
number of atoms is 6265 for the protein plus 2745 (915 water molecules) and 2817
(939 water molecules) atoms for the solvent, which gives 27,030 and 27,246 modes
for the open and closed structures, respectively. The sixth first modes corresponding
to global translation and rotation of the system are not considered here and so the
index of the modes starts from 7.
We applied also ANM to hHsp70 by using Eq. 34 with Rc 11 Å [40] and with A
fitted to the Bi factors (Eq. 24) computed from the all-atom normal mode calculations
for the same systems (Fig. 8). The best values of the force constant A reproducing
the all-atom Bi factors were 4.8 and 4.2 kcal/mol/Å2 for hHsp70 in open and closed
states, respectively. With these force constants, ANM reproduces very nicely the
structural fluctuations along the sequence of hHsp70 with a correlation coefficient ρ
524 A. Nicolaï et al.
Fig. 7 a Cartoon representation of the atomic structures of hydrated hHsp70 protein in the open (left
panel) and closed (right panel) conformations. Water molecules are shown as transparent spheres.
The color code is the following: subdomain IA, blue; IB, marine; IIA, lightblue; IIB, cyan; linker,
magenta; SBD-β, green; SBD-α, red and C-term, gray. These figures were prepared with PyMOL
(https://www.pymol.org). b Density of states D S (ω) of low-frequency vibrations of hHsp70 in the
open (green) and closed (blue) conformations. c Classical absorption spectra P(ω) of hHsp70 for
the open (green) and closed (blue) conformations computed from NMA using the GROMOS96
ffG43a1 force-field. Different damping γ are represented: 0.1 cm−1 (top panel), 1.0 cm−1 (middle
panel) and 10.0 cm−1 (bottom panel)
0.86 (open) and 0.89 (closed) between the Bi factors computed in ANM and those
computed with an all-atom force field, as shown in Fig. 8.
From aa-NMA calculations, there are 24 and 26 non-zero collective modes below
10 cm−1 for the open and closed conformations, respectively (Fig. 7b). The lowest
non-zero frequency mode of hHsp70, i.e. ω7 , occurs at 2.77 cm−1 for the open confor-
mational state and at 1.22 cm−1 ) for the closed conformational state. This difference
Raman and Infrared Spectra of Acoustical, Functional Modes … 525
Fig. 8 B-factors computed in all-atom normal mode calculations (black line) and in the anisotropic
network model (ANM) (red line) for the open state (panel a) and for the closed state (panel b) of
hHsp70 using Eqs. 24 and 15 in the text, respectively. The values of the constants A (Eq. 33) given
in the figure are the best values giving the highest correlation ρ between the two sets of computed
B factors
of 1.5 cm−1 can be explained by the fact that the closed state has a more elongated
structure than the open state, for which the two domains are docked. Consequently,
the closed state of hHsp70 may subtend modes of longer wavelength than the closed
state of hHsp70 and thus of smaller frequency. However, these differences between
the two main conformational states of hHsp70 are not visible in their density of states
D S (ω), as shown in panel b of Fig. 7.
Therefore, at first glance, it seems not possible to identify the two main conforma-
tional states of hHsp70 based on the sole measurement of D S (ω) using for example
inelastic Neutron scattering [68]. However, based on the experimental results of EAR
[97], one may expect the acoustical modes to interact with an electric field. Each
acoustical mode should have a different signature in the EAR or in the far-infrared
spectra of hHsp70 depending on its dipole moment. A variation of the molecu-
lar dipole moment at acoustical frequency is expected as shown for conalbumin in
Sect. 3.2 because hHsp70 has a strong dipolar character, as 81 residues are positively
charged and 92 are negatively charged [121].
We computed the classical infrared (absorption) spectra of hHsp70 from aa-NMA
as done for conalbumin using Eq. 26. On the opposite to conalbumin, there is no EAR
experimental data and we cannot estimate the damping effects. Therefore, we decided
to study the infrared spectra of hHsp70 using different order of magnitude for the
damping factor γ (Eq. 26), from weakly damped (as observed in EAR for conalbumin
[97], i.e. 0.1 cm−1 ) to overdamped modes (as observed in dielectric spectroscopy for
lysozyme [67], i.e. 10 cm−1 ). As explained above for conalbumin, the damping has
a huge impact on both the intensities and the positions of the peaks in the IR spectra.
Figure 7b shows spectra P(ω) of hHsp70 for the open and closed conformational
states as a function of the value of the damping constant γ . First of all, it is clear
from Fig. 7b that the open and closed conformations show different spectra P(ω),
independently of the value of the damping constant γ and also independently of the
force-field used for the calculations, as detailed elsewhere [121]. In fact, the closed
526 A. Nicolaï et al.
conformation shows an intense peak at ω 1.2 cm−1 whereas the same peak is
shifted at ω 3.5 cm−1 for the open conformation. As expected, an increase of the
damping constant γ from 0.1 to 1.0 cm−1 goes together with an increase of the width
of the peaks and with a decrease of the spectral resolution but does not change the
position of the peaks because all acoustical modes have frequencies larger than γ
(regime of damped modes). By increasing the damping constant from 1.0 to 10 cm−1 ,
another phenomenon is observed in Fig. 7b. There is a shift of the most intense peak
of the closed conformation to a lower frequency, namely 0.3 cm−1 , because the most
intense and lowest frequency modes have frequencies ω7 , ω8 and ω9 that are larger
than γ (regime of overdamped modes). Finally, as shown in Fig. 7b, even for γ as large
as 10.0 cm−1 , the two conformational states of the protein could be distinguished.
Note that the exact same conclusions were extracted from the NMA and IR spectra
calculations using the AMBER99sb-ILDN and the CHARMM27 force-field [121].
In the lowest frequency acoustical modes, only some of them are useful to identify the
functionally important motions for the transition between the open and closed state of
the chaperone cycle (Fig. 6). As shown for a few proteins, the collective modes of the
structures in the initial and final states of a conformational change contain information
about the dynamic of the transition [24]. The relevance of the low-frequency modes
for the conformational transition of hHsp70 between its open and closed states can be
quantified by their involvement coefficients (see Sect. 2.6). In brief, a linear pathway
interpolating between the two conformations (open and closed) was built. For each
collective mode k, the projection of the atomic displacements within the mode k
on the interpolated pathway defined the involvement coefficient Ik of the mode (the
maximum value is 1, corresponding to a perfect match between the displacements
of the atoms within the mode and the interpolating pathway, see Eq. 32). The sum of
the square involvement coefficient of each mode up to an index K is the cumulative
involvement coefficient C I K , (see Eq. 33). The coefficients Ik and the cumulative
coefficients C I K for the transition from the open to the closed states (Fig. 9a, b) and
vice versa (Fig. 9c, d) were computed in the coarse-grained approach for the first 100
modes and in the all-atom approach up to 25 cm−1 (corresponding to 250 modes, i.e.
less than 1% of the total number of degrees of freedom).
In ANM, the cumulative involvement coefficient indicates that the first 10 and the
first 100 slow modes of a total of the 1917 modes of nonzero frequency of hHsp70
account for 45% (C I10 0.45) and 69% (C I100 0.69) of the displacement from the
open to the closed state, respectively (Fig. 9a). The same result is observed for the
reverse transition, from the closed state to the open state (C I10 0.60 and C I100
0.73, Fig. 9c). This emphasizes the high contribution of the slowest modes to the tran-
Raman and Infrared Spectra of Acoustical, Functional Modes … 527
Fig. 9 The individual (boxes) and cumulative (full line) involvement coefficients of the modes com-
puted in the coarse-grained (red) and all-atom (black) normal mode calculations for the transition
open → closed (panels a and b) and closed → open (panels c and d) of hHsp70
sition. In addition, the mode contributing the most to this transition (open → closed)
is the mode having the lowest-nonzero “frequency” λ7 , which has an involvement
coefficient I7 0.62 whereas, for the transition closed → open, the mode λ12 (I12
0.57) is the mode which contributes the most to the transition (Fig. 9c).
In the all-atom NMA calculation, the cumulative coefficient C I K for the
open → closed transition (Fig. 9b) and for the closed → open transition (Fig. 9d)
reached about 0.35 at 5 cm−1 and increases linearly at higher frequency to reach
about 0.50 at 25 cm−1 . For the transition open → closed, the lowest nonzero fre-
quency mode ω7 2.78 cm−1 (83.4 GHz) has a large involvement coefficient I7
0.49 (Fig. 9b), whereas, for the transition closed → open, the mode ω11 3.22 cm−1
(96.6 GHz) has the largest contribution (I11 0.39, Fig. 9d). In addition, the modes
ω7 1.22 cm−1 (36.6 GHz) and ω13 4.19 cm−1 (125.7 GHz) have an involvement
coefficient significantly larger than the other modes (Fig. 9d), with respectively I7
0.26, and I13 0.31.
528 A. Nicolaï et al.
First, we compare the collective mode of the open conformation of hHsp70 having the
largest contribution in the coarse-grained and in the all-atom NMA for the transition
from the open to the closed state, i.e. the lowest nonzero frequency modes λ7 and
ω7 (Fig. 9a, b). The global motion described by the lowest-frequency mode is the
same in ANM and in the all-atom calculation, i.e. it corresponds to the closure of the
SBD (Fig. 10a, b). Indeed, in the mode λ7 computed in ANM, the SBD is the most
mobile part, whereas the NBD moves as rigid unit (Fig. 10a). In the global motion
described by the modes λ7 and ω7 , the helix A of the SBD-α serves as a hinge region
around which the SBD-β and the rest of the SBD-α (helices B + C + D) move toward
(Fig. 10a, b). The SBD-β and SBD-α move in opposite directions from each other in
both all-atom and ANM calculations (Fig. 10a, b) [40].
Because the superposition of the modes within the frequency range 0–25 cm−1
(1% of the total number of modes of hHsp70) catches more than 50% of the structural
change (Fig. 9b), we decided to build the ICw mode from the all-atom normal modes
with ω < 25 cm−1 . In all-atom NMA, the mode ICw, describing the best the transition
from the open to the closed state in the harmonic approximation, corresponds to a
global motion of the two parts of the SBD moving in opposite directions and simu-
lating the closure of the SBD (Fig. 11a). In addition, deformations of the subdomains
IB, IIA and IIB of the NBD were observed: they modify the structure of the NBD.
Indeed, the subdomain IB tends to follow the motion of the SBD-β. Considering the
fact that the SBD-β is very close to the subdomain IB of the NBD and the fact that
the SBD-α is bound to the lobe I of the NBD in the open conformation of hHsp70, it
seems logical that a rearrangement of the lobe I of the NBD must be coupled to the
undocking of the SBD from the NBD. In the lobe II of the NBD, the motion in the
subdomain IIA tends to modify the surface binding cleft between the subdomains IA
Raman and Infrared Spectra of Acoustical, Functional Modes … 529
(a) (b)
(c) (d)
Fig. 10 a Graphical representation of the collective mode λ7 of the open state of hHsp70 computed
from ANM. b Graphical representation of the collective mode ω7 of the open state of hHsp70
computed from all-atom method. c Graphical representation of the collective mode λ12 of the
closed state of hHsp70 computed from ANM. d Graphical representation of the collective mode
ω11 of the closed state of hHsp70 computed from all-atom method. Eigenvectors are represented
by gray arrows and black arrows represent the sum of the eigenvectors of the residues belonging
to the same subdomain, i.e. NBD-IA, IB, IIA, IIB, linker, SBD-β, SBD-α and C-terminal. Black
spheres represent the center of mass of each subdomain. The color code is the same as in Fig. 1.
The panels a and b (c and d) correspond to the same view. The figure was prepared with PyMOL (
http://www.pymol.org)
and IIA. It has been demonstrated that this cleft IA/IIA is crucial for conformational
dynamics of Hsp70 [123].
530 A. Nicolaï et al.
Fig. 11 a Representation of the collective mode ICw of hHsp70 for the transition open → closed
computed from all-atom NMA. b Representation of the collective mode ICw of hHsp70 for the
transition closed → open computed from all-atom NMA. The representation properties are the
same as in Fig. 10. Panels a and b were prepared with PyMOL (http://www.pymol.org)
In ANM, as shown in Fig. 10c, the mode contributing the most to the transition
closed → open is the mode λ12 whereas in the all-atom calculation, the mode con-
tributing the most to the transition closed → open is the mode ω11 (Fig. 9d). The global
motion described by the mode λ12 to a compression/elongation which restricts the
mobility of the linker (Fig. 10c). This mode does not correspond to a direct opening
of the lid although there are important fluctuations in the SBD. The motion in the
SBD corresponds to a sliding motion of the lid coupled to a sideward motion of
the C-terminal part (Fig. 10c), the SBD-β tending to perform an upward motion. In
the NBD, the rotation of the subdomains IIA and IIB is observed. In the all-atom
calculation, the global motion described by the mode ω11 corresponds to a compres-
sion/elongation of the structure, as observed in ANM. In addition, the same motion
is observed within the SBD, i.e. a sliding motion of the lid coupled with a side-
ward motion of the C-terminal part with the SBD-β performing an upward motion
(Fig. 10d). In the NBD, we observed a rigid rotation of the complete lobe II of the
NBD. In the all-atom NMA, the fluctuations are less distributed in the whole structure
compared with ANM (Fig. 10c). The all-atom calculations confirm the dynamical
coupling between the lobe II of the NBD and the SBD-α observed in ANM [40].
This provides a natural mechanism of communication between the C-terminal part
of the protein and its N-terminal part which are about 10 nm apart in the closed state.
Raman and Infrared Spectra of Acoustical, Functional Modes … 531
In all-atom normal mode analysis, the mode ICw which describes the best the tran-
sition from the closed to the open state corresponds to a coordinated motion of the
NBD and of the SBD (Fig. 11b). The global motion observed in the ICw mode for
the transition closed → open is due to a large displacement of the SBD-α (Fig. 11b),
which tends to open the SBD (the SBD-β and the SBD-α moves in different direc-
tions, Fig. 11b) and which is coupled to large motions of the lobe II and of the
subdomain IB of the NBD (Fig. 11b). The deformation of the NBD in the ICw mode
corresponds to a rotation of the lobe II of the NBD. The subdomain IA being rather
immobile and the subdomain IB tends to follow the motion of the subdomain IIA
(Fig. 11b). The deformation of the SBD in the ICw mode corresponds to an upward
motion of the SBD-β as well as a sideward motion of the lid (Fig. 11b).
As observed for the transition open → closed, the motions within the NBD are
coupled to the motions of the SBD, establishing a communication channel between
the NBD and the SBD. It is very interesting to observe that the rotation of the
subdomain IIB is coupled to the motion of the SBD as observed in the ANM calcu-
lation [40]. In addition, rotation of the subdomain IIB was shown by NMR to be an
important characteristic of the structural changes induced by a nucleotide-exchange
co-chaperone and by the replacement of ADP by ATP in the NBD structures [113,
124, 125].
Involvement coefficients and the ICw modes permit to define a few collective coor-
dinates describing the structural changes between two states of hHsp70. According
to this objective, the weight attributed to each low-frequency mode corresponds to its
involvement coefficient, in order to construct the best collective mode interpolating
linearly between the geometries of the two states of the molecule [24]. This standard
approach does not consider the statistics, i.e. the probability to find the protein in
a given mode at a given temperature that we explore here. At equilibrium, in the
harmonic approximation, the number of modes excited at a given frequency ω is
given by the Bose-Einstein statistic n B E (ω).
1
nBE ω , (35)
exp kT
−1
5 Concluding Remarks
The low-frequency modes of proteins have been studied since about four decades.
Experimentally, low-frequency vibrational modes of proteins were measured by Neu-
tron scattering, Raman spectroscopy and by Far-Infrared spectroscopy. The first well-
resolved acoustical modes of several proteins at frequencies as low as 0–3.3 cm−1
(0–100 GHz) were detected very recently by using a nanobiosensing device: the
Extraordinary Acoustic Raman (EAR) spectroscopy. In EAR, a single molecule is
Raman and Infrared Spectra of Acoustical, Functional Modes … 533
(a)
(b)
open hHsp70 mode BEw vs. mode ICw closed hHsp70 mode BEwvs. mode ICw
(c) (d)
Fig. 12 Graphical representation of the functional mode BEw of hHsp70 in the open (panel a) and
closed (panel b) states. The sum of the eigenvectors per subdomain of the BEw and ICw mode are
represented by black and gray arrows, respectively. Overlap of the eigenvectors of the BEw and
ICw modes computed for each subdomain for the open (panel c) and for the closed (panel d) states
of hHsp70. The color code is the same as in Fig. 1. Panels a and b were prepared with PyMOL (
http://www.pymol.org)
trapped and excited by a low-frequency electric field. From all-atom normal modes
calculations applied to conalbumin, we demonstrated that detected modes are both
IR and Raman active and we identified the type of motions and the origin of the
mechanisms: they are torsional large-scale vibrational modes producing significant
local variation of the molecular dipole moment due to the motions of the charged
residues, i.e. Arginine and Lysine residues, with the longest side chains.
In human Hsp70, the modes at a frequency below 30 cm−1 contribute the most to
the transition between its two structural states. In fact, only a few modes are enough to
describe the motions of the protein which are the most collinear to a simplified inter-
polated pathway between its two structural states. These findings are in agreement
534 A. Nicolaï et al.
References
1. Benedek, G., Ellis, J., Reichmuth, A., Ruggerone, P., Schief, H., Toennies, J.P.: Organ-pipe
modes of sodium epitaxial multilayers on Cu(001) observed by inelastic helium-atom scat-
tering. Phys. Rev. Lett. 69, 2951–2954 (1992)
2. Senet, P., Lambin, P., Lucas, A.A.: Standing-wave optical phonons confined in ultrathin over-
layers of ionic materials. Phys. Rev. Lett. 74, 570–573 (1995)
3. deGennes, P.G., Papoular, M., Polarisation, matière et rayonnement. In: Volume in honor of
Alfred Kastler, Presse Univ Fr, Paris (1969)
4. Gō, N.: Shape of the conformational energy surface near the global minimum and low-
frequency vibrations in the native conformation of globular proteins. Biopolymers 17,
1373–1379 (1977)
5. Petitcolas, W.L., Dowley, M.W.: Acoustical phonon spectra of biological polymers. Nature
212, 400–401 (1966)
6. Keskin, O., Jernigan, R.L., Bahar, I.: Proteins with similar architecture exhibit similar large-
scale dynamic behavior. Biophys. J. 78, 2093–2106 (2000)
7. Atilgan, A.R., Durell, S.R., Jernigan, R.L., Demirel, M.C., Keskin, O., Bahar, I.: Anisotropy
of fluctuation dynamics of proteins with an elastic network. Biophys. J. 80, 505–515 (2001)
8. Lamb, H.: On the vibration of an elastic sphere. Proc. London Math. Soc. 13, 189–212 (1881)
9. Koizumi, H., Tachibana, M., Kojima, K.: Elastic constants in tetragonal hen egg-white
lysozyme crystals containing large amount of water. Phys. Rev. E 79, 061917 (2009)
10. Bellissent-Funel, M.-C., Teixeira, J., Chen, S.H., Dorner, B., Middendorf, H.D., Crespi, H.L.:
Low-frequency collective mode in dry and hydrated proteins. Biophys. J. 56, 713–716 (1989)
11. Edwards, C., Palmer, S.B., Emsley, P., Helliwell, J.R., Glover, I.D., Harris, G.W., Moss, D.S.:
Thermal motion in protein crystals estimated using laser-generated ultrasound and Young’s
modulus measurements. Acta Cryst. A 46, 315–320 (1990)
12. Tachibana, M., Kojima, K., Ikuyama, R., Kobayashi, Y., Ataka, M.: Sound velocity and
dynamic elastic constants of lysozyme single crystals. Chem. Phys. Lett. 332, 259–264 (2000)
Raman and Infrared Spectra of Acoustical, Functional Modes … 535
13. McCammon, J.A., Gelin, B.R., Karplus, M.: The hinge-bending mode in lysozyme. Nature
262, 325–326 (1976)
14. Gō, N., Noguti, T., Nishikawa, T.: Dynamics of a small globular protein in terms of low-
frequency vibrational modes. Proc. Natl. Acad. Sci. U S A 80, 3696–3700 (1983)
15. Brooks, B., Karplus, M.: Harmonic dynamics of proteins: normal mode and fluctuations in
bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. U S A 80, 6571–6575 (1983)
16. Levitt, M., Sander, C., Stern, P.S.: Protein normal mode dynamics: trypsin inhibitor, crambin,
ribonuclease and lysozyme. J. Mol. Biol. 181, 423–447 (1985)
17. Brooks, B., Karplus, M.: Normal modes for specific motions of macromolecules: application
to hinge-bending mode of lysozyme. Proc. Natl. Acad. Sci. U S A 82, 4995–4999 (1985)
18. Dykeman, E.C., Sankey, O.F.: Normal mode analysis and applications in biological physics.
J. Phys.: Condens. Matter 22, 423202 (2010)
19. Hayward, S., Berendsen, H.J.C.: Systematic analysis of domain motions in proteins from con-
formational change: New results on citratesynthase and T4 lysozyme. Proteins 30, 144–154
(1998)
20. Gerstein, M., Lesk, A.M., Chothia, C.: Structural mechanisms for domain movements in
proteins. Biochemistry 33, 6739–6748 (1994)
21. Gerstein, M., Krebs, W.A.: A database of macromolecular motions. Nucleic Acids Res. 26,
4280–4290 (1998)
22. Gō, M., Gō, N.: Fluctuations of alpha-helix. Biopolymers 15, 1119–1127 (1976)
23. Gō, N., Scheraga, H.A.: Analysis of the contribution of internal vibrations to the statistical
weights of equilibrium conformations of macromolecules. J. Chem. Phys. 51, 4751–4767
(1969)
24. Cui, Q., Li, G., Ma, J., Karplus, M.: A normal mode analysis of structural plasticity in the
biomolecular motor F1-ATPase. J. Mol. Biol. 340, 345–372 (2004)
25. Gaillard, T., Dejaegere, A., Stote, R.H.: Dynamics of beta3 integrin I-like and hybrid domains:
insight from simulations on the mechanism of transition between open and closed forms.
Proteins 76, 977–994 (2009)
26. McCammon, J.A.: Protein dynamics. Rep. Prog. Phys. 47, 1–46 (1984)
27. Bennett, W.S., Huber, R.: Structural and functional aspects of domain motions in proteins.
CRCCR Rev. Bioch. Mol. 15, 291–384 (1984)
28. Karplus, M., Petsko, G.A.: Molecular dynamics simulation in biology. Nature 347, 631–639
(1990)
29. Berendsen, H.J.C., Hayward, S.: Collective protein dynamics in relation to function. Curr.
Opin. Struct. Biol. 10, 165–169 (2000)
30. Tama, F., Sanejouand, Y.H.: Conformational change of proteins arising from normal mode
calculations. Protein Eng. 14, 1–6 (2001)
31. Rod, T.H., Radkiewicz, J.L., Brooks, C.L.: Correlated motion and effect of distal mutations
in dihydrofolate reductase. Proc. Natl. Acad. Sci. U S A 100, 6980–6985 (2003)
32. Tobi, D., Bahar, I.: Structural changes involved in protein binding correlate with intrinsic
motions of proteins in the unbound state. Proc. Natl. Acad. Sci. U S A 102, 18908–18913
(2005)
33. Dobbins, S.E., Lesk, V.I., Sternberg, M.J.E.: Insights into protein flexibility: the relationship
between normal modes and conformational change upon protein-protein docking. Proc. Natl.
Acad. Sci. U S A 105, 10390–10395 (2008)
34. Bakan, A., Bahar, I.: The intrinsic dynamics of enzymes plays a dominant role in determining
the structural changes induces upon inhibitor binding. Proc. Natl. Acad. Sci. U S A 106,
14349–14354 (2009)
35. Benkovic, S.J., Hammes-Schiffer, S.: Enzyme motions inside and out. Science 312, 208–209
(2006)
36. Nashine, V.C., Hammes-Schiffer, S., Benkovic, S.J.: Coupled motions in enzyme catalysis.
Curr. Opin. Chem. Biol. 14, 644–651 (2010)
37. Henzler-Wildman, K., Kern, D.: Dynamic personalities of proteins. Nature 450, 964–971
(2007)
536 A. Nicolaï et al.
38. Zwier, M.C., Chong, L.T.: Reaching biological timescales with all-atom molecular dynamics
simulations. Curr. Opin. Pharm. 10, 745–752 (2010)
39. Nicolaï, A., Delarue, P., Senet, P.: Theoretical insights into sub-terahertz acoustic vibrations
of proteins measured in single molecule experiments. J. Phys. Chem. Lett. 24(7), 5128–5136
(2016)
40. Nicolaï, A., Senet, P., Delarue, P., Ripoll, D.R.: Human inducible Hsp70: structures, dynamics,
and interdomain communication from all-atom molecular dynamics simulations. J. Chem.
Theory Comput. 6, 2501–2519 (2010)
41. Nicolaï, A., Senet, P., Delarue, P.: Conformational dynamics of full-length inducible human
Hsp70 derived from microsecond molecular dynamics simulations in explicit solvent. J.
Biomol. Struct. Dyn. (2012) (in press)
42. Noguti, T., Gō, N.: Structural basis of hierarchical multiple substrates of a protein. IV: rear-
rangements in atom packing and local determinations. Proteins 5, 125–131 (1989)
43. Hayward, S., Kitao, A., Gō, N.: Harmonicity and anharmonicity in protein dynamics: a normal
mode analysis and principal component analysis. Proteins 23, 177–186 (1995)
44. Ma, J., Karplus, M.: Ligand-induced conformational changes in ras p21, a normal mode and
energy minimization analysis. J. Mol. Biol. 274, 114–131 (1997)
45. Ma, J., Karplus, M.: The allosteric mechanism of the chaperone GroEL: a dynamic analysis.
Proc. Natl. Acad. Sci. U S A 95, 8502–8507 (1998)
46. Gaillard, T., Martin, E., San Sebastian, E., Cossio, F.P., Lopez, X., Dejaegere, A., Stote, R.H.:
Comparative normal mode analysis of LFA-1 integrin I-domains. J. Mol. Biol. 374, 231–249
(2007)
47. Houdusse, A., Karplus, M., Cecchini, M.: Allosteric communication in myosin V: from small
conformational changes to large directed movements. PLoS Comput. Biol. 4(8), e1000129
(2008)
48. Durand, P., Trinquier, G., Sanejouand, Y.: New approach for determining low-frequency
normal modes in macromolecules. Biopolymers 34, 759–771 (1994)
49. Tama, F., Gadea, F.X., Marques, O., Sanejouand, Y.H.: Building-block approach for deter-
mining low-frequency normal modes of macromolecules. Proteins 41, 1–7 (2000)
50. Tirion, M.M.: Low-amplitude elastic motions in proteins from a single-parameter atomic
analysis. Phys. Rev. Lett. 77, 1905–1908 (1996)
51. Hinsen, K.: Analysis of domain motions by approximate normal mode calculations. Proteins
33, 417–429 (1998)
52. Bahar, I., Atilgan, A.R., Erman, B.: Direct evaluation of thermal fluctuations in proteins using
a single-parameter harmonic potential. Fold Des. 2, 173–181 (1997)
53. Navizet, I., Lavery, R., Jernigan, R.L.: Myosin flexibility: structural domains and collective
vibrations. Proteins 54, 384–393 (2004)
54. Bahar, I., Rader, A.J.: Coarse-grained normal mode analysis in structural biology. Curr. Opin.
Struct. Biol. 15, 586–592 (2005)
55. Yang, L., Song, G., Jernigan, R.L.: How well we can understand large-scale protein motions
using normal modes of elastic network model. Biophys. J. 83, 1620–1630 (2007)
56. Ferraro, J.R.: Introductory Raman Spectroscopy, 2nd edn. Academic Press, Boston, Amster-
dam (2002)
57. Krishtal, A., Senet, P., Van Alsenoy, C.: Local softness, softness dipole, and polariz- abilities
of functional groups: application to the side chains of the 20 amino acids. J. Chem. Phys. 131,
044312 (2009)
58. Kutzner, C., Van der Spoel, D., Lindahl, E., Hess B.: GROMACS 4: algorithms for highly
efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 4,
435–447 (2008)
59. Born, M., Huang, K.: Dynamical theory of crystal lattice. In: Texts in the Physical Sciences.
Oxford Classic (1998)
60. Frauenfelder, H.F., Parak, F., Young, R.D.: Conformational substates in proteins. Ann. Rev.
Biophys. Chem. 17, 451–479 (1988)
Raman and Infrared Spectra of Acoustical, Functional Modes … 537
61. Senet, P., Maisuradze, G.G., Foulie, C., Delarue, P., Scheraga, H.A.: How main-chain of
proteins explore the free-energy landscape in native states. Proc. Natl. Acad. Sci. U S A 105,
19708–19713 (2008)
62. Cote, Y., Senet, P., Delarue, P., Maisuradze, G.G., Scheraga, H.A.: Anomalous diffusion and
dynamical correlation between the side chains and the main chain of proteins in their native
states. Proc. Natl. Acad. Sci. U S A 109, 10346–10351 (2012)
63. Cote, Y., Senet, P., Delarue, P., Maisuradze, G.G., Scheraga, H.A.: Nonexponential decay of
internal rotation correlation functions of native proteins and self-similar structural fluctuations.
Proc. Natl. Acad. Sci. U S A 107, 19844–19849 (2010)
64. Kitao, A., Hayward, S., Go, N.: Energy-landscape of a native protein: jumping-among-minima
model. Proteins 33, 496–517 (1998)
65. Wales, D.: Energy Landscapes. Cambridge University Press, Cambridge (2003)
66. Kitao, A., Go, N.: Investigating protein dynamics in collective coordinate space. Curr. Opin.
Struct. Biol. 9, 164–169 (1999)
67. Vinh, N.Q., Allen, S.J., Plaxco, K.W.: DIelectric spectroscopy of proteins as a quantitative
experimental test of computational models of their low-frequency harmonic motions. J. Am.
Chem. Soc. 133, 8942–8947 (2011)
68. Middendorf, H.D.: Biophysical applications of quasi-elastic and inelastic neutron scattering.
Ann. Rev. Biophys. Bioeng. 13, 425–451 (1984)
69. Middendorf, H.D., Hayward, R.L., Parker, S.F., Bradshaw, J., Miller, A.: Vibrational neutron
spectroscopy of collagen and model polypeptides. Biophys. J. 69, 660–673 (1995)
70. Harney, T., James, D., Miller, A., White, J.W.: Phonons and the elastic moduli of collagen
and muscle. Nature 267, 285–287 (1977)
71. Cusak, S., Doster, W.: Temperature dependence of the low-frequency dynamics of myoglobin.
Measurement of the vibrational frequency distribution by inelastic neutron scattering. Bio-
phys. J. 58, 243–251 (1990)
72. Berney, C.V., Renugopalakrishnan, V., Bhatnagar, R.S.: Collagen. An inelastic neutron-
scattering study of low-frequency vibrational modes. Biophys. J. 52, 343–345 (1987)
73. Bartunik, H.D.: Intramolecular low-frequency vibrations in lysozyme by neutron time-of-
flight spectroscopy. Biopolymers 21, 43–50 (1982)
74. Middendorf, H.D.: Neutron studies of the dynamics of globular proteins. Phys. B 182, 415–420
(1992)
75. Diehl, M., Doster, W., Petry, W., Schober, H.: Water-coupled low-frequency modes of myo-
globin and lysozyme observed by inelastic neutron scattering. Biophys. J. 73, 2726–2732
(1997)
76. Lushnikov, S.G., Svaindze, A.V., Sashin, I.L.: Vibrational density of states of hen egg white
lysozyme. JETP Lett. 82, 31–35 (2005)
77. Paciaroni, A., Orecchini, A., Haertlein, M., Moulin, M., Conti Nibali, V., De Francesco, A.,
Petrillo, C., Sacchetti, F.: Vibrational collective dynamics of dry proteins in the terahertz
region. J. Phys. Chem. B 116, 3861–3865 (2012)
78. Brown, K.G., Erfurth, S.C., Small, E.W., Petitcolas, W.L.: Conformationally dependent low-
frequency motions of proteins by laser Raman spectroscopy. Proc. Natl. Acad. Sci. U S A 69,
1467–1469 (1972)
79. Genzel, L., Keilmann, F., Martin, T.P., Winterling, G., Yacoby, Y., Fröhlich, H., Makinen,
M.W.: Low-frequency Raman spectra of lysozyme. Biopolymers 15, 219–225 (1976)
80. Painter, P.C., Mosher, L.E., Rhoads, C.: Low-frequency modes in Raman spectra of proteins.
Biolpolymers 21, 1469–1472 (1982)
81. Urabe, H., Sugawara, Y., Ataka, M., Rupprecht, A.: Low-frequency Raman spectra of
lysozyme crystals and oriented DNA films: dynamics of crystal water. Biophys. J. 74,
1533–1540 (1998)
82. Hédoux, A., Ionov, R., Willart, J.F., Lerbret, A., Affouard, F., Guinet, Y., Descamps, M.,
Prévost, D., Paccou, L., Danéde, F.: Evidence of a two-stage thermal denaturation process in
lysozyme: a Raman scaterring and differential scanning calorimetry investigation. J. Chem.
Phys. 124, 014703 (2006)
538 A. Nicolaï et al.
83. Crupi, C., D’Angelo, G., Wanderlingh, U., Vasi, C.: Raman spectroscopic and low-temperature
calorimetric investigation of the low-energy vibrational dynamics of hen egg-lysozyme. Phi-
los. Mag. 91, 1956–1965 (2011)
84. Sassi, P., Perticaroli, S., Comez, L., Lupi, L., Paolantoni, M., Fioretto, D., Morresi, A.:
Reversible and irreversible denaturation processes in globular proteins: from collective to
molecular spectroscopic analysis. J. Raman Spectrosc. 43, 273–279 (2012)
85. Shuker, R., Gamon, R.W.: Raman-scattering selection rule breaking and the density of states
in amorphous materials. Phys. Rev. Lett. 25, 222–225 (1970)
86. Zorn, R.: The boson peak demystified? Physics 4, 44 (2011)
87. Chumakov, A.I., Monaco, G., Crichton, W.A., Bosak, A., Rüffer, R., Meyer, A., Kargl, F.,
Comez, L., Fioretto, D., Giefers, H., Roitsch, S., Wortmann, G., Manghnani, M.H., Hushur,
A., Williams, Q., Balogh, J., Parliński, K., Jochym, P., Piekarz, P.: Equivalence of the boson
peak in glasses to the transverse acoustic van Hove singularity in crystals. Phys. Rev. Lett.
106, 225501 (2011)
88. Leyser, H., Doster, W., Diehl, M.: Far-infrared emission by boson peak vibrations in a globular
protein. Phys. Rev. Lett. 82, 2987–2989 (1999)
89. Tarek, M., Tobias, D.J.: Effects of solvent packing on side chain and backbone contributions
to the protein boson peak. J. Chem. Phys. 115, 1607–1612 (2001)
90. Doster, W., Cusak, S., Petry, W.: Dynamical transition of myoglobin revealed by inelastic
neutron scattering. Nature 337, 754–756 (1989)
91. McCammon, J.A., Karplus, M., Gelin, B.R.: Dynamics of folded proteins. Nature 267,
585–590 (1977)
92. Moeller, K.D., Williams, G.P., Steinhauser, S., Hirschmugl, C., Smith, J.C.: Hydration-
dependent far-infrared absorption in lysozyme detected using synchrotron radiation. Biophys.
J. 61, 276–280 (1992)
93. Das, G.: Principal component analysis based methodology to distinguish protein SERS spec-
tra. J. Mol. Struct. 993, 500–505 (2011)
94. De Angelis, F., Gentile, F., Mecarini, F., Das, G., Moretti, M., Candeloro, P., Coluccio, M.L.,
Cojoc, G., Accardo, A., Liberale, C., Zaccaria, R.P., Perozziello, G., Tirinato, L., Toma, A.,
Cuda, G., Cingolani, R., Di Fabrizio, E.: Breaking the diffusion limit with super-hydrophobic
delivery of molecules to plasmonic nanofocusing SERS structures. Nat. Photonics 5, 682
(2012)
95. Oladepo, S.A., Xiong, K., Hong, Z.M., Asher, S.A., Handen, J., Lednev, I.K.: UV resonance
Raman investigations of peptide and protein structure dynamics. Chem. Rev. 112, 2604–2628
(2012)
96. Li, H., Nafie, L.A.: Simultaneous acquisition of all four forms of circular polarization Raman
optical activity: results for α-pinene and lysozyme. J. Raman Spectrosc. 43, 89–94 (2012)
97. Wheaton, S., Gelfand, R.M., Gordon, R.: Probing the Raman-active acoustic vibrations of
nanoparticles with extraordinary spectral resolution. Nat. Photonics 9, 68–72 (2015)
98. Li, F., Jin, L., Xu, Z., Zhang, S.: Electrostrictive effect in ferroelectrics: an alternative approach
to improve piezoelectricity. Appl. Phys. Rev. 1, 011103 (2014)
99. Achar, B.N.N., Barsch, G.R., Cross, L.E.: Static shell model calculation of electrostriction
and third order elastic coefficients of perovskite oxides. Ferroelectrics 37, 495–498 (1981)
100. Schade, A.L., Caroline, L.: Raw hen egg white and the role of iron in growth inhibition of
shigella dysenteriae, staphylococcus aureus, escherichia coli, and saccharomyces cerevisiae.
Science 100, 14–15 (1944)
101. Mizutani, K., Mikami, B., Aibara, S., Hirose, M.: Structure of aluminium-bound ovotransfer-
rin at 2.15 angstroms resolution. Acta Crystallogr. D 61, 1636–1642 (2005)
102. Lindahl, E., Hess, B., van der Spoel, D.: Gromacs 3.0: a package fro molecular simulation
and trajectory analysis. J. Mol. Mod. 7, 306–317 (2001)
103. Lindorff-Larsen, K., Piana, S., Palmo, K., Maragakis, P., Klepeis, J.L., Dror, R.O., Shaw, D.E.:
Improved side-chain torsion potentials for the amber Ff99SB protein force field. Proteins 78,
1950–1958 (2010)
Raman and Infrared Spectra of Acoustical, Functional Modes … 539
104. Hartl, F.U., Hayer-Hartl, M.: Molecular chaperones in the cytosol: from nascent chain to
folded protein. Science 295, 1852–1858 (2002)
105. Bukau, B., Deuerling, E., Pfund, C., Craig, E.A.: Getting newly synthesized proteins into
shape. Cell 101, 119–122 (2000)
106. Young, J.C., Agashe, V.R., Siegers, K., Hartl, F.U.: Pathways of chaperone-mediated protein
folding in the cytosol. Nat. Rev. Mol. Cell Biol. 5, 781–791 (2004)
107. Saibil, H.R.: Chaperones machines in action. Curr. Opin. Struct. Biol. 18, 35–42 (2008)
108. Selkoe, D.J.: Folding proteins in fatal ways. Nature 426, 900–904 (2003)
109. Garrido, C., Brunet, M., Didelot, C., Zermati, Y., Schmitt, E., Kroemer, G.: Heat shock proteins
27 and 70: anti-apoptic proteins with tumorigenic properties. Cell Cycle 5, 2592–2601 (2006)
110. Buchburger, A., Theyssen, H., Schröder, H., McCarty, J.S., Virgallita, G., Milkereit, P., Rein-
stein, J., Bukau, B.: Nucleotide-induced conformational changes in the ATPase and substrate
binding domains of the DnaK chaperone provide evidence for interdomain communication.
J. Biol. Chem. 270, 16903–16910 (1995)
111. Brehmer, D., Rudiger, S., Gassler, C.S., Klostermeier, D., Packschies, L., Reinstein, J., Mayer,
M.P., Bukau, B.: Tuning of chaperone activity of Hsp70 proteins by modulation of nucleotide
exchange. Nature 8, 427–432 (2001)
112. Liu, Q., Hendrickson, W.A.: Insights into Hsp70 chaperone activity from a crystal structure
of the yeast Hsp110 Sse1. Cell 131, 106–120 (2007)
113. Polier, S., Dragovic, Z., Hartl, F.U., Bracher, A.: Structural basis for the cooperation of Hsp70
and Hsp110 chaperones in protein folding. Cell 131, 106–120 (2008)
114. Schuermann, P.J., Jiang, J.W., Cuellar, J., Llorca, O., Wang, L.P., Gimenez, L.E., Jin, S.P.,
Taylor, A.B., Demeler, B., Morano, K.A., Hartl, P.J., Valpuesta, J.M., Lafer, E.M., Sousa, R.:
Structure of the Hsp110: Hsc70 nucleotide exchange machine. Mol. Cell 31, 232–243 (2008)
115. Wilbanks, S.M., Chen, L., Tsuruta, H., Hodgson, K.O., McKay, D.B.: Solution small-angle
X-ray scattering study of the molecular chaperone Hsc70 and its subfragments. Biochem 34,
12095–12106 (1995)
116. Shi, L., Kataka, M., Fink, A.L.: Conformational characterization of DnaK and its complexes
by small-angle X-ray scattering. Biochem 35, 3297–3308 (1996)
117. Bertelsen, E.B., Chang, L., Gestwicki, J.E., Zuiderweg, E.R.P.: Solution conformation of
wild-type E. coli Hsp70 (DnaK) chaperone complexed with ADP and substrate. Proc. Natl.
Acad. Sci. U S A 106, 8471–8476 (2009)
118. Golas, E., Maisuradze, G.G., Senet, P., Oldziej, S., Czaplewski, C., Scheraga, H.A., Liwo,
A.: Simulation of the opening and closing of Hsp70 chaperones by coarse-grained molecular
dynamics. J. Chem. Theory Comput. 8, 1750–1764 (2012)
119. Berendsen, H.J.C., Postma, J.P.M., van Gunsteren, W.F., Hermans, J.: Interaction models for
water in relation to protein hydration. In: Pullman, B. (ed.), pp. 331–338. D. Reidel
120. Scott, W.R.P., Hünenberger, P.H., Tironi, I.G., Mark, A.E., Billeter, S.R., Fennen, J., Torda,
A.E., Huber, T., Krüger, P., van Gunsteren, W.F.: The GROMOS biomolecular simulation
program package. J. Phys. Chem. A 103, 3596–3607 (1999)
121. Nicolaï, A., Barakat, F., Delarue, P., Senet, P.: Fingerprints of conformational states of human
Hsp70 at sub-THz frequencies. ACS Omega 6(1), 1067–1074 (2016)
122. Cecchini, M., Houdusse, A., Karplus, M.: Allosteric communication in myosin V: from small
conformational changes to large directed movements. PLoS Comput. Biol. 4, e1000129 (2008)
123. Swain, J.F., Dinler, G., Sivendran, R., Montgomery, D.L., Stotz, M., Gierasch, L.M.: Hsp70
chaperone ligands control domain association via an allosteric mechanism mediated by the
interdomain linker. Mol. Cell 26, 27–39 (2007)
124. Bhattacharya, A., Kurochkin, A.V., Yip, G.N.B., Zhang, Y., Bertelsen, E.B., Zuiderweg,
E.R.P.: Allostery in Hsp70 chaperones is transduced by subdomain rotations. J. Mol. Biol.
388, 475–490 (2009)
125. Zhuravleva, A., Gierasch, L.M.: Allosteric signal transmission in the nucleotide-binding
domain of 70-kDa heat shock protein (Hsp70) molecular chaperones. Proc. Natl. Acad. Sci.
U S A 108, 6987–6992 (2011)
Explicit-Solvent All-Atom Molecular
Dynamics of Peptide Aggregation
1 Introduction
Proteins are biomolecules that play key roles in every cell of the human body. The
biological functions of proteins include catalyzing chemical reactions, muscle con-
traction (titin), providing structural support, transport of ions (hemoglobin), trans-
mission of information between specific cells and organs (hormones), activity in the
immune system (antibodies), passage of molecules across cell membranes, etc. The
long process of biological evolution has designed proteins in such a way that under
normal physiological conditions (pH ≈ 7, T ~300 K, atmospheric pressure) most
of them (except intrinsically disordered ones) fold into unique three-dimensional
structures. Only in these native folded structures can proteins be stable and biolog-
ically active. Proteins unfold to more extended conformations if the conditions are
changed or upon application of mechanical force or denaturant agents, such as urea
or guanidinium chloride. If the physiological conditions are restored, most proteins
refold spontaneously to their native states [2].
Protein folding and unfolding processes are of utmost importance for control-
ling biological activity and targeting proteins to their cellular destinations. However,
when things go wrong along those processes, proteins might misfold and escape
the sophisticated system of cellular quality control. Fibril formation resulting from
protein misfolding and aggregation is a hallmark of several protein conformational
disorders (also known as protein misfolding diseases) including Alzheimer’s, Hunt-
ington’s and Parkinson’s diseases [61, 77]. Apart from those widely-known dis-
eases, recent experimental and theoretical evidence has shown that preeclampsia, a
pregnancy-specific disorder, shares pathophysiological features with recognized pro-
teinopathies [15, 42, 87]; the cause of amyotrophic lateral sclerosis (ALS) is related
to the misfolding and aggregation of superoxide dismutase 1 (SOD1) protein [11,
70].
The growing awareness that protein aggregation is linked to a number of protein
conformational disorders attracts much attention of researchers. It is due to the enor-
mous medical importance of aggregation phenomena. Better understanding of protein
aggregation processes offers an opportunity to develop medical tools to alleviate the
suffering of millions of individuals with aggregation-related diseases [25]. Despite
some progress in understanding this complicated process at the basic level, many
important questions are yet to be answered. What are the general factors governing
fibril formation rates? How does the presence of small peptides disrupt the aggre-
gation pathway of proteins which are hallmarks of neurodegenerative diseases? Can
the development of effective inhibitors be facilitated? To address these questions,
we employ classical molecular dynamics simulations, shown to be of paramount
importance to our understanding of the structure-dynamics-function relationship in
biomolecules and molecular complexes.
Molecular Dynamics (MD) simulations are a computational method for studying
atoms and molecules that move according to the laws of classical mechanics. The
energy of interactions between atoms can be modeled with a variety of empirical
force-fields, which typically include bonded and non-bonded terms. The non-bonded
Explicit-Solvent All-Atom Molecular Dynamics … 543
Although force-fields differ in their parameters, the general functional form of any
force field consists of two terms
where E bonded describes bonded interactions that act only within molecules and
E non - bonded involves non-bonded interactions between and within molecules. The
bonded potential term includes 2-, 3- and 4-body interactions of covalently bonded
atoms, while the non-bonded potential term involves Lennard-Jones and Coulomb
interactions. For a more detailed description of all-atom force-field functional forms,
the book of Frenkel and Smit [26] is a good reference.
different molecules. Van der Waals interactions involving hydrogen atoms are not
calculated.
Once the energy function is chosen and the model system is built, the next step is
computation of forces exerted on atoms. The force acting on each atom is calculated
as a negative derivative of potential energy with respect to the atom coordinates. Once
the forces exerted on the atoms are obtained, positions and velocities of each atom
are updated according to the classical Newton’s law of motion. To avoid numerically
unstable results, the equations of motion are integrated with a time step, which
is limited by the fastest movements in the molecule. The small time step of 1 or
2 fs typically used for explicit solvent all-atom simulations constitutes the main
bottleneck in the practical applications of MD simulations. To reach experimentally
relevant timescales even for proteins that fold fast (microseconds to milliseconds),
iterations in the MD algorithm have to be repeated 109 –1012 times and thus pose
a significant challenge for explicit solvent atomistic simulations. Coarse-grained
models, which speed up the computation at the cost of structural accuracy, achieve
millisecond simulations and beyond [40].
Fig. 1 Outline of the periodic boundary condition in 2D. The red cell is surrounded by its replicas
to fill the space. If a particle leaves the red box from one side, it re-enters the same box from the
opposite side, so the total number of particles in the cell remains unchanged. The minimum image
convention implies that a particle interacts with the closest image of the remaining particles in the
system. An example of the smallest value of the relative distance between two particles is shown
by a black arrow
facts are too large to be ignored and underlying dynamics of the process of interest
becomes physically meaningless. Making the simulation box smaller to reduce the
computational costs might come at a high price of wasting resources and rerunning
the simulations.
The remaining specific MD details are as follows. Electrostatic interactions were
computed using the particle mesh Ewald method [23]. The non-bonded interaction
pair-list was updated every 10 fs using a cutoff of 1.5 nm. All covalent bonds were
constrained by the LINCS algorithm [33] with a relative tolerance of 10−4 . Initial
velocities of the atoms were generated from the Maxwell distribution at 300 K.
Temperature of 300 K was controlled using a v-rescale thermostat [16]. The equations
of motion were integrated using a leap-frog algorithm with a time step of 2 fs.
however, the structure of protein aggregates has not yet been resolved. Our recent
computational analysis by publicly available algorithms [18, 28, 44, 81] of the col-
lection of aggregated proteins and peptides extracted from urine of pregnant women
diagnosed with preeclampsia (referred to by our group as the preeclampsia mis-
foldome) [14, 37] predicted the short FVFLM peptide of SERPINA1 protein as
highly amyloidogenic (Kouza et al. unpublished data). The small size of KLVFF and
FVFLM peptide, or of similar peptides found in vivo in states of disturbed proteosta-
sis, makes them good candidates for studying early stages of the aggregation process
by explicit-solvent all-atom MD simulations.
Oligomerization time correlates with the population of fibril-prone conforma-
tions in the monomeric state
We started our investigation of FVFLM and KLVFF fibrillation capacity by per-
forming MD simulation of monomers. Recent theoretical studies have shown that
oligomer formation times are strongly correlated with the population of the fibril-
prone conformation in the monomeric state [51, 64]. The population of fibril-prone
N ∗ conformations in a monomeric state is defined:
E
PN ∗ − /Z (2)
kB T
where Z is the partition function and E is the barrier separating native and N ∗
states. The more populated is the N ∗ state the larger is its propensity for aggregation.
For this reason the population of fibril-prone conformations in the monomeric state is
an important factor governing fibril formation rates and it can be used as a measure
of aggregation propensity. With an increasing number of a peptide’s fibril-prone
conformations, fibril formation time decreases.
Figure 2 presents the end-to-end distance as a function of time for FVFLM and
KLVFF monomers. Using the criterion for the fibril-prone conformation, we found
that the population of peptides in the fibril-prone state was ~21% and ~13% for
FVFLM and KLVFF monomers, respectively. Such a significant difference in pop-
ulations of fibril-prone conformations implies that the propensity of FVFLM for
self-assembly is higher than for KLVFF and reflects the difference in fibril formation
times between these peptides.
Subsequently, we compared the stability of FVFLM and KLVFF peptides by gen-
erating the free energy landscapes of the systems as a function of end-to-end distance
(R) and radius of gyration (Rg ), as shown in Fig. 3. The free-energy landscape pro-
file of KLVFF shows three minima, while for FVFLM it is less complex with one
broad minimum. The typical snapshots of representative conformations for local
minima are presented in Fig. 3c. In contrast to FVFLM where mainly pre-extended
and extended configurations are populated, our results indicate that conformations
for KLVFF are more complex and diverse. Remarkably, the compact conformations
with small values of the end-to-end distance in a range of 0.4–0.8 nm were observed
for KLVFF, but not for FVFLM (Fig. 3c). The barrier-free downhill nature of the free
energy profile of FVFLM implies that fibril-prone conformations are much easily
Explicit-Solvent All-Atom Molecular Dynamics … 549
Fig. 2 Time dependence of the end-to-end distance renormalized by Rmax for FVFLM and KLVFF
monomers. Results are averaged in a 40 ps window. Rmax 1.426 nm is the maximum end-to-
end distance obtained in simulations. The green and yellow lines refer to R/Rmax 0.9 and 0.8,
respectively. Reproduced from Ref. [42] with permission from the PCCP Owner Societies
accessible compared to those for KLVFF. This result explains why oligomer forma-
tion time for FVFLM is much shorter than for KLVFF. The time required to form
(F V F L M) (F V F L M)
the FVFLM dimer and trimer is τdimer ≈ 17 ns and τtrimer ≈ 46 ns, which
(K L V F F) (K L V F F)
is shorter than that of KLVFF, τdimer ≈ 23 ns and τtrimer ≈ 100 ns. Thus the
more accessible fibril-prone conformations in the monomeric peptide form are, the
faster its oligomer formation becomes. This result, which implies that the popula-
tion of fibril-prone conformations of monomers can be used to accurately predict its
self-assembly rates into higher ordered structures, is of paramount importance as it
opens up new routes to understanding the aggregation process at a single-monomer
level.
Short peptides as inhibitors of fibril formation
One of the principal goals in treating neurodegenerative diseases is to devise strategies
to inhibit fibril formation [27, 99]. One of the possible ways for the prevention and
treatment of Alzheimer’s disease is to design and use molecular inhibitors that inhibit
β-secretase and γ-secretase responsible for production of beta amyloid [31]. Another
powerful strategy is to prevent the aggregation of Aβ proteins by the presence of
short peptides which may occupy the self-recognition site of the parent proteins
thereby obstructing the aggregation process [19, 85]. On one hand, short peptides
form amyloid fibrils similar to their protein precursors. On the other hand, a mixture
of short peptides and their parent proteins can block the binding sites responsible for
amyloid aggregation and thus prevent aggregation. This strategy seems to be very
promising due to its potential use as protection against aggregation [85]. Several
previous reports identified short peptides including KLVFF and LPFDD that can
disrupt the fibrillation of full-length beta amyloid protein [19, 30, 85, 88].
As FVFLM peptides have been shown to form dimers and trimers faster than
KLVFF, an intriguing question that arises is whether FVFLM can bind more effec-
tively to beta amyloid protein than KLVFF does. Or in other words, could we propose
even more effective inhibitors compared to the known KLVFF or LPFDD?
550 M. Kouza et al.
Fig. 3 Free energy landscape for monomer KLVFF (a) and FVFLM (b) as a function of radius
of gyration and end-to-end distance. Surfaces are shown with contour lines indicating the relative
0.75 kB T slope of the surface. c Typical snapshots for local minima are marked by 1, 2, 3 and 4.
Reproduced from Ref. [42] with permission from the PCCP Owner Societies
To address this question, we studied the influence of the FVFLM peptide on the
kinetics of Aβ16–20 oligomerization using all-atom simulation. The initial configura-
tion of the system of two KLVFF peptides and one FVFLM was created by randomly
placing these peptides in a periodic box far enough that no peptide-peptide interac-
tions were present. Starting from this configuration we carried out eight independent
simulations and monitored the kinetics of dimerization and trimerization.
In Fig. 4 we show the dependence of the number of hydrogen bonds between
monomers. We defined the dimer as formed when three or more backbone hydrogen
bonds are made between monomers. Using this criterion, we found a significantly
higher probability of dimer formation between KLVFF and FVFLM peptides than
between KLVFF and itself. From the data in Fig. 4, we found that the kinetics of
FVFLM binding to KLVFF was faster compared to KLVFF binding to itself. This
suggests unambiguously the FVFLM capability of binding the β-amyloid aggregation
hot-spot (KLVFF). The peptides incorporating the KLVFF sequence have been shown
to bind full-length β-amyloid and block the KLVFF sequence in β-amyloid, which is
critical for amyloid aggregation [19, 74, 84, 85]. Our results suggest that FVFLM can
be used as a recognition sequence to interact not only with SERPINA1, the parent
protein of peptides aggregated in preeclampsia, but also with the KLVFF sequence
in β-amyloid. Interestingly, both β-amyloid and SERPINA1 immunoreactivity were
detected in the aggregates found in the urine of women with preeclampsia [15]. Based
Explicit-Solvent All-Atom Molecular Dynamics … 551
Fig. 4 Time dependence of the number of backbone hydrogen bonds between monomers. The
green curve represents hydrogen bonds between Aβ16–20 peptides (KLVFF), while the blue and
magenta curves show those between FVFLM and one of Aβ16–20 peptides (KLVFF). Snapshots
showing FVFLM and Aβ16–20 peptides are in blue and red colors, respectively. Reproduced from
Ref. [42] with permission from the PCCP Owner Societies
on these results, we suggest that FVFLM-like peptides could be used for the efficient
inhibition of β-amyloid (or other pro-amyloidogenic proteins) oligomerization and
aggregation.
The effects of mutations in fibril formation
The amino acid sequence determines protein propensity for folding and aggregation.
The role of sequence in aggregation may be better understood by studying mutations
which can alter aggregation pathways, rates and structure [21]. Bhavaraju and Hans-
mann [9] compared the stability of wild type and four mutants (R61N, G68D, A84T
and D82I) of the immunoglobulin light-chain protein. It was shown that amyloid for-
mation is triggered by the dissociation of dimers and transition of monomers from
their native state into fibril-prone states. Dimer stabilization by binding to dimer inter-
face or stabilization of monomer’s ground state have been suggested as the strategies
for the drug design targeting light-chain associated systematic amyloidosis [9].
Another important example is the β-amyloid protein. The region involving
residues from 16 to 23 in β-amyloid has been shown to play a crucial role in its
fibril formation. Numerous experimental and computational studies have been per-
formed for various mutations such as the Flemish (A21G), Arctic (E22G), Dutch
(E22Q), Italian (E22K), Iowa (D23 N) and Osaka (E22) variants among many
others [7, 22, 86]. Little attention has been focused on C- and N-terminal residues.
However, recent experiments demonstrated that mutations in those regions influ-
ence the kinetics of fibril formation. The G33A, G33I and G37L mutants as well
as English (H6R), Taiwanese (D7H) and Tottori (D7N) of β-amyloid can modulate
protein aggregation rates and pathways [20, 55, 66]. For example, A2V mutation
was found to greatly increase the Aβ40 fibril formation rates, but the mixture of the
Aβ40 and its A2V mutant peptides protects against amyloidogenesis [14, 24]. Using
traditional MD simulations in explicit solvent, Li and co-workers [89, 90] repro-
552 M. Kouza et al.
4 Conclusions
Fig. 5 Polymorphism in β-amyloid fibrils. Experimentally resolved U-shaped Aβ1–40 (a) and S-
shaped Aβ11–42 (b) fibril structures. Representatives structures of proposed out-of-register model
(c) and ring-like model (d) of Aβ1–42 fibrils
such simulations will have the capacity and resources to tackle the dangerous and
deadly structures of amyloid oligomers and aggregates.
Acknowledgements The authors thank Girik Malik for critical reading of the manuscript. M. K.
acknowledges the Polish Ministry of Science and Higher Education for financial support through
“Mobilnosc Plus” Program No. 1287/MOB/IV/2015/0. A. Kol. and M. K. would like to acknowl-
edge support from the National Science Center grant [MAESTRO 2014/14/A/ST6/00088]. IAB
acknowledges support from the Eunice Kennedy Shriver National Institute of Child Health and
Human Development (NICHD) R01HD084628 and The Research Institute at Nationwide Chil-
dren’s Hospital’s John E. Fisher Endowed Chair for Neonatal and Perinatal Research. A. Klo.
acknowledges support from National Science Foundation grant DBI 1661391, and Bridge funds
provided by The Research Institute at Nationwide Children’s Hospital. This research was sup-
ported in part by the High Performance Computing Facility at The Research Institute at Nationwide
Children’s Hospital.
554 M. Kouza et al.
References
1. Alder, B.J., Wainwright, T.E.: Phase transition for a hard sphere system. J. Chem. Phys. 27(5),
1208–1209 (1957)
2. Anfinsen, C.B.: Principles that govern folding of protein chains. Science 181(4096), 223–230
(1973)
3. Balbach, J.J., Ishii, Y., Antzutkin, O.N., Leapman, R.D., Rizzo, N.W., Dyda, F., Reed,
J., Tycko, R.: Amyloid fibril formation by Abeta(16–22), a seven-residue fragment of the
Alzheimer’s beta-amyloid peptide, and structural characterization by solid state NMR. Bio-
chemistry 39(45), 13748–13759 (2000)
4. Barz, B., Wales, D.J., Strodel, B.: A kinetic approach to the sequence-aggregation relationship
in disease-related protein assembly. J. Phys. Chem. B 118(4), 1003–1011 (2014)
5. Berendsen, H.J.C, Postma, J.P.M., van Gunsteren, W.F., Hermans, J.: Interaction models for
water in relation to protein hydration. Intermolecular Forces 14, 331–442 (1981)
6. Berg, B.A., Neuhaus, T.: Multicanonical algorithms for 1st order phase-transitions. Phys. Lett.
B 267(2), 249–253 (1991)
7. Berhanu, W.M., Alred, E.J., Hansmann, U.H.E.: Stability of Osaka mutant and wild-type fibril
models. J. Phys. Chem. B 119(41), 13063–13070 (2015)
8. Bernhardt, N.A., Xi, W.H., Wang, W., Hansmann, U.H.E.: Simulating protein fold switching
by replica exchange with tunneling (vol 12, pg 5656, 2016). J. Chem. Theory Comput. 13(1),
393–394 (2017)
9. Bhavaraju, M., Hansmann, U.H.E.: Effect of single point mutations in a form of systemic
amyloidosis. Protein Sci. 24(9), 1451–1462 (2015)
10. Blaszczyk, M., Kurcinski, M., Kouza, M., Wieteska, L., Debinski, A., Kolinski, A., Kmiecik,
S.: Modeling of protein-peptide interactions using the CABS-dock web server for binding
site search and flexible docking. Methods 93, 72–83 (2016)
11. Blokhuis, A.M., Groen, E.J.N., Koppers, M., van den Berg, L.H., Pasterkamp, R.J.: Protein
aggregation in amyotrophic lateral sclerosis. Acta Neuropathol. 125(6), 777–794 (2013)
12. Boczko, E.M., Brooks, C.L.: First-Principles calculation of the folding free-energy of a 3-helix
bundle protein. Science 269(5222), 393–396 (1995)
13. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M.:
Charmm—A program for macromolecular energy, minimization, and dynamics calculations.
J. Comput. Chem. 4(2), 187–217 (1983)
14. Buhimschi, I., Jing, H.W., Axe, M., Ray, W., Zhao, G.M., Huang, C.S., Song, Y., Wysocki, V.,
Buhimschi, C.: Shotgun proteomics of the urine misfoldome identifies molecular signatures
of preeclampsia subphenotypes. Am. J. Obstet. Gynecol. 212(1), S34 (2015)
15. Buhimschi, I.A., Nayeri, U.A., Zhao, G., Shook, L.L., Pensalfini, A., Funai, E.F., Bernstein,
I.M., Glabe, C.G., Buhimschi, C.S.: Protein misfolding, congophilia, oligomerization, and
defective amyloid processing in preeclampsia. Sci. Transl. Med. 6(245), 245–292 (2014)
16. Bussi, G., Donadio, D., Parrinello, M.: Canonical sampling through velocity rescaling. J.
Chem. Phys. 126(1), 014101 (2007)
17. Case, D.A., Cheatham, T.E., Darden, T., Gohlke, H., Luo, R., Merz, K.M., Onufriev, A.,
Simmerling, C., Wang, B., Woods, R.J.: The Amber biomolecular simulation programs. J.
Comput. Chem. 26(16), 1668–1688 (2005)
18. Castillo, V., Grana-Montes, R., Sabate, R., Ventura, S.: Prediction of the aggregation propen-
sity of proteins from the primary sequence: aggregation properties of proteomes. Biotechnol.
J. 6(6), 674–685 (2011)
19. Chafekar, S.M., Malda, H., Merkx, M., Meijer, E.W., Viertl, D., Lashuel, H.A., Baas, F.,
Scheper, W.: Branched KLVFF tetramers strongly potentiate inhibition of beta-amyloid aggre-
gation. ChemBioChem 8(15), 1857–1864 (2007)
20. Chen, W.T., Hong, C.J., Lin, Y.T., Chang, W.H., Huang, H.T., Liao, J.Y., Chang, Y.J., Hsieh,
Y.F., Cheng, C.Y., Liu, H.C., Chen, Y.R., Cheng, I.H.: Amyloid-beta (Abeta) D7H mutation
increases oligomeric Abeta42 and alters properties of Abeta-zinc/copper assemblies. PLoS
ONE 7(4), e35807 (2012)
Explicit-Solvent All-Atom Molecular Dynamics … 555
21. Chiti, F., Dobson, C.M.: Protein misfolding, amyloid formation, and human disease: a sum-
mary of progress over the last decade. Annu. Rev. Biochem. 86(86), 27–68 (2017)
22. Coskuner, O., Wise-Scira, O., Perry, G., Kitahara, T.: The structures of the E22 delta mutant-
type amyloid-beta alloforms and the impact of E22 delta mutation on the structures of the
wild-type amyloid-beta alloforms. ACS Chem. Neurosci. 4(2), 310–320 (2013)
23. Darden, T., York, D., Pedersen, L.: Particle mesh Ewald—An N.log(N) method for Ewald
sums in large systems. J. Chem. Phys. 98(12), 10089–10092 (1993)
24. Di Fede, G., Catania, M., Morbin, M., Rossi, G., Suardi, S., Mazzoleni, G., Merlin, M.,
Giovagnoli, A.R., Prioni, S., Erbetta, A., Falcone, C., Gobbi, M., Colombo, L., Bastone, A.,
Beeg, M., Manzoni, C., Francescucci, B., Spagnoli, A., Cantu, L., Del Favero, E., Levy, E.,
Salmona, M., Tagliavini, F.: A recessive mutation in the APP gene with dominant-negative
effect on amyloidogenesis. Science 323(5920), 1473–1477 (2009)
25. Dobson, C.M.: Protein folding and misfolding. Nature 426(6968), 884–890 (2003)
26. Frenkel, D., Smit, B.: Understanding Molecular Simulation: From Algorithms to Applications.
Elsevier (1996)
27. Frydman-Marom, A., Rechter, M., Shefler, I., Bram, Y., Shalev, D.E., Gazit, E.: Cognitive-
performance recovery of Alzheimer’s disease model mice by modulation of early soluble
amyloidal assemblies. Angew. Chem. Int. Ed. Engl. 48(11), 1981–1986 (2009)
28. Garbuzynskiy, S.O., Lobanov, M.Y., Galzitskaya, O.V.: FoldAmyloid: a method of prediction
of amyloidogenic regions from protein sequence. Bioinformatics 26(3), 326–332 (2010)
29. Gazit, E.: Self assembly of short aromatic peptides into amyloid fibrils and related nanostruc-
tures. Prion 1(1), 32–35 (2007)
30. Gordon, D.J., Tappe, R., Meredith, S.C.: Design and characterization of a membrane per-
meable N-methyl amino acid-containing peptide that inhibits Abeta(1–40) fibrillogenesis. J.
Peptide Res. 60(1), 37–55 (2002)
31. Hamaguchi, T., Ono, K., Yamada, M.: Anti-amyloidogenic therapies: strategies for prevention
and treatment of Alzheimer’s disease. Cell. Mol. Life Sci. 63(13), 1538–1552 (2006)
32. Hansmann, U.H.E.: Parallel tempering algorithm for conformational studies of biological
molecules. Chem. Phys. Lett. 281(1–3), 140–150 (1997)
33. Hess, B., Bekker, H., Berendsen, H.J.C., Fraaije, J.G.E.M.: LINCS: a linear constraint solver
for molecular simulations. J. Comput. Chem. 18(12), 1463–1472 (1997)
34. Hess, B., Kutzner, C., van der Spoel, D., Lindahl, E.: GROMACS 4: algorithms for highly
efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 4(3),
435–447 (2008)
35. Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A., Simmerling, C.: Comparison
of multiple amber force fields and development of improved protein backbone parameters.
Proteins-Struct. Funct. Bioinf. 65(3), 712–725 (2006)
36. Hukushima, K., Nemoto, K.: Exchange Monte Carlo method and application to spin glass
simulations. J. Phys. Soc. Jpn. 65(6), 1604–1608 (1996)
37. Jing, H.W., Zhao, G.M., Axe, M., Buhimschi, C.S., Wysocki, V., Buhimschi, I.A.: Protein
enrichment using Congo red (CR) affinity enhances characterization of the urine misfoldome
in preeclampsia (PE). Am. J. Obstet. Gynecol. 214(1), S408 (2016)
38. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., Klein, M.L.: Comparison of
simple potential functions for simulating liquid water. J. Chem. Phys. 79(2), 926–935 (1983)
39. Jorgensen, W.L., Tiradorives, J.: The opls potential functions for proteins-energy minimiza-
tions for crystals of cyclic-peptides and crambin. J. Am. Chem. Soc. 110(6), 1657–1666
(1988)
40. Kmiecik, S., Gront, D., Kolinski, M., Wieteska, L., Dawid, A.E., Kolinski, A.: Coarse-grained
protein models and their applications. Chem. Rev. 116(14), 7898–7936 (2016)
41. Kolinski, A.: Protein modeling and structure prediction with a reduced representation. Acta
Biochim. Pol. 51(2), 349–371 (2004)
42. Kouza, M., Banerji, A., Kolinski, A., Buhimschi, I.A., Kloczkowski, A.: Oligomerization of
FVFLM peptides and their ability to inhibit beta amyloid peptides aggregation: consideration
as a possible model. Phys. Chem. Chem. Phys. 19(4), 2990–2999 (2017)
556 M. Kouza et al.
43. Kouza, M., Co, N.T., Nguyen, P.H., Kolinski, A., Li, M.S.: Preformed template fluctuations
promote fibril formation: Insights from lattice and all-atom models. J. Chem. Phys. 142(14),
04B610_1 (2015)
44. Kouza, M., Faraggi, E., Kolinski, A., Kloczkowski, A.: The GOR method of protein secondary
structure prediction, and its application as protein aggregation prediction tool. In: Zhou, Y.,
Kloczkowski, A., Faraggi, E., Yang, Y. (eds.) Prediction of Protein Secondary Structure. vol.
1484, pp. 7–24. Humana Press, New York (2017)
45. Kouza, M., Hansmann, U.H.E.: Velocity scaling for optimizing replica exchange molecular
dynamics. J. Chem. Phys. 134(4), 01B630 (2011)
46. Kouza, M., Hu, C.K., Li, M.S.: New force replica exchange method and protein folding
pathways probed by force-clamp technique. J. Chem. Phys. 128(4), 01B618 (2008)
47. Kouza, M., Hu, C.K., Li, M.S., Kolinski, A.: A structure-based model fails to probe the
mechanical unfolding pathways of the titin I27 domain. Journal of Chemical Physics 139(6),
08B615 (2013)
48. Kouza, M., Hu, C.K., Zung, H., Li, M.S.: Protein mechanical unfolding: Importance of non-
native interactions. J. Chem. Phys. 131(21), 12B608 (2009)
49. Kouza, M., Lan, P.D., Gabovich, A.M., Kolinski, A., Li, M.S.: Switch from thermal to force-
driven pathways of protein refolding. J. Chem. Phys. 146(13), 135101 (2017)
50. Kubelka, J., Hofrichter, J., Eaton, W.A.: The protein folding ‘speed limit’. Curr. Opin. Struct.
Biol. 14(1), 76–88 (2004)
51. Li, M.S., Co, N.T., Reddy, G., Hu, C.K., Straub, J.E., Thirumalai, D.: Factors governing
fibrillogenesis of polypeptide chains revealed by lattice models. Phys. Rev. Lett. 105(21),
218101 (2010)
52. Lindorff-Larsen, K., Maragakis, P., Piana, S., Shaw, D.E.: Picosecond to millisecond structural
dynamics in human ubiquitin. J. Phys. Chem. B 120(33), 8313–8320 (2016)
53. Liwo, A., He, Y., Scheraga, H.A.: Coarse-grained force field: general folding theory. Phys.
Chem. Chem. Phys. 13(38), 16890–16901 (2011)
54. Lu, J.X., Qiang, W., Yau, W.M., Schwieters, C.D., Meredith, S.C., Tycko, R.: Molecular
structure of beta-amyloid fibrils in Alzheimer’s disease brain tissue. Cell 154(6), 1257–1268
(2013)
55. Lu, Y., Wei, G.H., Derreumaux, P.: Effects of G33A and G33I mutations on the structures
of monomer and dimer of the amyloid-beta fragment 29–42 by replica exchange molecular
dynamics simulations. J. Phys. Chem. B 115(5), 1282–1288 (2011)
56. Luhrs, T., Ritter, C., Adrian, M., Riek-Loher, D., Bohrmann, B., Doeli, H., Schubert, D.,
Riek, R.: 3D structure of Alzheimer’s amyloid-beta(1–42) fibrils. Proc. Natl. Acad. Sci. U S
A 102(48), 17342–17347 (2005)
57. Marrink, S.J., Risselada, H.J., Yefimov, S., Tieleman, D.P., de Vries, A.H.: The MARTINI
force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111(27),
7812–7824 (2007)
58. Mazor, Y., Gilead, S., Benhar, I., Gazit, E.: Identification and characterization of a novel
molecular-recognition and self-assembly domain within the islet amyloid polypeptide. J. Mol.
Biol. 322(5), 1013–1024 (2002)
59. Mccammon, J.A., Gelin, B.R., Karplus, M.: Dyn. Folded Proteins. Nature 267(5612), 585–590
(1977)
60. Micheletti, C., Laio, A., Parrinello, M.: Reconstructing the density of states by history-
dependent metadynamics. Phys. Rev. Lett. 92(17), 170601 (2004)
61. Moreno-Gonzalez, I., Soto, C.: Misfolded protein aggregates: mechanisms, structures and
potential for disease transmission. Semin. Cell Dev. Biol. 22(5), 482–487 (2011)
62. Morriss-Andrews, A., Shea, J.E.: Simulations of protein aggregation: insights from atomistic
and coarse-grained models. J. Phys. Chem. Lett. 5(11), 1899–1908 (2014)
63. Morriss-Andrews, A., Shea, J.E.: Computational studies of protein aggregation: methods and
applications. Annu. Rev. Phys. Chem. 66(66), 643–666 (2015)
64. Nam, H.B., Kouza, M., Hoang, Z., Li, M.S.; Relationship between population of the fibril-
prone conformation in the monomeric state and oligomer formation times of peptides: Insights
from all-atom simulations. J. Chem. Phys. 132(16), 04B613 (2010)
Explicit-Solvent All-Atom Molecular Dynamics … 557
65. Nguyen, P.H., Li, M.S., Stock, G., Straub, J.E., Thirumalai, D.: Monomer adds to preformed
structured oligomers of Abeta-peptides by a two-stage dock-lock mechanism. Proc. Natl.
Acad. Sci. U S A 104(1), 111–116 (2007)
66. Ono, K., Condron, M.M., Teplow, D.B.: Effects of the English (H6R) and Tottori (D7N)
familial Alzheimer disease mutations on amyloid beta-protein assembly and toxicity. J. Biol.
Chem. 285(30), 23184–23195 (2010)
67. Peter, E.K., Pivkin, I.V., Shea, J.E.: A canonical replica exchange molecular dynamics imple-
mentation with normal pressure in each replica. J. Chem. Phys. 145(4), 044903 (2016)
68. Petkova, A.T., Yau, W.M., Tycko, R.: Experimental constraints on quaternary structure in
Alzheimer’s beta-amyloid fibrils. Biochemistry 45(2), 498–512 (2006)
69. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel,
R.D., Kale, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comput. Chem.
26(16), 1781–1802 (2005)
70. Proctor, E.A., Fee, L., Tao, Y.Z., Redler, R.L., Fay, J.M., Zhang, Y.L., Lv, Z.J., Mercer,
I.P., Deshmukh, M., Lyubchenko, Y.L., Dokholyan, N.V.: Nonnative SOD1 trimer is toxic
to motor neurons in a model of amyotrophic lateral sclerosis. Proc. Natl. Acad. Sci. U S A
113(3), 614–619 (2016)
71. Pronk, S., Pall, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., Shirts, M.R., Smith,
J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: GROMACS 4.5: a high-throughput
and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845–854
(2013)
72. Rhee, Y.M., Sorin, E.J., Jayachandran, G., Lindahl, E., Pande, V.S.: Simulations of the role
of water in the protein-folding mechanism. Proc. Natl. Acad. Sci. U S A 101(17), 6456–6461
(2004)
73. Rief, M., Gautel, M., Oesterhelt, F., Fernandez, J.M., Gaub, H.E.: Reversible unfolding of
individual titin immunoglobulin domains by AFM. Science 276(5315), 1109–1112 (1997)
74. Rojas, A.V., Liwo, A., Scheraga, H.A.: A study of the alpha-helical intermediate preceding
the aggregation of the amino-terminal fragment of the beta amyloid peptide (Abeta(1–28)).
J. Phys. Chem. B 115(44), 12978–12983 (2011)
75. Scheraga, H.A., Khalili, M., Liwo, A.: Protein-folding dynamics: overview of molecular
simulation techniques. Annu. Rev. Phys. Chem. 58, 57–83 (2007)
76. Scott, W.R.P., Hunenberger, P.H., Tironi, I.G., Mark, A.E., Billeter, S.R., Fennen, J., Torda,
A.E., Huber, T., Kruger, P., van Gunsteren, W.F.: The GROMOS biomolecular simulation
program package. J. Phys. Chem. A 103(19), 3596–3607 (1999)
77. Selkoe, D.J.: Alzheimer’s disease: genes, proteins, and therapy. Physiol. Rev. 81(2), 741–766
(2001)
78. Shakhnovich, E.: Protein folding thermodynamics and dynamics: Where physics, chemistry,
and biology meet. Chem. Rev. 106(5), 1559–1588 (2006)
79. Siwy, C.M., Lockhart, C., Klimov, D.K.: Is the conformational ensemble of Alzheimer’s
Abeta 10–40 peptide force field dependent? Plos Computat. Biol. 13(1), e1005314 (2017)
80. Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding.
Chem. Phys. Lett. 314(1–2), 141–151 (1999)
81. Tartaglia, G.G., Vendruscolo, M.: The Zyggregator method for predicting protein aggregation
propensities. Chem. Soc. Rev. 37(7), 1395–1401 (2008)
82. Tenidis, K., Waldner, M., Bernhagen, J., Fischle, W., Bergmann, M., Weber, M., Merkle,
M.L., Voelter, W., Brunner, H., Kapurniotu, A.: Identification of a penta- and hexapeptide of
islet amyloid polypeptide (IAPP) with amyloidogenic and cytotoxic properties. J. Mol. Biol.
295(4), 1055–1071 (2000)
83. Thirumalai, D., Reddy, G., Straub, J.E.: Role of water in protein aggregation and amyloid
polymorphism. Acc. Chem. Res. 45(1), 83–92 (2012)
84. Tjernberg, L.O., Lilliehook, C., Callaway, D.J.E., Naslund, J., Hahne, S., Thyberg, J., Tere-
nius, L., Nordstedt, C.: Controlling amyloid beta-peptide fibril formation with protease-stable
ligands (vol 272, pg 12601, 1997). J. Biol. Chem. 272(28), 17894–17895 (1997)
558 M. Kouza et al.
85. Tjernberg, L.O., Naslund, J., Lindqvist, F., Johansson, J., Karlstrom, A.R., Thyberg, J., Tere-
nius, L., Nordstedt, C.: Arrest of beta-amyloid fibril formation by a pentapeptide ligand. J.
Biol. Chem. 271(15), 8545–8548 (1996)
86. Tomiyama, T., Nagata, T., Shimada, H., Teraoka, R., Fukushima, A., Kanemitsu, H., Takuma,
H., Kuwano, R., Imagawa, M., Ataka, S., Wada, Y., Yoshioka, E., Nishizaki, T., Watanabe, Y.,
Mori, H.: A new amyloid mu variant favoring oligomerization in Alzheimer’s-type dementia.
Ann. Neurol. 63(3), 377–387 (2008)
87. Tong, M., Cheng, S.B., Chen, Q., DeSousa, J., Stone, P.R., James, J.L., Chamley, L.W.,
Sharma, S.: Aggregated transthyretin is specifically packaged into placental nano-vesicles in
preeclampsia. Sci. Rep. 7, 6694 (2017)
88. Viet, M.H., Ngo, S.T., Lam, N.S., Li, M.S.: Inhibition of aggregation of amyloid peptides by
beta-sheet breaker peptides and their binding affinity. J. Phys. Chem. B 115(22), 7433–7446
(2011)
89. Viet, M.H., Nguyen, P.H., Derreumaux, P., Li, M.S.: Effect of the English familial disease
mutation (H6R) on the monomers and dimers of Abeta40 and Abeta42. ACS Chem. Neurosci.
5(8), 646–657 (2014)
90. Viet, M.H., Nguyen, P.H., Ngo, S.T., Li, M.S., Derreumaux, P.: Effect of the Tottori familial
disease mutation (D7N) on the monomers and dimers of Abeta40 and Abeta42. ACS Chem.
Neurosci. 4(11), 1446–1457 (2013)
91. Wabik, J., Kmiecik, S., Gront, D., Kouza, M., Kolinski, A.: Combining coarse-grained protein
models with replica-exchange all-atom molecular dynamics. Int. J. Mol. Sci. 14(5), 9893–9905
(2013)
92. Walti, M.A., Ravotti, F., Arai, H., Glabe, C.G., Wall, J.S., Bockmann, A., Guntert, P., Meier,
B.H., Riek, R.: Atomic-resolution structure of a disease-relevant Abeta(1–42) amyloid fibril.
Proc. Natl. Acad. Sci. U S A 113(34), E4976–E4984 (2016)
93. Wang, J.N., Zhu, W.L., Li, G.H., Hansmann, U.H.E.: Velocity-scaling optimized replica
exchange molecular dynamics of proteins in a hybrid explicit/implicit solvent. J. Chem. Phys.
135(8), 084115 (2011)
94. Wu, C., Shea, J.E.: Coarse-grained models for protein aggregation. Curr. Opin. Struct. Biol.
21(2), 209–220 (2011)
95. Xi, W.H., Hansmann, U.H.E.: Ring-like N-fold models of Abeta(42) fibrils. Sci. Rep. 7, 40787
(2017)
96. Xi, W.H., Vanderford, E.K., Hansmann, U.H.E.: Out-of-register Abeta(42) assemblies as
models for neurotoxic oligomers and fibrils. J. Chem. Theory Comput. 14(2), 1099–1110
(2018)
97. Xi, W.H., Wang, W.H., Abbott, G., Hansmann, U.H.E.: Stability of a recently found triple-
beta-stranded Abeta 1–42 fibril motif. J. Phys. Chem. B 120(20), 4548–4557 (2016)
98. Xiao, Y.L., Ma, B.Y., McElheny, D., Parthasarathy, S., Long, F., Hoshi, M., Nussinov, R.,
Ishii, Y.: Abeta(1–42) fibril structure illuminates self-recognition and replication of amyloid
in Alzheimer’s disease. Nat. Struct. Mol. Biol. 22(6), 499 (2015)
99. Yan, L.M., Velkova, A., Tatarek-Nossol, M., Andreetto, E., Kapurniotu, A.: LAPP mimic
blocks Abeta cytotoxic self-assembly: cross-suppression of amyloid toxicity of Abeta and
IAPP suggests a molecular link between Alzheimer’s disease and type II diabetes. Angew.
Chem. Int. Ed. 46(8), 1246–1252 (2007)
100. Yasar, F., Bernhardt, N.A., Hansmann, U.H.E.: Replica-exchange-with-tunneling for fast
exploration of protein landscapes. J. Chem. Phys. 143(22), 224102 (2015)
101. Kouza, M., Co, N.T., Li, M.S., Kmiecik, S., Kolinski, A., Kloczkowski, A., Buhimschi, I.A.:
Kinetics and mechanical stability of the fibril state control fibril formation time of polypeptide
chains: A computational study. J. Chem. Phys. 148, 215106 (2018)
Part IV
Use of Structural Database or
Experimental Information in Modeling
Protein Structure and Dynamics
Bioinformatical Approaches
to Unstructured/Disordered Proteins
and Their Complexes
B. Mészáros · Z. Dosztányi
MTA-ELTE Momentum Bioinformatics Research Group, Eötvös Loránd University,
Budapest, Hungary
e-mail: [email protected]
Z. Dosztányi
e-mail: [email protected]
E. Fichó · C. Magyar · I. Simon (B)
Institute of Enzymology, RCNS, HAS, Budapest, Hungary
e-mail: [email protected]
E. Fichó
e-mail: [email protected]
C. Magyar
e-mail: [email protected]
© Springer Nature Switzerland AG 2019 561
A. Liwo (ed.), Computational Methods to Study the Structure and Dynamics
of Biomolecules and Biomolecular Processes, Springer Series on Bio-
and Neurosystems 8, https://doi.org/10.1007/978-3-319-95843-9_17
562 B. Mészáros et al.
In approximately the first 40 years of structural biology, the central model underly-
ing all biochemical studies was that a well-formed structure is a prerequisite for a
protein to carry out its function. This notion motivated a large number of structure-
function studies and led to the structure determination of over 100,000 proteins as of
date. Although some proteins and protein segments were known that either did not
lend themselves to structure determination or had sequence features that were seem-
ingly incompatible with a folded structure (e.g. highly charged, repetitive sequence
regions), these were considered as hallmarks of imperfect experimental conditions
or some exotic rarities of nature.
With the explosion of available genome sequences, during the 1990s the known num-
ber of these ‘rarities’ and ‘experimental errors’ grew steadily to the point where they
could no longer be written down on a side note. This forced molecular biologists to
reassess the structure-function paradigm [1]. The world of proteins was extended to
include proteins that do not require a stable, three dimensional structure even under
physiological conditions in order to fulfill their biological role [2–4]. These Intrin-
sically Unstructured/Disordered Proteins (IUPs/IDPs) lack a well defined tertiary
structure in isolation and fluctuate between a multitude of conformations over time
and population. The importance of protein disorder is underlined by the abundance of
partially or fully disordered proteins encoded in higher eukaryotic genomes [5]. Using
bioinformatics methods (discussed in later sections) it was estimated that 30–50%
of eukaryotic proteins contain at least one long disordered segment. The fact that
protein disorder is not a tolerated necessity but provides an evolutionary advantage
is reflected by studies showing the steady increase of the percentage of disordered
proteins in proteomes as organism complexity increases [6, 7]. Furthermore, dis-
ordered proteins are involved in many critical processes [3] such as transcription,
translation, regulation, signal transduction and stress-response, complementing the
functional repertoire of globular proteins [8].
Characterization of IDPs based on their functions shows that disorder can help
these proteins to fulfill their functions in various ways [9, 10]. In accord with the
wide variety of functions associated with it, protein disorder also comes in a variety.
In some cases disordered regions are short and can be found at the terminal regions
of globular domains, such as the disordered N-terminal region of the eIF4E protein.
Bioinformatical Approaches to Unstructured/Disordered Proteins … 563
Similarly, globular domains can also harbor flexible loops that appear as missing
regions in solved structures. Flexible linkers that connect globular domains, such
as zinc fingers, represent another type of localized disorder. In another scenario,
especially in complex organisms, protein disorder often encompasses larger, domain
sized regions. These regions can exhibit different degrees of flexibility ranging from
the near-random conformation of the ACTR domain of the p160 protein, through the
presence of local transient secondary structural elements—such as in the N-terminal
region of p27—to compact molten globule regions with considerable amount of sec-
ondary structure but without stable tertiary structure, such as the nuclear coactivator
binding domain of the CBP protein.
Given the functional importance of disordered protein regions, their malfunction
is expected to have serious biological consequences. IDPs have been implicated
in various diseases, including neurodegenerative diseases, amyloidosis, diabetes,
cardiovascular diseases and cancer [11–14]. Despite the fact that proteins involved
in these diseases were shown to have a higher disorder content, the exact role of
protein disorder in the diseases themselves are not fully understood. Probably, most
results published to date concern the involvement of IDPs in cancer [15]. BRCA1,
p27, p21 and CBP, are examples of proteins with a significant amount of disorder
that have been associated with various forms of cancer. One of the best characterized
disordered proteins, p53, is known to be directly inactivated in more than 50% of
cancers. At a more general level, the higher proportion of disordered proteins among
cancer associated proteins was also observed [15]. However, it has been shown that
the link between protein disorder and the involvement in cancer is not causative. In
fact, both are strongly correlated with protein function, which links them together
[16]. This clearly calls for a more detailed understanding of the role of protein
disorder in various diseases.
Apart from basic research interests, the connection between protein disorder and
its role in diseases has implications in therapeutics as well. The pharmaceutical
industry is currently struggling to find promising new drug targets, despite substantial
increases in research funding. Drug discovery rates seem to have reached a plateau
or are perhaps even declining, suggesting the need for new strategies. Until recently,
the feasibility of targeting proteins without a well-defined structure was unclear for
the purpose of drug development [17]. There is now, however, a newly sparked
interest in IDPs as potential drug targets [18]. This is supported by the finding of
specific inhibitors to block the interaction between a disordered region of p53 and
the folded MDM2, or between the disordered helix-loop-helix type transcription
factors c-Myc and Max. Recognizing the relevance of these proteins stimulated more
systematic efforts aimed at their structural characterization and the determination of
their mechanisms of action.
564 B. Mészáros et al.
As the above pharmaceutical examples show, the study of the interactions involving
IDPs is of special interest that has relevance not only from a therapeutic viewpoint
but also from a basic research perspective as well. With the exception of a few
known disordered proteins, such as entropic chains (where the biological function is
directly mediated by disorder without interaction, as in the case of the MAP2 projec-
tion domain, titin’s PEVK domain and the nucleoporin complex), most disordered
proteins function by binding specifically to other proteins, DNA or RNA. The lack
of structure in the unbound form has a profound effect on both the binding process
and the properties of the resulting complex [19, 20], albeit this effect varies depend-
ing on the type and structural features of the partner molecule. In the most studied
scenario, the IDP interacts with ordered (globular) protein partner(s) via a process
termed coupled folding and binding. In these cases, the flexibility of the disordered
partner decreases due to the binding. As a result, the resulting complex usually lends
itself to traditional structure determination. The main source of protein structures,
the Protein Data Bank (PDB) [21] contains several hundred (or possibly even more)
of such cases. These examples demonstrate the definitive differences of complexes
involving disordered proteins compared to complexes formed exclusively by ordered
globular proteins. Although in both cases the proteins have a stable structure in the
complex form, many of the distinct properties of bound IDPs give away their inherent
flexibility in their free form [22, 23].
In most cases, disordered segments adopt a largely extended and open conforma-
tion in the complex. Probably one of the most characteristic features of disordered
binding regions is that they are usually well localized in the sequence—in about 70%
of the cases the interacting residues can be mapped to a single continuous region of
residues. These localized interacting regions allow IDPs to have an increased modu-
larity as different binding regions can be incorporated into the same protein without
excessively increasing protein length. These binding regions can be close to each
other or can form mutually exclusive overlapping sites creating molecular switches.
The distinct binding mode of IDPs is also reflected in the physico-chemical nature
of their interfaces. These interfaces are more hydrophobic, and the preferred interac-
tion contacts are also significantly different compared to ordered proteins. As opposed
to the large number of polar-polar interactions at globular interfaces, IDPs tend to
favor hydrophobic-hydrophobic contacts with the partner protein. The increased
importance of hydrophobic interactions during binding is a hallmark of the com-
plexes involving IDPs [24].
Figure 1 shows a protein complex involving three proteins. This complex on one
hand shows a typical interaction between ordered proteins, and on the other hand, also
shows an interaction between ordered proteins and a disordered protein. The shown
solved structure is of the complex between the ordered cyclinA and cyclin dependent
kinase 2 (CDK2) proteins, inhibited by the disordered p27 protein. The interaction
between cyclinA and CDK2 plays an essential role in the control of the transition
between the S and G2 cell-cycle phases of eukaryotic cells. The specific interac-
Bioinformatical Approaches to Unstructured/Disordered Proteins … 565
Fig. 1 Example of interfaces between two ordered proteins and a disordered protein. The ordered
CDK2 and cyclinA are shown in blue and purple surface representations, respectively. The disor-
dered p27 is shown in golden cartoon representation. The figure was generated from the 1jsu PDB
file
tion between the two proteins enables CDK2 to bind ATP due to slight structural
rearrangements emerging during the binding. The interaction surface is dominated
by polar and charged residues and is relatively planar. The interface is of moderate
size compared to the size of the proteins with about 13–14% of the residues of both
cyclin A and CDK2 visible in the structure being involved directly in the binding.
A strikingly different molecular recognition scenario is presented by the disordered
p27 in the complex. The segment of p27 involved in the binding shows only little
helical preferences in the unbound form. However, several regions adopt a well-
defined α-helix upon binding. The group of the most strongly interacting residues of
p27 is dominated by hydrophobic/aromatic residues that fit into hydrophobic clefts
and grooves on the surface of the cyclinA-CDK2 complex. The structure shows that
the interacting region of p27 forms a largely linear binding site in the sense that all
residues of p27 interacting with the ordered partner complex are sequentially close.
This enables p27 to incorporate a significantly larger fraction of its residues into the
interaction, and accordingly over two-thirds of the visible residues of p27 are directly
involved in the binding.
In the case of disordered proteins, the coupling between folding and binding
is not only apparent in the structural properties of the resulting complex, but also
in the thermodynamics and energetics of the binding. Following the basic rules of
thermodynamics, the resulting protein complex corresponds to the state with an
energy minimum. However—as opposed to the interaction of globular proteins—in
566 B. Mészáros et al.
complexes involving IDPs, the loss of entropy during the folding of the disordered
partner plays a much larger role, which results in a weaker overall binding compared
to that of globular proteins. This way the specificity, which is basically independent
of the entropic terms, is uncoupled from binding strength [20]. This enables IDPs
to form specific, yet transient interactions, which are indispensable for regulatory
and signaling processes [3, 9, 10]. The increased rate of association and dissocia-
tion of disordered proteins increase their temporal binding capacity. Furthermore,
disordered proteins are able to incorporate a higher fraction of their surface in the
binding interface, which increases their interaction capacity in a spatial sense as well
[25]. Consequently, disordered proteins in general can mediate a large number of
interactions thus serving as hubs of protein-protein interaction networks [26].
In the previously described scenario of coupled folding and binding, the interacting
IDP partner reaches the ordered state via using the ordered protein partner(s) as a
template that drives the folding process. However, some IDPs are able to adopt stable
structures during interactions without a pre-existing folded partner. As a counterpart
to coupled folding and binding, in the case of Mutual Synergistic Folding (MSF)
[27] all interacting partners are IDPs without stable tertiary structures outside of
the complex. During MSF the folding of all participating protein partners happens
at the same time coupled to the interaction in a synergistic manner. The first well-
documented case of MSF was presented in the early 2000s describing the co-folding
of ACTR and CBP [27]. This study presented multiple evidence using structural and
thermodynamic analysis concerning both the unstructured state of the interactors, and
the fact that they combine with high affinity to form a cooperatively folded helical
heterodimer. The complex structure arising from this specific MSF interaction is
shown in Fig. 2.
The inherent difficulty of the experimental analysis of IDPs had a profound effect
on the advancement of the field of MSF studies. While interest was sparked early on
during the development of the IDP field in general, targeted analyses had been scarce
and typically considered only a handful of known examples [22, 28–30]. However,
recent efforts have produced a comprehensive and systematic catalogue of MSF
complexes, serving as grounds for future analysis of this binding mode [31]. While
the detailed understanding of the biophysical and biochemical features of MSF are
yet to be achieved, complex structures consisting exclusively of IDPs offer deep
insights into the underlying mechanisms even at a first glance.
As all constituent proteins in an MSF complex lack a stable structure in their
unbound form, there is no single ordered template upon which the folding occurs.
Instead, the participating IDPs form a stable hydrophobic core together, with typically
all partners donating a sufficient number of hydrophobic residues. The emerging
structures often closely resemble single domain ordered protein structures in terms
of hydrophobic core size, secondary structure content or average contact numbers,
Bioinformatical Approaches to Unstructured/Disordered Proteins … 567
albeit being composed of several chains. In accord, individual IDPs involved in MSF
have fairly high hydrophobic contents, on par with that of ordered proteins. Due to
this property, IDPs undergoing MSF present a special class of disorder. The most
universal hallmark of IDPs is their lack of hydrophobic residues and high net charge,
and IDPs capable of MSF defy this quasi-ubiquitous feature. Instead, their disordered
nature stems from the special sequential arrangement of their hydrophobic residues.
Furthermore, while their composition is compatible with folding, their sizes are not;
the sheer number of hydrophobic residues in a single IDP partner is insufficient to the
independent adoption of a folded structure. This shows that the deep understanding
and future possible modulation of the whole spectrum of IDP-mediated interactions
require a targeted, systematic analyses in years to come.
is rather limited with numbers in the hundreds or low thousands. This is especially
alarming in light of the fact that about half of human proteins are estimated to contain
at least one longer disordered segment. This discrepancy faithfully reflects the dif-
ficulties of the experimental identification of disordered proteins. Because of these
difficulties, bioinformatics tools that target the prediction of protein disorder from the
sequence play a very important role in the identification and characterization of IDPs
as only these tools can give us information about their basic properties, evolution
and functions on a large scale.
The experimental difficulties often hindering efficient analyses of IDPs called for the
development of bioinformatics/theoretical approaches. Possibly the most prominent
computational task addressed almost instantly in the field of IDPs is the development
of efficient algorithms for the prediction of protein disorder from the amino acid
sequence. As with all bioinformatics prediction algorithms, IDP predictions present
issues at several different levels. These include the buildup of the prediction algorithm
itself; but the proper choice of training and testing databases and the correct evaluation
of the resulting method are equally important. In the following section we give a
brief summary of the resources enabling the development of IDP-focused prediction
methods; while in the next chapter we give an overview of the basic concepts and
techniques of disorder prediction methods.
2.2 Databases
two databases differ not only in the length of these segments, but encompass two
different flavors of protein disorder.
While data from DisProt and the PDB can highlight disordered protein regions in
general, there are other databases that have a more specialized focus concentrating of
the interacting segments of IDPs. Recent efforts in the systematic collection of inter-
actions between IDPs and protein partners have produced two distinct, yet closely
related databases. The Disordered Binding Sites (DIBS) [43] database collects cases
where an IDP interacts with an ordered protein partner via coupled folding and
binding. In contrast, the Mutual Folding Induced by Binding (MFIB) [31] database
is the repository of complexes formed exclusively by IDPs via mutual synergistic
folding. While the target interactions differ for the two databases, their underlying
approach, their architecture, and the information provided by them are highly sim-
ilar. Both databases contain the complex structures of the listed interactions. These
entries are manually inspected by database curators with a focus on the validity of
the experimental evidence for the ordered/disordered state of constituent proteins
to assure reliability. DIBS and MFIB also provide structural and functional annota-
tions of the complexes and crosslinks to other databases, as well. In addition, DIBS
also collects the dissociation constants of the interactions where available, as well as
the description of potential post-translational modifications modulating the binding
strength.
In contrast to the databases discussed so far, IDEAL (Intrinsically Disordered
proteins with Extensive Annotations and Literature) [44] is a collection of both
generic IDP regions and also disordered binding segments (albeit with a lower
stringency considering experimental verification). The database contains manual
annotations on IDP regions, and also contains annotations about interacting regions,
post-translational modification sites, and structural domain assignments. While the
primary focus of IDEAL used to be the general collection of disordered protein
regions, the newer incarnations of the database feature a new functional class of
IDP regions, called protean segments (ProS). A ProS is defined as a region of the
sequence, which is suspected to be disordered in isolation (although at times lacking
experimental proof) but which is known to be ordered bound to a protein partner.
Such defined protein regions coincide with regions undergoing coupled folding and
binding/mutual synergistic folding; thus, with the introduction of ProS, conceptually
IDEAL can be considered to lie between generic disorder databases and disordered
binding site repositories.
All above databases that incorporate IDP-mediated interactions require a stable
bound structure for IDPs. However, several protein- and nucleic acid-interacting IDPs
were found to retain varying degrees of flexibility in their complex form. In recent
years the number of these so-called fuzzy complexes [45] steadily grew and a dedi-
cated database, FuzDB was established to collect experimentally verified instances of
such interactions [46]. FuzDB currently contains over 100 fuzzy complexes, together
with their structural and biochemical evidence for disorder. The database also pro-
vides interpretation of experimental results, together with additional information
about the interactors (such as regulatory sites generated by alternative splicing or
post-translational modifications).
Bioinformatical Approaches to Unstructured/Disordered Proteins … 571
of highly specific scenarios [60], predicted secondary structures are not expected to
be stable for disordered proteins.
The incorporation of sequence profiles calculated from evolutionarily related
sequences is also more problematic in the case of disordered proteins. The strong
sequence bias present in these proteins, especially in low complexity segments can
distort the result of sequence similarity searches. Generally, disordered proteins are
evolutionarily less conserved [61], but the dynamic behavior and the associated
molecular function can be preserved even in the absence of apparent sequence con-
servation [62]. As a result, alignments are a less reliable source of information for
disordered protein segments. Although several methods use evolutionary information
in the prediction, it leads to a smaller boost in the performance of disorder prediction
methods than observed for example in the case of secondary structure prediction
methods [63].
Most prediction methods provide predictions at the per residue basis. The per-
formance of disorder predictions can be evaluated using the Matthews correlation
coefficient (MCC), balanced accuracy (ACC) that weighs the performance on the
positive and negative datasets based on the respective size of the datasets, and the
area under the receiver operating characteristic (ROC) curve (AUC, with possible
values ranging from 0.5 for random predictions to 1.0 for perfect predictors). Since
2002, the performance of various disorder prediction methods has been critically
assessed at the CASP (Critical Assessment of Protein Structure Prediction) experi-
ments [64–69]. CASP evaluations are restricted to residues with missing X-ray coor-
dinates and there is no similar blind testing for long disordered regions. According to
the CASP10 assessment, top disorder prediction methods can reach 0.90 AUC [70]
and around 70% ACC (evaluation of disorder prediction methods was discontinued
after CASP10 due to insufficient amount of data on disordered residues). Testing on
disordered regions culled from the DisProt usually place different methods at the top.
On these datasets, methods can discriminate between ordered and disordered seg-
ments with around 80% accuracy at the per residue basis [63, 71–73]. A recent novel
benchmarking dataset collected in the new release of the DisProt database confirmed
that disorder predictors work quite well, especially for long disordered segments.
However, a large fraction of such regions still goes virtually undetected [73]. Gen-
erally, the performance of disorder predictors critically depends on the dataset used
for testing, or more generally, the type of disorder studied. It is also influenced by
the evaluation criteria. Nevertheless, modern disorder prediction methods can be
considered quite reliable in general.
programs, and provide residue based predictions. A summary of these methods can
be found in Table 1 at the end of the section.
The first member of the PONDR-family of disorder prediction methods was
PONDR VL-XT [39]. The training set of this method was composed of variously
characterized long (>30 residues) disordered regions [75], and two additional training
sets of X-ray-characterized terminal regions, one for the amino-terminus and one for
the carboxy-terminus [34]. The method uses the amino acid compositions, attributes
derived from compositions such as sequence complexity, and attributes derived from
compositions via some function or scale such as hydropathy, net charge, etc. The
attributes were selected by analyzing their discriminatory power, their orthogonality,
and based on their effect on the performance. Then, the various types of attributes
were weighted and combined via artificial neural networks (ANNs). The resulting
method was found particularly useful to pinpoint certain regions that are candidates
for undergoing disorder-to-order transitions [76, 77].
PONDR-VL3-BA [78, 79] also uses an artificial neural network but the training
dataset was much larger compared to that of VL-XT. The input is formed by 18
amino acid frequencies, the average flexibility and sequence complexity, calculated
within a window of 41 residues. Sequence profiles generated by PSI-BLAST [80] can
also be added as an input attribute to improve the accuracy of predicting disordered
regions. Similarly to VL-XT, a neural network with a fully connected hidden layer of
ten neurons was trained on the specific datasets and it outputs a value for the central
amino acid in the window. These predictions are augmented by a specific predictor
that was trained to recognize the boundary between ordered and disordered regions.
Based on this, the closest maximum prediction from the boundary predictor became
the new boundary between the ordered and disordered regions.
DisEMBL, another computational tool for predicting disordered/unstructured
regions was developed by Linding et al [81]. Because of some uncertainties in the
definition of protein disorder, they developed three separate neural network based pre-
dictors using alternative definitions of disorder. These correspond to missing residues
indicated by REMARK 465 in the PDB files, residues with high B-factor (hot loops)
and residues within loops and coils. The differences in these three predictors under-
lined the distinct features of each group. By investigating the relationships between
the different disorder definitions, they found that hot loops showed less correlation
with coils and more with the missing residues.
Using an original approach, RONN (Regional Order Neural Network) [82] recog-
nizes disordered segments based on their similarity to well-characterized prototype
sequences with known disordered status. In this method, sub-sequences of a query
sequence are aligned to all prototype segments, and the similarity to these sequence
fragments is calculated using a standard mutation matrix. The resulting homology
scores are converted into distances and are used to train a modified version of radial
basis function networks called a bio-basis function neural network.
Along with artificial neural networks, the most widely used class of standard
machine learning algorithms are support vector machines (SVMs). SVMs have sev-
eral advantages over neural networks as they are less prone to overfitting, can be
trained more efficiently and handle noisy datasets better. SVMs can also handle
Bioinformatical Approaches to Unstructured/Disordered Proteins … 575
Table 1 (continued)
Name of method Training dataset for Algorithm Input data of the
disorder algorithm
DISOclust3 [84, 99] Structures from PDB Consensus Structural models,
predicted disorder
IUPred [92] None Biophysical model Amino acid
composition
IsUnstruct [100] Biophysical model Amino acid
propensities
Column 2 shows the dataset on which the methods were trained, column 3 shows the basic imple-
mented algorithm and column 4 shows the quantities the algorithm uses to calculate the final
prediction score. Abbreviations: SVM Support vector machine; PSSM Position specific scoring
matrix
unbalanced datasets, which is the case for disordered residues defined based on miss-
ing residues, as these usually comprise only 10% of all residues. The first method
utilizing SVMs for the prediction of disorder was implemented in DISOPRED2 [7].
This method was trained on a large dataset of missing residues of high resolution
structures. Separate models were created for N- and C-terminal regions besides the
model for the middle regions of the sequences. The input of the predictions is a
sequence profile for each protein, generated using a PSI-BLAST search [80] against
a filtered sequence database. One of the keys of the high accuracy of DISOPRED2
was that it was trained by placing larger cost on false positive predictions. The latest
version of this method is DISOPRED3 [83], which has a two-layer design. The first
layer uses three models to predict disorder: DISOPRED2, a neural network based
method trained on long disordered regions, and a model based on nearest neighbour
prediction of disorder. These predictions are combined by a second layer using a
neural network that helps to increase the accuracy of the predictions. DISOPRED3
is also capable of predicting disordered binding sites using SVM based techniques.
DISOclust3 also relies on the DISOPRED3 predictions but it also incorporates
structural information for the prediction of disorder. The main premise of this
approach is that structured residues are conserved in three-dimensional space across
multiple structural models. Residues missing or exhibiting high variations in certain
positions across the models are highly likely to be disordered. These predictions are
combined with the results generated by DISOPRED3. DISOclust3 is now part of the
IntFOLD [84] platform.
In the case of feed forward neural networks and SVMs, the prediction for each
residue is independent of the prediction for other residues. In contrast, recurrent
networks can also propagate data from later processing stages to earlier stages. Such
technique is used in DISpro [58]. It employs a one-dimensional recursive neural
network that combines the flexibility of a Bayesian model with the fast and convenient
parameterization of neural networks. The method also incorporates evolutionary
information as well as predicted secondary structure and solvent accessibility. Instead
of using a fixed window size, the prediction at each position depends on the entire
Bioinformatical Approaches to Unstructured/Disordered Proteins … 577
and a predictor of disorder content, DisCon [95]. DisCoP uses a regression model to
produce a new disorder prediction from seven methods (DisProt and X-ray versions
of EsPritz, CSpritz [96], SPINE-D [97], DISOPRED2, MD [72] and DISOclust)
selected empirically to maximize predictive performance. It was shown that the
consensus-based method offers a better performance compared to other predictors.
Meta approaches that integrate the results of several prediction methods have
been very successful in various areas of structure predictions [98] and appeared for
the prediction of protein disorder as well. These methods achieve improved perfor-
mance by decreasing the noise of individual predictors. Since individual disorder
prediction methods are often specific to certain types of protein disorder, their com-
bination could cover more aspects of disorder. The last round of CASP experiment
were clearly dominated by meta-predictors [69]. Nevertheless, there is still an urgent
need for specialized predictors that can accurately capture certain types of disorder.
Although these predictors might be inferior to meta-predictors in certain evaluations,
they provide more insights into the structural and even the functional properties of
disordered regions.
enough favorable intrachain contacts, it will not adopt a stable position in the 3D
structure of the chain. If such residues are clustered along a segment of a protein or
the whole protein, then this segment or the entire protein will be disordered.
The implementation of the above principle in IUPred is done taking an energetics
point of view. For globular proteins, the contribution of interresidue interactions
to total energy is often approximated by low-resolution force fields, or statistical
potentials, which are energy-like quantities derived from globular proteins based on
the observed amino acid pairing frequencies [103]. In deriving the actual potentials,
different principles have been applied. The resulting empirical energy functions are
well suited to assess the quality of structural models and have been used for fold
recognition or threading but also in docking, ab initio folding, or predicting protein
stability. Their success in a wide range of applications suggests the existence of a
common set of interactions, simultaneously favored in all native—as opposed to
alternate—structures.
In the case of IUPred, a dedicated statistical potential is optimized to estimate
the pairwise interaction energies between residues. The total pairwise energy E of
a protein in its native state is the sum of the energies of all the pairwise residue-
residue interactions in the protein. E is the function of the conformation as well
as the amino acid sequence, as they define the list of residue-residue interactions
that have a contribution to the total energy. This total energy can be calculated by
taking all contacts in the protein, and weighting them by the corresponding interaction
energies. The interaction energy between any two types or amino acids can be inferred
by calculating the frequency of interactions between these two types in a dataset of
known protein structures. These frequencies are transformed into interaction energies
using the Boltzmann hypothesis [104] and are described by the 20 by 20 interaction
energy matrix of amino acid pairs, M. Hence, the pairwise energy content calculated
based on the structure can be written as:
E calculated Mi j C i j (1)
i, j
where M ij is the interaction energy between amino acid types i and j, and C ij is the
number of interactions between residues of types i and j in the given conformation.
This energy calculation, however, assumes the knowledge of the 3D structure of
the protein and as such, is not directly applicable to proteins whose structure can
not be determined. To come around this problem, a novel estimation scheme was
established and implemented in IUPred to enable the estimation of the E interaction
energy without the structure, using the protein sequence alone. The rationale behind
this approach is that the energy contribution of a residue depends not only on its
amino acid type, but also on its potential partners in the sequence. It is assumed that
if the sequence contains more amino acid residues that can form favorable contacts
with the given residue, its expected energy contribution will be more favorable. The
simplest approximating formula for the specific estimated pairwise energy can be
expressed with a quadratic formula as:
580 B. Mészáros et al.
E estimated L Pi j f i f j (2)
i, j
20
E kj Pi j f ik (w0 ) (3)
i1
Fig. 3 Screenshot of the IUPred server output for the human Wiskott-Aldrich protein. The horizon-
tal axis represents the protein chain and the vertical axis represents the probability of each residue
to be disordered. Residues with values above 0.5 are predicted to be disordered and values below
0.5 indicate an ordered structure
As discussed in Sects. 1.2 and 1.3, many disordered proteins carry out important
functions via binding to other proteins that involves coupled folding and binding or
mutual synergistic folding. Due to their specific functional and structural properties,
these binding regions have distinct properties compared to both globular proteins and
disordered proteins in general, and these properties—in principle—enable the con-
struction of prediction algorithms to recognize them from the protein sequence. While
there are many algorithms for predicting IDPs, apparently the choice of methods for
predicting regions undergoing disorder-to-order transition upon protein binding is
rather limited.
The first publicly available method for the prediction of disordered binding regions
undergoing coupled folding and binding was ANCHOR [6]. ANCHOR aims to cap-
ture the basic biophysical properties of disordered binding segments. The essential
feature of these regions is that they exist in a disordered state in isolation, but they can
favorably interact with a globular protein and adopt a rigid conformation upon bind-
ing. In this model the combination of the high disordered tendency of the sequential
environment, and high energetic gain by interacting with a globular protein partner
indicates the presence of a disordered binding region. The implementation of these
principles follows the basic idea behind IUPred, and these criteria for the presence
of a disordered binding region are quantified with the use of estimated energies.
The testing of ANCHOR showed that the predictor recognizes around 70% of
disordered binding regions, while falsely predicting only 5% of residues in ordered
proteins. As the available dataset for experimentally verified disordered protein com-
plexes is limited in size, the benefit of using physical models instead of machine learn-
ing algorithms is evident. Another strength of ANCHOR comes from the fact that
the efficiency of the prediction is largely independent of the amino acid composition
of the query protein. For example, acidic binding regions, such as certain calmod-
ulin binding sites, are recovered with approximately the same success rate as proline
582 B. Mészáros et al.
rich binding regions, such as SH2 and SH3 domain binding sites, or hydrophobic
sites, such as the MDM2 binding region of p53. Furthermore, the goodness of the
prediction is also independent of the conformation the binding region adopts in the
bound conformation. This independency also shows the generality of ANCHOR.
The method combines the transparency of simplified biophysical models with the
usability of bioinformatical approaches.
The predictions obtained with IUPred and ANCHOR are demonstrated through
the example of the human calcium/calmodulin-dependent protein kinase IV (UniProt
ID: Q16566), shown on Fig. 4. The plot was generated with the online version
of ANCHOR [106], available at http://anchor.enzim.hu/ and http://anchor.elte.hu/.
Calcium/calmodulin-dependent kinase IV binds to calmodulin near its C-terminal
end (residues 322–341). This patch is correctly identified using ANCHOR as shown
in the figure. The binding region can also be identified based on one of the subclasses
of calmodulin binding motifs, namely the basic 1-8-14 binding motif consisting of
three positively charged residues followed by three hydrophobic ones in the 1st, 8th
and 14th position C-terminal from the positive sequence patch. The location of this
motif is also indicated on the figure.
Although IUPred and ANCHOR rely on the same approach and use the same
interaction energy prediction scheme, their outputs are distinctively different. How-
ever, IUPred also reacts to the presence of disordered binding regions: as can be seen
from the example presented on Fig. 4, disordered binding regions tend to appear
more ordered than their surrounding disordered protein segments. This tendency
is not exclusive to IUPred, many other disorder prediction outputs reflect binding
regions in a similar way. In the case of PONDR VL-XT the presence of these ‘dips’
in the prediction profile was exploited to construct a disordered binding region pre-
diction algorithm [76, 77]. In this framework, regions undergoing a coupled folding
and binding process adopting an α-helical conformation in their bound form were
targeted. These regions, termed α-MoRFs (molecular recognition features) were pre-
dicted using the local drops in the prediction score as an input to a neural network that
was trained on known examples of α-helical binding sites. The neural network then
Fig. 4 Output of the ANCHOR prediction server for calcium/calmodulin-dependent protein kinase
IV. The plot shows the predicted disordered binding regions in blue with the output of the general
disorder prediction method IUPred in red and the location of the calmodulin binding motif in orange
Bioinformatical Approaches to Unstructured/Disordered Proteins … 583
tries to discriminate the potential binding regions using various sequence features,
including disorder, secondary structure predictions and amino acid indices.
The construction of the PONDR-based α-MoRF prediction algorithm marked
the introduction of machine learning approaches into the field of disordered bind-
ing site prediction. This line of research has been actively pursued in recent years
yielding novel prediction algorithms with increasing efficiency. The first publicly
available MoRF prediction algorithm was MoRFpred, which is able to predict bind-
ing regions of IDPs regardless of their bound structures [107]. MoRFpred utilizes an
SVM architecture with various sequence features—evolutionary profiles, predicted
disorder, relative solvent accessibility, physicochemical properties—as input. The
latest incarnation of the MoRF family of disordered binding site prediction methods
is available at the MoRFchibi SYSTEM site [108]. The basis of the suite is the MoR-
Fchibi method, which is a significantly improved version of MoRFpred and can also
be easily integrated into custom bioinformatics analysis pipelines. The suite also
offers MoRFchibiWeb, which utilizes a meta design, meaning it predicts putative
MoRF annotation computed by MoRFchibi, while improving the predictive perfor-
mance. The server offers a third variant of the method, MoRFchibiLIGHT, which is
more lightweight and run-time optimized version of the algorithm, best suited for
large-scale computation tasks.
5 Linear Motifs
As discussed in the previous section and Sects. 1.2 and 1.3, the study of protein-
protein interactions formed by disordered proteins is based on structural considera-
tions. However, the study of interactions between protein domains and short, linear
protein regions—a description which fits most cases of IDPs undergoing coupled
folding and binding—has a distinctly separate approach as well, with the use of
linear motifs.
Linear motifs, also referred to as short linear motifs (SLiMs) or minimotifs, are
short functional sites typically found in disordered protein regions [109]. In the
framework of linear motifs, the interaction is not described focusing on the short
disordered partner, but the larger one, which is usually a protein domain. It was
found for many domains such as SH2/SH3, 14-3-3, WW and kinase domains that their
interacting partners—albeit in many cases not being homologues—share a limited
number (typically between 2 and 10) of common residues in the short interaction
region [110, 111]. Apart from these residues, the binding region also incorporates
other, flexible positions that can contain various amino acids without disrupting
the binding [112]. Figure 5 shows the example of nuclear receptors that are able
584 B. Mészáros et al.
Fig. 5 The figure shows the known interaction partners of nuclear receptors that all bind using the
same binding mode. The upper left structure shows a solved complex structure (based on PDB entry
1m2z) between a small region of the human NCOA2 nuclear receptor coactivator (shown in red and
yellow) and a glucocorticoid receptor (shown in blue). Although the actual sequences around the
binding region do not share a high level of similarity, they all contain three key leucine residues.
These three amino acids interspersed and flanked by flexible positions constitute the consensus
LIG_NRBOX motif (shown in red in the structure and the partner sequences)
to bind a large variety of protein partners. Although most partner proteins are not
homologues, they all share three key leucine residues at their interacting sites. During
the interaction, the region that binds to the receptor forms an α-helix and the three
leucines form a hydrophobic patch on the surface of the helix. This patch in turn
recognizes the appropriate complementary hydrophobic region of the interface of
the receptor, and anchors the helix to the binding groove. The consensus sequence
of the binding region is xLxxLLx, where x can stand for any amino acid, except for
proline, as it would disrupt the helix formation. This motif is called LIG_NRBOX and
ligands of many nuclear receptors are able to recognize their receptor partners via this
sequence pattern. The theory of linear motifs, used to describe such interactions, is
based on the assumption that these common residues (constituting the motif) mediate
the binding largely independent of the other regions of the protein they are embedded
in, functioning autonomously. However, in many cases the role of the context was
shown to be larger than originally expected [113].
The majority of protein-protein interaction mediating linear motifs were described
in eukaryotes. Currently the largest and most comprehensive available database of
these motifs is the Eukaryotic Linear Motif (ELM) database [114]. Motifs are cat-
egorized according to the type of interaction partners and functions (cleavage sites,
degradation sites, docking sites, ligand binding sites, post-translational modification
sites and targeting signals). Although the majority of these motifs were described in
Bioinformatical Approaches to Unstructured/Disordered Proteins … 585
The disordered binding region and the linear motif concepts describe molecular
interactions on different bases: the former focusing on the structure (or the initial
lack and the formation of it) and the latter approaching the problem through the
sequence. However, the interactions described by the two concepts share a high
degree of similarity. In both cases the interaction is confined to a relatively short, linear
sequence region in one of the partners. Furthermore, most experimentally described
linear motif instances were found in disordered protein regions. Accordingly, in many
cases, such as the binding of p53 to MDM2 and the N terminal region of p27 binding
to the cyclinA-CDK2 complex, the same interaction was categorized as an example
of both linear motif mediated binding and of disordered binding regions. Through
many common examples, both the binding of disordered proteins and linear motifs
have been shown to play vital roles in eukaryotic regulation and signaling [117], as
well as serving as target points for viruses [115]. Apart from individual examples,
the connection between protein disorder and motif regulation has been also shown
at a more general level [118].
Despite the very different approaches used to describe interactions via disordered
binding and linear motifs, the two fields not only share a large number of com-
mon examples but also struggle with essentially the same problems. Probably the
most serious bottleneck in both cases is the low number of experimentally verified
examples. About 50% of human proteins are predicted to contain at least one larger
disordered region, and it was shown that the primary reason for the emergence of
586 B. Mészáros et al.
these regions is to harbor binding regions [6]. In contrast, the number of experi-
mentally verified disordered regions collected in the DisProt database is in the low
thousands [41] and the number of known disordered binding regions is even less
[43]. Parallelly, a moderate estimate places the number of individual motif mediated
interactions in the human proteome alone above 35,000 [119]. Despite this high esti-
mated occurrence, the number of experimentally verified, true motif instances in all
eukaryotic proteins described in the ELM database has only reached 3000. While it
is clear that the two concepts—linear motifs and disordered binding regions—could
be used in connection to strengthen each other’s predictions, this connection between
the two fields is yet to be established in detail.
Disorder prediction methods can be used in two different ways. On one hand, they can
be used in large scale studies where many proteins are analyzed. These projects usu-
ally aim to uncover statistically meaningful differences between classes of proteins,
for example considering proteomes of different organisms, with regard to disorder
content. In this scenario usually only longer, contiguous disordered segments are
considered, and short runs (typically below 20 or 30) of residues predicted to be
disordered are filtered out. In this setup, methods that are trained to recognize longer
stretches of disordered residues, such as IUPred, PONDR VL3-BA or MFDp2 clearly
have an advantage. Practically all state-of-the-art methods assign to each residue a
continuous score, which represents the probability of it being disordered. However,
when using these methods, this score is converted to a binary classification. Residues
with scores above a predetermined threshold are classified as disordered, and residues
with lower scores are assigned an ordered status. It is worth noting, however, that
various methods are optimized for different false prediction rates—usually in the
2–15% range—and the predetermined cutoff is set accordingly. Although in com-
parative studies, where the basic questions are similar to “which of these groups of
proteins contains more disorder” or “how does the disorder content of proteomes
change during evolution” this does not affect the final results to a great extent, it
should be kept in mind that the actual numbers depend on the choice of algorithm.
The other typical use of disorder prediction methods is the analysis of individual
proteins. In these cases the different false positive rates of various methods presents a
problem that needs to be addressed, as the choice of method clearly affects the results.
Although this in theory can be circumvented by re-calibrating various methods on
a standardized dataset, this solution is not feasible for casual users. Furthermore,
the fact that various methods are optimized for various typical lengths of disorder
presents an additional level of difficulty when choosing a single method to use.
Bioinformatical Approaches to Unstructured/Disordered Proteins … 587
These considerations point towards the combined use of disorder prediction methods
when investigating individual protein sequences. A good starting point can be the
application of methods sensitive to larger, contiguous regions of order/disorder to
establish the basic structural composition of the protein in question. As a next step,
methods capable of detecting more localized disorder regions—such as OnD-CRF,
DisEMBL or DISOPRED3—can be applied.
Probably one of the most difficult tasks from the viewpoint of successful disor-
der prediction is presented by partial or transient structural elements. In the case of
stable, globular domains, or highly flexible disordered regions without a strong struc-
tural preference, most methods tend to show good agreement. However, considering
regions with partial or transient structure, such as molten globules, coiled-coil regions
or some disordered binding regions, almost all methods react to the underlying struc-
tural preferences with a lowered prediction score [120]. This type of behavior and
the resulting lack of a clear consensus prediction is highly characteristic of these
structurally ambiguous regions and for the experienced researcher these can serve
as dead giveaways. However, in the successful identification of the nature of the
underlying structural reasons, dedicated predictions—such as ANCHOR for identi-
fying disordered binding regions or COILS [121] for the identification of coiled-coil
regions—are indispensable.
In the next section we present a case study, where the reaction of various prediction
methods are demonstrated for ordered, disordered and disordered binding regions of
the human p53 protein.
Fig. 6 Predictions for human p53 (UniProt AC: P04637). In the case of OnD-CRF and DISO-
PRED3 the original prediction scores were rescaled linearly to be directly comparable with other
methods. Disordered predictions were sorted top to bottom by decreasing average predicted disorder
tendency. The central, ordered DNA binding domain (DBD) is shown in red and experimentally
verified disordered binding regions (TAD1/2, tetramerization- and C-terminal regulatory domains)
are shown in green. The rest of the protein is disordered and is shown in white. Underneath the
disorder prediction outputs, the known biologically relevant linear motifs are shown with black and
grey boxes for ligand binding and sub-cellular localization target motifs, respectively. The middle
line (Predicted secondary structure) shows the secondary structure prediction by PSIPRED, with
black and striped boxes indicating predicted α-helical and β structures, respectively. The bottom
line shows the disordered binding site prediction by ANCHOR. Shading of the boxes corresponds
to the overall confidence of the predicted binding region, with darker shades indicating a higher
confidence
Bioinformatical Approaches to Unstructured/Disordered Proteins … 589
Acknowledgements This work was supported by grants Hungarian Research and Developments
Fund (OTKA K108798 for Z.D. and K115698 for I.S.), the “Momentum” grant from the Hungarian
Academy of Sciences (LP2014-18) for Z.D. The János Bolyai Research Scholarship of the Hun-
garian Academy of Sciences for C.M. is also gratefully acknowledged. We would like to thank to
Mark Adamsbaum for his critical reading of the manuscript.
References
1. Wright, P.E., Dyson, H.J.: Intrinsically unstructured proteins: re-assessing the protein
structure-function paradigm. J. Mol. Biol. 293, 321–331 (1999). https://doi.org/10.1006/jmbi.
1999.3110
2. Dunker, A.K., Lawson, J.D., Brown, C.J., et al.: Intrinsically disordered protein. J. Mol. Graph.
Model. 19, 26–59 (2001)
3. Dyson, H.J., Wright, P.E.: Intrinsically unstructured proteins and their functions. Nat. Rev.
Mol. Cell Biol. 6, 197–208 (2005). https://doi.org/10.1038/nrm1589
4. Tompa, P.: Intrinsically unstructured proteins. Trends Biochem. Sci. 27, 527–533 (2002)
5. Dunker, A.K., Obradovic, Z., Romero, P., et al.: Intrinsic protein disorder in complete
genomes. Genome Inform Ser Workshop Genome Inform 11, 161–171 (2000)
Bioinformatical Approaches to Unstructured/Disordered Proteins … 591
6. Mészáros, B., Simon, I., Dosztányi, Z.: Prediction of protein binding regions in disordered pro-
teins. PLoS Comput. Biol. 5, e1000376 (2009). https://doi.org/10.1371/journal.pcbi.1000376
7. Ward, J.J., Sodhi, J.S., McGuffin, L.J., et al.: Prediction and functional analysis of native
disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337, 635–645 (2004).
https://doi.org/10.1016/j.jmb.2004.02.002
8. Xie, H., Vucetic, S., Iakoucheva, L.M., et al.: Functional anthology of intrinsic disorder. 1.
Biological processes and functions of proteins with long disordered regions. J. Proteome Res.
6, 1882–1898 (2007). https://doi.org/10.1021/pr060392u
9. Tompa, P.: The interplay between structure and function in intrinsically unstructured proteins.
FEBS Lett. 579, 3346–3354 (2005). https://doi.org/10.1016/j.febslet.2005.03.072
10. Galea, C.A., Wang, Y., Sivakolundu, S.G., Kriwacki, R.W.: Regulation of cell division by
intrinsically unstructured proteins: intrinsic flexibility, modularity, and signaling conduits†.
Biochemistry 47, 7598–7609 (2008). https://doi.org/10.1021/bi8006803
11. Uversky, V.N., Oldfield, C.J., Dunker, A.K.: Intrinsically disordered proteins in human dis-
eases: introducing the D2 concept. Annu Rev Biophys 37, 215–246 (2008). https://doi.org/
10.1146/annurev.biophys.37.032807.125924
12. Cheng, Y., LeGall, T., Oldfield, C.J., et al.: Abundance of intrinsic disorder in protein asso-
ciated with cardiovascular disease†. Biochemistry 45, 10448–10460 (2006). https://doi.org/
10.1021/bi060981d
13. Uversky, V.N., Vladimir, Uversky N.: Intrinsic disorder in proteins associated with neurode-
generative diseases. Front Biosci. 14, 5188 (2009). https://doi.org/10.2741/3594
14. Uversky, V.N., Oldfield, C.J., Midic, U., et al.: Unfoldomics of human diseases: linking
protein intrinsic disorder with diseases. BMC Genom. 10(Suppl 1), S7 (2009). https://doi.
org/10.1186/1471-2164-10-S1-S7
15. Iakoucheva, L.M., Brown, C.J., Lawson, J.D., et al.: Intrinsic disorder in cell-signaling and
cancer-associated proteins. J. Mol. Biol. 323, 573–584 (2002)
16. Pajkos, M., Mészáros, B., Simon, I., Dosztányi, Z.: Is there a biological cost of protein
disorder? Analysis of cancer-associated mutations. Mol. BioSyst. 8, 296–307 (2012). https://
doi.org/10.1039/c1mb05246b
17. Cheng, Y., LeGall, T., Oldfield, C.J., et al.: Rational drug design via intrinsically disordered
protein. Trends Biotechnol. 24, 435–442 (2006). https://doi.org/10.1016/j.tibtech.2006.07.
005
18. Metallo, S.J.: Intrinsically disordered proteins are potential drug targets. Curr. Opin. Chem.
Biol. 14, 481–488 (2010). https://doi.org/10.1016/j.cbpa.2010.06.169
19. Uversky, V.N.: Natively unfolded proteins: a point where biology waits for physics. Protein
Sci. 11, 739–756 (2002). https://doi.org/10.1110/ps.4210102
20. Dyson, H.J., Wright, P.E.: Coupling of folding and binding for unstructured proteins. Curr.
Opin. Struct. Biol. 12, 54–60 (2002)
21. Berman, H.M.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000). https://doi.
org/10.1093/nar/28.1.235
22. Gunasekaran, K., Tsai, C.-J., Nussinov, R.: Analysis of ordered and disordered protein com-
plexes reveals structural features discriminating between stable and unstable monomers. J.
Mol. Biol. 341, 1327–1341 (2004). https://doi.org/10.1016/j.jmb.2004.07.002
23. Mészáros, B., Tompa, P., Simon, I., Dosztányi, Z.: Molecular principles of the interactions of
disordered proteins. J. Mol. Biol. 372, 549–561 (2007). https://doi.org/10.1016/j.jmb.2007.
07.004
24. Wright, P.E., Jane Dyson, H.: Linking folding and binding. Curr. Opin. Struct. Biol. 19, 31–38
(2009). https://doi.org/10.1016/j.sbi.2008.12.003
25. Uversky, V.N., Oldfield, C.J., Dunker, A.K.: Showing your ID: intrinsic disorder as an ID for
recognition, regulation and cell signaling. J. Mol. Recognit. 18, 343–384 (2005). https://doi.
org/10.1002/jmr.747
26. Dosztányi, Z., Chen, J., Dunker, A.K., et al.: Disorder and sequence repeats in hub proteins
and their implications for network evolution. J. Proteome Res. 5, 2985–2995 (2006). https://
doi.org/10.1021/pr060171o
592 B. Mészáros et al.
27. Demarest, S.J., Martinez-Yamout, M., Chung, J., et al.: Mutual synergistic folding in recruit-
ment of CBP/p300 by p160 nuclear receptor coactivators. Nature 415, 549–553 (2002). https://
doi.org/10.1038/415549a
28. Rumfeldt, J.A.O., Galvagnion, C., Vassall, K.A., Meiering, E.M.: Conformational stability
and folding mechanisms of dimeric proteins. Prog. Biophys. Mol. Biol. 98, 61–84 (2008).
https://doi.org/10.1016/j.pbiomolbio.2008.05.004
29. Tsai, C.-J., Nussinov, R.: Hydrophobic folding units at protein-protein interfaces: implications
to protein folding and to protein-protein association. Protein Sci. 6, 1426–1437 (1997). https://
doi.org/10.1002/pro.5560060707
30. Nussinov, R., Xu, D., Tsai, C.-J.: Mechanism and evolution of protein dimerization. Protein
Sci. 7, 533–544 (1998). https://doi.org/10.1002/pro.5560070301
31. Fichó, E., Reményi, I., Simon, I., Mészáros, B.: MFIB: a repository of protein complexes
with mutual folding induced by binding. Bioinformatics 33, 3682–3684 (2017). https://doi.
org/10.1093/bioinformatics/btx486
32. Bracken, C., Iakoucheva, L.M., Romero, P.R., Dunker, A.K.: Combining prediction, compu-
tation and experiment for the characterization of protein disorder. Curr. Opin. Struct. Biol.
14, 570–576 (2004). https://doi.org/10.1016/j.sbi.2004.08.003
33. Garner, E., Cannon, P., Romero, P., et al.: Predicting disordered regions from amino acid
sequence: common themes despite differing structural characterization. Genome Inform Ser
Workshop Genome Inform 9, 201–213 (1998)
34. Li, X., Romero, P., Rani, M., et al.: Predicting protein disorder for N-, C-, and internal regions.
Genome Inform Ser Workshop Genome Inform 10, 30–40 (1999)
35. Radivojac, P., Obradovic, Z., Smith, D.K., et al.: Protein flexibility and intrinsic disorder.
Protein Sci. 13, 71–80 (2004). https://doi.org/10.1110/ps.03128904
36. He, B., Wang, K., Liu, Y., et al.: Predicting intrinsic disorder in proteins: an overview. Cell
Res. 19, 929–949 (2009). https://doi.org/10.1038/cr.2009.87
37. Wootton, J.C.: Non-globular domains in protein sequences: automated segmentation using
complexity measures. Comput. Chem. 18, 269–285 (1994)
38. Wootton, J.C., Federhen, S.: Analysis of compositionally biased regions in sequence
databases. Methods Enzymol. 266, 554–571 (1996)
39. Romero, P., Obradovic, Z., Li, X., et al.: Sequence complexity of disordered protein. Proteins
Struct. Funct. Genet. 42, 38–48 (2000). https://doi.org/10.1002/1097-0134(20010101)42:1%
3c38:aid-prot50%3e3.0.co;2-3
40. Vucetic, S., Obradovic, Z., Vacic, V., et al.: DisProt: a database of protein disorder. Bioinfor-
matics 21, 137–140 (2005). https://doi.org/10.1093/bioinformatics/bth476
41. Piovesan, D., Tabaro, F., Mičetić, I., et al.: DisProt 7.0: a major update of the database of
disordered proteins. Nucleic Acids Res. 45, D219–D227 (2017). https://doi.org/10.1093/nar/
gkw1056
42. Dutta, S., Burkhardt, K., Young, J., et al.: Data deposition and annotation at the worldwide
protein data bank. Mol. Biotechnol. 42, 1–13 (2009). https://doi.org/10.1007/s12033-008-
9127-7
43. Schad, E., Fichó, E., Pancsa, R., et al.: DIBS: a repository of disordered binding sites mediating
interactions with ordered proteins. Bioinformatics 34, 535–537 (2018). https://doi.org/10.
1093/bioinformatics/btx640
44. Fukuchi, S., Sakamoto, S., Nobe, Y., et al.: IDEAL: intrinsically disordered proteins with
extensive annotations and literature. Nucleic Acids Res. 40, D507–D511 (2012). https://doi.
org/10.1093/nar/gkr884
45. Tompa, P., Fuxreiter, M.: Fuzzy complexes: polymorphism and structural disorder in protein-
protein interactions. Trends Biochem. Sci. 33, 2–8 (2008). https://doi.org/10.1016/j.tibs.2007.
10.003
46. Miskei, M., Antal, C., Fuxreiter, M.: FuzDB: database of fuzzy complexes, a tool to develop
stochastic structure-function relationships for protein complexes and higher-order assemblies.
Nucleic Acids Res. 45, D228–D235 (2017). https://doi.org/10.1093/nar/gkw1019
Bioinformatical Approaches to Unstructured/Disordered Proteins … 593
47. Piovesan, D., Tabaro, F., Paladin, L., et al.: MobiDB 3.0: more annotations for intrinsic disor-
der, conformational diversity and interactions in proteins. Nucleic Acids Res. 46, D471–D476
(2017). https://doi.org/10.1093/nar/gkx1071
48. Ulrich, E.L., Akutsu, H., Doreleijers, J.F., et al.: BioMagResBank. Nucleic Acids Res. 36,
D402–D408 (2007). https://doi.org/10.1093/nar/gkm957
49. Necci, M., Piovesan, D., Dosztányi, Z., Tosatto, S.C.E.: MobiDB-lite: fast and highly specific
consensus prediction of intrinsic disorder in proteins. Bioinformatics 33, 1402–1404 (2017).
https://doi.org/10.1093/bioinformatics/btx015
50. Oates, M.E., Romero, P., Ishida, T., et al.: D2 P2 : database of disordered protein predictions.
Nucleic Acids Res. 41, D508–D516 (2013). https://doi.org/10.1093/nar/gks1226
51. Mohan, A., Uversky, V.N., Radivojac, P.: Influence of sequence changes and environment on
intrinsically disordered proteins. PLoS Comput. Biol. 5, e1000497 (2009). https://doi.org/10.
1371/journal.pcbi.1000497
52. De Biasio, A., Guarnaccia, C., Popovic, M., et al.: Prevalence of intrinsic disorder in the
intracellular region of human single-pass type I proteins: the case of the notch ligand Delta-4.
J. Proteome Res. 7, 2496–2506 (2008). https://doi.org/10.1021/pr800063u
53. Uversky, V.N., Gillespie, J.R., Fink, A.L.: Why are “natively unfolded” proteins unstructured
under physiologic conditions? Proteins 41, 415–427 (2000)
54. Galzitskaya, O.V., Garbuzynskiy, S.O., Lobanov, M.Y.: FoldUnfold: web server for the pre-
diction of disordered regions in protein chain. Bioinformatics 22, 2948–2949 (2006). https://
doi.org/10.1093/bioinformatics/btl504
55. Xie, Q., Arnold, G.E., Romero, P., et al.: The sequence attribute method for determining
relationships between sequence and protein disorder. Genome Inform Ser Workshop Genome
Inform 9, 193–200 (1998)
56. Campen, A., Williams, R.M., Brown, C.J., et al.: TOP-IDP-scale: a new amino acid scale
measuring propensity for intrinsic disorder. Protein Pept. Lett. 15, 956–963 (2008)
57. Linding, R., Russell, R.B., Neduva, V., Gibson, T.J.: GlobPlot: exploring protein sequences
for globularity and disorder. Nucleic Acids Res. 31, 3701–3708 (2003)
58. Cheng, J., Sweredoski, M.J., Baldi, P.: Accurate prediction of protein disordered regions by
mining protein structure data. Data Min. Knowl. Discov. 11, 213–222 (2005). https://doi.org/
10.1007/s10618-005-0001-y
59. Fuxreiter, M., Simon, I., Friedrich, P., Tompa, P.: Preformed structural elements feature in part-
ner recognition by intrinsically unstructured proteins. J. Mol. Biol. 338, 1015–1026 (2004).
https://doi.org/10.1016/j.jmb.2004.03.017
60. Süveges, D., Gáspári, Z., Tóth, G., Nyitray, L.: Charged single alpha-helix: a versatile protein
structural motif. Proteins 74, 905–916 (2009). https://doi.org/10.1002/prot.22183
61. Brown, C.J., Takayama, S., Campen, A.M., et al.: Evolutionary rate heterogeneity in proteins
with long disordered regions. J. Mol. Evol. 55, 104–110 (2002). https://doi.org/10.1007/
s00239-001-2309-6
62. Daughdrill, G.W., Narayanaswami, P., Gilmore, S.H., et al.: Dynamic behavior of an intrinsi-
cally unstructured linker domain is conserved in the face of negligible amino acid sequence
conservation. J. Mol. Evol. 65, 277–288 (2007). https://doi.org/10.1007/s00239-007-9011-2
63. Peng, K., Radivojac, P., Vucetic, S., et al.: Length-dependent prediction of protein intrinsic
disorder. BMC Bioinform. 7, 208 (2006). https://doi.org/10.1186/1471-2105-7-208
64. Melamud, E., Moult, J.: Evaluation of disorder predictions in CASP5. Proteins 53(Suppl 6),
561–565 (2003). https://doi.org/10.1002/prot.10533
65. Jin, Y., Dunbrack Jr., R.L.: Assessment of disorder predictions in CASP6. Proteins 61(Suppl
7), 167–175 (2005). https://doi.org/10.1002/prot.20734
66. Bordoli, L., Kiefer, F., Schwede, T.: Assessment of disorder predictions in CASP7. Proteins
69(Suppl 8), 129–136 (2007). https://doi.org/10.1002/prot.21671
67. Noivirt-Brik, O., Prilusky, J., Sussman, J.L.: Assessment of disorder predictions in CASP8.
Proteins 77(Suppl 9), 210–216 (2009). https://doi.org/10.1002/prot.22586
68. Monastyrskyy, B., Fidelis, K., Moult, J., et al.: Evaluation of disorder predictions in CASP9.
Proteins 79(Suppl 10), 107–118 (2011). https://doi.org/10.1002/prot.23161
594 B. Mészáros et al.
69. Monastyrskyy, B., Kryshtafovych, A., Moult, J., et al.: Assessment of protein disorder region
predictions in CASP10. Proteins 82(Suppl 2), 127–137 (2014). https://doi.org/10.1002/prot.
24391
70. Liu, Y., Wang, X., Liu, B.: A comprehensive review and comparison of existing computational
methods for intrinsically disordered protein and region prediction. Brief. Bioinform. (2017).
https://doi.org/10.1093/bib/bbx126
71. Dosztányi, Z., Sándor, M., Tompa, P., Simon, I.: Prediction of protein disorder at the domain
level. Curr. Protein Pept. Sci. 8, 161–171 (2007)
72. Schlessinger, A., Punta, M., Yachdav, G., et al.: Improved disorder prediction by combination
of orthogonal approaches. PLoS ONE 4, e4433 (2009). https://doi.org/10.1371/journal.pone.
0004433
73. Necci, M., Piovesan, D., Dosztányi, Z., et al.: A comprehensive assessment of long intrinsic
protein disorder from the DisProt database. Bioinformatics 34, 445–452 (2018). https://doi.
org/10.1093/bioinformatics/btx590
74. Meng, F., Uversky, V.N., Kurgan, L.: Comprehensive review of methods for prediction of
intrinsic disorder and its molecular functions. Cell. Mol. Life Sci. 74, 3069–3090 (2017).
https://doi.org/10.1007/s00018-017-2555-4
75. Romero, Obradovic, Dunker, K.: Sequence data analysis for long disordered regions prediction
in the calcineurin family. Genome Inform Ser Workshop Genome Inform 8, 110–124 (1997)
76. Oldfield, C.J., Cheng, Y., Cortese, M.S., et al.: Coupled folding and binding with alpha-helix-
forming molecular recognition elements. Biochemistry 44, 12454–12470 (2005). https://doi.
org/10.1021/bi050736e
77. Cheng, Y., Oldfield, C.J., Meng, J., et al.: Mining alpha-helix-forming molecular recognition
features with cross species sequence alignments. Biochemistry 46, 13468–13477 (2007).
https://doi.org/10.1021/bi7012273
78. Radivojac, P., Obradović, Z., Brown, C.J., Dunker, A.K.: Prediction of boundaries between
intrinsically ordered and disordered protein regions. Pac. Symp. Biocomput. 216–227 (2003)
79. Obradovic, Z., Peng, K., Vucetic, S., et al.: Predicting intrinsic disorder from amino acid
sequence. Proteins 53(Suppl 6), 566–572 (2003). https://doi.org/10.1002/prot.10532
80. Schaffer, A.A.: Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements. Nucleic Acids Res. 29, 2994–3005 (2001).
https://doi.org/10.1093/nar/29.14.2994
81. Linding, R., Jensen, L.J., Diella, F., et al.: Protein disorder prediction. Structure 11, 1453–1459
(2003). https://doi.org/10.1016/j.str.2003.10.002
82. Yang, Z.R., Thomson, R., McNeil, P., Esnouf, R.M.: RONN: the bio-basis function neural
network technique applied to the detection of natively disordered regions in proteins. Bioin-
formatics 21, 3369–3376 (2005). https://doi.org/10.1093/bioinformatics/bti534
83. Jones, D.T., Cozzetto, D.: DISOPRED3: precise disordered region predictions with anno-
tated protein-binding activity. Bioinformatics 31, 857–863 (2015). https://doi.org/10.1093/
bioinformatics/btu744
84. McGuffin, L.J., Atkins, J.D., Salehe, B.R., et al.: IntFOLD: an integrated server for mod-
elling protein structures and functions from amino acid sequences. Nucleic Acids Res. 43,
W169–W173 (2015). https://doi.org/10.1093/nar/gkv236
85. Cheng, J., Randall, A.Z., Sweredoski, M.J., Baldi, P.: SCRATCH: a protein structure and
structural feature prediction server. Nucleic Acids Res. 33, W72–W76 (2005). https://doi.org/
10.1093/nar/gki396
86. Wang, L., Sauer, U.H.: OnD-CRF: predicting order and disorder in proteins using [corrected]
conditional random fields. Bioinformatics 24, 1401–1402 (2008). https://doi.org/10.1093/
bioinformatics/btn132
87. Wang, S., Weng, S., Ma, J., Tang, Q.: DeepCNF-D: predicting protein order/disorder regions
by weighted deep convolutional neural fields. Int. J. Mol. Sci. 16, 17315–17330 (2015).
https://doi.org/10.3390/ijms160817315
88. Obradovic, Z., Peng, K., Vucetic, S., et al.: Exploiting heterogeneous sequence properties
improves prediction of protein disorder. Proteins Struct. Funct. Bioinf 61, 176–182 (2005).
https://doi.org/10.1002/prot.20735
Bioinformatical Approaches to Unstructured/Disordered Proteins … 595
89. Walsh, I., Martin, A.J.M., Di Domenico, T., Tosatto, S.C.E.: ESpritz: accurate and fast
prediction of protein disorder. Bioinformatics 28, 503–509 (2012). https://doi.org/10.1093/
bioinformatics/btr682
90. Xue, B., Dunbrack, R.L., Williams, R.W., et al.: PONDR-FIT: a meta-predictor of intrinsically
disordered amino acids. Biochim. Biophys. Acta 1804, 996–1010 (2010). https://doi.org/10.
1016/j.bbapap.2010.01.011
91. Prilusky, J., Felder, C.E., Zeev-Ben-Mordehai, T., et al.: FoldIndex: a simple tool to predict
whether a given protein sequence is intrinsically unfolded. Bioinformatics 21, 3435–3438
(2005). https://doi.org/10.1093/bioinformatics/bti537
92. Dosztányi, Z., Csizmók, V., Tompa, P., Simon, I.: The pairwise energy content estimated from
amino acid composition discriminates between folded and intrinsically unstructured proteins.
J. Mol. Biol. 347, 827–839 (2005). https://doi.org/10.1016/j.jmb.2005.01.071
93. Mizianty, M.J., Peng, Z., Kurgan, L.: MFDp2. Intrinsically Disordered Proteins 1, e24428
(2013). https://doi.org/10.4161/idp.24428
94. Fan, X., Kurgan, L.: Accurate prediction of disorder in protein chains with a comprehensive
and empirically designed consensus. J. Biomol. Struct. Dyn. 32, 448–464 (2014). https://doi.
org/10.1080/07391102.2013.775969
95. Mizianty, M.J., Zhang, T., Xue, B., et al.: In-silico prediction of disorder content using
hybrid sequence representation. BMC Bioinform. 12, 245 (2011). https://doi.org/10.1186/
1471-2105-12-245
96. Walsh, I., Martin, A.J.M., Di Domenico, T., et al.: CSpritz: accurate prediction of protein dis-
order segments with annotation for homology, secondary structure and linear motifs. Nucleic
Acids Res. 39, W190–W196 (2011). https://doi.org/10.1093/nar/gkr411
97. Zhang, T., Faraggi, E., Xue, B., et al.: SPINE-D: accurate prediction of short and long disor-
dered regions by a single neural-network based method. J. Biomol. Struct. Dyn. 29, 799–813
(2012). https://doi.org/10.1080/073911012010525022
98. Bujnicki, J.M., Elofsson, A., Fischer, D., Rychlewski, L.: LiveBench-2: large-scale automated
evaluation of protein structure prediction servers. Proteins Suppl. 5, 184–191 (2001)
99. McGuffin, L.J.: Intrinsic disorder prediction from the analysis of multiple protein fold recog-
nition models. Bioinformatics 24, 1798–1804 (2008). https://doi.org/10.1093/bioinformatics/
btn326
100. Lobanov, M.Y., Galzitskaya, O.V.: The Ising model for prediction of disordered residues from
protein sequence alone. Phys. Biol. 8, 035004 (2011). https://doi.org/10.1088/1478-3975/8/
3/035004
101. Lobanov, M.Y., Sokolovskiy, I.V., Galzitskaya, O.V.: IsUnstruct: prediction of the residue
status to be ordered or disordered in the protein chain by a method based on the Ising model. J.
Biomol. Struct. Dyn. 31, 1034–1043 (2013). https://doi.org/10.1080/07391102.2012.718529
102. Dosztányi, Z.: Prediction of protein disorder based on IUPred. Protein Sci. 27, 331–340
(2018). https://doi.org/10.1002/pro.3334
103. Thomas, P.D., Dill, K.A.: An iterative method for extracting energy-like quantities from
protein structures. Proc. Natl. Acad. Sci. U S A 93, 11628–11633 (1996)
104. Shortle, D.: Propensities, probabilities, and the Boltzmann hypothesis. Protein Sci. 12,
1298–1302 (2003). https://doi.org/10.1110/ps.0306903
105. Dosztanyi, Z., Csizmok, V., Tompa, P., Simon, I.: IUPred: web server for the prediction of
intrinsically unstructured regions of proteins based on estimated energy content. Bioinfor-
matics 21, 3433–3434 (2005). https://doi.org/10.1093/bioinformatics/bti541
106. Dosztányi, Z., Mészáros, B., Simon, I.: ANCHOR: web server for predicting protein binding
regions in disordered proteins. Bioinformatics 25, 2745–2746 (2009). https://doi.org/10.1093/
bioinformatics/btp518
107. Disfani, F.M., Hsu, W.-L., Mizianty, M.J., et al.: MoRFpred, a computational tool for
sequence-based prediction and characterization of short disorder-to-order transitioning
binding regions in proteins. Bioinformatics 28, i75–i83 (2012). https://doi.org/10.1093/
bioinformatics/bts209
596 B. Mészáros et al.
B. Lesyng
Faculty of Physics, Department of Biophysics, University of Warsaw,
Żwirki i Wigury 93, 02-892 Warsaw, Poland
e-mail: [email protected]
P. Daniluk (B) · B. Lesyng
Bioinformatics Laboratory, Mossakowski Medical Research Centre,
Pawinskiego 5, 02-106 Warsaw, Poland
e-mail: [email protected]
1 Introduction
Proteins are biopolymers comprising one or more polypeptide chains. There exist
twenty amino-acid residues which occur in proteins encountered in living organisms.
Thus a first approximation (primary structure) of a protein is its sequence, normally
represented as a string of letters from a 20 letter alphabet. Sequences may be com-
pared to reveal genetic, evolutionary relationships between proteins. Sequence com-
parison is a variant of a well researched string matching problem which is usually
solved with an ubiquitous Needleman-Wunsch algorithm [35] or its heuristic coun-
terparts [2, 39]. A polypeptide chain of a protein after synthesis undergoes a process
of folding in which it obtains a well defined characteristic spatial conformation (ter-
tiary structure). Structure is instrumental to the role a given protein performs in a
living organism. With some simplification, one may assume that a residue sequence
determines a spatial structure, which in turn determines a function. Due to the nature
of evolutionary processes it can be observed that structure is a much more conserved
property than sequence. Even remotely homologous proteins usually have similar
tertiary structure. Therefore, comparison of structures, although more difficult, may
provide more information on evolutionary and functional relationships than sequence
analysis alone.
Although several methods for protein structure comparison have been developed
during the past two decades, no single “best of all” method exists, and there are many
known cases of so-called difficult similarities, which cannot be correctly solved by
most methods. Relatively little effort has been put into development of formal theories
of this problem, which would enable a thorough analysis of its properties.
The purpose of this study is to give a brief overview of the existing approaches
and methodologies followed by a formal analysis of several variants of the prob-
lem of computing alignments based on a set of local similarities. Description of a
method based on presented theoretical principles along with a few practical aspects
of comparing protein structures are also provided.
This study is organized as follows. The introduction covers basic definitions,
contains a brief overview of popular methods, outlines potential pitfalls and gives a
short introduction into theory of computational complexity. In the following section
the most popular approaches to defining and comparing structural fragments are
presented. The third and fourth sections are devoted to the problems of computing
an optimal alignment of two or more protein structures and include an analysis of the
computational complexity of several variants of these problems. In the fifth section
we present practical but rarely used techniques which may be useful in similarity
analysis, as well as several case studies.
The notion of an alignment in the context of biological sequences originates from the
concept of introducing gaps into sequences written one below the other, to maximize
the number of columns with identical or similar residues. Alternatively, one may
Theoretical and Computational Aspects … 599
Methodologies of protein structure comparison may be classified into two major cate-
gories – global and local. In the first one an alignment and superposition of molecules
are iteratively improved. Starting with a given alignment, an optimal superposition
is computed, then a new alignment is extracted from the superposition by identifying
pairs of residues spatially close to each other. Such methods are effective, assum-
ing conformational variability is limited and similarity is significant enough for the
process to converge quickly.
Alternatively, computing an alignment may start with identifying a set of local sim-
ilarities, which afterwards serve as building blocks for the global alignment. There are
several methods of decomposing structures into smaller fragments. The most popular
are inter-residue distances (SSAP [37], DALI [21], PAUL [50]), single continuous
segments of the main chain (CE [46]) or secondary structure elements (SSEs) (VAST
[17], SARF [1], MATRAS [26], GANGSTA [19]). Less popular include Delaunay
triangulation (TOPOFIT [22]), spherical polar Fourier representations (3D-BLAST
[31]), and geometric hashing (Cα -match [4]). Local descriptors of protein structures
(see Sect. 2.2) have also been successfully applied (DEDAL [10]). Global alignment
is computed by selecting the largest consistent set of local similarities. Definitions of
consistency and methods for searching the solution space vary. Usually it is required
that correspondences between residues given by two consistent alignments have to
600 P. Daniluk and B. Lesyng
agree on all residues common to both of them. Sometimes additional criteria are used,
such as the similarity of transformations required to superimpose fragments [4] or
the ordering in the protein sequence are used. The search of the solution space is
performed using algorithms for finding isomorphic subgraphs or cliques, clustering,
dynamic programming or other techniques. Some methods use a one-dimensional
representation of structure – where each residue is substituted with a characteriza-
tion of its local features – and use dynamic programming to align such artificial
sequences (e.g. SHEBA [23]). Due to the computational complexity caused by the
combinatoric size of the solution space, solutions containing circular permutations or
segment swaps are disregarded even if the method could find them in theory. Such a
situation takes place with the DALI method and its publicly available implementation
DaliLite [20, 21]. Sometimes spatial distortions are accommodated by introducing
“hinges” (FATCAT [52], FlexProt [43], ProtDeform [41], FlexSnap [42]) (Table 1).
The problem of computing multiple alignments of protein structures is much
harder and less popular. There are two basic approaches to defining and computing
a multiple alignment – searching for a substructure common to all structures com-
pared, or searching for all similarities as long as equivalences between residues are
unambiguous (see Sect. 4.1). Existing methods are often generalizations of methods
of computing pairwise alignments. Based on the similarity of all pairs a binary tree
is built. Its leaves correspond to structures, while nodes to multiple alignments of
structures in its descendants, which are computed in a manner similar to aligning
Table 1 Selected methods for computing alignments of two protein structures
Method name Year Authors Flexible Segment swaps
SSAP [37] 1989 Orengo and Taylor No No
Cα -match [4] 1993 Bachar et al. No Yes
DALI [21] 1993 Holm and Sander No Noa
VAST [17] 1996 Gibrat et al. No No
SARF [1] 1996 Alexandrov No Yes
CE [46] 1998 Shindyalov and No No
Bourne
SHEBA [23] 2000 Jung and Lee No No
MATRAS [26] 2000 Kawabata and No No
Nishikawa
FATCAT [52] 2003 Ye and Godzik Yes No
TOPOFIT [22] 2004 Ilyin et al. No No
FlexProt [43] 2004 Shatsky et al. Yes No
GANGSTA [19] 2008 Guerler and Knapp No Yes
ProtDeform [41] 2009 Rocha et al. Yes No
3D-BLAST [31] 2010 Mavridis and Ritchie No No
FlexSnap [42] 2010 Salem et al. Yes Yes
PAUL [50] 2010 Wohlers et al. No No
DEDAL [10] 2011 Daniluk and Lesyng Yes Yes
a DALI in principle is capable of computing alignments with segment swaps, but the publicly
two structures. When computation ends, the root node contains a multiple alignment
of all structures (MUSTANG [28], POSA [53]). Sometimes a strategy similar to
hierarchical clustering is used. Starting with single structures, at each step the two
most similar multiple alignments (or structures) are combined (Matt [33]). There
also exist methods where all structures are considered at the same time. MASS [13]
is based on searching for maximal correspondences between SSEs assuming rigid
global superpositions. On the other hand, MultiProt [44] attempts to align a chosen
pivot structure with all others. This process is repeated for all selections of pivot, and
the best multiple alignment is returned. DAMA [9] – an extension of the DEDAL
method employing an evolutionary algorithm is currently under development.
In many cases, the similarity between protein structures is either obvious or non-
existent. Nevertheless, there exists a “grey area” of so-called difficult similarities. It
comprises cases where similarity between sequences cannot be detected or is mis-
leading, the evolutionary relationship is not obvious, or where there exist significant
distortions that obscure the similarity. These distortions may include repeats, inser-
tions or deletions, permutations or substantial conformational changes.
Repeating motifs involve a significant combinatorial burden, because in principle
all assignments between occurrences of such a motif should be assessed. This is par-
ticularly challenging in case of the so-called propeller folds, which contain structures
similar to a marine propeller. They are composed of 4–8 blades resulting in up to 8!1
possible assignments of blades and at least 8 equivalent alignments.
Insertions and deletions may be a result of genomic rearrangements. After losing
a segment of a significant length a protein may retain its conformation. Nevertheless
the similarity is obfuscated by size differences, and the fact that some fragments
of the smaller structure usually have a different conformation to fill the gap after
missing residues.
Permutations probably pose the most fundamental challenge since the whole con-
cept of an alignment has to be readjusted. Circular permutations are the most common
example. They may be caused by gene duplication or rearrangements of the protein
chain during folding [49]. Two protein chains are circular permutations of each other
if they can be divided into two subunits (A1 − B1 and A2 − B2 respectively), such
that structures A1 − B1 and A2 − B2 are similar in the traditional sense (without
permutations). More complex rearrangements (e.g. caused by changes of the num-
ber of residues in loop regions) have been observed [18]. Oligomeric structures are
another example of sequence rearrangements. Sometimes proteins composed of sev-
eral chains are similar despite the fact that chain boundaries are placed differently or
that numbers of chains differ (in such a case, chains cannot be compared separately).
1 Factorial: 8! = 1 · 2 · …· 8 = 40320.
602 P. Daniluk and B. Lesyng
Finally protein structures are not rigid. Many functions they perform involve con-
formational changes [12, 16]. Furthermore experimental methods used to determine
tertiary structure usually involve changing environmental conditions to nonphys-
iological, which may distort the studied structure (see Sect. 5.3 for an example).
Conformational variability is especially difficult, because assessing structural sim-
ilarity relies on geometrical data. Distinguishing between “natural” flexibility and
dissimilarity may be challenging even to experts.
This study presents several results concerning the computational complexity of pro-
tein structure alignment. In this section we provide a brief introduction to the theory
of computational complexity.
Traditionally computational complexity theory is applied to so-called decision
problems, which originate from the formalism of recognizing languages by finite-
state automata or Turing machines. In this formalism, instances of a problem are
encoded as words over a certain alphabet, and words corresponding to instances
with positive answers belong to a language recognized by a machine. Decision prob-
lems have a strict form – “For a given instance I determine whether I satisfies a
predicate P(I ).”, which is quite different to an open form of optimization problems
which can be stated as follows “For a given instance I find a solution S which has a
maximal value of property p from all valid solutions of I .”. Any optimization prob-
lem, however, can be transformed to a decision problem of the form “For a given
instance I and a value v does there exist a solution with value of property p greater
than or equal to v.”.
There are two fundamental classes of decision problems. The first one (P) contains
problems which can be solved by a Turing machine in polynomial time. This in
practice means, that a polynomial time algorithm for solving such a problem exists,
and can be implemented on any computer. Such problems are considered tractable or
efficiently solvable since computation time for any instance is limited by a polynomial
function of its size. The second major class (NP) comprises problems which can be
solved in polynomial time by a non-deterministic Turing machine. This informally
means, that given a potential solution to the problem, it is possible to check if it is
valid in polynomial time. All problems from P belong to NP, because if the solution
can be computed in polynomial time, it can also be efficiently checked. There are,
however, problems in NP for which a polynomial time algorithm is not known. Some
of them belong to a subclass of NP-complete problems, which may be deemed as
a collection of the “hardest” problems in NP. It can be proven that, if there exists
a polynomial time solution for any NP-complete problem, all problems in NP also
have a polynomial time solution, and thus P = NP. Until now finding such a solution,
or proving that it does not exist, remains an open problem.
Problems in NP can be “ranked” by their “difficulty”, with NP-complete problems
being the hardest. In order to prove that a given problem P1 is NP-complete, it is
Theoretical and Computational Aspects … 603
enough to prove that it belongs to NP and that it is “harder” than a known NP-
complete problem (P2 ). This is performed by constructing a so-called reduction of
P2 into P1 . A reduction is a recipe for converting all instances of P2 into instances
of P1 preserving the decision result (i.e. accepted instances of P2 are converted to
accepted instances of P1 and vice versa). The reduction has to be performed in
polynomial time. This proves that if P1 is tractable, any instance of P2 also can be
solved in polynomial time by converting it into an instance of P1 and applying an
algorithm for P1 . Therefore, if P1 belongs to P, P2 does also along with all problems
in NP.
More information on computational complexity may be found in the seminal book
[15].
2 Fragment-Based Methods
where aki (bki ) denotes the ith element of vector ak (bk ). One can easily see that M and
geometrical centers can be recycled. Extending sets A and B after computing their
RMSD can be easily implemented, greatly reducing computational complexity (e.g.
computing RMSD of segments of length n requires O(n) time, just like computing
distances between all prefixes of A and B). A pair of similar segments is usually
called an aligned fragment pair (AFP).
To our best knowledge all alignment methods using AFP employ some sort of
a global similarity measure. It is necessary because the fact that alignment is built
from APFs does not imply actual similarity of aligned substructures. The inability to
capture spatial relationships between residues distant in the sequence but neighboring
in space is the main drawback of continuous segments. It can be amended by using
fragments encompassing at least two disjoint pieces of backbone.
DALI [21], a popular and highly regarded method, uses pairs of continuous seg-
ments of length 6. A similarity measure is based on the distances between points rep-
resenting residues. If A = {a1 , . . . , a12 } and B = {b1 , . . . , b12 } are residues belong-
ing to certain pairs of hexapeptides in structures A and B, similarity is computed as
follows:
12 12
S= θ − d(ai , a j ) − d(bi , b j )
i=1 j=1
where d(a, b) is an Euclidean distance between points a and b, and θ is the parameter
determining a zero level of similarity. The distance based approach is appealing
because distance maps are invariant under isometric transformations, hence there is
no need to search for a transformation giving the optimal superposition. It is also easy
to implement and fast for small fragments (although its computational complexity is
bound by O(N 2 )).
2.2.1 Definition
ψ(El(a (i) )) = ψ(a (i−2) )ψ(a (i−1) )ψ(a (i) )ψ(a (i+1) )ψ(a (i+2) )
= b( j−2) b( j−1) b( j) b( j+1) b( j+2) = El(b( j) )
2. ψ(El(a1 ) = ψ(El(a2 ))
In simple terms, a mapping contains pairs of corresponding contacts. It does not
necessarily cover all contacts in both descriptors, but each contact may have only
one corresponding counterpart in the other descriptor. To be valid a mapping has to
preserve overlapping of elements. Contacts with overlapping elements can be mapped
only to contacts with the same overlap, while non-overlapping contacts may have
only non-overlapping counterparts. We will say that a valid mapping constitutes an
alignment of descriptors. One should note that under this definition an alignment may
Theoretical and Computational Aspects … 607
contain so-called segment swaps (i.e. aligned segments may have different order in
structures they originate from). This is a fundamental difference between traditional
understanding of alignment and our definition.
For two descriptors to be similar, an alignment between them has to exist and
satisfy requirements imposed on its size and the spatial similarity of aligned sub-
structures. The size can be measured with the number of aligned residues, elements
or segments, while spatial similarity may be assessed using a Root Mean Square
Distance (RMSD). This is a two-objective optimization problem, since extending
alignment will most likely increase RMSD between substructures and vice versa.
To reliably solve this problem, we use an extensive search algorithm that finds
alignments satisfying the following conditions:
1. the RMSD of aligned elements must not exceed 1.5 Å,
2. for each pair of aligned elements, the RMSD of substructures consisting of these
elements and respective central elements must not exceed 2.5 Å (i.e. elements
should have the same position relative to the central element),
3. at least half of the segments must be aligned,
4. the RMSD of aligned residues must not exceed 2.5 Å.
The algorithm searches through all alignments satisfying the above conditions.
First, all pairs of elements satisfying conditions 1 and 2 are identified. Then, all
possible assemblies of those pairs are checked for condition 4. If it is not met, they
are reduced by removing the least fitting pairs of elements, until either condition 4
is met or condition 3 is no longer satisfied.
In Sect. 1.4 we have briefly explained the main ideas behind the theory of compu-
tational complexity. We will demonstrate that the problem of assessing descriptor
similarity is NP-complete. We start by providing a formal definition of the decision
problem for finding an optimal descriptor alignment. The definition will be slightly
simpler than the one used in the previous section in order to avoid technical difficul-
ties.
Definition 1 For two descriptors D1 , D2 and constants n and T the Optimal Align-
ment of Descriptors (OAD) problem is to determine whether there exists an align-
ment of D1 and D2 covering no less than n residues such that the RMSD between
aligned residues is not greater than T .
Proof First we notice that it is enough to prove that OAD is NP-complete for one
particular value of T , since, if a problem contains an NP-complete sub-problem it is
NP-complete itself. Thus we will assume that T is large enough for any alignment
to be structurally acceptable (e.g. T = ∞).
608 P. Daniluk and B. Lesyng
s(a) = m B,
a∈A
It is easy to see that any subset in such partition must contain exactly three
elements. Our reduction will assign an instance of OAD to any instance of 3-
PARTITION. We will show a method of constructing descriptors D1 and D2 for any
m, B and s. Because we have assumed that the threshold for the RMSD of aligned
residues is infinitely large, we don’t have to deal with providing coordinates. It is
enough to give contact patterns.
A comb of length k (Fig. 3a) is a contact pattern which contains k residues such
that subsequent residues lay one residue apart in the sequence. Let D1 contain 3m
combs (one for each element of A) of lengths given by values of s for corresponding
(b)
... ...
(c)
Theoretical and Computational Aspects … 609
4 Unless P = NP.
5A protein is a polypeptide which under physiological conditions assumes and maintains a certain
native conformation.
610 P. Daniluk and B. Lesyng
Definition 4 For given structures S1 and S2 , set Φ and number k the Optimal Struc-
ture Alignment Problem (OSA) is to determine, whether there exists an alignment
of S1 and S2 with support in Φ covering at least k residues.
In other words, the task is to choose from a given set of triples (M) a subset in
which every element from sets W , X and Y occurs exactly once. There also exists a
two dimensional version of this problem – 2-DIMENSIONAL MATCHING (2DM,
also called a marriage matching problem) where M ⊆ X × Y . Although very similar
surprisingly it can be solved in polynomial time. We will use a slightly modified
version of this problem.
Definition 6 For given sets M ⊆ X × Y and G ⊆ P(M),6 where X and Y are dis-
joint sets of size q, the RESTRICTED 2-DIMENSIONAL MATCHING (R2DM)
problem is to determine, whether there exists a subset M ⊆ Msuch that, |M | = q,
elements of M are disjoint and there exists G ⊆ G such that G = M .
R2DM may be viewed as a case of the marriage matching where would-be wives
set conditions of the kind: “I will marry you, if Mr. X marries Ms. W and Mr. Q
marries Ms. S”. Such conditions are encoded as elements of the set G. To prove,
that R2DM is NP-complete, we may use a simple reduction of 3DM. Each triple
from M in 3DM is encoded as two pairs in M and a set containing these pairs in G.
Figure 4 contains an example of such transformation. One can easily establish that
a solution of an instance of R2DM obtained from 3DM can always be converted to
a solution of the original problem. Furthermore, if the original 3DM instance has a
valid solution, the corresponding instance of R2DM is always solvable.
R2DM is very convenient for proving that OSA is intractable. In the conversion
of an R2DM instance to OSA sets X and Y will correspond to sets residues of S1
and S2 ; set M – to the set of all pairs of mapped residues in alignments from Φ and
finally set G – to Φ itself. Elements of G are sets of pairs from M which have to
be picked together. In the case of alignment with support in Φ, each pair of aligned
residues has to belong to a local alignment from Φ. Let S be a set of pairs from
G. S is converted to an alignment, which for each pair in S maps together residues
corresponding to its elements. A subset A being a solution of R2DM corresponds
to the alignment covering whole structures. A set G corresponds to the support of
this alignment in Φ.
To make the reduction possible, local similarities have to allow for sequence
swaps. Otherwise, instances of R2DM for which sets X and Y cannot be ordered
in such a way that all elements in G are ordered on both positions, could not be
converted to an instance of OSA.
In the previous section we have proven the intractability of the Optimal Structure
Alignment problem. The proof presented applies to the most generic version of OSA
where local similarities may be any arbitrary mappings between residues. All “real”
approaches known to us employ local similarities having a well defined structure
(e.g. continuous segments, pairs of segments, local descriptors). If we look back to
(a)
(b)
Fig. 4 a Sample instance of 3DM. b The same instance converted to R2DM. Each triple is converted
to two pairs and an element in set G. Primes are added to element names to comply with the
requirement that sets X and Y are to be disjoint
Proof These sort of cases can easily be solved with the modified Smith-Waterman
algorithm. Algorithms of this sort based on dynamic programming usually have
polynomial complexity. In this particular case a pessimistic estimate of computation
time linearly depends on the number of residues in aligned structures and the size of
Φ (O(|S1 | |S2 | |Φ|)).
Theorem 4 A variant of OSSA where the set of local similarities may contain match-
ings of three separate continuous segments is NP-complete.
Theoretical and Computational Aspects … 613
ti s = vi
ϕi (s) =
c p s = c p and clause pcontains a positive appearance of ith variable
f i s = vi
ϕ i (s) =
c p s = c p and clause p contains a negative appearance of ith variable
C = {{¬u 1 , u 2 , u 3 } , {u 1 , ¬u 2 , u 3 } , {u 1 , u 2 , ¬u 3 }}
(¬u 1 ∨ u 2 ∨ u 3 ) ∧ (u 1 ∨ ¬u 2 ∨ u 3 ) ∧ (u 1 ∨ u 2 ∨ ¬u 3 )
This instance of 3SAT could be converted to the following instance of OSSA (assum-
ing that all segments are of length 5) (ϕ(a) = ⊥ means that a ∈/ Dom(ϕ)):
614 P. Daniluk and B. Lesyng
v1 v2 v3
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
S1 = a a a a a a a a a a a a a a a
c1 c2 c3
a (16) a (17) a (18) a (19)a (20) a (21)a (22)a (23) a (24)a (25) a (26) a (27) a (28) a (29)a (30)
t1 t2
a (19)a (20)a (21)a (22)a (23) a (24) a (25) a (26) a (27) a (28) a (29)a (30)a (31)a (32)a (33)
v1 v2 v3
(1) (2) (3) (4) (5)
ϕ1 (S1 ) = b b b b b ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥
t1
c1 c2 c3
⊥ ⊥ ⊥ ⊥ ⊥ b(24) b(25) b(26) b(27) b(28) b(29) b(30) b(31) b(32) b(33)
c2 c3
v1 v2 v3
b(19) b(20) b(21) b(22) b(23) ⊥ ⊥ ⊥ ⊥ ⊥ b(29) b(30) b(31) b(32) b(33)
c1 c3
v1 v2 v3
(7) (8) (9) (10) (11)
ϕ 2 (S1 ) = ⊥ ⊥ ⊥ ⊥ ⊥ b b b b b ⊥ ⊥ ⊥ ⊥ ⊥
f2
c1 c2 c3
(24) (25) (26) (27) (28)
⊥ ⊥ ⊥ ⊥ ⊥ b b b b b ⊥ ⊥ ⊥ ⊥ ⊥
c2
Theoretical and Computational Aspects … 615
v1 v2 v3
b(19) b(20) b(21) b(22) b(23) b(24) b(25) b(26) b(27) b(28) ⊥ ⊥ ⊥ ⊥ ⊥
c1 c2
v1 v2 v3
u 1 → 0, u 2 → 1, u 3 → 1
v1 v2 v3
ξ (S1 ) = b(2) b(3) b(4) b(5) b(6) b(7) b(8) b(9) b(10) b(11) b(13) b(14) b(15) b(16) b(17)
f1 t2 t3
c1 c2 c3
b(19) b(20) b(21) b(22) b(23) b(24) b(25) b(26) b(27) b(28) b(29) b(30) b(31) b(32) b(33)
c1 c2 c3
Theorem 5 A variant of OSA where the set of local similarities may contain match-
ings of two separate continuous segments is NP-complete.
Proof To prove this theorem it suffices to note that a variant of R2DM in which each
element of the set G contains exactly two pairs from A is NP-complete. This follows
directly from the reduction we have used to prove the intractability of R2DM. All
instances of R2DM resulting from it have this feature. Therefore, if we assume that
structures are divided into non-overlapping segments of the same length, any local
similarity in the reduction from R2DM to OSA will consist of no more than two
segments.
Fig. 5 Computational complexity of OSA and OSSA depending on the maximal number of seg-
ments in local similarities s
Theoretical and Computational Aspects … 617
(RMSD equal to zero), only local similarities with zero RMSD could be used, thereby
drastically limiting the size of set Φ. Nevertheless, theoretical complexity, which
should normally be assessed for all threshold values, would not change.
7 Clique in a graph is a subset of nodes such that every two nodes in the subset are connected by an
edge.
618 P. Daniluk and B. Lesyng
8 We deliberately skip over the fact that multiple alignment is a relation while a pairwise alignment
is a function. The property of being a multiple alignment guarantees that it can be converted to a
function in a trivial way.
Theoretical and Computational Aspects … 621
Theorem 7 For any value of k, a variant of OSMA in which the number of local
similarities in Φ for each pair of structures does not exceed k is NP-complete.
Before we prove this theorem, let us note that if size of set Φ in OSA is limited by
a constant, then the number of possible alignments with support in Φ is limited by
2k . This means that an extensive search algorithm for finding an optimal alignment
will have a computational complexity of O(2k ) = O(1) (because k is constant). In
layman’s terms Theorem 7 establishes that pessimistic computation time for OSMA
is exponential with respect to the number of structures.9
Proof We once more use 3SAT (see Definition 8). Let U = {u 1 , . . . , u k }, C =
{C1 , . . . , Cl } be an instance of 3SAT. We will construct three sets of structures
corresponding to: variables (set V ), clauses (set K ) and assignment of values (set
L):
(1) V = {V1 , . . . , Vk }, where Vi = a1Vi a2Vi . . . a19
Vi Vi 10
a20
Ki Ki Ki Ki
(2) K = {K 1 , . . . , K l }, where K i = a1 a2 . . . a14 a15
L0 L0 L0 L0
(3) L = {L 0 }, where L 0 = a1 a2 . . . a20 a21
To simplify the notation we will define the following segments:
(1) vi = a1Vi . . . a5Vi , vi = a6Vi . . . a20Vi
Φ = (Φ T ∪ Φ F ∪ Φ K ∪ Φ L ) ∪ (Φ T ∪ Φ F ∪ Φ K ∪ Φ L )−1
9 Unless P = NP.
10 In this proof we abstain from giving residue numbers in upper index.
11 If ϕ : A → B in an alignment between structures A and B, an inverse alignment is a function
An assignment of values which satisfies all clauses exists if and only if there
exists an alignment of S = {L 0 , V1 , . . . , Vk , K 0 , . . . , K l } with support in Φ of size
2(20k+10l)
(k+l)(k+l+1)
. Furthermore, if Φ is a support of such an alignment, for every i either
ϕi ∈ Φ or ϕiF ∈ Φ and an assignment:
T
1 ϕiT ∈ Φ
ui →
0 ϕiF ∈ Φ
C = {{¬u 1 , u 2 , u 3 } , {u 1 , ¬u 2 , u 3 } , {u 1 , u 2 , ¬u 3 }}
(¬u 1 ∨ u 2 ∨ u 3 ) ∧ (u 1 ∨ ¬u 2 ∨ u 3 ) ∧ (u 1 ∨ u 2 ∨ ¬u 3 )
t l
L0 = a1L 0 a2L 0 a3L 0 a4L 0 a5L 0 a6L 0 a7L 0 a8L 0 a9L 0 a10
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21
f
v1 v1
V1 = a1V1 a2V1 a3V1 a4V1 a5V1 a6V1 a7V1 a8V1 a9V1 a10
V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20
v2 v2
V2 = a1V2 a2V2 a3V2 a4V2 a5V2 a6V2 a7V2 a8V2 a9V2 a10
V2 V2 V2 V2 V2 V2 V2 V2 V2 V2 V2
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20
v3 v3
V3 = a1V3 a2V3 a3V3 a4V3 a5V3 a6V3 a7V3 a8V3 a9V3 a10
V3 V3 V3 V3 V3 V3 V3 V3 V3 V3 V3
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20
k1,1 k1,2 k1,3
K1 = a1K 1 a2K 1 a3K 1 a4K 1 a5K 1 a6K 1 a7K 1 a8K 1 a9K 1 a10
K1 K1 K1 K1 K1 K1
a11 a12 a13 a14 a15
k2,1 k2,2 k2,3
K 2 = a1K 2 a2K 2 a3K 2 a4K 2 a5K 2 a6K 2 a7K 2 a8K 2 a9K 2 a10
K2 K2 K2 K2 K2 K2
a11 a12 a13 a14 a15
k3,1 k3,2 k3,3
K 3 = a1K 3 a2K 3 a3K 3 a4K 3 a5K 3 a6K 3 a7K 3 a8K 3 a9K 3 a10
K3 K3 K3 K3 K3 K3
a11 a12 a13 a14 a15
Theoretical and Computational Aspects … 623
t e = v1 t e = v2 t e = v3
ϕ1T (e)= ϕ2T (e)
= ϕ3T (e) =
l e = v1 l e = v2 l e = v3
f e = v1 f e = v2 f e = v3
ϕ1F (e) = ϕ2F (e) = ϕ3F (e) =
l e = v1 l e = v2 l e = v3
ϕ1,1
K
(e) = v1 e = k1,1 ϕ1,2
K
(e) = v2 e = k1,2 ϕ1,3
K
(e) = v3 e = k1,3
ϕ2,1
K
(e) = v1 e = k2,1 ϕ2,2
K
(e) = v2 e = k2,2 ϕ2,3
K
(e) = v3 e = k2,3
ϕ3,1
K
(e) = v1 e = k3,1 ϕ3,2
K
(e) = v2 e = k3,2 ϕ3,3
K
(e) = v3 e = k3,3
ϕ1,1
L
(e) = f e = k1,1 ϕ1,2
L
(e) = t e = k1,2 ϕ1,3
L
(e) = t e = k1,3
ϕ2,1
L
(e) = t e = k2,1 ϕ2,2
L
(e) = f e = k2,2 ϕ2,3
L
(e) = t e = k2,3
ϕ3,1
L
(e) = t e = k3,1 ϕ3,2
L
(e) = t e = k3,2 ϕ3,3
L
(e) = f e = k3,3
u 1 → 0, u 2 → 1, u 3 → 1
Therefore, there exists an alignment with support containing ϕ1F , ϕ2T , ϕ3T and size
180
21
. The support of this alignment also contains ϕ1,1 K
, ϕ2,3
K
, ϕ3,2
K
, ϕ1,1
L
, ϕ2,3
L
, ϕ3,2
L
,
and an alignment is induced by the following (see also Fig. 6) (ϕ(a) = ⊥ means, that
a∈ / Dom(ϕ)):
ξ V1 L 0 (V1 ) = a2L 0 a3L 0 a4L 0 a5L 0 a6L 0 a7L 0 a8L 0 a9L 0 a10
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21
f l1 l2 l3
ξ V2 L 0 (V2 ) = a1L 0 a2L 0 a3L 0 a4L 0 a5L 0 a7L 0 a8L 0 a9L 0 a10
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21
t l1 l2 l3
v3 v31 v32 v33
ξ V3 L 0 (V3 ) = a1L 0 a2L 0 a3L 0 a4L 0 a5L 0 a7L 0 a8L 0 a9L 0 a10
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21
t l1 l2 l3
k1,1 k1,2 k1,3
of a variable used to satisfy each clause (ϕiKj for all i and j = 1, 2, 3), and selection
of one of three literals used to satisfy a clause (ϕiLj for all i and j = 1, 2, 3). One can
easily see that for each clause only one of ϕiLj can be used. Same applies to ϕiKj if all
variables have values assigned. Otherwise each of residues a2L 0 a3L 0 a4L 0 a5L 0 would be
aligned to more than one residue in K i . If an alignment contains local similarities
encoding assignment of value for each variable and this assignment satisfies all
clauses, it can easily be extended with respective local similarities encoding choices
of variables and literals for each clause. Elements of Φ L guarantee that a variable
can be used to satisfy a clause only if it is assigned 1 and occurs in positive literal,
or is assigned 0 and occurs in negative literal. Such an alignment has a required size.
It remains to be proven, that every alignment of size (k+l)(k+l+1)2(20k+10l)
contains either ϕiT
or ϕi for each variable. We begin with an observation that local similarities related
F
to different variables are independent in a way that they cannot cause contradiction
in an alignment. This meansthat when searching for an optimalalignment one can
independently deal with sets ϕiT , ϕiF , ϕiKp , ϕiq K
, ϕirK , ϕ ps
L
, ϕqt
L
, ϕrLu , where p, q, r are
the numbers of clauses containing variable i, and s, t, u are positions of variable i
in these clauses. If an alignment does not contain neither ϕiT nor ϕiF it may contain
all of ϕiKp , ϕiq
K
, ϕirK (Only one similarity from Φ L may be picked for each clause.).
Similarities ϕiKp , ϕiq K
, ϕirK contribute (k+l)(k+l+1)
30
to the alignment size, while each of
ϕi , ϕi contributes (k+l)(k+l+1) . Therefore any alignment which does not contain
T F 40
assignments of value for all variables is suboptimal. This concludes the proof and
explains the introduction of seemingly unnecessary segments vi and l . The remaining
technical details have been left to the reader.
With the theorem above we have established, that (provided P is not equal to
NP) all algorithms performing multiple alignment of structures have exponential
computational complexity with respect to the number of structures. This is due to the
fact that although every multiple alignment can be described with a set of pairwise
alignments, not every set of pairwise alignments induces a proper multiple alignment.
Conflicts that prevent inducing a multiple alignment may involve alignments of any
number of structures and thus cannot be efficiently resolved by performing clean-ups
on subsets of structures. The last theorem in this section formalizes this observation.
Let us assume that S is divided into two subsets, S1 and S2 , and we have already
calculated optimal multi-alignments of these sets. We will consider computing a
multiple alignment of S which contains alignments of S1 and S2 .
their solutions are merged to achieve a solution for the whole dataset. This step is
repeated recursively. OAMA would occur in the merging stage of such an approach.
This theorem does not require a proof nor a comment, since OAMA contains OSA.
Theorem 9 OAMA is NP-complete even if its input is extended with optimal align-
ments of all pairs of structures from S1 × S2 .
Due to lack of space we will leave this theorem without proof, since it is similar to
the proof of Theorem 7.
In this section we have shown that computing optimal multiple alignments adds
one more level of intractability to an already difficult problem of computing optimal
alignments of two structures. Conflicts that may occur between alignments of pairs
prevent them from being merged into a multiple alignment. Resolving these conflicts
is intractable by itself.
In this section we describe certain issues which may arise in comparison of protein
structures. The content of this section pertains mostly to application of local descrip-
tors to computing alignments, but may be treated as a collection of tips and tricks
to be used elsewhere. We will begin with a fundamental problem of setting correct
thresholds.
Usually, finding the best structural alignment is a problem of optimization of two vari-
ables – alignment size, and its quality. Root Mean Square distance (RMSD) [24, 25]
is the most popular measure due to its simplicity – it has a compact mathematical solu-
tion. Other methods (e.g. MaxSub [47]) exist, but did not manage to achieve popular-
ity. All these methods, however, rely on superimposing aligned residues and assessing
628 P. Daniluk and B. Lesyng
the quality of the superposition bases on distances between aligned residues. Struc-
tures are therefore treated as rigid objects. Nevertheless, it is commonly accepted
that proteins are flexible to some extent. In Sect. 1 we have suggested that methods of
aligning protein structures should take such flexibility into account, and thus quality
measures overcoming the rigid-body limitation should be used. The simplest way is
to introduce explicit “hinges” which connect rigid fragments [52].
In this section we will present a solution used in the DEDAL method, which was
designed to allow for flexible rearrangements of loosely coupled substructures (e.g.
domains) and small local distortions, while penalizing deformations significantly
changing the arrangement of interactions which stabilize structures. As in the case
of local descriptors, where inter-residue contacts are used to define the structural
neighborhood of a chosen residue, contacts may be used to detect a network of
interactions responsible for the rigidity of a protein structure. One may imagine that
residues in contact are connected with springs, which have to be somewhat extended
or compressed, if these residues were to be superimposed onto their counterparts
in the other structure. Degree of deformations of such springs may be treated as an
indicator of the structural similarity (Fig. 8).
Definition 12 An aligned contact is a pair a (k) , b(m) , a (l) , b (n) , such that a (k) is
aligned to b , and a is aligned to b and at least one of a , a , b , b(n)
(m) (l) (n) (k) (l) (m)
(a) (b)
Fig. 8 Similar structures (ASTRAL domains a d1d5fa_ i b d1nd7a_) comprise two differently
arranged subdomains. Properly aligned contacts are marked with green lines. Yellow lines denote
aligned contacts which are not preserved in the other structure. Red lines mark residue pairs not in
contact, which are aligned with residues in contact. In order to superimpose these structures it is
necessary to extend springs corresponding to yellow contacts to lengths of respective red lines
A tension of the alignment ξ is a square mean of its local tensions computed for
each residue, and then for the whole structure:
!
" [tens(a (i) ,ξ(a (i) ),a ( j) ,ξ(a ( j) ))]2
"
" a ( j) ∈Ta (i) |Ta ( j) |
tens(ξ ) = #
|Dom(ξ )|
a (i) ∈Dom(ξ )
where a ( j) ∈ Ta (i) , if a (i) , ξ(a (i) ) , a ( j) , ξ(a ( j) ) is an aligned contact in ξ .
and verified when the crystal structures became available. NK-lysin (SCOP domain
d1nkla_) is composed of five α-helices arranged in the “folded leaf” architecture
[29]. The“swaposin” domain (d1qdma1) of aspartic proteinase prophytepsin has the
same architecture, but the helices are in a different order [27] (Fig. 9). Nevertheless,
despite the obvious similarity most of the structure comparison methods align the
(a) (b)
(c)
Fig. 9 The Saposin domain of NK-lysin (SCOP domain d1nkla_) and the “swaposin” domain of
prophytepsin (d1qdma1). Despite differing topologies these two domains have the same architec-
ture and identical disulfide bonds. a Methods incapable of handling segment swaps wrongly align
cysteine residues (figure shows alignment computed by DALI). b DEDAL correctly identifies the
best superposition and the disulfide bond network. c Alignments shown in a (red) and b (green)
are plotted against local similarity of single segments of length 5 (yellow). It can be observed that
similarity of continuous segments is insufficient to discover the correct alignment
Theoretical and Computational Aspects … 631
helices in agreement with their order along the sequence, which results in a visually
poor superposition (Fig. 9a). The similarity of continuous segments commonly used
does not provide enough information concerning the arrangement of helices and at
the same time supports an alignment without swaps (Fig. 9c). It should be empha-
sized that, apart from the worse RMS distance, alignments without swaps incorrectly
match cysteine residues forming the disulfide bonds. FlexSnap [42] and DEDAL [10]
identify the similarity correctly (Fig. 9b).
GTPases
Guanine nucleotide-binding proteins (G proteins) are important cellular regulators.
They act as binary switches, and use the GTP-GDP-GTP cycle to flip between the
on and off states. GTPase domains they contain are responsible for the GTP/GDP
binding. The GTPase activity depends on the set of five conserved sequence motifs
[36]. An alternative circularly permuted GTPase structure (cpGTPase) [45] which
contains all five motifs in a different order also exists(Fig. 11a and b). Despite a
different topology the cpGTPase domains retain the GTP binding activity, and have
the same architecture as GTPases. Although the crucial motifs are highly conserved
and identifiable by sequence analysis [3], many structure comparison methods are
unable to correctly align residues which form the GTP/GDP binding site. CE [46]
and DALI [21] yield 36% accuracy, while FlexSnap [42] and Cα -match [4] have
90% accuracy (reference alignment contains residues responsible for GTP binding).
In contrast, DEDAL [10] yields an entirely accurate superposition in this region
(Fig. 11c and d).
Cyanovirin-N
Cyanovirin-N is a potent HIV-inactivating protein, which exists in both monomeric
and domain-swapped dimeric forms. Although the monomeric form is predominant
in solution, and was determined first [7], the metastable dimeric form is also present.
The dimeric form is stabilized in the crystalline state [51] and eventually its struc-
ture was also obtained by NMR [5]. For the dimeric form, it can be observed that
the X-ray (SCOP domain d1l5ba_) and NMR (d1l5ea_) structures exhibit a
slightly different arrangement of subdomains (Fig. 10a and b), and that the local con-
formations of all residues except for the hinge region (PRO51-ASN53, Fig. 10c) are
identical. Nevertheless, the similarity between the two structures cannot be easily
determined by the rigid-body techniques, which are capable of aligning only one
subdomain. Surprisingly FlexSnap [42], although in principle capable of handling
conformational variability, gives only 50% accuracy with the reference alignment.
6 Conclusions
G1 G5
G4
G1
G4 G5
N C
(a) (b)
(c) (d)
Fig. 10 Conformation of the Cyanovirin-N dimeric form depends on the molecular environment. a
X-ray (d1l5ba_) and b NMR (d1l5ea_) structures have different conformations of the “hinge”
region (PRO51-ASN53) c. To fully analyze the similarity of the two structures it is necessary to
abandon the rigid-body approach. The regions on both sides of the “hinge” have to be superimposed
separately. DEDAL accomplishes this by extending local similarities in both regions and effectively
defining the “hinge” as the boundary between them
Theoretical and Computational Aspects … 633
(a) (b)
(c)
Fig. 11 Topologies of a the Dynamin A GTPase (SCOP domain d1jwyb_) and b cpGTPase
domain from the YjeQ protein (d1u0la2). Aligned SSEs are indicated by lighter colors. c DEDAL
superposition of the GTPase and the cpGTPase domains (yellow and blue, respectively). For clarity,
only the aligned parts of the structures are shown. d View of the binding site in the same superposition
showing residues participating in the GDP/GTP binding (red) and the GDP molecule. Despite
significant topological differences, DEDAL effectively handles all alignable SSEs and correctly
superimposes the active sites. The sequence identity of the superimposed regions is 24.2%
634 P. Daniluk and B. Lesyng
discrete problem into an easier continuous one with the aim of obtaining an approx-
imate solution. This technique might be of use in efficiently computing multiple
alignments (private communications, unpublished work).
As the number of known protein structures and high quality models increases,
computing biologically relevant alignments is becoming a serious option in the area
traditionally reserved to genome-wide sequence searches. It is generally accepted
that a sequence of residues implies a spatial structure, which in turn determines
atomic functional motions and other properties of a molecule. Therefore conclusions
inferred from the structure comparison are in general more reliable than ones based
on sequence alignments.
One should note that although causal relations between sequence, structure,
atomic motions and function are often discussed in biological literature, until now
such relations do not have any formal, consistent mathematical framework. Nev-
ertheless, during the past few years, based on methodologies developed for com-
plex systems in economy and neurophysiology, a prototype of causal analysis for
biomolecular systems has been proposed.12 In particular, applying the presented
methodology to trajectories obtained from molecular dynamic simulations can help
to elucidate the actual logic of its functioning. The development of such formal-
ism for causal relations is one of the challenging tasks in structural biology and
bioinformatics.
References
1. Alexandrov, N.: SARFing the PDB. Protein Eng. 9(9), 727 (1996)
2. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.:
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res. 25(17), 3389–402 (1997)
3. Anand, B., Verma, S.K., Prakash, B.: Structural stabilization of GTP-binding domains in circu-
larly permuted GTPases: implications for RNA binding. Nucleic Acids Res. 34(8), 2196–205
(2006)
4. Bachar, O., Fischer, D., Nussinov, R., Wolfson, H.: A computer vision based technique for 3-D
sequence-independent structural comparison of proteins. Protein Eng. 6(3), 279–88 (1993)
5. Barrientos, L.G., Louis, J.M., Botos, I., Mori, T., Han, Z., O’Keefe, B.R., Boyd, M.R., Wlo-
dawer, A., Gronenborn, A.M.: The domain-swapped dimer of cyanovirin-N is in a metastable
folded state: reconciliation of X-ray and NMR structures. Structure 10(5), 673–86 (2002)
6. Berbalk, C., Schwaiger, C.S., Lackner, P.: Accuracy analysis of multiple structure alignments.
Protein Sci. 18(10), 2027–35 (2009)
7. Bewley, C.A., Gustafson, K.R., Boyd, M.R., Covell, D.G., Bax, A., Clore, G.M., Gronenborn,
A.M.: Solution structure of cyanovirin-N, a potent HIV-inactivating protein. Nat. Struct. Biol.
5(7), 571–8 (1998)
8. Bystroff, C., Baker, D.: Prediction of local structure in proteins using a library of sequence-
structure motifs. J. Mol. Biol. 281(3), 565–77 (1998). https://doi.org/10.1006/jmbi.1998.1943
9. Daniluk, P., Lesyng, B.: DAMA: a novel method for aligning multiple protein structures. In:
Multi-Pole Approach to Structural Biology Conference. Warsaw, Poland (2011a)
10. Daniluk, P., Lesyng, B.: A novel method to compare protein structures using local descriptors.
BMC Bioinformatics 12(1), 344 (2011b). https://doi.org/10.1186/1471-2105-12-344
11. Daniluk, P., Dziubiński, M., Hallay-Suszek, M., Rakowski, F., Walewski, L., Lesyng, B.: From
experimental structural probability distributions to the theoretical causality analysis of molec-
ular changes. CAMES (In press) (2012)
12. Dobbins, S., Lesk, V., Sternberg, M.: Insights into protein flexibility: the relationship between
normal modes and conformational change upon protein–protein docking. Proc. National Acad.
Sci. 105(30), 10,390 (2008)
13. Dror, O., Benyamini, H., Nussinov, R., Wolfson, H.: MASS: multiple structural alignment by
secondary structures. Bioinformatics 19(Suppl 1), i95–104 (2003)
14. Elias, I.: Settling the intractability of multiple alignment. J. Comput. Biol. 13(7), 1323–39
(2006). https://doi.org/10.1089/cmb.2006.13.1323
15. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-
completeness. A Series of books in the mathematical sciences, W. H, Freeman, San Francisco
(1979)
16. Gerstein, M., Echols, N.: Exploring the range of protein flexibility, from a structural proteomics
perspective. Curr. Opin. Chem. Biol. 8(1), 14–19 (2004)
17. Gibrat, J.F., Madej, T., Bryant, S.H.: Surprising similarities in structure comparison. Curr. Opin.
Struct. Biol. 6(3), 377–85 (1996)
18. Grishin, N.V.: Fold change in evolution of protein structures. J. Struct. Biol. 134(2–3), 167–85
(2001)
19. Guerler, A., Knapp, E.W.: Novel protein folds and their nonsequential structural analogs. Pro-
tein Sci. 17(8), 1374–82 (2008)
20. Holm, L., Park, J.: DaliLite workbench for protein structure comparison. Bioinformatics 16(6),
566–7 (2000)
21. Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. J. Mol.
Biol. 233(1), 123–38 (1993)
22. Ilyin, V.A., Abyzov, A., Leslin, C.M.: Structural alignment of proteins by a novel TOPOFIT
method, as a superimposition of common volumes at a topomax point. Protein Sci. 13(7),
1865–74 (2004)
23. Jung, J., Lee, B.: Protein structure alignment using environmental profiles. Protein Eng. 13(8),
535–43 (2000)
24. Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. Sect.
A 32(5), 922–923 (1976)
25. Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta
Crystallogr. Sect. A 34(5), 827–828 (1978)
26. Kawabata, T., Nishikawa, K.: Protein structure comparison using the markov transition model
of evolution. Proteins 41(1), 108–22 (2000)
27. Kervinen, J., Tobin, G.J., Costa, J., Waugh, D.S., Wlodawer, A., Zdanov, A.: Crystal structure
of plant aspartic proteinase prophytepsin: inactivation and vacuolar targeting. EMBO J. 18(14),
3947–55 (1999)
28. Konagurthu, A.S., Whisstock, J.C., Stuckey, P.J., Lesk, A.M.: MUSTANG: a multiple structural
alignment algorithm. Proteins 64(3), 559–74 (2006). https://doi.org/10.1002/prot.20921
29. Liepinsh, E., Andersson, M., Ruysschaert, J.M., Otting, G.: Saposin fold revealed by the NMR
structure of NK-lysin. Nat. Struct. Biol. 4(10), 793–5 (1997)
30. Lindqvist, Y., Schneider, G.: Circular permutations of natural protein sequences: structural
evidence. Curr. Opin. Struct. Biol. 7(3), 422–7 (1997)
31. Mavridis, L., Ritchie, D.: 3D-blast: 3D protein structure alignment, comparison, and classifi-
cation using spherical polar fourier correlations. Pacific Symp. Biocomputing 2010, 281–292
(2010)
Theoretical and Computational Aspects … 637
32. Mayr, G., Domingues, F.S., Lackner, P.: Comparative analysis of protein structure alignments.
BMC Struct. Biol. 7, 50 (2007)
33. Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure align-
ment. PLoS Comput. Biol. 4(1), e10 (2008). https://doi.org/10.1371/journal.pcbi.0040010
34. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of state calcu-
lations by fast computing machines. J. Chem. Phys. 21(6), 1087 (1953)
35. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in
the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–53 (1970)
36. Niemann, H.H., Knetsch, M.L., Scherer, A., Manstein, D.J., Kull, F.J.: Crystal structure of a
dynamin GTPase domain in both nucleotide-free and GDP-bound forms. EMBO J. 20(21),
5813–21 (2001)
37. Orengo, C.A., Taylor, W.R.: SSAP: sequential structure alignment program for protein structure
comparison. Methods Enzymol 266, 617–35 (1996)
38. Pawlak, Z.: Rough Sets: theoretical aspects of reasoning about data. System theory, knowledge
engineering, and problem solving, Kluwer Academic Publishers, Theory and decision library
(1991)
39. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. Proc. National
Acad. Sci. 85(8), 2444 (1988)
40. Ponting, C.P., Russell, R.B.: Swaposins: circular permutations within genes encoding saposin
homologues. Trends Biochem Sci. 20(5), 179–80 (1995)
41. Rocha, J., Segura, J., Wilson, R.C., Dasgupta, S.: Flexible structural protein alignment by a
sequence of local transformations. Bioinformatics 25(13), 1625–31 (2009)
42. Salem, S., Zaki, M., Bystroff, C.: FlexSnap: flexible non-sequential protein structure alignment.
Algorithms for Mol. Biology 5(1), 12 (2010)
43. Shatsky, M., Nussinov, R., Wolfson, H.J.: FlexProt: alignment of flexible protein structures
without a predefinition of hinge regions. J. Comput. Biol. 11(1), 83–106 (2004a)
44. Shatsky, M., Nussinov, R., Wolfson, H.J.: A method for simultaneous alignment of multiple
protein structures. Proteins 56(1), 143–56 (2004b). https://doi.org/10.1002/prot.10628
45. Shin, D.H., Lou, Y., Jancarik, J., Yokota, H., Kim, R., Kim, S.H.: Crystal structure of YjeQ
from Thermotoga maritima contains a circularly permuted GTPase domain. Proc. Natl. Acad.
Sci. U S A 101(36), 13,198–13,203 (2004)
46. Shindyalov, I.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial exten-
sion (CE) of the optimal path. Protein Eng. 11(9), 739–47 (1998)
47. Siew, N., Elofsson, A., Rychlewski, L., Fischer, D.: MaxSub: an automated measure for the
assessment of protein structure prediction quality. Bioinformatics 16(9), 776–785 (2000)
48. Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin glasses. Phys. Rev. Lett.
57(21), 2607–2609 (1986)
49. Vogel, C., Morea, V.: Duplication, divergence and formation of novel protein topologies. Bioes-
says 28(10), 973–8 (2006). https://doi.org/10.1002/bies.20474
50. Wohlers, I., Domingues, F.S., Klau, G.W.: Towards optimal alignment of protein structure
distance matrices. Bioinformatics 26(18), 2273–80 (2010)
51. Yang, F., Bewley, C.A., Louis, J.M., Gustafson, K.R., Boyd, M.R., Gronenborn, A.M., Clore,
G.M., Wlodawer, A.: Crystal structure of cyanovirin-N, a potent HIV-inactivating protein,
shows unexpected domain swapping. J. Mol. Biol. 288(3), 403–12 (1999)
52. Ye, Y., Godzik, A.: Flexible structure alignment by chaining aligned fragment pairs allowing
twists. Bioinformatics 19(Suppl 2), ii246–55 (2003)
53. Ye, Y., Godzik, A.: Multiple flexible structure alignment using partial order graphs. Bioinfor-
matics 21(10), 2362–9 (2005). https://doi.org/10.1093/bioinformatics/bti353
Fuzzy Oil Drop Model
Application—From Globular Proteins
to Amyloids
Abstract The fuzzy oil drop model asserts the presence of a monocentric
hydrophobic core in a protein, generated by the influence of water which directs
hydrophobic residues towards the center, while exposing hydrophilic molecules on
the surface. Applying the model to a range of proteins which vary in terms of structure
and function reveals globally accordant structures and locally discordant fragments
which disrupt the hydrophobic core and appear to mediate the protein’s biological
function. Solenoids provide an example of structural elements which diverge from
the fuzzy oil drop model by adopting a linear distribution of hydrophobicity. Such lin-
ear propagation, while unbounded in principle, is arrested by terminal “caps”, which
mediate contact with water and therefore prevent the solenoid from growing indef-
initely. Amyloids—a group of misfolding proteins—follow the same principles but
lack suitable “caps” and may propagate without bound. In light of the fuzzy oil drop
model, the factor most directly responsible for this phenomenon is anomalous inter-
action with the aqueous environment, where the expected monocentric distribution
of hydrophobicity is replaced by a distribution based on the intrinsic hydrophobicity
of each residue, thus preventing a hydrophobic core from emerging. In this work we
present a set of proteins which represent progressive departures from the fuzzy oil
drop model (i.e. from the theoretical distribution of hydrophobicity expressed by a
3D Gaussian). We also discuss the biological function and/or disfunction of each
protein.
The fuzzy oil drop model, introduced in [1], enables us to study how proteins vary
with respect to size and biological role. This chapter discusses a broad spectrum
of proteins, from small globular molecules (represented by 1UCS, an antifreeze
protein [2]), through large globular examples (Exonuclease III from Escherichia coli;
PDB ID 1AKO [3]), dimeric proteins which exhibit enzymatic activity and can bind
ligands (Class II fructose-1,6-biphosphate aldolase; PDB ID 1B57 [4]), solenoid-
containing proteins (another antifreeze protein—isoform 501 from spruce budworm;
PDB ID 1Z2F [5]), and amyloids (amyloid β-peptide (Aβ) fibrils Aβ1-40 peptide
with the Osaka mutation (E22D); PDB ID 2MVX [6]). Processing this heterogeneous
set of proteins with the fuzzy oil drop model reveals strong directionality: from
structures highly consistent with the theoretical core structure represented by a 3D
Gaussian, through local deviations related to biological function (ligand binding;
protein complexation), all the way to global discordance, evident e.g. in solenoids
and amyloids which expose a linear arrangement of alternating bands of high and low
hydrophobicity (propagating along their axis of elongation) in place of a monocentric
hydrophobic core. The key difference between solenoids and amyloids is that the
former group is equipped with special “stoppers” (or “caps”), preventing unchecked
growth of the solenoid structure. In amyloids no such stoppers are present.
A novel addition to the model as presented in [1] is the so-called relative distance
(RD) coefficient, given as:
O/T
RD
(O/T + O/R)
This value determines whether the observed (O) distribution more closely approx-
imates the theoretical (T) or the unified (R) boundary case. It is worth noting that RD
can also be computed for a different set of boundary distributions. Thus far we have
focused on the theoretical distribution (T), which perfectly matches the 3D Gaus-
sian, and the unified distribution (R), where each residue is assigned a hydrophobicity
of 1/N (N being the number of residues in the polypeptide chain). In some cases,
however, it is useful to replace the unified distribution with the so-called intrinsic
distribution (H) based on the individual intrinsic hydrophobicity of each residue.
This approach results in two distinct values of RD: one for the T-O-R variant and
another for the T-O-H variant. Notably, RD can also be computed for specific frag-
ments of the input chain, such as selected secondary folds or fragments which meet
some arbitrary criterion.
A low value of RD in the T-O-R variant indicates the presence of a well-formed
hydrophobic core, consistent with the Gaussian distribution. This type of distribu-
tion can be called “cooperative” since it relies on cooperation of individual residues
in adopting a common conformation. On the other hand, a high value of RD (T-O-
R) means that the protein lacks a prominent hydrophobic core. Regarding T-O-H,
high RD shows that the hydrophobicity distribution is dominated by the individual
properties of each residue. This is a “selfish” state where the placement of residues
Fuzzy Oil Drop Model Application—From Globular … 641
is dictated by their interaction with close neighbors rather than by the general ten-
dency to produce a common core. Such distributions are commonly found in linear
structures, particularly amyloids (or solenoids) comprising sequences of identical (or
similar, from the point of view of hydrophobicity) fragments. This often leads to lin-
ear arrangement of alternating hydrophobicity maxima and minima, which propagate
along the fibril’s axis.
Figure 1 illustrates the RD parameter in both variants (T-O-R and T-O-H).
All further references to RD which do not specify a variant will indicate T-O-R.
Whenever T-O-H is considered instead, this will be clearly stated.
A thorough description of the fuzzy oil drop model can be found in [7].
Fig. 1 Distributions of hydrophobicity calculated for a sample protein and compared with reference
distributions (T, R and H). a—theoretical distribution (T); c—unified distribution (R); b—observed
distribution (O), which is assessed as accordant with the theoretical distribution (RD 0.26, as
shown on axis d). Substituting the intrinsic distribution (h) for the unified distribution results
in an RD value of 0.58, indicating that the structure in question is dominated by the intrinsic
hydrophobicity of each participating residue (axis e). For the sake of clarity, the presentation has
been restricted to a single dimension. The illustrated fragment is composed of residues 104–112
in transthyretin (1DVQ). f—theoretical distribution for the selected fragment; g—corresponding
observed distribution; h—corresponding intrinsic distribution
642 M. Banach et al.
As stated above, we will discuss individual proteins in the order of increasing dis-
cordance versus the theoretical distribution, which is expressed by the 3D Gaussian.
distribution. In all cases, RD adopts values from the 0–1 range. Both parameters
RD of values far below 0.5 suggest domination of the Gaussian distribution in this
molecule.
The molecule under consideration is an antifreeze protein which works by dis-
rupting the natural organization of water, required for an ice crystal to emerge. The
action is quite similar to de-icing pavement by scattering salt. The mere presence of
the protein triggers structural rearrangement in the aqueous medium. By adjusting
to the distribution of charge (polar groups) on the protein surface, water molecules
adopt an ordering which disfavors the formation of ice crystals. The effect is not
limited to the layer directly adjacent to the protein but likely propagates to further
layers and can be felt at some distance from the surface.
Exonuclease III from Escherichia coli (1AKO) provides an example of a large pro-
tein which is nevertheless consistent with the fuzzy oil drop model [3]. This protein
has a chain length of 268 aa. Reviewers which comment on publications related to the
fuzzy oil drop model often remark that the theoretical and observed distributions can
remain consistent only for small proteins. The presented case contradicts this opinion.
Figure 4 illustrates both distributions, showing that they remain consistent in spite
of small local deviations. The corresponding RD for T-O-R value is 0.441. The dia-
gram also reveals catalytic residues and shows that strongly polar residues (note that
644 M. Banach et al.
Fig. 4 T(blue), O(red) and H(green) distributions in exonuclease III from Escherichia coli (1AKO).
The catalytic residues distinguished as orange stars
catalytic reactions are based mainly on electrostatic interactions) are located in areas
where high hydrophobicity is expected. Due to the need to interact with the substrate
(in this case—DNA), catalytic residues are commonly found in binding pockets.
Consequently, their neighborhood is characterized by high expected hydrophobicity.
When such residues are omitted from calculations, the remainder of the protein typi-
cally conforms to the theoretical hydrophobicity profile with greater accuracy (in this
case, RD 0.419 for T-O-R). In contrast, the catalytic residues themselves diverge
from the model (RD 0.715 for T-O-R). This phenomenon reflects the encoding of
information in the structure of the active site, where hydrophilic residues located
in close proximity to the hydrophobic core create suitable conditions for catalytic
reactions.
The beta-sandwich structure which forms part of the protein is characterized by
high RD for T-O-R (0.576). One of its constituent beta sheets is consistent with the
theoretical distribution (RD 0.487 for the 53–59 fragment for T-O-R), while the
remaining sheet houses three out of five catalytic residues and remains divergent
from the model (RD 0.681 for the beta sheet which includes the 75–79 fragment
for T-O-R). Such discordance of the “catalytic” beta sheet suggests a certain degree of
instability and capacity for structural rearrangements which may be required during
catalysis.
The entire molecule remains highly stable owing to the presence of a well-ordered
outer layer composed of helical folds (RD 0.440 for T-O-R).
Figure 5 depicts the 3D structure of this protein. We can observe that the surface
is composed of hydrophilic residues, while the polar residues forming the active site
are housed in a pocket.
The presented protein includes a region dominated by positive electrostatic poten-
tial, with numerous strongly preserved residues (identified in many endonucleases
from bacteria to man). This region participates in cleaving a phosphate group (via
nucleophilic attack), which also requires the presence of a metal ion. Our analysis
based on the fuzzy oil drop model is consistent with the above properties.
Fuzzy Oil Drop Model Application—From Globular … 645
Fig. 5 3D structure of exonuclease III: a globular form of the protein; b hydrophilic surface (gray),
catalytic residues—orange, local exposure of hydrophobic residues (red). Notice the red residues
localization in certain distance versus the exposed surface
The presence of a ligand typically distorts the protein’s hydrophobic core structure
due to the need for a suitable binding cavity. In the presented case elimination of
residues responsible for interaction with the ligand lowers RD to 0.551 (T-O-R),
while the ligand binding fragment itself strongly diverges from the model (RD
0.701 – T-O-R). Clearly, a protein which includes a binding cavity does not follow
the theoretical distribution with the same accuracy as proteins which lack such a
cavity. Comparing RD values for the entire molecule and for its remainder following
elimination of catalytic residues indicates that deviations from the theoretical model
are concentrated in areas where substrate binding occurs.
Similar conditions are observed in areas responsible for protein-protein inter-
action. The interface, when analyzed on its own, has RD 0.584 (T-O-R), while
the remainder of the molecule exhibits a lower value of RD (0.560 – T-O-R). This
suggests that local deviations from the theoretical hydrophobicity distribution are
associated with external factors, such as the presence of a ligand or complexation
partner.
Comparing T and O profiles reveals discordances in areas responsible for lig-
and binding and polymerization. Of note is the 275–340 fragment, which houses
a catalytic residue, as well as residues which mediate contact with the ligand and
dimerization. A local excess of hydrophobicity, if present on the surface, typically
enables contact with another protein chain, while local hydrophobicity deficiencies
usually correspond to catalytic active sites. This is further visualized in Fig. 7.
In terms of its tertiary conformation, the monomeric form of 1B57 presents a cen-
trally located beta sheet surrounded by helical folds. The beta sheet itself exhibits low
RD (0.354 T-O-R), indicating high structural stability (under the assumption that a
prominent hydrophobic core stabilizes the protein’s tertiary conformation). An addi-
tional hairpin, comprising two separate beta folds, is characterized by RD 0.242
(T-O-R). The helices, considered as a single unit, diverge from the theoretical distri-
bution (RD 0.521 T-O-R), with two fragments (288–306 and 307–310) regarded
as particularly divergent. Eliminating these two short folds from the “sheath” which
surrounds the central beta sheet lowers its RD value to 0.482 (T-O-R). This means
that at least part of the sheath also contributes to structural stabilization by ensuring
Fig. 7 3D presentation of
Class II fructose-1,
6-bisphosphate aldolase in
complex with phosphogly-
colohydroxamate. Red
helix—unstable fragment at
288–310; orange
spheres—catalytic residues.
Gray and blue colours—two
chains
Fuzzy Oil Drop Model Application—From Globular … 647
entropically advantageous contact with water. The two discordant helices are located
in close proximity to the catalytic residue (N286), which suggests that they form part
of the catalytic active site and undergo conformational changes during catalysis.
Our presentation of Class II fructose-1,6-bisphosphate aldolase in complex with
phosphoglycolohydroxamate is intended as an example of a molecule where various
functional factors (complexation, ligand binding) result in localized deviations from
the theoretical hydrophobic core structure.
With regard to the dimeric structure, the computed RD value of 0.662 (T-O-R)
indicates significant departure from the model, with the terminal fragments of each
chain (288-terminus) seen as particularly discordant (RD 0.691 (T-O-R)). Such
discordance suggests high flexibility of folds which bracket the active site. In contrast,
stabilization is provided by the interface helices at 61–70, 43–51 and 79–101, with
the corresponding RD value of 0.485 (T-O-R). More broadly, the following helical
folds taken together—47–53, 61–70, 79–83, 95–101, 112–134, 159–164, 270–279,
292–307 and 340–354—produce an RD of 0.487 (T-O-R), indicating their stabilizing
role.
When dealing with a complex molecule, it is often useful to consider each frag-
ment separately in order to determine whether it contributes to structural stability.
Locally unstable fragments encode information which is related to the protein’s func-
tional profile. Note that a perfectly accordant protein would be highly soluble but
incapable of any form of activity. Such conditions are approximated by antifreeze
proteins which perform their intended function merely by being present in the aque-
ous environment—such proteins benefit from excellent solubility and inability to
interact with any external molecules. Consequently, localized departures from the
3D Gaussian should be treated as a means of encoding information in a way which
enables the protein to recognize its intended ligand and undergo specific conforma-
tional changes—as is indeed the case with 1B57.
fragment is 0.670 (T-O-R), while the C-terminal section is highly consistent with the
model (0.398 for the 111–121 fragment and 0.456 for the longer fragment at 103–121
T-O-R). The role of caps is to prevent unrestricted elongation of the solenoid and
formation of fibrillary structures. They perform their function by mediating entrop-
ically advantageous contact with water (Fig. 9). In 1Z2F a single C-terminal cap
appears to be present.
The RD value computed for the whole molecule is 0.713 (T-O-R), which, as
already stated, indicates that no concentration of hydrophobicity exists at the center
and that the protein is not stabilized by hydrophobic effects. Instead, the stabilizing
effect appears to be generated by a system of five disulfide bonds. A discussion of
the stabilizing role of disulfides in the context of the fuzzy oil drop model can be
found in [8].
The status of solenoid part in 1Z2F expressed by RD T-O-R 0.760 and for T-
O-H RD 0.803 suggest absence of uni-centric hydrophobic core RD for T-O-R.
Moreover very high value of RD for T-O-H reveals strong influence of intrinsic
hydrophobicity of individual residues on the final structure of the solenoid in this
molecule. The resultant overall distribution of hydrophobicity in solenoid appears
to represent linear, band-like propagation of low/high hydrophobicity. This peculiar
distribution of hydrophobicity in solenoids seems important from the point of view
of structuralization of water (Fig. 10). In 1UCS (an antifreeze protein), where the
Fig. 9 T(blue), O(red) and H(green) distributions for the N- and C-terminal fragments in 1Z2F (an
antifreeze protein)
Fuzzy Oil Drop Model Application—From Globular … 649
entire surface consists of polar residues, water is naturally repelled by the surface. In
the case of a solenoid, however, interaction between the protein and water becomes
far more nuanced. Electrostatic effects are believed to result in the emergence of
aqueous “bands” (with differing structural properties) in the neighborhood of the
protein. Clearly, exposed hydrophobicity has a markedly different effect on water
than any hydrophilic residues present on the surface. Some reports even suggest that
water may “levitate” above hydrophobic patches [9]. Such radical alteration of the
natural ordering of water molecules may explain the observed action of solenoid-
containing antifreeze proteins [10, 11], which prevent water from freezing even at
subzero temperatures.
Under the fuzzy oil drop model amyloids are regarded strongly discordant versus the
theoretical distribution of hydrophobicity. In place of a monocentric hydrophobic
core we observe linear propagation of alternating maxima and minima. These bands
propagate along the axis of the emerging fibril and typically result in identical residues
being found in close proximity to one another. Note that such a structure is not
optimal in terms of charge distribution—this proves that hydrophobic interactions
play a dominant role in determining the tertiary conformation of the amyloid.
All the above properties are exemplified by 2MVX—Aβ1-40 peptide with the
Osaka mutation (E22D) [6].
Our analysis concerns amyloid structures which emerge via complexation of 40-aa
polypeptides. The reference amyloid is an elongated fibril consisting of two identical
650 M. Banach et al.
subfibrils (Fig. 11a—chains A–E and F–J). Each subfibril can be further divided into
two distinct beta sheets. The amyloid as a whole is characterized by RD 0.591
(T-O-R), which indicates the lack of a prominent hydrophobic core.
Figure 12 provides a comparison of T and O profiles for the amyloid fibril.
As seen in Fig. 12, the T and O distributions for peptides located at the end of the
fibril differ somewhat from those found in the central part of the chain.
Given that the presented amyloid consists of two identical subfibrils, we have
singled out residues responsible for complexation of the opposite subunit. The status
of this interface section is described by RD 0.479 (T-O-R), whereas the remaining
section of the amyloid (minus the interface) gives RD 0.654 (T-O-R). This means
that the interface is hydrophobically optimized while the remainder of the structure
diverges from the theoretical distribution. This phenomenon may also explain the
moderate RD value calculated for the complex as a whole.
Analysis of a single chain, as illustrated by the profile in Fig. 13, reveals local
accordance with the theoretical distribution, along with certain fragments for which
the observed distribution appears to correlate negatively with theoretical values
(which is typical for amyloids). These fragments are further described in Table
1. More specifically, the fragments at 5–11, 11–15 and 21–27 seem to follow the
intrinsic hydrophobicity of each residue (high values of the H/O correlation coeffi-
cient, with the T/O coefficient adopting negative values and RD remaining high for
both T-O-R and T-O-H)). This shows that in amyloids the observed distribution is
not merely divergent from its theoretical counterpart, but—in some areas—a polar
opposite of T.
The values given in bold—the parameters supporting the interpretation of high
influence of intrinsic hydrophobicity on the status in amyloid fibril.
Fuzzy Oil Drop Model Application—From Globular … 651
High values of RD in both variants (T-O-R and T-O-H) mean that the amyloid does
not produce a monocentric hydrophobic core. For certain fragments the observed dis-
tribution correlates negatively with the theoretical distribution. The location of such
fragments is highlighted by yellow lines in Fig. 13, and also in Fig. 11b. Figure 11c
reveals linear propagation of hydrophobic bands in place of a monocentric core. Such
propagation, occurring along the axis of elongation, is commonplace in amyloids and
distinguishes them from globular proteins—as noted in other publications [12, 13].
The status of the beta sheet (11–18) in chains B-D and G-I (except for terminal
peptides) exhibits strong divergence between T and O, with a very high value of the
Fig. 13 Theoretical (T—blue), observed (O—pink) and intrinsic (H—green) hydrophobicity dis-
tribution. Areas where O diverges from T in favor of H are marked in pink
652 M. Banach et al.
Table 1 RD parameters and correlation coefficients for fragments selected according to the profile
shown in Fig. 13, and in beta sheets. Edge chains (A, E, F and J) have been eliminated from
calculations in order to more accurately represent an unbounded fibril
Fragment T-O-R T-O-H Correlation coefficient
H-T T-O H-O
1–4 0.413 0.188 0.586 0.691 0.801
5–11 0.737 0.687 −0.303 −0.309 0.745
11–15 0.730 0.721 −0.268 −0.375 0.867
16–21 0.407 0.282 0.484 0.620 0.945
21–27 0.668 0.813 0.124 −0.214 0.911
27–40 0.480 0.429 0.583 0.649 0.761
Beta-sheet 0.640 0.490 0.111 0.090 0.935
11–18
Chains B-D
G-I
H/O correlation coefficient. This indicates that the conformation of this fragment
is dominated by intrinsic hydrophobicity, even though RD (T-O-H) remains slightly
below 0.5 (note that the beta sheet in question also encompasses some residues which
do not belong to the “divergent” fragment identified in Fig. 13. If, in the course of
complexation, a hydrophobic core were to emerge, indefinite elongation of the fib-
ril would not be possible. The proteins discussed at the beginning of this chapter
possess hydrophobic cores and therefore adopt globular conformation, without the
risk of indefinite propagation in any direction. Similarly, the “caps” which termi-
nate solenoid fragments in antifreeze proteins prevent complexation of additional
peptides. The lack of similar structures in amyloids opens the door to unrestricted
elongation, resulting in a fibril where the distribution of hydrophobicity is governed
by intrinsic properties of each residue and no hydrophobic core may form. The linear
propagation of discordant fragments can be seen in Fig. 14.
3 Discussion
The presented spectrum of proteins is another example of how the fuzzy oil drop
model can be used to study the relation between the protein’s hydrophobic core and its
biological properties. Structures classified as “misfolding proteins” are represented
by amyloid β-peptide (Aβ) fibrils 1–40 (PDB ID: 2MVX).
A near-perfect match between the theoretical and observed distribution of
hydrophobicity (equivalent to a molecular surface composed entirely of hydrophilic
residues) is observed in certain antifreeze proteins. Such proteins attain their tertiary
conformation by directing all hydrophobic residues towards the center, where they
can be shielded from contact with water. Exposure of polar (or charged) residues on
the surface causes the surrounding water particles to adapt, and disrupts their natural
Fuzzy Oil Drop Model Application—From Globular … 653
Fig. 14 Amyloid 2MVX—residues distinguished as red visualise the residues identified as dis-
cordant according to profiles shown in Fig. 13. Here the red fragments represent the fragments
distinguished as pink in Fig. 13
ever, the structure of water changes, the protein may adopt a conformation which is
dominated by intrinsic hydrophobicity, potentially favoring amyloid aggregation. As
already noted, the natural structuralization of water is unknown—however it should
be noted that from among the multitude of chemical factors which promote amy-
loidogenesis [16] none involve actual chemical reactions. Furthermore, shaking is a
known causative factor of amyloid transformation. While not chemical in nature, this
process may alter the structural properties of water in a way which allows amyloid
fibrils to form.
The search for adequate model representing protein-water relation has its long
history. The basic model for fuzzy oil drop model the oil drop model introduced
by Kautzmann [17, 18]. The role of hydrophobic interaction was the central point
of research particularly in respect to folding, unfolding and refolding phenomena
[19–27]. The influence of water environment was widely discussed [28, 29]. The
structure analysis in respect to its packing treated as final effect of folding in water
environment introduced new aspects of folding process [30–32]. The general models
for protein folding implemented the aspects based on hydrophobicity particularly
in context of hydrophobicity exposed on the surface of proteins [33–48]. The water
environment was discussed to treated as important partner in folding process [49–53].
Many fundamental papers take part in the history of protein-water relation
[54–60].
The fuzzy oil drop model described in this chapter makes the quantitative assess-
ment of the status of balance between internal force field (inter-atomic interaction in
protein molecule) and external force field characteristics of which appears to have
critical influence of the final form of polypeptide chain [61].
This paper does not discuss any disease-related problems including medical treat-
ment techniques. The best review on the basic molecular level as well as medical
aspects and therapy is given in [62, 63] especially due to the historical context of the
research oriented on mechanism of amyloidosis. The self-assembly and misfolding
processes are critical for cellular activity leading to cellular devastation. The list of
neurodegenerative diseases is even longer after discovery of defective amyloid pro-
cessing in preeclampsia [64]. In this context the search for effective therapy is of high
importance. The example of the proposal focused on inhibition of the fibrillation pro-
cess is given in [65], where the short polypeptide FVFLM is recognised to inhibit the
fibril elongation of KLVFF. However the “stop” mechanism preventing the unlim-
ited elongation of amyloid-like structures identified in biologically active proteins
[15] suggests rather the polypeptides of high preference for helical structural forms
[66]. Helix—especially amphipatic one—is able to aggregate to hydrophobic part of
amyloid with the opposite hydrophilic site exposed toward water environment. This
condition allows water penetration excluding the continuation of fibrilation process
[67].
References
1. Roterman, I., Konieczny, L., Banach, M., Marchewka, D., Kalinowska, B., Baster, Z., Tomanek,
M., Piwowar, M.: Simulation of protein folding process. In: Liwo A. (ed) Computational
Methods To Study the Structure And Dynamics of Biomolecules and Biomolecular Processes,
pp. 599–638. Springer (2014)
2. Ko, T.P., Robinson, H., Gao, Y.G., Cheng, C.H., DeVries, A.L., Wang, A.H.: The refined crystal
structure of an eel pout type III antifreeze protein RD1 at 0.62-A resolution reveals structural
microheterogeneity of protein and solvation. Biophys. J. 84, 1228–1237 (2003)
3. Mol, C.D., Kuo, C.F., Thayer, M.M., Cunningham, R.P., Tainer, J.A.: Structure and function
of the multifunctional DNA-repair enzyme exonuclease III. Nature 374, 381–386 (1995)
4. Hall, D.R., Leonard, G.A., Reed, C.D., Watt, C.I., Berry, A., Hunter, W.N.: The crystal structure
of Escherichia coli class II fructose-1, 6-bisphosphate aldolase in complex with phosphogly-
colohydroxamate reveals details of mechanism and specificity. J. Mol. Biol. 287, 383–394
(1999)
5. Li, C., Guo, X., Jia, Z., Xia, B., Jin, C.: Solution structure of an antifreeze protein CfAFP-501
from Choristoneura fumiferana. J. Biomol. NMR. 32(3), 251–6 (2005)
6. Schütz, A.K., Vagt, T., Huber, M., Ovchinnikova, O.Y., Cadalbert, R., Wall, J., Güntert, P.,
Böckmann, A., Glockshuber, R., Meier, B.H.: Atomic-resolution three-dimensional structure
of amyloid β fibrils bearing the Osaka mutation. Angew. Chem. Int. Ed. Engl. 54, 331–335
(2015)
7. Kalinowska, B., Banach, M., Konieczny, L., Roterman, I.: Application of divergence entropy to
characterize the structure of the hydrophobic core in DNA interacting proteins. Entropy 17(3),
1477–1507 (2015). https://doi.org/10.3390/e17031477
8. Banach, M., Kalinowska, B., Konieczny, L., Roterman, I.: Role of disulfide bonds in stabilizing
the conformation of selected enzymes—an approach based on divergence entropy applied to
the structure of hydrophobic core in proteins. Entropy 18(3), 67 (2016). https://doi.org/10.
3390/e18030067
9. Schutzius, T.M., Jung, S., Maitra, T., Graeber, G., Köhme, M., Poulikakos, D.: Spontaneous
droplet trampolining on rigid superhydrophobic surfaces. Nature 527(7576), 82–85 (2015).
https://doi.org/10.1038/nature15738
10. Modig, K., Qvist, J., Marshall, C.B., Davies, P.L., Halle, B.: High water mobility on the
ice-binding surface of a hyperactive antifreeze protein. Phys. Chem. Chem. Phys. 12(35),
10189–10197 (2010). https://doi.org/10.1039/c002970j. Epub 2010 Jul 29
11. Miskowiec, A., Buck, Z.N., Hansen, F.Y., Kaiser, H., Taub, H., Tyagi, M., Diallo, S.O., Mamon-
tov, E., Herwig, K.W.: On the structure and dynamics of water associated with single-supported
zwitterionic and anionic membranes. J. Chem. Phys. 146(12), 125102 (2017). https://doi.org/
10.1063/1.4978677
12. Banach, M., Konieczny, L., Roterman, I.: The fuzzy oil drop model, based on hydrophobicity
density distribution, generalizes the influence of water environment on protein structure and
function. J. Theor. Biol. 359, 6–17 (2014)
13. Roterman, I., Banach, M., Konieczny, L.: Application of the fuzzy oil drop model describes
amyloid as a ribbonlike micelle. Entropy 19(4), 167 (2017). https://doi.org/10.3390/e19040167
14. Roterman, I., Banach, M., Kalinowska, B., Konieczny, L.: Influence of the aqueous environment
on protein structure—a plausible hypothesis concerning the mechanism of amyloidogenesis.
Entropy 18(10), 351 (2016)
15. Banach, M., Konieczny, L., Roterman, I.: Why do antifreeze proteins require a solenoid?
Biochimie 144, 74–84 (2018)
16. Serpell, L.C.: Alzheimer’s amyloid fibrils: structure and assembly. Biochim. Biophys. Acta
1502, 16–30 (2000)
17. Kuntz Jr., I.D., Kauzmann, W.: Hydration of proteins and polypeptides. Adv. Protein Chem.
28, 239–345 (1974)
18. Kauzmann, W.: Some factors in the interpretation of protein denaturation. Adv. Protein Chem.
14, 1–63 (1959)
Fuzzy Oil Drop Model Application—From Globular … 657
19. Tanford, C.: How protein chemists learned about the hydrophobic factor. Protein Sci. 6(6),
1358–1366 (1997)
20. Tanford, C., Pain, R.H., Otchin, N.S.: Equilibrium and kinetics of the unfolding of lysozyme
(muramidase) by guanidine hydrochloride. J. Mol. Biol. 15(2), 489–504 (1966)
21. Kirshner, A.G., Tanford, C.: The dissociation of hemoglobin by inorganic salts. Biochemistry
3, 291–296 (1964)
22. Tanford, C.: Extension of the theory of linked functions to incorporate the effects of protein
hydration. J. Mol. Biol. 39(3), 539–544 (1969)
23. Tanford, C.: Protein denaturation. Adv. Protein Chem. 23, 121–282 (1968)
24. Tanford, C.: Formation of the native structure of proteins: inferences from the kinetics of
denaturation and renaturation. Ciba Found. Symp. 7, 125–146 (1972)
25. Nozaki, Y., Tanford, C.: The solubility of amino acids and two glycine peptides in aqueous
ethanol and dioxane solutions. Establishment of a hydrophobicity scale. J. Biol. Chem. 246(7),
2211–2217 (1971)
26. Tanford, C., Nozaki, Y., Reynolds, J.A., Makino, S.: Molecular characterization of proteins in
detergent solutions. Biochemistry 13(11), 2369–2376 (1974)
27. Tanford, C.: Protein-lipid interactions. Neurosci Res. Program Bull. 11(3), 193–195 (1973)
28. Baldwin, R.L., Rose, G.D.: How the hydrophobic factor drives protein folding. Proc Natl Acad
Sci U S A. 113(44), 12462–12466 (2016)
29. Baldwin, R.L.: Dynamic hydration shell restores Kauzmann’s 1959 explanation of how the
hydrophobic factor drives protein folding. Proc. Natl. Acad. Sci. U S A 111(36), 13052–13056
(2014)
30. Richardson, J.S., Richardson, D.C., Tweedy, N.B., Gernert, K.M., Quinn, T.P., Hecht, M.H.,
Erickson, B.W., Yan, Y., McClain, R.D., Donlan, M.E., et al.: Looking at proteins: represen-
tations, folding, packing, and design. Biophysical society national lecture, 1992. Biophys. J.
63(5), 1185–1209 (1992)
31. Richardson, J.S.: Introduction: protein motifs. FASEB J. 8(15), 1237–1239 (1994)
32. Richardson, J.S.: The protein surface is a moving target. Structure 12(6), 912–913 (2004)
33. Chothia, C.: Hydrophobic bonding and accessible surface area in proteins. Nature 248(446),
338–339 (1974)
34. Chothia, C.: Principles that determine the structure of proteins. Annu. Rev. Biochem. 53,
537–572 (1984)
35. Chothia, C., Janin, J.: Orthogonal packing of beta-pleated sheets in proteins. Biochemistry
21(17), 3955–3965 (1982)
36. Lesk, A.M., Chothia, C.: Solvent accessibility, protein surfaces, and protein folding. Biophys.
J. 32(1), 35–47 (1980)
37. Chothia, C.: The nature of the accessible and buried surfaces in proteins. J. Mol. Biol. 105(1),
1–12 (1976)
38. Janin, J., Miller, S., Chothia, C.: Surface, subunit interfaces and interior of oligomeric proteins.
J. Mol. Biol. 204(1), 155–164 (1988)
39. Miller, S., Janin, J., Lesk, A.M., Chothia, C.: Interior and surface of monomeric proteins. J.
Mol. Biol. 196(3), 641–656 (1987)
40. Miller, S., Lesk, A.M., Janin, J., Chothia, C.: The accessible surface area and stability of
oligomeric proteins. Nature 328(6133), 834–836 (1987)
41. Creighton, T.E., Chothia, C.: Protein structure. Selecting Buried Residues. Nat. 339(6219),
14–15 (1989)
42. Gerstein, M., Chothia, C.: Packing at the protein-water interface. Proc. Natl. Acad. Sci. U S A
93(19), 10167–10172 (1996)
43. Gong, H., Porter, L.L., Rose, G.D.: Counting peptide-water hydrogen bonds in unfolded pro-
teins. Protein Sci. 20(2), 417–427 (2011)
44. Gong, H., Rose, G.D.: Assessing the solvent-dependent surface area of unfolded proteins using
an ensemble model. Proc. Natl. Acad. Sci. U S A 105(9), 3321–3326 (2008)
45. Fitzkee, N.C., Rose, G.D.: Sterics and solvation winnow accessible conformational space for
unfolded proteins. J. Mol. Biol. 353(4), 873–887 (2005)
658 M. Banach et al.
46. Creamer, T.P., Srinivasan, R., Rose, G.D.: Modeling unfolded states of proteins and peptides.
II. Backbone Solvent Accessibility. Biochem. 36(10), 2832–2835 (1997)
47. Rose, G.D., Wolfenden, R.: Hydrogen bonding, hydrophobicity, packing, and protein folding.
Annu. Rev. Biophys. Biomol. Struct. 22, 381–415 (1993)
48. Rose, G.D., Geselowitz, A.R., Lesser, G.J., Lee, R.H., Zehfus, M.H.: Hydrophobicity of amino
acid residues in globular proteins. Science 229(4716), 834–838 (1985)
49. Dill, K.A., Truskett, T.M., Vlachy, V., Hribar-Lee, B.: Modeling water, the hydrophobic effect,
and ion solvation. Annu. Rev. Biophys. Biomol. Struct. 34, 173–199 (2005)
50. Southall, N.T., Dill, K.A.: Potential of mean force between two hydrophobic solutes in water.
Biophys. Chem. 101–102, 295–307 (2002)
51. Chan, H.S., Dill, K.A.: Solvation: how to obtain microscopic energies from partitioning and
solvation experiments. Annu. Rev. Biophys. Biomol. Struct. 26, 425–459 (1997)
52. Alonso, D.O., Dill, K.A.: Solvent denaturation and stabilization of globular proteins. Biochem-
istry 30(24), 5974–5985 (1991)
53. Dill, K.A., Shortle, D.: Denatured states of proteins. Annu. Rev. Biochem. 60, 795–825 (1991)
54. Chan, H.S., Dill, K.A.: Origins of structure in globular proteins. Proc. Natl. Acad. Sci. U S A
87(16), 6388–6392 (1990)
55. Mobley, D.L., Bayly, C.I., Cooper, M.D., Shirts, M.R., Dill, K.A.: Correction to small molecule
hydration free energies in explicit solvent: an extensive test of fixed-charge atomistic simula-
tions. J. Chem. Theory Comput. 11(3), 1347 (2015)
56. Drechsel, N.J., Fennell, C.J., Dill, K.A., Villà-Freixa, J.: TRIFORCE: tessellated semianalytical
solvent exposed surface areas and derivatives. J. Chem. Theory Comput. 10(9), 4121–4132
(2014)
57. Cohen, P., Dill, K.A., Jaswal, S.S.: Modeling the solvation of nonpolar amino acids in guani-
dinium chloride solutions. J Phys Chem B. 118(36), 10618–10623 (2014)
58. Rocklin, G.J., Mobley, D.L., Dill, K.A., Hünenberger, P.H.: Calculating the binding free ener-
gies of charged species based on explicit-solvent simulations employing lattice-sum methods:
an accurate correction scheme for electrostatic finite-size effects. J. Chem. Phys. 139(18),
184103 (2013)
59. Lukšič, M., Urbic, T., Hribar-Lee, B., Dill, K.A.: Simple model of hydrophobic hydration. J.
Phys. Chem. B. 116(21), 6177–6186 (2012)
60. Fennell, C.J., Dill, K.A.: Physical modeling of aqueous solvation. J. Stat. Phys. 145(2), 209–226
(2011)
61. Schmit, J.D., Ghosh, K., Dill, K.: What drives amyloid molecules to assemble into oligomers
and fibrils? Biophys. J. 100(2), 450–458 (2011)
62. Chiti, F., Dobson, C.M.: Protein misfolding, functional amyloid, and human disease. Annu.
Rev. Biochem. 75, 333–366 (2006)
63. Chiti, F., Dobson, C.M.: Protein misfolding, amyloid formation, and human disease: a summary
of progress over the last decade. Annu. Rev. Biochem. 86, 27–68 (2017)
64. Buhimschi, I.A., Nayeri, U.A., Zhao, G., Shook, L.L., Pensalfini, A., Funai, E.F., Bernstein,
I.M., Glabe, C.G., Buhimschi, C.S.: Protein misfolding, congophilia, oligomerization, and
defective amyloid processing in preeclampsia. Sci. Transl. Med. 6(245), 245ra92 (2014)
65. Kouza, M., Banerji, A., Kolinski, A., Buhimschi, I.A., Kloczkowski, A.: Oligomerization of
FVFLM peptides and their ability to inhibit beta amyloid peptides aggregation: consideration
as a possible model. Phys. Chem. Chem. Phys. 19(4), 2990–2999 (2017)
66. Roterman, I., Banach, M., Konieczny, L.: Propagation of fibrillar structural forms in proteins
stopped by naturally occurring short polypeptide chain fragments. Pharmaceuticals 10(4), 89
(2017)
67. Roterman, I., Banach, M., Konieczny, L.: Towards the design of anti-amyloid short peptide
helices. Bioinformation 14(1), 1–7 (2018)
13 CChemical Shifts in Proteins: A Rich
Source of Encoded Structural
Information
J. A. Vila (B)
IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes,
950-5700 San Luis, Argentina
e-mail: [email protected]
J. A. Vila
Baker Laboratory of Chemistry and Chemical Biology, Cornell University,
Ithaca, NY 14853-1301, USA
Y. A. Arnautova
Molsoft L.L.C, 11199 Sorrento Valley Road, S209, San Diego,
CA 92121, USA
1 Introduction
Before a protein structure can be analyzed in light of its biological function it is nec-
essary to validate it, i.e., to have a clear understanding of its reliability in terms of both
the overall structure and of its details at per-residue level. However, an accurate and
fast validation of protein structures constitutes a long-standing problem in Nuclear
Magnetic Resonance (NMR) spectroscopy [1–4]. For this reason, investigators have
proposed a plethora of methods to determine the accuracy and reliability of protein
structures in recent years [5–12]. Despite this progress, there is a growing need for
more sophisticated, physics-based and fast structure-validation methods [1, 2, 6, 7,
11].
The 13 Cα chemical shifts provide important information about conformations of
peptides and proteins in solution [13–39] and, therefore, can be used as an exquisitely
sensitive probe with which to assess the quality of protein models. We developed
recently a new, physics-based methodology [34], that makes use of observed and
computed {at the Density-functional theory (DFT) level of theory [40]} 13 Cα chemi-
cal shifts for an accurate validation of protein structures in solution and in crystal [41].
The first step in the development of this new methodology involved determining the
factors that affect 13 Cα shielding calculations, such as the protonation/deprotonation
state of distant ionizable groups, sequential nearest-neighbor or covalent geometry
effects (i.e., due to variations in the bond lengths and bond angles of residues) and
the sensitivity of the shielding/deshielding of 13 Cα nuclei to changes in side-chain
conformation. Once all these factors affecting 13 Cα -shielding have been properly
identified and considered, a very important test is to determine the accuracy and
speed of the computation of the 13 Cα -shielding as a function of the size of the basis
set chosen and the Density Functional Theory (DFT) model adopted. These are
important tests because DFT-based quantum mechanical (QM) calculations are very
CPU demanding, despite the ever-increasing computational power available.
The new DFT-based method has been applied to study a number of problems,
such as unblocked statistical-coil tetrapeptides in aqueous solution [32], polyproline
II helix conformation in a proline-rich environment [31], the 13 Cα and 13 Cβ chemical
shifts of cysteines in disulfide-bonded cysteine [42] or determination of the fraction
of the tautomeric forms of histidine in proteins as a function of pH [43]. This new
strategy also provides a unified, self-consistent method to determine high-quality
protein structures, without relying on knowledge-based information [44]. Thus, a
β-sheet or an all α-helical protein structure can be accurately determined by simply
identifying a set of conformations which simultaneously satisfy a number of con-
straints, namely 13 Cα -dynamically-derived torsional angle constraints and Nuclear
Overhauser Effect (NOE) derived distance constraints [29, 44].
The currently used 13 Cα chemical shift-based validation and determination proto-
col [29, 33, 44, 45, 34] exploits the following features: (a) the assignment of chemical
shifts is a fundamental step in a protein structure determination by NMR spectroscopy
[46], and no extra experimental work is needed; (b) in addition to the impact of the
covalent structure, 13 Cα chemical shifts are modulated mainly by the intraresidue
13 C Chemical Shifts in Proteins: A Rich … 661
backbone and side-chain dihedral angles [16, 17, 19, 20–22, 27, 47, 35, 39], with
no significant influence of the amino acid sequence [48]; (c) 13 Cα is ubiquitous in
proteins; and, (d) 13 Cα chemical shifts can be computed with high accuracy at the
QM level of theory.
This chapter is intended to be an overview of the author’s contribution to the field of
protein structure determination and validation using, mainly, information decoded
from the 13 Cα chemical shifts. Consequently, the chapter is organized as follows:
first, the method used to compute the 13 Cα chemical shifts and to analyze the results
are briefly described; second, the main factors affecting the 13 Cα chemical shifts
computation are enumerated and discussed; third, the capabilities of the computed
13 α
C chemical shifts, as a rich source of encoded structural information, are illustrated
by a series of applications that involves, but is not limited to, the determination of
protein structures; and finally a new protein-structure validation server, CheShift-2
[49], with which NMR spectroscopists can assess the quality of their protein models,
before they are deposited in the Protein Data Bank (PDB) [50], is presented. It is worth
noting that the theory, and details, behind alternative protein structure determination
and validation methods are not discussed here and, hence, the reader is referred
instead to an extensive collection of such methods [1, 5–12, 26, 51–61].
2 Methods
All the experimentally determined conformations, unless noted otherwise, were reg-
ularized, i.e., all residues were replaced by the standard Empirical Conformational
Energy Program for Peptides (ECEPP) [62] residues in which bond lengths and bond
angles are fixed (rigid-body geometry approximation) at the standard values [62] and
hydrogen atoms were added, if necessary.
Computations of the 13 Cα chemical shifts involve a series of approximations. For
each amino acid residue X in the protein sequence: (a) the 13 Cα shielding depends,
mainly, on its own backbone conformations [21, 27] and side-chain [19, 20, 35],
with no significant influence of either the amino acid sequence or the position of
the given residue in the sequence, except for residues preceding proline [48]; (b)
each amino acid residue X in the protein sequence can be treated as a terminally-
blocked tripeptide with the sequence Ac-GXG-NMe, with X in the conformation
of the protein structure; (c) the 13 Cα isotropic shielding values (σ) for each amino
acid residue X can be computed at the OB98/6-311 + G(2d,p) level of theory [28]
with the Gaussian 03 package [63]. The remaining residues in each tripeptide are
treated at the OB98/3-21G level of theory, i.e., by using the locally-dense basis set
approach [64]; (d) all ionizable residues can be considered neutral during the QM
calculations [45], unless noted otherwise; (e) no geometry optimization is necessary
662 J. A. Vila and Y. A. Arnautova
because such optimization by ab initio (HF) or DFT methods has only a small effect
on the computed chemical shifts [19].
The computed 13 Cα shieldings (σsubst, th ) are converted to 13 Cα chemical shifts (δ)
by employing the equation δth σref – σsubst, th where the indices denote a theoretical
(th) computation, the reference substance (ref ), and the substance of interest (subst),
i.e., the 13 Cα shielding of a given amino acid residue X. The observed shielding value
of tetramethylsilane (TMS) in the gas phase [65], namely 188.1 ppm, was adopted
as an initial (see below) reference value. All the computed 13 Cα shielding (σsubst, th )
values are calculated using the Gauge-Invariant Atomic Orbital method at the DFT
level of theory as implemented in the GAUSSIAN 03/09 suite of programs (Frisch
et al., 2003). For all purposes, in this chapter, we have used only one exchange-
correlation functional, OB98, because it was shown [30] to be one of the most
accurate and fast functionals with which to reproduce the observed 13 Cα chemical
shifts of proteins in solution (see Sect. 3.2).
Determination of a proper TMS shielding value for each functional is crucial for
an accurate computation of the 13 Cα chemical shifts because it will enable us to
minimize the presence of systematic errors which might bias the chemical shifts-
based analysis. From this point of view the effective TMS value will provide the
most accurate approach to solve the problem because it will not require further
adjustments. Consequently computation of an effective TMS values is central to our
calculations.
By adopting the observed TMS value of 188.1 ppm (Jameson and Jameson, 1987)
as a reference it is possible to find for any functional, the characteristic mean (x o ) and
standard deviation (σ) of the Normal (or Gaussian) fit of the frequency of the errors
distribution. For all functionals tested in our work the characteristic mean value (x o )
appears displaced from its ideal value of 0.0 by a positive, or negative, amount, e.g.,
for OB98 a x o + 3.6 ppm was found. Further analysis [30] indicates that for any
of the 10 functionals tested a straightforward use of the observed TMS shielding
value (188.1 ppm) is not appropriate, if no further corrections are introduced. Hence,
for each functional and basis set chosen it is feasible to find an ‘effective’ TMS
shielding value for which the Normal (or Gaussian) fit shows a zero displacement,
i.e., an effective TMS value that gives a x o 0.0. For example, use of OB98 with a
large [6-311 + G(2d,p)/3-21G] basis set leads to an effective TMS of 184.5 ppm, i.e.,
by subtracting 3.6 ppm from 188.1 ppm [30], that gives a x o 0.0 ppm. Likewise,
use of a small (6-31G/3-21G) basis set leads to an effective TMS of 195.4 ppm.
13 C Chemical Shifts in Proteins: A Rich … 663
The observed chemical shift for each residue i, 13 Cαobserved, i , represents contributions
from an ensemble of rapidly interconverting conformers that coexist in solution.
Then, an accurate comparison between the observed and computed 13 Cα chemical
shifts requires consideration of an ensemble of NMR-derived conformers, rather
than of a single conformation [41, 33]. Consequently, for each amino acid residue
in the sequence, i, the average of the chemical shifts calculated for the individual
residues in the ensemble of conformers representing the NMR structure, < 13 Cα
> i , is computed as:
Ω
< 13 Cα >i (1/Ω) 13
Ci,α k , (1)
k1
where 13 Cαi, k is the computed chemical shift for residue i in conformer k, with 1 ≤ i
≤ N, where N is the number of residues in the sequence. Derivation of Eq. (1) was
obtained through the following approximation:for each residue i the quantity to be
computed must, in principle, be <13 Cα >i Ω 13 α
k1 λk Ci,k , where λk is the Boltz-
mann factor for conformer k, with k1 λk ≡ 1. But, computation of the Boltzmann
factors at QM level of theory is not possible, with the existing computational facilities,
because it would require computation of the total energy at the QM level of theory
for each of the conformers in the ensemble used to represent the NMR structure.
Therefore, the following approximation was used: λk 1/ [48]; in other words,
in this approximation each conformer contributes equally to the average chemical
shift obtained by fast conformational averaging. Whether a computation of a Boltz-
mann average, rather than the arithmetic average, would lead to a more accurate
representation of the 13 Cα chemical shifts needs further investigation.
The < 13 Cα > i value obtained from Eq. (1) is used to compute the conformational-
average difference i between the observed and computed 13 Cα chemical shifts for
each amino acid residue i,
13 α 13 α
i Cobser ved,i − < C >i (2)
N
ca − r msd [(1/N ) 2i ]1/2 , (3)
i1
which is a global property of the protein NMR structure given as the weighted aver-
age of the differences between the experimental 13 Cα chemical shifts and the < 13 Cα
> i —values for all the residues in the protein.
664 J. A. Vila and Y. A. Arnautova
Fig. 1 Flow-chart of the 13 Cα -based protein structure determination protocol described in the
Methods section. Figure adapted from Vila et al. [44]. Copyright 2007 American Chemical Society
13 C Chemical Shifts in Proteins: A Rich … 665
13
Cα and 13 Cβ conformational shifts [27]. The derived torsional constraints are only
for those amino acids residues in the sequence that pertain to a regular structure, i.e.,
to a α-helix or β-sheet. Consequently, these (ϕ,ψ)α,β torsional constraints (shown
in Fig. 1) are limited to, on average, ~50% of the amino acids residues in proteins
because the remaining ones populate non-regular structures.
Then, a clustering procedure, e.g., the Minimal Spanning Tree method [67], is
used to select a small sub-set of the total number of the VTF-derived conformations,
namely those possessing a maximum NOE-derived distance violation lower than
some arbitrary fixed value. For each of these conformations the 13 Cα chemical shifts
are computed as described in Sect. 2.1. Examination of the chemical shifts of all the
amino acids in the ensemble of conformations enables us to identify the amino acid
at each position in the sequence whose computed chemical shifts most closely match
the observed ones, among all these conformations. This identified set of individual
amino acid conformations corresponds to only one conformation of the whole chain:
the ‘theoretical minimal-rmsd model’ [33]. In this model, the 13 Cα chemical shift
of each residue individually best matched the experimental one, thereby providing
a new set of φ, ψ, and χ torsional angle constraints for all amino acid residues in
the sequence, i.e., not just for the amino acid residues in regular structures. Because
the chemical shifts are a multivalued function of the φ, ψ, and χ torsional angles,
the set of torsional angles derived from the ‘theoretical minimal-rmsd model’ does
not, necessarily, represent a unique solution to a given set of observed 13 Cα chemical
shifts values.
Step 2: Only one conformation among all the conformations produced in Step
1 is selected, for example, the conformation possessing the lowest rmsd between
the computed and observed 13 Cα chemical shifts. The selected conformation is used
as a starting one in a new conformational search with the Monte Carlo with Mini-
mization (MCM) method [68, 69]. The MCM search is carried out with two types of
constraints: the original set of NOE-derived distance constraints and the new set of φ,
ψ, χ torsional angles derived in Step 1. This time the conformational search is carried
out using a complete force-field including the internal potential energy described by
ECEPP/05 [70], the solvent free energy calculated by using a solvent-accessible sur-
face area model [71], and an additional energy terms aimed at penalizing violations
of the distance and torsional angle constraints [72]. Convergence of the determina-
tion protocol is monitored using the ca-rmsd between the computed and observed
13 α
C chemical shifts.
Step 3: If the computed ca-rmsd is lower than certain, arbitrary chosen, cutoff
value (ξ), then the procedure is ended. Otherwise, the Step 2 is repeated using a new
set of (φ,ψ,χ) derived from the minimal-rmsd-model of the previous step.
It is worth noting that after our physics-based protocol was published [44] an alter-
native knowledge-based method that makes use of 1 H, 13 Cα , 13 Cβ and 15 N chemical
shifts as restraints, was successfully applied to structure determination of several
proteins [53]. A blind test of computational methods, included several that use also
chemical shifts as restraints, aimed at fully automated determination of protein struc-
tures has been carried out recently [60].
666 J. A. Vila and Y. A. Arnautova
computed
δi ( p H ) (1/) {< ρi,k > δ +,i,k + (1− < ρi,k >)δ 0,i,k } (5)
k1
where δ+,i,k and δ0,i,k are the computed 13 Cα chemical shifts, for the amino acid i in
conformation k, with fully charged and neutral side chains, respectively, Ω is the
number of conformers in the protein ensemble, and < ρi,k > the averaged degree of
charge, as given by Eq. (4).
The current methodology [33, 34] relies on a crucial observation: once residue con-
formations are established by their interactions with the rest of the protein the 13 Cα
shielding of each residue depends, mainly, on its backbone and side-chain confor-
mations, with no significant influence by the nature of the nearest-neighbor amino
acids, except for residues immediately preceding proline [48].
The above observation allows us to parallelize the 13 Cα shielding calculations in
proteins and, hence, to make them computationally feasible. Moreover, a given set of
accurately-determined amino acid residue conformations representing the accessible
conformational space for all the 20 naturally occurring amino acids and showing a
good distribution of side-chain conformations will constitute a reasonable ensemble
with which to carry out tests of the current methodology. The results of these tests
should be transferable to proteins of any class or size. Consequently, we used struc-
tures of three proteins solved by NMR and X-ray, namely PDB id 1D3Z, 2JVD and
1NS1 to evaluate the performance of different DFT functionals and basis sets, as
explained below.
DFT has become a method of choice for QM calculations of the electronic structure
and properties of many molecular and solid systems. Because the exact exchange-
correlation functional is unknown, a large number of approximations has been pro-
posed in the literature making it essential to pursue more accurate and reliable approx-
imate functional, a process which, on the other hand, depends on the applications.
Selection of the most appropriate density functional model for a particular application
becomes one of the main problems of the DFT method. For this reason we decided
[28] to test several density functional models (namely B3LYP, OLYP, PBE1PBE,
668 J. A. Vila and Y. A. Arnautova
OPBE, O3LYP, OPW91, OB98, BPW91, BPBE and B971). The benchmarking was
intended to find not only the most accurate functional with which to reproduce the
observed 13 Cα chemical shifts in solutions but also the fastest one, in terms of CPU
time, because speed of DFT calculations could severely limit their applicability to
proteins. The test was applied to 10 NMR-derived conformations of the 76-residue
α/β protein ubiquitin (PDB id 1D3Z).
Comparison of the observed and computed 13 Cα chemical shifts shows that there
are five functionals, namely OPW91, OB98, OPBE, OLYP, and O3LYP, which are
among the faster ones and, even more importantly, behave very similarly in their
ability to reproduce accurately the observed 13 Cα chemical shifts. In particular, we
observe that OB98 appears to be slightly better than any other of the five functionals
in terms of both the correlation coefficient, R, (or Pearson coefficient) between the
observed and the conformational-averaged 13 Cα chemical shifts and the standard
deviation of the computed conformational-averaged 13 Cα chemical shifts from a
linear regression. Consequently, we chose the OB98 for all the applications [30].
We also compared the results obtained using OB98 with those obtained with
B3LYP, a very popular functional that has been used extensively in our group, and
elsewhere. The correlation existing between averaged 13 Cα chemical shift values
obtained for the 10 conformations of 1D3Z with OB98 and B3LYP functional, is
excellent [30], i.e., showing a correlation coefficient R 0.998 and standard deviation
of 0.300 ppm. This test provides solid evidence that the results and conclusions
obtained using B3LYP do not need to be revised if the OB98 functional is adopted
[30].
To study the dependence of the accuracy and speed of DFT calculations of the 13 Cα
chemical shifts in proteins on the size of the basis set used, six basis sets, viz.,
6-31G/3-21G, 6-31G(d)/3-21G, 6-311G(d, p)/3-21G, 6-311 + G(d, p)/3-21G, and 6-
311 + G(2d,p)/3-21G locally-dense basis-set approximations, and uniform 3-21G/3-
21G set were initially applied [28] to 10 NMR-derived conformations ubiquitin
[54]. For each of these six basis sets, combined with the OB98 functional, the 13 Cα
shielding was computed for 760 amino acid residues by treating each amino acid X
in the sequence as a terminally-blocked tripeptide with the sequence Ac-GXG-NMe
in the conformation of the regularized experimental protein structure. Analysis of
the results [28], in terms of the agreement between the computed and observed 13 Cα
chemical shifts shows that the accuracy with which the observed 13 Cα chemical shifts
are reproduced by using either the small basis set (6-31G/3-21G) or the larger basis
set [6-311 + G(2d,p)/3-21G] is very similar, although, use of the small basis set leads
to a significant decrease in computational time.
13 C Chemical Shifts in Proteins: A Rich … 669
The results also indicates that the 13 Cα chemical shifts computed with the large [6-
311 + G(2d,p)/3-21G] basis set, can be reproduced accurately (within an average error
of ~0.4 ppm) and faster (by ~9 times) by using the small (6-31G/3-21G) basis set after
extrapolating it with: 13 C α −1.597+1.040×13 Cμα . In effect, the correlation existing
between averaged 13 Cα chemical shift values computed for the 32 conformations
of 1NS1 with these two basis sets, is excellent [28], i.e., showing a correlation
coefficient R 0.999 and standard deviation of 0.284 ppm. Even more important,
an analysis of the magnitude of the errors and their distribution carried out for Val
and Arg hypersurfaces, constructed by calculating a grid of 6864 and 6794 points,
respectively, corresponding to different combinations of the φ, ψ, χ1, and χ2 (only
for Arg) torsional angles, indicates that ~70% of them are within ~0.6 ppm and that
the most populated regions of the Ramachandran map are not affected by errors
higher than ~1.0 ppm [28].
In conclusion, the described analysis enabled us to select the smaller basis set
(6-31G/3-21G) that provides accuracy similar to that of a ‘basis set limit’ [6-311 +
G(2d,p)/3-21G] to reproduce the computed chemical shifts, but at a significantly
lower computational cost [28].
Overall, except for the Pro effects, use of the Ac-G-X-G-NMe model peptide for
the computation of the 13 Cα chemical shifts of residue X is a good approximation
because the computed values are accurate within ±0.5 ppm for all residue-types, if
neither the subsequent nor precedent residue-type effects are taken into account (see
Fig. 2).
Experimental protein structures are often solved using force fields which allow vari-
ation of bond lengths and bond angles. However, it is known that QM calculations
are very sensitive to bond lengths and bond angles [16]. Therefore, we have explored
the dependence of the computed 13 Cα -chemical shifts on the bond lengths and bond
angles to establish whether a rigid- rather than non-rigid geometry approximation is
a more accurate representation with which to compute the chemical shifts.
Fig. 2 Histogram of the average, over all 20 conformers of the protein PDB id 2K87, second-
order differences : a with < (X − YX ) > arising from the nature of the sequentially
preceding residue-type (Yyy). X and YX are the differences between the observed chemical
shifts and those computed using the Ac-Gly-Xxx-Gly-NMe and Ac-Gly-Yyy-Xxx-Gly-NMe model
peptides, respectively; b with < (X – XY ) > for the differences arising from the nature of
the subsequent residue-type, i.e., with XY computed with Ac-Gly-Xxx-Yyy-Gly-NMe. Figure
adapted from [48] (with permission of Springer)
13 C Chemical Shifts in Proteins: A Rich … 671
For this test, the structure of ubiquitin deposited in the PDB (PDB id 1UBQ)
was chosen because it possesses non-regularized geometry and has been solved by
X-ray diffraction at 1.8 Å resolution [83]. We have also examined the corresponding
structure with regularized geometry, i.e., the one with all the residues replaced by the
standard ECEPP residue geometry [62], named here as 1UBQregular . Analysis of the
differences between the computed and observed 13 Cα chemical shifts for the 1UBQ
and 1UBQregular structures, leads to rmsd of 3.28 ppm and 2.38 ppm, respectively.
The better agreement obtained with 1UBQregular , rather than 1UBQ, is consistent
with the long-time recognition that the bond lengths and bond angles of both X-
ray and NMR-derived structures are not as highly accurately defined as in studies of
small molecules [16], with which the ECEPP geometry [62] has been parameterized.
Further analysis of the agreement of the two ubiquitin structures with the deposited
electron density data [83] of 1UBQ, in terms of the R-factor, leads to 19.2 and 23.1%
for 1UBQ and 1UBQregular , respectively; while the all-heavy-atom rmsd between
these two structures is 0.142 Å [34].
Overall, the use of regularized geometry, i.e., ECEPP geometry, is an accurate
approximation with which to compute the 13 Cα chemical shifts in proteins and, hence,
is used in most of the application discussed in this chapter.
Among the factors that affect 13 Cα -shielding, which are important for an accu-
rate computation of chemical shifts, is the sensitivity of 13 Cα nuclei to the shield-
ing/deshielding induced by changes in the protonation/deprotonation of distant ion-
izable groups [84–87]. However, these factors have not been taken into account
explicitly in current computations of 13 Cα chemical shifts in proteins at the QM level
of theory because, usually, the calculations are carried out in the gas phase, and the
ionizable residues are treated as neutral groups.
The question of whether the use of neutral, rather than charged, side chains is more
accurate for computation of the 13 Cα chemical shifts of ubiquitin, at a given fix pH,
was investigated as follows [45]. For a given ionizable residue i in a conformation k,
first, the average charge distribution, < ρi,k > , was computed by using Eq. (4), i.e., by
explicit consideration of the 2ξ ionization states for every conformation [75], with ξ
being the number of ionizable groups in the molecule, namely 22; and second, the
13 α
C chemical shifts as a function of the pH,δi ( p H ), were computed by using Eq. (5).
This analysis was applied to 139 conformations of ubiquitin: 138 (10 conformations
from PDB id 1D3Z plus 128 conformations from PDB id 1XQQ) NMR-derived
conformations [54, 88], while the remaining one is an X-ray structure (PDB id
1UBQ) solved at 1.8 Å resolution [83].
Additionally, an extra set of 50 randomly generated conformations for each amino
acid residue X, in the terminally-blocked tripeptide with the sequence Ac-GXG-
672 J. A. Vila and Y. A. Arnautova
NMe, with X being Lysine (Lys), Ornithine (Orn), Diaminobutyric acid (Dab),
Glutamic acid (Glu) or Aspartic (Asp) acid, were also obtained. This set of ran-
domly generated conformations was used to determine: (i) the range of shield-
ing/deshielding of the 13 Cα nucleus of free acidic/basic amino acid residues in solu-
tion, in their fully charged and neutral forms, respectively; (ii) how these ranges
of shielding/deshielding variations compare with those derived from 3058 ionizable
groups of the 139 conformations of the protein ubiquitin; and (iii) how the computed
shielding/deshielding range of variations are influenced by the distance between the
charged side-chain group and the 13 Cα nucleus (for example, there are two chemical
bonds in Asp, rather than three in Glu, separating the deprotonated carboxyl group
from the 13 Cα nucleus). To examine an analogous effect for a basic side-chain group,
such as Lys, use was made of the non-natural amino acids Orn and Dab because, for
these amino acids, the protonated amino group is separated from the 13 Cα nucleus
by four and three chemical bonds, rather than by five in Lys.
The results of this study [45], based on the analysis of 139 conformations of
ubiquitin at pH 6.5, indicate that use of neutral, rather than charged, amino acids is
a significantly better approximation of the observed 13 Cα chemical shifts in solution
for the acidic groups, and a slightly better representation, though significantly less
expensive computationally, for the basic groups (see Fig. 3).
Additionally, our analysis of Lys, Orn and Dab revealed a significantly greater
deshielding of the 13 Cα nucleus (due to the deprotonation of the acidic groups)
than the shielding due to the protonation of the basic groups. The origin of such a
difference can be found in the distance between the ionizable groups and the 13 Cα
nucleus, which is shorter for the acidic than for the basic groups.
To what extent are the chemical shifts of the amino acid residues in a protein affected
by the side-chain orientation? The basis for such a query arises from the fact that the
three torsion angles φ, ψ and χ1 are not independent on each other over the whole
range because they involve a common N-Cα bond [89, 90]. To find an answer to this
question, the dependence of the 13 C chemical shifts on side-chain orientation was
investigated [35], at DFT level of theory, for two-strand antiparallel β-sheet model
peptide with the amino acid sequence Ac-A3 -X-A12 -NH2 where X represents any of
the 17 naturally-occurring amino acids considered here, i.e., not including alanine,
glycine and proline. Because the majority of β-sheets are twisted, rather than planar,
with a right-hand twist in the approximately ±30° range for the backbone dihedral
angles [91–94] conformational parameters for β-sheets may deviate from those for
planar pleated sheets and, hence, are difficult to model by using canonical values. The
fact that β-sheets in proteins appear as parallel or antiparallel strands, or a combination
of both, only exacerbates the modeling problem. For this reasons, the dihedral angles
13 C Chemical Shifts in Proteins: A Rich … 673
adopted for the backbone were taken, and kept fixed, from the experimental structure
of an antiparallel β-sheet, specifically from the 16-residue segment (G41-G56) of the
B3 binding domain of protein G (PDB id 1P7E).
For the 17 naturally occurring amino acids considered the analysis indicates that
there is: (a) good agreement between computed and observed 13 Cα and 13 Cβ chem-
ical shifts, i.e., with correlations coefficient, R, of 0.95 and 0.99, respectively; (b)
significant variability of the computed 13 Cα and 13 Cβ chemical shifts as function of
χ1 for all 17 residues, except for Ser; and (c) a smaller compared to χ1 , although
significant, dependence of the computed 13 Cα chemical shifts of χξ (with ξ ≥ 2) for
11 out of 17 residues.
The above results obtained by Villegas et al. [35] for an antiparallel (16-residue
segment) β-sheet were later validated on a 76 residues α/β protein, i.e., by exploring
the effects of side-chain conformation on the computed 13 Cα chemical shifts [45].
This validation process involved an exhaustive conformational search, starting from
an arbitrary selected conformation of the NMR-determined ubiquitin protein (PDB
id 1D3Z), in which only the torsional angles of the side chains were allowed to vary,
i.e., all backbone dihedral angles (φ, ψ, ω) were fixed at their corresponding observed
674 J. A. Vila and Y. A. Arnautova
We have chosen three examples to illustrate how the structural information decoded
from the observed 13 C chemical shifts can be used in practice: (1) to determine the
fraction of the tautomeric forms of the imidazole ring of histidine (His) in proteins as
a function of pH, provided that the observed 13 Cγ and 13 Cδ2 chemical shifts and the
protein structure, or the fraction of H+ form are known; (2) to determine either all
α-helical or all β-sheet protein structures in solution; and (3) to assess the reliability
of NMR-determined protein models before they are published or deposited in the
PDB. Each of these applications is described in the following subsections.
In 1965 Mandel [96], in a pioneering NMR experiment, detected the imidazole (C2)
protons of histidine (His) residues in Ribonuclease A and in 1966, Bradbury and
Scheraga [97], were able to distinguish between the histidine residues of Ribonu-
clease A, i.e., they resolved the NMR-peaks of three out of four histidines of this
enzyme. Subsequently, use of NMR spectroscopy, X-ray crystallography and theo-
retical studies, based on QM calculations, have continuously evolved in their ability
to determine properties of the histidine residues in solution and in the solid state [43,
79, 98–116]. The reason for this persistent interest in His is due to the fact that this
residue is unique among all 20 naturally occurring amino acids because ~50% of all
enzymes use His in their active sites [117]. This is, mainly, because of the versa-
tility of imidazole His ring, which includes two neutral, chemically-distinct forms,
referred to as Nδ1 -H and Nε2 -H tautomers, and a protonated form, the charged H+
form, with one form favored over the other two by the protein environment and pH. In
addition, His with a pK° of 6.6 [118] is the only ionizable residue that titrates around
neutral pH, allowing the non-protonated nitrogen of its imidazole ring to serve as an
effective ligand for metal binding [79], or to play a crucial role in the proton-transfer
process [103].
13 C Chemical Shifts in Proteins: A Rich … 675
ε
,
676 J. A. Vila and Y. A. Arnautova
Fig. 4 Bar diagram of the average σo shielding values computed for each carbon of the imidazole
ring of His for each of the two tautomers: Nδ1 -H, Nε2 -H, and for the H+ form. The values were
averaged over ~35,000 conformations of histidine in the model tripeptide Ac-GHG-NMe. Grey,
black and white colors indicate the results obtained for the 13 Cγ , 13 Cδ2 and 13 Cε1 nuclei, respectively.
Figure adapted from Vila et al., 2011 (with permission of PNAS)
with ε the single-valued first-order shielding difference computed for the Nε2 -H
tautomer (ε ~ 31 ppm). The fraction of the f δ tautomer is obtained straightforwardly
as: f δ 1− < ρ > − f ε .
The above formulation was used to determine the tautomeric forms of His for each
of 8 selected proteins for which both the structure and the 13 Cδ2 and 13 Cγ chemical
shifts of the imidazole ring of His, are available. In each of these applications the
average degree of protonation < ρ > for all ionizable residues was computed by using
Eq. (4). The tautomeric forms of His are determined by using the expressions for f δ
and f ε given above [43]. Likewise, using the observed values, obs , obtained from
solid-state NMR for unblocked dipeptides, with the sequence His-Leu, His-Met,
Gly-His, Leu-His, His-Ala, His-Glu, Ala-His and His-Asp [99], we also determined
the tautomeric fractions of the imidazole ring of His for each of these 8 compounds.
Results obtained from the 8 proteins indicate that the protonated form is the
most populated one while the distribution of the tautomeric forms for the imidazole
ring varies significantly among different histidine residues in the same protein (see
Fig. 5a). Thus, His226 and His250 show comparable degree of protonation, < ρ >,
although the tautomeric distribution is very different (see Fig. 5a), i.e., showing the
importance of the environment of the histidines in determining the tautomeric forms.
Let us explain the origin of this observation. On one hand, the Nδ1 nucleus of H250
13 C Chemical Shifts in Proteins: A Rich … 677
is located only 2.9 Å from the carbonyl backbone oxygen of S248 (see Fig. 5b),
presumably forming a hydrogen-bond (green dots in Fig. 5b), while the Nε2 nucleus
is exposed to the solvent but the imidazole ring is surrounded by fully protonated
R264 and R266 (data not shown) and, hence, lowering the probability that a proton
binds to Nε2 , in good agreement with the computed tautomeric distribution for H250
in Fig. 5a. On the other hand, the Nε2 nucleus of the imidazole ring of H226 is at
3.3 Å from a backbone carbonyl oxygen of W246 (see Fig. 5c), while the Nδ1 is at
3.1 Å from a backbone amino group of H226 (see Fig. 5c). As a result, a preference
of Nε2 -H over the Nδ1 -H tautomeric form for H226 is expected, in agreement with
the computed tautomeric fractions for this residue in Fig. 5a.
In addition, our results show that for ~70% of the neutral histidine-containing
dipeptides the method leads to fairly good agreement between the calculated and
the experimental tautomeric form. Co-existence of different tautomeric forms in the
same crystal structure may explain the disagreement obtained for the remaining 30%
of dipeptides.
In this section we illustrate, with two examples, how the structural information
encoded in the 13 Cα chemical shifts can be used to determine an ensemble of con-
formations, provided that a set of NOE-derived distance constraints, is available.
However, since the chemical shifts are sensitive to the dynamics of a protein on the
microsecond time scale [88] the question whether a single rather than an ensemble
of conformations is a better representation of the NMR observables, such as the
chemical shifts, must be investigated first.
Fig. 6 Bar diagram of the rmsd (ppm) between the computed and observed 13 Cα chemical shifts
of ubiquitin. Black-filled bar (2.74 ppm) represents the results from the SAR structure. Grey-filled
bars represent the rmsd for each of the generated 5 new models; the horizontal black line represents
the ca-rmsd (2.36 ppm) computed from the ensemble of 5 new models. Figure adapted from [41]
(with permission of the International Union of Crystallography)
tionally labile, as indicated by both the ca-rmsd and the theoretical minimal-rmsd
model analyses, and this must be taken into account to predict the 13 Cα chemical
shifts most accurately.
fold as monomers and do not contain disulfide bonds, is very valuable because such
determinations can provide important information for force-field development and
evaluation or improvement of search algorithms aimed at an efficient exploration of
the conformational space [123–126].
The results obtained indicate that an accurate all β-sheet structure can be deter-
mined by simply identifying a set of conformations which simultaneously satisfy a
set of constraints including 13 Cα -dynamically-derived torsional angle constraints for
all amino acid residues in the sequence and a fixed set of NOE-derived distance con-
straints [29]. Among the thousands of conformations generated by the VTF approach,
i.e., during the step 1 of the protein structure determination protocol shown in Fig. 1,
25 of them (see Fig. 7a) were selected by using a clustering procedure. This small
set of conformation was used to determine the theoretical minimal-rmsd model that
provides us with a set of φ, ψ, and χ torsional angle constraints for all the residues
in the sequence not just for those in α-helix or β-sheet regions. Using this set of tor-
sional angle constraints (φ, ψ, χ), combined with different number of NOE-derived
constraints, 2 sets of conformations of the BS2 peptide were determined after the step
2 of the protocol. One set of 20 conformations (shown in Fig. 7b) was obtained by
using 118 NOE-derived distance constraints, while the other set of 10 conformations
(shown in Fig. 7c) was obtained by using 130 NOE-derived distance constraints.
Regardless of the number of the NOE’s-derived distance constraints used, addition
of the 13 Cα -derived torsional constraints led to a noticeably lower ca-rmsd’s (2.2 and
3.5 ppm, for the set of 20 and 10 conformations, respectively) compared to the 20
models obtained by Santiveri et al. [127] who used a full set of 130 NOE’s-derived
distance constraints but no 13 Cα chemical shift information (4.6 ppm). In line with
this finding, graphical inspection of the results shown in Fig. 7b–c also indicated
that use of 13 Cα -derived torsional constraints led to sets of conformations with less
side-chain torsional angle spreading, i.e., as can be seen from comparison of Fig. 7b
and c against 7d, with the latter obtained by Santiveri et al [127]. In addition, the
correlation coefficient, R, between the observed and computed 13 Cβ chemical shifts
was somewhat better for the two sets obtained using the 13 Cα -based determination
protocol (shown in Fig. 1). Thus, R is 0.99 and 0.98 for the 20 and 10 conformation
sets, respectively, while R is 0.97 for the set of conformation derived by Santiveri
et al [127].
Overall, analysis of the ca-rmsd, the NOE-derived distance violations, the 13 Cβ
chemical shifts, and some stereo chemical quality factors for these sets, as a mea-
sure of the closeness with which the calculations reproduce the structure in solution,
indicates that our self-consistent physics-based method is able to produce a more
accurate set of conformations (shown in Fig. 7b and c) than that obtained with the
traditional methods [127] [shown in Fig. 7d]. Our results also suggest that for a
flexible molecule in solution, like BS2, it may not be possible to determine a single
structure that would satisfy all the constraints simultaneously. This is a consequence
of the well-known fact that NMR parameters, such as the observed NOE-derived
distances and the 13 Cα chemical shifts, correspond to a dynamic ensemble of con-
formations and, therefore, may not be reproduced exactly by a limited set of static
structures [44, 128].
682 J. A. Vila and Y. A. Arnautova
The solution NMR structures of both full length (residues 1–77) and truncated
(residues 1–46) forms of YnzC protein (PDB id 2JVD) from Bacillus subtilis [133],
that is part of the small yneA SOS response operon that regulates cell division in
this organism [134], have been determined recently [135]. The corresponding X-ray
crystal structure (PDB ID, 3BHP) was solved by Kuzin et al. [133] at 2.0 Å resolu-
tion. The unique two-helix monomeric structure of YnzC, with no disulfide bonds,
makes it an attractive subject for testing our physics-based methodology for protein
structure determination.
The goal of this application is two two-fold. First, as a blind test, we attempted to
determine whether it is possible to obtain an ensemble of conformations for which
each individual conformer simultaneously satisfies the NOE-derived distance con-
straints and the 13 Cα -derived torsional constraints for the YnzC protein in solution
[136]. Although the solution NMR structure [135] of this protein had been solved at
the time of this blind test, the only information provided was a full set of both the
observed 13 Cα chemical shifts and the NOE-derived distance constraints. In particu-
lar, no information about the coordinates of the solved structures of the YnzC protein
[135] or the heteronuclear 15 N-1 H NOE data was provided at the moment of the test.
Our second goal was to carry out a cross-validation test of high-quality sets of
conformations obtained for the YnzC protein in solution by using alternative deter-
mination methods, namely, the solution NMR set of conformations (PDB id, 2JVD)
obtained by using NOE-derived distance constraints, dihedral-angle constraints and
hydrogen-bond constraints [135], and the 2.0-Å X-ray crystal structure (PDB id,
3BHP) (Kuzin et al. [133]. For this second goal, several validation scores were used
[136], including: (i) Recall, Precision, F-measure (RPF) analysis [6]; (ii) several
global quality score indicators provided by Verify3D [10], ProsaII [137], Procheck
[8], and MolProbity [5]; (iii) the ca-rmsd and rmsd between observed 13 Cα chemical
shifts and those computed at the DFT level, and (iv) the backbone rmsd between
these refined structures and the mathematical average coordinates of the ensemble
of NMR structures of YnzC(1–48) deposited in the PDB.
By carrying out a blind test we demonstrated [136] that an accurate all α-helical set
of protein structures can be determined by simply identifying conformations which
simultaneously satisfy a set of constraints, including 13 Cα -dynamically-derived tor-
sional angle constraints for all amino acid residues in the sequence and a fixed set
of 1022 NOE-derived distance constraints. The protein structure determination was
carried out as follows: after generation of thousands of conformations using the VTF
procedure (step 1) 10 of them, shown in Fig. 8b, were selected, i.e., those possessing
a maximum NOE-derived distance violation lower than some fixed cutoff value; only
one of the 10 conformations produced in step 1 was selected. The selected confor-
mation was used as a starting one in a conformational search carried out with two
types of constraints: the original fixed limited NOE-derived distance constraints and
the set of φ, ψ, χ torsional angles derived from step 1. The resulting new set of 10
conformations is shown in Fig. 8c. Repetition of the step 2 with a tighter tolerance
range, than in the previous iteration, for the torsional angle constraints enabled us
684 J. A. Vila and Y. A. Arnautova
Fig. 8 Results for the 77-residue YnzC protein from Bacillus subtilis. a Bar diagram indicating the
rmsd (ppm) between the computed and observed 13 Cα chemical shifts for each of the 10 conforma-
tions from Set-NOE-CS (red bars), for the 20 conformations from 2JVD (yellow bars), and for each
of the three chains in the 2.0 Å crystal structure of YnzC protein, PDB id 3BHP, namely chain a, b
and c (black, cyan and green bars). Black (1.54 ppm) and red (1.38 ppm) horizontal lines show the
ca-rmsd values computed for the residues 1–46 of 2JVD and Set-NOE-CS, respectively; b Super-
position of 10 NMR-derived conformations of YnzC (represented by ribbon diagrams) obtained
after the VTF procedure, in Step 1 (see Flow-chart in Fig. 1); c Same as b after the conformational
search in Step 2; d Same as c after repeating the conformational search in Step 2 (Set-NOE-CS),
i.e., this time by using a new set of torsional angles (ϕ, ψ, χ) derived from the set of conformations
shown in panel (c); e superposition of 20 NMR-derived conformations (PDB id 2JVD) of YnzC
protein obtained by Aramini et al. [135]; and f Graphic representation of the X-ray determined
structure of YnzC protein (PDB id 3HBP); the asymmetric unit contains 3 similar, but not identical,
copies of the YnzC protein molecule, namely chain a, b and c. Figure a adapted from [136] (with
permission of PNAS)
to determine the final set of 10 conformations shown in Fig. 8d, i.e., the so-called
Set-NOE-CS.
13 C Chemical Shifts in Proteins: A Rich … 685
The PDB is the most important archive of experimental protein structures solved
by X-ray crystallography and NMR spectroscopy. The large number of structures
deposited in PDB constitutes an extraordinary source of information that has been,
and continuously is, used for a wide range of applications in structural drug design,
molecular modeling, force-field parameterization, molecular biology applications,
etc. Some deposited protein structures, showing few, or a large number, of flaws,
are formally withdrawn from the data-base and, hence, considered as obsolete, even
though their coordinates remain available in PDB. In most cases, a successor (or
superseded) structure replaces the old obsolete one. The large number of obsolete
structure indicates that development of accurate validation protocols remains an
important task.
An ideal validation method should meet two requirements. First, it should be strong
rather than weak. A validation method is considered ‘strong’ if it is able to assess how
well a structure, or an ensemble of structures, predicts experimental data not used in
686 J. A. Vila and Y. A. Arnautova
suggest that CheShift can provide a standard with which to evaluate the quality of
protein structures solved by either X-ray crystallography or NMR-spectroscopy, if
the experimentally observed 13 Cα chemical shifts are available.
of the protein, reflects the flexibility of these segments of the molecules in solution,
as is clearly shown by the CheShift-2 validation of 2K5Q (see Fig. 9b).
obsolete) having a wrong fold; while the other one, 2B95 (that replaced the obsolete
1TGQ in the PDB), showing a correct fold. This difference is a result of the oligomeric
state assumed during the protein-structure determination, namely a monomer for
1TGQ, and a homodimer for 2B95, as pointed out by Nabuurs et al. [11].
Validation of both protein ensembles, as a whole, shows that 2B95 is a slightly
better representation of the observed 13 Cα chemical shifts, in terms of the ca-rmsd
[34], than 1TGQ, viz., ca-rmsd = 2.08 and 2.35 ppm, for 2B95 and 1TGQ, respec-
tively. However, the ca-rmsd difference between these two ensembles (~0.30 ppm)
is not large enough to assure, unambiguously, that the 1TGQ ensemble needs fur-
ther refinement. In fact, a similar difference in terms of rmsd, i.e., within a range of
~0.30 ppm, was found among 5 new models of the protein ubiquitin (see grey bars
in Fig. 6), all of which fit X-ray diffraction data with R and Rfree factors similar to
those for the deposited X-ray structure, PDB id1UBQ, solved at 1.8 Å resolution
[41]. Certainly, these 5 new models can be considered to be of comparable structural
quality. Consequently, variations of ca-rmsd ~0.30 ppm cannot be used as a univer-
sal criterion to unequivocally determine if a protein, such as 1TGQ, needs further
refinement.
Analysis of dynein light chain 2A protein illustrates that validation of a protein as
a whole (global validation), e.g., with the ca-rmsd, may not enable us to determine
unambiguously whether one protein model is of better quality than another model
of the same protein, while the validation at a per-residue basis (local validation),
e.g., as with the CheShift-2 server, does (see Fig. 10). To further test the ability of
CheShift-2 server to detect small differences between protein models, a small set of
15 obsolete/successor pairs of proteins was also considered (see Supplementary Data
of [49]. The results indicate that the CheShift-2 server constitutes a fast and accurate
validation tool with which to determine, at the per-residue basis, the existence of local
flaws in protein models even for conformations that differ in small details, as for the
obsolete and successor models of Membrane-bound Lytic Murein Transglycosylase
D (fragment Lysm Domain) (see Fig. 11).
In general, pairs of obsolete and successor proteins present in PDB can be used
as a benchmark set with which to test validation methods. These ensembles of obso-
lete/successor pairs of proteins are very appealing because their members possess
different topology and numbers of residues and a complete sets of 13 Cα chemical
shifts are available for a large number of them from the Bio Magnetic Resonance
Data Bank (BMRB) [117].
In this chapter we have illustrated how the information encoded in the 13 C chemical
shifts can be used for an assorted number of applications, namely, from protein
structure prediction to accurate detection of structural flaws, at a residue-level, in
NMR-determined protein models.
690 J. A. Vila and Y. A. Arnautova
Fig. 10 Two models of the dynein light chain 2A protein: a 1TGQ (obsolete) and b 2B95 (succes-
sor). Both models are shown as ribbons and colored according to CheShift-2. The BMRB accession
number, from which the observed 13 Cα chemical shifts were obtained, is 6527. Figure adapted from
[49] (with permission of Oxford University Press)
13 C Chemical Shifts in Proteins: A Rich … 691
The ability to detect and accurately characterize the mobility of the surface side
chains by computing 13 Cα chemical shifts constitutes one of the strengths of the cur-
rent methodology. Hence, we are planning to focus our research on the development
of new physics-based algorithms for a fast and accurate determination and validation
of side-chain conformations, with the goal to improve the quality of NMR-determined
protein models. Since NMR spectroscopy provides chemical shifts for several other
nuclei, besides 13 Cα , feasibility of their DFT-computation and benefits of including
the information encoded in these data in structure determination protocols is cur-
rently under investigation in our group. In general, new developments in the field
of NMR spectroscopy are needed in order to develop protocols for high-throughput
NMR determination of high-quality protein structures in solution.
References
1. Bhattacharya, A., Tejero, R., Montelione, G.T.: Evaluating protein structures determined by
structural genomics consortia. Proteins 66, 778–795 (2007)
2. Billeter, M., Wagner, G., Wüthrich, K.: Solution NMR structure determination of proteins
revisited. J. Biomol. NMR 42, 155–158 (2008)
3. Williamson, M.P., Craven, C.J.: Automated protein structure calculation from NMR data. J.
Biomol. NMR 43, 131–143 (2009)
4. Williamson, M.P., Kikuchi, J., Asajura, T.: Application of 1H-NMR chemical-shifts to mea-
sure the quality of protein structures. J. Mol. Biol. 247, 541–546 (1995)
5. Davis, I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang, X., Murray, L.W.,
Arendall III, W.B., Snoeyink, J., Richardson, J.S., Richardson, D.C.: MolProbity: all-atom
contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35,
W375–W383 (2007)
6. Huang, Y.J., Powers, R., Montelione, G.T.: Protein NMR Recall, Precision, and F-measure
scores (RPF scores): Structure quality assessment measures based on information retrieval
statistics. J. Am. Chem. Soc. 127, 1665–1674 (2005)
7. Huang, Y.J., Tejero, R., Powers, R., Montelione, G.T.: A topology-constrained distance net-
work algorithm for protein structure determination from NOESY data. Proteins 62, 587–603
(2006)
8. Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.: PROCHECK—a program to
check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291 (1993)
9. Lovell, S.C., Davis, I.W., Arendall III, W.B., de Bakker, P.I.W., Word, J.M., Prisant, M.G.,
Richardson, J.S., Richardson, D.C.: Structure validation by Cα geometry: φ, ψ, and Cβ devi-
ation. Proteins 50, 437–450 (2003)
10. Lüthy, R., Bowie, J.U., Eisenberg, D.: Assessment of protein models with three-dimensional
profiles. Nature 356, 83–85 (1992)
11. Nabuurs, S.B., Spronk, C.A.E.M., Vuister, G.W., Vriend, G.: Tradional biomolecular structure
determination by NMR spectroscopy allows for major errors PLOS. Comp. Biol. 2, 71–79
(2006)
12. Vriend, G.: WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8,
52–56 (1990)
13. Berjanskii, M., Wishart, D.S.: A simple method to predict protein flexibility using secondary
chemical shifts. J. Am. Chem. Soc. 127, 14970–14971 (2005)
14. Berjanskii, M., Wishart, D.S.: The RCI server: rapid and accurate calculation of protein
flexibility using chemical shifts. Nucleic Acids Res. 35, W531–W537 (2007)
13 C Chemical Shifts in Proteins: A Rich … 693
15. Cornilescu, G., Delaglio, F., Bax, A.: Protein backbone angle restraints from searching a
database for chemical shift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)
16. de Dios, A.C., Pearson, J.G., Oldfield, E.: Chemical shifts in proteins: An ab initio study
of carbon-13 nuclear magnetic resonance chemical shielding in glycine alanine and valine
residues. J. Am. Chem. Soc. 115, 9768–9773 (1993)
17. de Dios, A.C., Pearson, J.G., Oldfield, E.: Secondary and tertiary structural effects on protein
NMR chemical shifts: An ab initio approach. Science 260, 1491–1496 (1993)
18. Frank, A., Möller, H.M., Exner, T.H.: Toward the quantum chemical calculation of NMR
chemical shifts of proteins. 2 Level of theory, basis set, and solvent model dependence. J.
Chem. Theory Comput. 8, 1480–1492 (2012)
19. Havlin, R.H., Le, H., Laws, D.D., de Dios, A.C., Oldfield, E.: An ab initio quantum chemical
investigation of carbon–13 NMR shielding tensors in glycine, alanine, valine, isoleucine,
serine, and threonine: Comparisons between helical and sheet tensors, and effects of χ1 on
shielding. J. Am. Chem. Soc. 119, 11951–11958 (1997)
20. Iwadate, M., Asakura, T., Williamson, M.P.: Cα and Cβ carbon-13 chemical shifts in proteins
from an empirical database. J. Biomol. NMR 13, 199–211 (1999)
21. Kuszewski, J., Qin, J., Gronenborn, A.M., Clore, M.: The impact of direct refinement against
13Cα and 13Cβ chemical shifts on protein structure determination by NMR. J. Magn. Reson.
Ser. B 106, 92–96 (1995)
22. Luginbühl, P., Szyperski, T., Wüthrich, K.: Statistical basis for the use of 13Cα chemical shift
in protein structure determination. J. Magn. Reson. 109, 229–233 (1995)
23. Meiler, J.: PROSHIFT: protein chemical shift prediction using artificial neural networks. J.
Biomol. NMR 26, 25–37 (2003)
24. Neal, S., Nip, A.M., Zhang, H., Wishart, D.S.: Rapid and accurate calculation of protein 1H,
13C and 15 N chemical shifts. J. Biomol. NMR 26, 215–240 (2003)
25. Shen, Y., Bax. Ad.: Protein backbone chemical shifts predicted from searching a database for
torsional angle and sequence homology. J. Biomol. NMR, 38, 289–302 (2007)
26. Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y.,
Singarapu, K.K., Lemak, A., et al.: Consistent blind protein structure generation from NMR
chemical shift data. Proc. Natl. Acad. Sci. U. S. A. 105, 4685–4690 (2008)
27. Spera, S., Bax, A.: Empirical correlation between protein backbone conformation and Cα
and Cβ 13C nuclear magnetic resonance chemical shifts. J. Am. Chem. Soc. 113, 5490–5492
(1991)
28. Vila, J.A., Arnautova, Y.A., Martin, O.A., Scheraga, H.A.: Quantum-mechanics-derived 13Cα
chemical shift server (CheShift) for Protein Structure validation. Proc. Natl. Acad. Sci. U. S.
A 106, 16972–16977 (2009)
29. Vila, J.A., Arnautova, Y.A., Scheraga, H.A.: Use of 13Cα chemical shifts for accurate deter-
mination of β-sheet structures in solution. Proc. Natl. Acad. Sci. U. S. A. 105, 1891–1896
(2008)
30. Vila, J.A., Aramini, J.M., Rossi, P., Kuzin, A., Su, M., Seetharaman, J., Xiao, R., Tong, L.,
Montelione, G.T., Scheraga, H.A.: Quantum chemical 13Cα chemical shift calculations for
protein NMR structure determination. refinement, and validation. Proc. Natl. Acad. Sci. U.
S. A. 105, 14389–14394 (2008)
31. Vila, J.A., Baldoni, H.A., Ripoll, D.R., Ghosh, A., Scheraga, H.A.: Polyproline II helix con-
formation in a proline-rich environment: a theoretical Study. Biophys. J. 86, 731–742 (2004)
32. Vila, J.A., Baldoni, H.A., Ripoll, D.R., Scheraga, H.A.: Unblocked statistical-coil tetrapep-
tides in aqueous solution: quantum-chemical computation of the carbon-13 NMR chemical
shifts. J. Biomol. NMR 26, 113–130 (2003)
33. Vila, J.A., Villegas, M.E., Baldoni, H.A., Scheraga, H.A.: Predicting 13Cα chemical shifts
for validation of protein structures. J. Biomol. NMR 38, 221–235 (2007)
34. Vila, J.A., Scheraga, H.A.: Assessing the accuracy of protein structures by quantum mechan-
ical computations of 13Cα chemical shifts. Acc. Chem. Res. 42, 1545–1553 (2009)
35. Villegas, M.E., Vila, J.A., Scheraga, H.A.: Effects of side-chain orientation on the 13C chem-
ical shifts of antiparallel β-sheet model peptides. J. Biomol. NMR 37, 137–146 (2007)
694 J. A. Vila and Y. A. Arnautova
36. Wishart, D., Bigam, C.G., Yao, J., Abildgaard, F., Dyson, H., Oldfield, E., Markley, J., Sykes,
B.: 1H, 13C and 15 N chemical shift referencing in biomolecular NMR. J. Biomol. NMR 6,
135–140 (1995)
37. Wishart, D., Bigam, C.G., Holm, A., Hodges, R.S., Sykes, B.D.: 1H, 13C and 15 N random
coil NMR chemical shifts of the common amino acids. I Investigation of nearest-neigbor
effects. J. Biomol. NMR 5, 67–81 (1995)
38. Xu, X.-P., Case, D.A.: Probing multiple effects on 15 N, 13Cα, 13Cβ and 13C chemical shifts
in peptides using density functional theory. Biopolymers 65, 408–423 (2002)
39. Xu, X.-P., Case, D.A.: Automated prediction of 15 N, 13Cα, 13Cβ and 13C’ chemical shifts
in proteins using a density functional database. J. Biomol. NMR 21, 321–333 (2001)
40. Parr, R.G., Yang, W.: Density functional theory of atoms and molecules. Oxford University
Press, New York (1989)
41. Arnautova, Y.A., Vila, J.A., Martin, O.A., Scheraga, H.A.: What can we learn by computing
13Cα chemical shifts for X-ray protein models? Acta Crystallogr. D D65, 697–703 (2009)
42. Martin, O.A., Villegas, M.E., Vila, J.A., Scheraga, H.A.: Analysis of 13Cα and 13Cβ chemical
shifts of cysteine and cystine residues in proteins: A quantum chemical approach. J. Biomol.
NMR 46, 217–225 (2010)
43. Vila, J.A., Arnautova, Y.A.: Vorobjev and Scheraga HA. Assessing the fractions of tautomeric
forms of the imidazole ring of histidine in proteins as a function of pH. Proc. Natl. Acad. Sci.
U. S. A. 108, 5602–5607 (2011)
44. Vila, J.A., Ripoll, D.R., Scheraga, H.A.: Use of 13Cα chemical shifts in protein structure
determination. J. Phys. Chem. B 111, 6577–6585 (2007)
45. Vila, J.A., Scheraga, H.A.: Factors affecting the use of 13Cα chemical shifts to determine,
refine, and validate protein structures. Proteins: structure. Funct. Bioinformatics 71, 641–654
(2008)
46. Wüthrich, K.: NMR of Proteins and Nucleic Acids. Wiley, New York, NY, U. S. A. (1986)
47. Sun, H., Sanders, L.K., Oldfield, E.: Carbon-13 NMR shielding in the twenty common amino
acids: comparisons with experimental results in proteins. J. Am. Chem. Soc. 124, 5486–5495
(2002)
48. Vila, J.A., Serrano, P., Wüthrich, K., Scheraga, H.A.: Sequential nearest-neighbor effects on
computed 13Cα chemical shifts. J. Biomol. NMR 48, 23–30 (2010)
49. Martin, O.A., Vila, J.A., Scheraga, H.A.: CheShift-2: graphic validation of protein structures.
Bioinformatics 28, 1538–1539 (2012)
50. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov,
I.N., Bourne, P.E.: Protein Data Bank Nucleic Acids Res. 28, 235–242 (2000)
51. Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W.,
Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice, L.M., Simonson, T.,
Warren, G.L.: Crystallography and NMR system: a new software suite for macromolecular
structure determination. Acta Crystallogr D 54, 905–921 (1998)
52. Brünger, A.T.: Version 1.2 of the Crystallography and NMR system. Nat. Protoc. 2, 2728–2733
(2007)
53. Cavalli, A., Salvatella, X., Dobson, C.M., Vendruscolo, M.: Protein structure determination
from NMR chemical shifts. Proc. Natl. Acad. Sci. U.S.A. 104, 9615–9620 (2007)
54. Cornilescu, G., Marquardt, J.L., Ottiger, M., Bax, A.: Validation of protein structure from
anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc.
120, 6836–6837 (1998)
55. Frank, A., Onila, I., Moller, H.M., Exner, T.E.: Toward the quantum chemical calculation of
nuclear magnetic resonance chemical shifts of proteins. Proteins 79(2189), 2202 (2011)
56. Guerry, P., Herrmann, T.: Advances in automated NMR protein structure determination. Q.
Rev. Biophys. 44, 257–309 (2011)
57. Güntert, P.: Structure calculation of biological macromolecules from NMR data. Q. Rev.
Biophys. 31, 145–237 (1998)
58. Güntert, P.: Automated structure determination from NMR spectra. Eur. Biophys. J. 38,
129–143 (2009)
13 C Chemical Shifts in Proteins: A Rich … 695
59. Güntert, P., Braun, W., Wüthrich, K.: Efficient computation of threedimensional protein struc-
tures in solution from nuclear magnetic resonance data using the program DIANA and the
supporting programs CALIBA, HABAS and GLOMSA. J. Mol. Biol. 217, 517–530 (1991)
60. Rosato, A., Aramini, J.M., Arrowsmith, C., Bagaria, A., Baker, D., Cavalli, A., Doreleijers,
J.F., Eletsky, A., Giachetti, A., Guerry, P., et al.: Blind testing of routine, fully automated
determination of protein structures from NMR data. Structure 20, 227–236 (2012)
61. Rosato, A., Bagaria, A., Baker, D., Bardiaux, B., Cavalli, A., Doreleijers, J.F., Giachetti, A.,
Guerry, P., Guntert, P., Herrmann, T., et al.: CASDNMR: critical assessment of automated
structure determination by NMR. Nat. Methods 6, 625–626 (2009)
62. Némethy, G., Gibson, K.D., Palmer, K.A., Yoon, C.N., Paterlini, G., Zagari, A., Rumsey, S.,
Scheraga, H.A.: Energy parameters in polypeptides. 10. Improved geometrical parameters
and nonbonded interactions for use in the ECEPP/3 algorithm, with application to praline-
containing peptides. J. Phys. Chem. 96, 6472–6484 (1992)
63. Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R.,
Zakrzewski, V.G., Montgomery, J.A., Jr Stratmann, R.E., Burant, J.C., et al.: Gaussian 03,
Revision E.01, Gaussian, Inc., Wallingford CT (2003)
64. Chesnut, D.B., Moore, K.D.: Locally dense basis-sets for chemical-shift calculations. J. Comp.
Chem. 10, 648–659 (1989)
65. Jameson, A.K., Jameson, C.J.: Gas-phase 13C chemical shifts in the zero-pressure limit:
Refinements to the absolute shielding scale for 13C J. Chem. Phys. Lett. 134, 461–466 (1997)
66. Vásquez, M., Scheraga, H.A.: Variable-target-function and buildup procedures for the calcula-
tion of protein conformation—application to bovine pancreatic trypsin-inhibitor using limited
simulated nuclear magnetic-resonance data. J. Biomol. Struct. Dyn. 5, 757–784 (1988)
67. Kruskal Jr., J.B.: On the shortest spanning subtree of a graph and the traveling salesman
problem. Proc. Am. Math. Soc. 7, 48–50 (1956)
68. Li, Z., Scheraga, H.A.: Monte Carlo minimization approach to the multiple minima problem
in protein folding. Proc. Natl. Acad. Sci. U. S. A. 84, 6611–6615 (1987)
69. Li, Z., Scheraga, H.A.: Structure and free energy of complex thermodynamic systems. J.
Molec. Str. (Theochem) 179, 333–352 (1998)
70. Arnautova, Y.A., Jagielska, A., Scheraga, H.A.: A new force field (ECEPP05) for peptides
proteins and organic molecules. J. Phys. Chem. B 110, 5025–5044 (2006)
71. Vila, J., Williams, R.L., Vásquez, M., Scheraga, H.A.: Empirical solvation models can be used
to differentiate native from near-native conformations of bovine pancreatic trypsin inhibitor
Proteins: structure. Funct. Genet. 10, 199–218 (1991)
72. Ripoll, D.R., Ni, F.: Refinement of the thrombin-bound structure of a hirudin peptide by a
restrained electrostatically driven monte-carlo method. Biopolymers 32, 359–365 (1992)
73. Vorobjev, Y.N., Scheraga, H.A.: A fast adaptive multigrid boundary element method for
macromolecule electrostatic computations in solvent. J. Comp. Chem. 18, 569–583 (1997)
74. Vorobjev, Y.N., Vila, J.A., Scheraga, H.A.: FAMBE-pH: a fast and accurate method to compute
the total solvation free energies of proteins. J. Phys. Chem. B 112, 11122–11136 (2008)
75. Ripoll, D.R., Vorobjev, Y.N., Liwo, A., Vila, J.A., Scheraga, H.A.: Coupling between folding
and ionization equilibria: Effects of pH on the conformational preferences of polypeptides. J.
Mol. Biol. 264, 770–783 (1996)
76. Vila, J.A., Ripoll, D.R., Arnaturova, Y.A., Vorobjev, Y.N., Scheraga, H.A.: Coupling between
conformation and proton binding in proteins. Proteins 61, 56–68 (2005)
77. Sitkoff, D., Sharp, K.A., Honig, B.: Accurate calculation of hydration free energies using
macroscopic solvent models. J. Phys. Chem. 98, 1978–1988 (1994)
78. Barth, P., Alber, T., Harbury, P.B.: Accurate, conformation-dependent predictions of solvent
effects on protein ionization constants. Proc. Natl. Acad. Sci. U. S.A. 104, 4898–4903 (2007)
79. Hass, M.A.S., Hansen, D.F., Christensen, H.E.M., Led, J.J., Kay, L.E.: Characterization of
conformational exchange of a histidine side chain: protonation, rotamerization, and tautomer-
ization of His61 plastocyanin from Anabaena variabilis. J. Am. Chem. Soc. 130, 8460–8470
(2008)
696 J. A. Vila and Y. A. Arnautova
80. Serrano, P., Johnson, M.A., Chatterjee, A., Neuman, B., Joseph, J.S., Buchmeier, M.J., Kuhn,
P., Wüthrich, K.: NMR structure of the nucleic acid-binding domain of the SARS coronavirus
nonstructural protein 3. J. Virol. 83, 12998–13008 (2009)
81. Schwarzinger, S., Kroon, G.J.A., Foss, T.R., Chung, J., Wright, P.E., Dyson, H.J.: Sequence-
dependent correction of random coil NMR chemical shifts. J. Am. Chem. Soc. 123, 2970–2978
(2001)
82. Wang, Y., Jardetzky, O.: Investigation of the neighboring residue effects on protein chemical
shifts. J. Am. Chem. Soc. 12, 14075–14084 (2002)
83. Vijay-Kumar, S., Bugg, C.E., Cook, W.J.: Structure of ubiquitin refined at 1.8 Å resolution.
J. Mol. Biol. 194, 531–544 (1987)
84. Quirt, A.R., Lyerla Jr., J.R., Peat, I.R., Cohen, J.S.: Reynolds WF and freedman MH Carbon-
13 nuclear magnetic resonance titration shifts in amino acids. J. Am. Chem. Soc. 96, 570–574
(1974)
85. Rabenstein, D.L., Sayer, T.L.: Carbon-13 shifts parameters for amines, carboxylic acids and
amino acids. J. Magn. Res. 24, 27–39 (1976)
86. Sayer, T.L., Rabenstein, D.L.: Nuclear magnetic resonance studies of the acid-base chemistry
of amino acids and peptides. III Determination of the microscopic and macroscopic acid
dissociation constants of α, ω-diaminocarboxylic acids Can. J. Chem. 54, 3392–3400 (1976)
87. Surprenant, H.L., Sarneski, J.E., Key, R.R., Byrd, J.T., Reilley, C.N.: Carbon-13 studies of
amino acids: chemical shifts, protonation shifts, microscopic protonation behavior. J. Magn.
Res. 40, 231–243 (1980)
88. Lindorff-Larsen, K., Best, R.B., Depristo, M.A., Dobson, C.M., Vendruscolo, M.: Simulta-
neous determination of protein structure and dynamics. Nature 433, 128–132 (2005)
89. Chakrabarti, P., Pal, D.: Main-chain conformational features at different conformations of the
side-chains in proteins. Protein Eng. 11, 631–647 (1998)
90. Dumbrack Jr., R.L., Karplus, M.: Conformational analysis of the backbone-dependent rotamer
preferences of protein sidechains. J. Mol. Biol. 230, 543–574 (1993)
91. Chothia, C., Levitt, M., Richardson, D.: Structure of proteins: packing of α-helices and β-
sheets. Proc. Natl. Acad. Sci. U. S. A. 74, 4130–4134 (1977)
92. Chou, K.-C., Pottle, M., Némethy, G., Ueda, Y., Scheraga, H.A.: Structure of β sheets. Origin
of the right handed twist and of the increased stability of antiparallel over parallel sheets. J.
Mol. Biol. 162, 89–112 (1982)
93. Chou, K.-C., Scheraga, H.A.: Origin of the right handed twist of β sheets of poly(L Val)
chains. Proc. Natl. Acad. Sci. USA 79, 7047–7051 (1982)
94. Creighton, T.E.: Proteins: Structure and Molecular Properties, pp. 186, 223. W.E. Freeman
and Company, New York (1984)
95. Karplus, M.: Contact electron-spin coupling of nuclear magnetic moments. J. Chem. Phys.
30, 11–15 (1959)
96. Mandel, M.: Proton Magnetic resonance spectra of some proteins: I. Ribonuclease, oxidized
ribonuclease, lysozyme, and cytochrome c. J. Biol Chem. 240, 1586–1592 (1965)
97. Bradbury, J.H., Scheraga, H.A.: Structural studies of ribonuclease. XXIV. The application
of nuclear magnetic resonance spectroscopy to distinguish between the histidine residues of
ribonuclease. J. Am. Chem. Soc. 88, 4240–4246 (1966)
98. Bachovchin, W.W.: 15 N NMR spectroscopy of hydrogen-bonding interactions in the active
site of serine proteases: evidence for a moving histidine mechanism. Biochemistry 25,
7751–7759 (1986)
99. Cheng, F., Sun, H., Zhang, Y., Mukkamala, D., Oldfield, E.: A solid state 13C NMR, crystal-
lographic, and quantum chemical investigation of chemical shifts and hydrogen bonding in
histidine dipeptides. J. Am. Chem. Soc. 127, 12544–12554 (2005)
100. Farr-Jones, S., Wong, W.Y.L., Gutheil, W.G., Bachovchin, W.W.: Direct observation of the tau-
tomeric forms of histidine in 15 N NMR spectra at low temperatures. Comments on intramolec-
ular hydrogen bonding on tautomeric equilibrium. J. Am. Chem. Soc. 115, 6813–6819 (1993)
101. Harbison, G., Herzfeld, J.: Griffin RGJ Nitrogen-15 chemical shifts tensors in L-histidine
hydrochloride monohydrate. J. Am. Chem. Soc. 103, 4752–4754 (1981)
13 C Chemical Shifts in Proteins: A Rich … 697
102. Hass, M.A.S., Yilmaz, A., Christensen, H.E.M., Led, J.J.: Histidine side-chain dynamics
and protonation monitored by 13C CPMG NMR relaxation dispersion. J. Biomol. NMR 44,
225–233 (2009)
103. Hu, F., Wenbin, L., Hong, M.: Mechanism of proton conduction and gating in influenza M2
proton channels from solid-state NMR. Science 330, 505–508 (2010)
104. Jensen, M.R., Has, M.A.S., Hansen, D.F., Led, J.J.: Investigating metal-binding in proteins
by nuclear magnetic resonance. Cell. Mol. Life Sci. 64, 1085–1104 (2007)
105. Markley, J.L.: Observation of histidine residues in proteins by means of nuclear magnetic
resonance spectroscopy. Acc. Chem. Res. 8, 70–80 (1974)
106. Meadows, D.H., Jardetzky, O., Epand, R.M., Ruterjans, H.H., Scheraga, H.A.: Proc. Natl.
Acad. Sci. U.S.A. 60, 766–772 (1968)
107. Pelton, J.G., Torchia, D.A., Meadow, N.D., Roseman, S.: Tautomeric states of the active-site
histidine of phosphorylated and unphosphorylated IIIGlc, a signal-transducing protein from
Escherichia coli, using two-dimensional heteronuclear NMR techniques ProtSci 2, 543–558
(1993)
108. Reynolds, W.F., Peat, I.R., Freedman, M.H., LyerlaJr, J.R.: Determination of the tautomeric
form of the imidazole ring of L-Histidine in basic solution by carbon-13 magnetic resonance
spectroscopy. J. Am. Chem. Soc. 95, 328–331 (1973)
109. Schuster, I.I., Roberts, J.D.: Nitrogen-15 nuclear magnetic resonance spectroscopy. Effects of
hydrogen bonding and protonation on nitrogen chemical shifts in imidazoles. J. Org. Chem.
44, 3864–3867 (1979)
110. Shimba, N., Serber, Z., Lewidge, R., Miller, S.M., Craik, C.S., Dotsch, V.: Quantitative iden-
tification of the protonation state of histidine in vitro and in vivo. Biochem 42, 9227–9234
(2003)
111. Shimba, N., Takahashi, H., Sakakura, M., Fuji, I., Shimada, I.: Determination of protonation
and deprotonation forms and tautomeric states of histidine residues in large proteins using
nitrogen-carbon J couplings in imidazole ring. J. Am. Chem. Soc. 120, 10988–10989 (1998)
112. Steiner, T.: L-Histidyl-L-alanine dehydrate. Acta. Cryst. C 52, 2554–2556 (1996)
113. Steiner, T., Koellner, G.: Coexistence of both histidines tautomers in the solid state and sta-
bilization of the unfavorable Nδ-H form by intramolecular hydrogen bonding: rystalline L-
His-Gly hemihydrates. Chem. Commun. 13, 1207–1208 (1997)
114. Strohmeier, M., Stueber, D., Grant, D.M.: Accurate 13C and 15 N chemical shift and 14 N
quadrupolar coupling constant calculations in amino acid crystals: Zwitterionic, hydrogen-
bonded systems. J. Phys. Chem. A 107, 7629–7642 (2003)
115. Sudmeier, J.L., Bradshaw, E.M., Coffman Haddad, K.E., Day, R.M., Thalhauser, C.J., Bullock,
P.A., Bachovchin, W.W.: Identification of histidine tautomers in proteins by 2D 1H/13Cδ2
one-bond correlated NMR. J. Am. Chem. Soc. 125, 8430–8431 (2003)
116. Wüthrich, K.: NMR in Biological Research: Peptides and Proteins. North-Holland, Amster-
dam (1976)
117. Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J., Livny, M.,
Mading, S., Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Wenger, R.K.,
Yao, H., Markley, J.L.: BioMagResBank nucleic. Acids Res. 36, D402–D408 (2008)
118. Demchuk, E., Wade, R.C.: Improving the continuum dielectric approach to calculating pKas
of ionizeable groups in proteins. J. Phys. Chem. 100, 17373–17387 (1996)
119. DePristo, M.A., de Bakker, P.I.W., Blundell, T.L.: Heterogeneity and inaccuracy in protein
structures solved by X-ray crystallography. Structure 12, 831–838 (2004)
120. Ringe, D., Petsko, G.A.: Study of protein dynamics by X-ray diffraction Methods in Emzy-
mology 131, 389–433 (1986)
121. Furnham, N., Blundell, T.L., DePristo, M.A., Terwilliger, T.C.: Is one solution good enough?
Nature Struct. Mol. Biol. 13, 184–185 (2006)
122. Wang, Y., Jardetzky, O.: Probability-based protein secondary structure identification using
combined NMR chemical-shift data. Prot Sci 11, 852–861 (2002)
123. Höfinger, S., Almeida, B., Hansmann, U.H.E.: Parallel tempering molecular dynamics folding
simulation of a signal peptide in explicit water. Proteins 68, 662–669 (2007)
698 J. A. Vila and Y. A. Arnautova
124. Jang, S., Kim, E., Pak, Y.: Free energy surfaces of miniproteins with a beta beta alpha motif:
replica exchange molecular dynamics simulation with an implicit solvation model. Proteins
62, 663–671 (2006)
125. Mohanty, S., Hansmann, U.H.E.: Folding of proteins with diverse folds. Biophy. J. 91,
3573–3578 (2006)
126. Zhou, R.: Free energy landscape of protein folding in water: Explicit versus implicit solvent.
Proteins 53, 148–161 (2003)
127. Santiveri, C.M., Santoro, J., Rico, M., Jiménez, M.A.: Factors involved in the stability of
isolated beta-sheets: turn sequence, beta-sheet twisting, and hydrophobic surface burial. Prot.
Sci. 13, 1134–1147 (2004)
128. Zhao, D., Jardetzky, O.: An assessment of the precision and accuracy of protein structures
determined by NMR–dependence on distance errors. J. Mol. Biol. 239, 601–607 (1994)
129. Korzhnev, D.M., Orekhov, V.Y., Arseniev, A.S.: Model-free approach beyond the borders of
its applicability. J. Mag. Res. 127, 184–191 (1997)
130. Palmer III, A.G.: NMR characterization of the dynamics of biomacromolecules. Chem. Rev.
104, 3623–3640 (2004)
131. Case, D.A., Darden, T.A., Cheatham, T.E., III, Simmerling, C.L., Wang, J., Duke, R.E., Luo,
R., Merz, K.M., Wang, B., Pearlman, D.A., et al.: AMBER 8 University of California, San
Francisco (2004)
132. Zhou, Y., Vitkup, D., Karplus, M.: Native proteins are surface-molten solids: Application of
the Lindemann criterion for the solid versus liquid state. J. Mol. Biol. 285, 1371–1375 (1999)
133. Kuzin, A.P., Su M., Seetharaman, J., Janjua, H., Cunningham, K., Maglaqui, M., Owens,
L.A., Zhao, L., Xiao, R., Baran, M.C., Acton, T.B., Rost, B., Montelione, G.T., Hunt, J.F.,
Tong, L.: Crystal structure of UPF0291 protein ynzC from Bacillus subtilis at resolution 2.0
A. (2008) Northeast Structural Genomics Consortium target SR384. https://doi.org/10.2210/
pdb3bhp/pdb
134. Kawai, Y., Moriya, S., Ogasawara, N.: Identification of a protein YneA, responsible for
cell division suppression during the SOS response in Bacillus subtilis. Mol. Microbiol. 47,
1113–1122 (2003)
135. Aramini, J.M., Sharma, S., Huang, Y.J., Swapna, G.V.T., Ho, C.K., Shetty, K., Cunningham,
K., Ma, L.-C., Zhao, L., Owens, L.A., Jiang, M., Xiao, R., Liu, J., Baran, M.C., Acton, T.B.,
Rost, B., Montelione, G.T.: Solution NMR structure of the SOS response protein YnzC from
Bacillus subtilis Proteins: Structure. Funct. Bioinformatics 72, 526–530 (2008)
136. Vila, J. A., Baldoni, H. A., Scheraga, H. A.: performance of density functional models to
reproduce observed 13Cα chemical shifts of proteins in solution. J. Comp. Chem. 38, 884–892
(2008b)
137. Sippl, M.J.: Recognition of errors in three-dimensional structures of proteins. Proteins 17,
355–362 (1993)
138. Kleywegt, G.J.: On vital aid: the why, what and how of validation Acta. Cryst, D 65, 134–139
(2009)
139. Sevcik, J., Dauter, Z., Lamzin, V.S., Wilson, K.S.: Ribonuclease from streptomyces aureofa-
ciens at atomic resolution. Acta Cryst D D52, 327–344 (1996)
Protein Secondary Structure
Assignments and Their Usefulness
for Dihedral Angle Prediction
E. Faraggi
Department of Physics, Indiana University Purdue University Indianapolis,
Indianapolis, IN 46202, USA
E. Faraggi
Department of Physics, Butler University, Indianapolis, IN 46208, USA
E. Faraggi · A. Kloczkowski
Battelle Center for Mathematical Medicine,
The Research Institute at Nationwide Children’s Hospital,
Columbus, OH 43215, USA
E. Faraggi (B)
Physics Division, Research and Information Systems,
LLC, Indianapolis, IN 46240, USA
e-mail: [email protected]
A. Kloczkowski
Department of Pediatrics, The Ohio State University, Columbus, OH 43215, USA
e-mail: [email protected]
1 Introduction
Proteins are the most important part of the biological machinery that carries out the
instructions contained in genetic code. As such, proteins are responsible on some
level for all biological functions. For example charge regulation, tissue building and
repair, and cellular transport are all achieved by the use of proteins. Proteins preform
their functional duties by various interactions associated with their three-dimensional
structures. Their three-dimensional structures are thought to be determined by their
amino acid sequence which in turn is specifically encoded in the genetic material of
the hosting organism.
While the genetic code, and the amino acid sequences of proteins, can be obtained
using automated experimental whole-genome sequencing and whole exome sequenc-
ing procedures relatively cheaply, currently the three-dimensional structures of pro-
teins can only be experimentally determined using labor-intensive and costly pro-
cedures. This creates a widening gap between the number of proteins for which
the sequence is known and the number of proteins for which the three-dimensional
structure is solved. Furthermore, for some proteins it is difficult to obtain experimen-
tal structure either because isolating the protein or applying standard experimental
procedures such as solvation and crystallization is difficult.
These considerations lead to extensive interest and activity in the field of protein
structure prediction. Protein structures are typically categorized into four levels of
increasing structural information. The first level in this categorization, sometimes
called the primary structure, is the amino acid sequence of the protein. This is the
chemical structure of the protein. The second level is the so called secondary struc-
ture which involves the patterns of hydrogen bonds corresponding to α-helices or
β-sheets along the amino acid sequence. These patterns are manifested as local
three dimensional structures. The third level of structural information, the tertiary
structure, is associated with the packing of secondary structure elements into single
domain protein structure. In theory tertiary structure can be deduced from the dihe-
dral angles. In practice however, small errors in the coil regions create large errors
for the overall arrangement of secondary structure elements and additional local
refinement is needed. The forth level of structural classification is called the quater-
nary structure and is associated with packing of the tertiary structure of individual
protein chains into functioning biological multimeric assemblies of several chains.
This hierarchy enables proteins to bridge the size gap between individual atoms and
biological components.
Prediction of protein structure usually exploits this hierarchy as well. Secondary
structure predictions [1–26] are used to set initial conditions and act as constraints
in three-dimensional prediction schemes [27–34]. Recently it was shown that substi-
tution of dihedral angles for secondary structure constraints in template-free model-
ing of three-dimensional structure results in a 100% improvement in the prediction
accuracy (twice the number of structures predicted to within 6 Å of native structure).
Faraggi et al. [35] Part of that approach of predicting dihedral angles uses secondary
structures predictions as input features. The present work shows how these secondary
Protein Secondary Structure Assignments … 701
structures were obtained and analyzes their usefulness relative to other assignment
schemes as input features for the prediction of the dihedral angle ψ.
Additionally to the structured proteins there are proteins that are partially disor-
dered (containing both ordered and disordered regions) or fully disordered. These
types of proteins are not considered here.
(a) 1 Sheet
0.9 Coil
0.8 Helix
0.7
Frequency
0.6
0.5
0.4
0.3
0.2
0.1
0
-200 -150 -100 -50 0 50 100 150 200
ψ
(b)
(c) 140
120
100
80
θ
60
40
20
0
-200 -150 -100 -50 0 50 100 150 200
ψ
Fig. 1 a Distribution of dihedral angle ψ for all three secondary structures. b Example of a defor-
mation in the three-dimensional structure associated with a beta-sheet, the dihedral angle ψ there
would be in the range of values covered by alpha-helix structures. c The distribution of the curvature
angle (θ) along the backbone
Protein Secondary Structure Assignments … 703
Another approach we can take is to reassign the secondary structure for residues
with odd angles to the opposite structure classes. That is, reassign sheet residues
with odd angles into the helix class and reassign helix residues with odd angles
into the sheet class. Since for the distribution of ψ the locations of the odd angles
approximately correspond to the opposite structure class, this modification may help
in the prediction of quantities such as dihedral angles where predicted secondary
structures are used as input. As we shall see later, these reassignments actually do
not change the accuracy of secondary structure prediction.
The following nomenclature will be used to designate the types of secondary
structure assignment. A1 will be used to designate the original assignment by DSSP.
A2 and A5 will be used to designate the modified assignment where helix and sheet
are interchanged for residues with odd angles. A3 and A4 will be used to designate
reassignment into coil those residues with odd angles. For A2 and A4 we perform the
modification on the DSSP assignments, while for A3 and A5 we perform odd angle
reassignments on the consensus secondary structure as discussed by Wei et al. [38].
We will use A6 to designate the easy classification of the DSSP assignment as
discussed above.
We have used several types of neural networks to analyze this problem. In the first part
of the analysis we used a neural network to predict the secondary structure assignment
according to the classifications described above. One should note that at this step
we are interested in a comparative analysis between the different methods and not in
the overall accuracy. Results for the overall accuracy of the SPINE-X server will be
given later. The general form of the neural networks used to predict the secondary
structure were described in detail earlier [35, 40], here we give only a brief overview
of them. The input layer for the network is composed of a n-residue input window
where each residue has the following descriptors: Twenty values from the position
specific scoring matrix as obtained from the PSI-BLAST program [41] normalized
by 9.0. Seven physical parameters describing the physicochemical properties of the
residue: a steric parameter(graph shape index), hydrophobicity, volume, polarizabil-
ity, isoelectric point, helix probability, and sheet probability. These parameters were
identified by Meiler et al. [42] and have been proved useful in protein structure pre-
diction [18, 35, 40, 43, 44]. They were linearly normalized such that their values vary
between -1 and 1. We also constructed our own mutation profile by taking aligned
sequences from the PSI-BLAST NR dataset with bit values between 20 and 60.
Included in this profile were also probabilities of gaps at particular residues. Finally
we calculated a sequence complexity parameter [45] for window sizes of 5, 11, 21,
31 around a given residue. Thus, for a given residue we have a 21 × 52 = 1092 input
features. Three state secondary structure probabilities were predicted for the central
residue in a 21 residue input window.
704 E. Faraggi and A. Kloczkowski
Two hidden layers, each with 71 nodes, were used for a preprocessing network.
Probability predictions from this network were further refined using a filter network
with a single hidden layer of 51 nodes. In addition, guiding weights were used
to control the dependence on sequence separation as described in Refs. [35, 40].
Training and testing for all neural networks considered here was preformed on the
SPINE dataset [18].
Predicted secondary structure assignments are then used as input for ψ angle
prediction. We chose the ψ angle since its variation is a good discriminator between
the helix and sheet configurations. In general the networks used to predict ψ angles
were composed of two hidden layers, each with 51 nodes. The input to the neural
network varies as we wish to study the dependence on the various assignments, and
how these assignments complement other input features such as physical parameters
and Position Specific Scoring Matrix (PSSM).
We would like to study the exact effect of the secondary structure reassignment on
the prediction accuracy for angle prediction. In the first case we will use only the three
predicted probabilities for the secondary structure assignment for a single residue
window along the chain. However, additional information is probably achievable by
the introduction of a bigger window size. Information about residues in harder to
predict odd angle regions can also be contained in the probabilities of the secondary
structures. Hence, we shall also study the accuracies for a window size of 21 residues.
Ten fold cross validation will be used to judge the accuracy of the predictions.
Vacant positions in the windows around residues near the terminals of the protein
chain are explicitly excluded from the computation by limiting the range of the
input window. We use a bipolar activation function given by f (x) = tanh(αx), with
α = 0.2, momentum of 0.4, and the back-propagation method with relatively slow
learning (learning rate 0.001) to optimize the weights. To determine the quality
of the prediction we use the Mean Absolute Error (MAE) in degrees, Pearson’s
correlation coefficient ( pc ), and the probability that the predicted and native angles
are separated by less than 10% (Q 10 p ). We use 10-fold cross-validations [35] to
estimate the accuracy over the set. To test for possible overfit issues we take secondary
structure predictions based on the weights trained for the first cross-validation fold
and compare angle prediction between the folds.
3 Results
We start with the accuracy of predicting the secondary structure assignments. Table 1
gives the accuracy for the six different classifications A1 through A6. For each
classification type we give the overall accuracy and the accuracy spliced according
to native and predicted secondary structure types. For a given secondary structure
type we also show its density. We first notice that the overall accuracy is the same
for A1 and A2. As we shall see later A2 is still useful for improving the dihedral
Protein Secondary Structure Assignments … 705
Table 1 Fraction of correctly predicted secondary structure residues and densities of structure
types
A1 ρ1a A2 ρ2b A3 ρ3c A4 ρ4d A5 ρ5e A6 ρ6f
All 0.800 0.800 0.812 0.803 0.809 0.821
Sheetn 0.729 0.232 0.723 0.222 0.736 0.236 0.719 0.220 0.734 0.239 0.734 0.221
Coiln 0.800 0.388 0.810 0.394 0.806 0.371 0.810 0.403 0.806 0.364 0.838 0.439
Helixn 0.842 0.380 0.834 0.385 0.863 0.393 0.846 0.377 0.856 0.398 0.857 0.340
Sheet p 0.788 0.215 0.779 0.206 0.794 0.218 0.785 0.202 0.785 0.223 0.795 0.204
Coil p 0.746 0.416 0.751 0.424 0.749 0.399 0.757 0.431 0.745 0.393 0.792 0.464
Helix p 0.867 0.369 0.867 0.370 0.888 0.383 0.868 0.367 0.888 0.384 0.878 0.332
Ten fold cross validated prediction accuracy for three state secondary structure as a function of
the native and predicted secondary structure types. Also given is the density, ρ, of the specific
secondary structure type. a A1: original DSSP assignment. b A2: switch between helix and sheet
for residues with special dihedral ψ angle and switch to coil assignment for special φ angles. c A3:
First apply consensus assignment [38], then shift all special dihedral angles to coil. d A4: Switch
all residues with special angles to coil assignment. e A5: Apply consensus assignment, then switch
helix/sheet assignment for residues with special ψ angles. f A6: Easy assignment from original
DSSP classification, with only alpha-helix assigned as helix, extended sheet as sheet, and all others
assigned as coil n Native. p Predicted
angle prediction. We also note that the accuracy is noticeably better for the consensus
assignments A3 and A5. Overall the relationship between the predicted and native
densities is similar between the three assignment methods. It is interesting to note
also that the improvement in prediction accuracy for method A3 comes mostly from
improved identification of structured segments: helix then sheet, and lastly coil. For
the A6 method we find that the secondary structure prediction accuracy is improved.
This is a known effect and is due to the relative easiness of this assignment scheme.
As we shall see later this improved accuracy does not translate into in an improved
accuracy for the prediction of derived quantities.
Overall it seems that for prediction accuracy of secondary structure assignments
A3 works best. However, we should consider the effects these classifications and
their prediction have on prediction of derived quantities. Here we shall analyze what
happens to the prediction of the dihedral angle ψ as we use the different classifications
mentioned above.
First we wish to consider as much as possible the local reassignment by taking
a window of a single residue and using only the predicted secondary structures as
inputs. In Table 2 we give the prediction accuracy of ψ in terms of Q10 p, MAE,
and pc . It seems that on the single residue level A2 gives an improved prediction for
ψ, even though A2, unlike the other assignment methods, gives no improvement for
secondary structure prediction. We see that overall reassignment remain relatively
constant with A2 slightly ahead of the rest. In terms of specific secondary structure
states, we see that all reassignment methods produce better results for the native sheet
and coil conformations, a challenging task that is important for tertiary structure
prediction [35].
706 E. Faraggi and A. Kloczkowski
Table 2 Prediction accuracy for the ψ dihedral angle using predicted secondary structure three
state assignment vector for the central residue (input window of one residue)
A1 A2 A3 A4 A5
All (70.1, 49.0, (70.4, 48.5, (70.5, 48.3, (69.7, 49.8, (70.3, 48.5,
0.637) 0.640) 0.643) 0.632) 0.638)
Sheetn (86.5, 25.7, (87.0, 24.9, (86.7, 25.5, (86.5, 25.7, (86.4, 25.9,
0.019) 0.163) 0.006) 0.013) 0.053)
Coiln (45.5, 80.4, (45.4, 80.7, (46.0, 79.0, (45.4, 80.7, (45.8, 79.3,
0.144) 0.141) 0.168) 0.139) 0.165)
Helixn (85.1, 31.2, (85.7, 30.1, (85.5, 30.8, (84.2, 32.8, (85.6, 30.8,
0.310) 0.317) 0.312) 0.328) 0.309)
Sheet p (83.2, 31.1, (85.7, 27.8, (85.0, 28.5, (85.1, 28.8, (83.6, 30.6,
0.005) 0.005) 0.008) −0.002) −0.001)
Coil p (43.9, 86.2, (44.1, 85.8, (43.6, 86.8, (43.9, 86.5, (43.7, 86.6,
−0.002) 0.001) 0.001) 0.002) 0.006)
Helix p (91.1, 18.6, (91.3, 18.3, (90.4, 19.2, (91.3, 18.4, (89.8, 20.1,
−0.010) −0.009) −0.006) −0.005) −0.003)
Ten fold cross validated prediction accuracies for the ψ dihedral angle as a function of the native
and predicted secondary structure types. For each assignment and secondary structure type we give
a three vector of the form (Q10 p, M AE, corr elation). The assignments A1 through A5 are as
given in Table 1. n Native. p Predicted
Table 3 Prediction accuracy for the ψ dihedral angle using predicted secondary structure three
state probability vector for a 21 residues window
A1 A2 A3 A4 A5
All (74.1, 42.1, (74.5, 41.8, (74.6, 41.8, (74.4, 42.1, (74.6, 41.9,
0.651) 0.653) 0.651) 0.645) 0.645)
Sheetn (84.2, 29.5, (85.1, 28.6, (85.1, 28.8, (84.4, 30.1, (84.4, 30.1,
0.031) 0.149) 0.105) 0.064) 0.091)
Coiln (50.3, 72.7, (50.5, 72.7, (51.6, 71.5, (51.1, 72.2, (51.8, 71.3,
0.256) 0.250) 0.257) 0.254) 0.261)
Helixn (92.3, 18.4, (92.4, 18.2, (91.7, 19.3, (92.2, 18.5, (91.8, 19.1,
0.250) 0.263) 0.276) 0.258) 0.254)
Sheet p (83.4, 30.8, (85.9, 27.5, (85.2, 28.3, (85.3, 28.6, (84.0, 30.2,
0.107) 0.059) 0.043) 0.053) 0.095)
Coil p (53.0, 70.7, (52.9, 71.1, (52.4, 72.5, (54.2, 69.7, (52.8, 71.8,
0.285) 0.272) 0.252) 0.287) 0.257)
Helix p (91.9, 17.3, (92.2, 17.1, (91.9, 17.3, (92.1, 17.1, (91.3, 18.2,
0.243) 0.236) 0.271) 0.231) 0.260)
Ten fold cross validated prediction accuracies for the ψ dihedral angle as a function of the native
and predicted secondary structure types. For each assignment and secondary structure type we give
a three vector of the form (Q10 p, M AE, corr elation). The assignments A1 through A5 are as
given in Table 1. n Native. p Predicted
Table 4 Prediction accuracy for the ψ dihedral angle using PSSM, physical parameters and pre-
dicted secondary structure probabilities for a 21 residue window using the indicated assignment
method A1 through A6
A1 A2 A3 A4 A5 A6
All (78.5, 36.7, (78.8, 36.4, (78.6, 36.7, (78.4, 36.9, (78.3, 37.3, (78.1, 37.4,
0.698) 0.701) 0.697) 0.695) 0.693) 0.693)
Sheetn (86.7, 27.2, (87.0, 26.8, (86.7, 27.4, (86.6, 27.5, (86.5, 27.7, (87.0, 26.8,
0.170) 0.207) 0.188) 0.190) 0.178) 0.176)
Coiln (60.5, 59.6, (60.6, 59.6, (60.8, 59.4, (60.4, 59.8, (60.6, 59.6, (60.3, 60.0,
0.393) 0.393) 0.395) 0.390) 0.392) 0.387)
Helixn (91.9, 19.3, (92.2, 18.5, (91.7, 19.4, (91.7, 19.4, (91.4, 20.2, (90.9, 20.9,
0.247) 0.253) 0.257) 0.253) 0.252) 0.257)
Sheet p (85.4, 29.0, (87.1, 26.5, (86.4, 27.2, (86.6, 27.4, (85.4, 28.8, (86.2, 28.1,
0.248) 0.175) 0.145) 0.201) 0.198) 0.252)
Coil p (62.4, 58.9, (62.4, 58.9, (61.3, 60.5, (62.6, 58.7, (61.3, 60.7, (63.1, 58.3,
0.401) 0.400) 0.384) 0.406) 0.381) 0.424)
Helix p (92.1, 17.1, (92.4, 16.9, (92.1, 17.3, (92.2, 17.1, (91.5, 18.2, (94.2, 13.9,
0.292) 0.302) 0.291) 0.273) 0.308) 0.239)
Ten fold cross validated prediction accuracies for the ψ dihedral angle as a function of the native
and predicted secondary structure types. For each assignment and secondary structure type we give
a three vector of the form (Q10 p, M AE, corr elation). The assignments A1 through A6 are as
given in Table 1. n Native. p Predicted
708 E. Faraggi and A. Kloczkowski
0.209
0.208
0.207
0.206
MAE 0.205
0.204
0.203
0.202
0.201
0.2
0.199
1 2 3 4 5 6 7 8 9 10
Fold
Fig. 2 Mean absolute error of the dihedral angle ψ as a function of the different folds in a ten-fold
cross-validation. In this case the position specific scoring matrix, physical parameters, and predicted
secondary structure probabilities are used as inputs. Predicted secondary structure probabilities are
taken from the weights tested on the first fold. Hence, depending on the amount of over-training for
the secondary structure, the first fold would be at a disadvantage and would generally have a higher
mean absolute error. As is seen from the plot in this case the amount of over-training is small as the
ψ accuracy for the first fold is comparable to the accuracies from the other folds
For the assignment method A6, which produced a significantly better secondary
structure prediction, we find that the accuracy of the ψ angle predicted from these val-
ues is significantly worse than the rest of the assignment methods. Both the improved
secondary structure prediction and the diminished accuracy of the ψ angle prediction
results from the relatively coarser discrimination between secondary structure states.
4 Discussion
In general, one should not be surprised that the effect of reassignment over the entire
dataset is small. The reason is that only a small set of residues had its secondary
structure reassigned. It is instructive to look at the effect of a modification specifically
for those residues that are modified.
In Table 5, we consider specifically those residues for which the DSSP assign-
ment was modified. For modifications by method A3, there are 7% of residues with
reassigned secondary structure out of the total database of 583,935 residues. The
accuracy for secondary structure prediction for these reassigned residues is 49% for
the A1 assignment and 65% for the A3 assignment. This improved secondary struc-
ture prediction is translated to better dihedral angle prediction. The MAE is reduced
from 51.6◦ to 50.9◦ , while the correlation increases from 0.545 to 0.551. In terms of
peak prediction [35] the accuracy is improved from 78.5 to 79.3% correctly predicted
peaks.
Protein Secondary Structure Assignments … 709
Out of the entire database there are 1.5% of residues for which their secondary
structure assignment was modified by the A2 method. The accuracy for secondary
structure prediction for this set of residues is 46% using method A1, 44% using
method A2, and 67% using method A3. Overall the secondary structure prediction
accuracy resulting from method A2 was reduced for this case. A plausible reason is
that reassigned residues occur within larger blocks of secondary structure types. Reas-
signing their secondary structure types creates isolated secondary structure regions
which are harder to predict. It is interesting to note that over the set of residues
reassigned by method A3 the secondary structure accuracy from A2 is 51%, better
than that of method A1. While not significantly improving the accuracy of secondary
structure prediction, method A2 produced the most significant effect in terms of the
accuracy of ψ prediction. For the set modified by method A2, the MAE for ψ using
method A1 is 99.5◦ and the correlation is 0.194. Using method A3 the dihedral angle
prediction accuracy is improved to an MAE of 96.7◦ and correlation of 0.211. Using
method A2 the accuracy is further improved with an MAE of 94.0◦ and a correla-
tion of 0.232. In terms of peak prediction the improvement is from 50.4 to 54.6%.
The hypothesis of equal distribution of error can be rejected at greater than a 99%
confidence level according to a t-test calculation. It is also interesting to note that the
accuracy of method A2 over the set of residues modified by method A3 is also sig-
nificantly better with an MAE of 50.3◦ and a correlation of 0.558. Note that judging
by these accuracy parameters it is evident that both the set of residues modified by
A2 and those modified by A3 are difficult to predict.
5 Conclusions
References
1. Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the accuracy and implications of simple
methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120(1), 97–
120 (1978)
2. Gibrat, J.-F., Garnier, J., Robson, B.: Further developments of protein secondary structure
prediction using information theory: new parameters and consideration of residue pairs. J.
Mol. Biol. 198(3), 425–443 (1987)
3. Howard, L.: Holley and Martin Karplus. Protein secondary structure prediction with a neural
network. Proc. National Acad. Sci. 86(1), 152–156 (1989)
4. Kneller, D.G., Cohen, F.E., Langridge, R.: Improvements in protein secondary structure pre-
diction by an enhanced neural network. J. Mol. Biol. 214(1), 171–182 (1990)
5. Sikorski, A.: Prediction of protein secondary structure by neural networks: Encoding short and
long range patterns of amino acid packing. Acta. Biochim. Pol., 39(4), (1992)
6. Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J.
Mol. Biol. 232(2), 584–599 (1993)
7. Rost, B., Sander, C., Schneider, R.: Phd-an automatic mail server for protein secondary structure
prediction. Comput. Appl. Biosci.: CABIOS 10(1), 53–60 (1994)
8. Garnier, J., Gibrat, J.-F., Robson, B.: Gor method for predicting protein secondary structure
from amino acid sequence. Methods Enzymol. 266, 540 (1996)
9. Frishman, D., Argos, P.: Seventy-five percent accuracy in protein secondary structure predic-
tion. Proteins-Struct. Funct. Genet. 27(3), 329–335 (1997)
10. Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., Barton, G.J.: Jpred: a consensus secondary
structure prediction server. Bioinformatics 14(10), 892–893 (1998)
11. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices.
J. Mol. Biol. 292, 195–202 (1999)
12. James, A.: Cuff and Geoffrey J Barton. Application of multiple sequence alignment profiles to
improve protein secondary structure prediction. Proteins: Struct. Funct. Bioinformatics 40(3),
502–511 (2000)
13. Hua, S., Sun, Z.: A novel method of protein secondary structure prediction with high segment
overlap measure: support vector machine approach. J. Mol. Biol. 308(2), 397–408 (2001)
Protein Secondary Structure Assignments … 711
14. Kloczkowski, A., Ting, K.-L., Jernigan, R.L., Garnier, J.: Protein secondary structure prediction
based on the gor algorithm with multiple sequence alignments. Polymer 43, 441–449 (2002)
15. Kloczkowski, A., Ting, K.-L., Jernigan, R.L., Garnier, J.: Combining the gor v algorithm with
evolutionary information for protein secondary structure prediction from amino acid sequence.
Proteins: Struct. Funct. Gen. 49, 154–166 (2002)
16. Kolinski, A.: Protein modeling and structure prediction with a reduced representation. Acta
Biochim. Pol.-English Edition- 51, 349–372 (2004)
17. Cheng, H., Sen, T.Z., Kloczkowski, A., Margaritis, D., Jernigan, R.L.: Prediction of protein
secondary structure by mining fragments database. Polymer 46, 4314–4321 (2005)
18. Dor, O., Zhou, Y.: Achieving 80% ten-fold cross-validated accuracy for secondary structure
prediction by large-scale training. Proteins 66, 838–845 (2007)
19. Homaeian, L., Kurgan, L.A., Ruan, J., Cios, K.J., Chen, K.: Prediction of protein secondary
structure content for the twilight zone sequences. Proteins: Struct. Funct. Bioinformatics 69(3),
486–498 (2007)
20. Kurgan, L., Cios, K., Zhang, H., Zhang, T., Chen, K., Shen, S., Ruan, J.: Sequence-based
methods for real value predictions of protein structure. Curr. Bioinformatics 3(3), 183–196
(2008)
21. Kurgan, L., Cios, K., Chen, K.: Scpred: accurate prediction of protein structural class for
sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics 9(1),
226 (2008)
22. Cole, C., Barber, J.D., Barton, G.J.: The jpred 3 secondary structure prediction server. Nucleic
Acids Res. 36, W197–W201 (2008)
23. Kountouris, P., Hirst, J.D.: Prediction of backbone dihedral angles and protein secondary struc-
ture using support vector machines. BMC Bioinformatics 10(1), 437 (2009)
24. Faraggi, E., Zhang, T., Yang, Y., Kurgan, L., Zhou, Y.: Spine X: improving protein secondary
structure prediction by multistep learning coupled with prediction of solvent accessible surface
area and backbone torsion angles. J. Comp. Chem. 33, 259–267 (2012)
25. Sen, T.Z., Jernigan, R.L., Garnier, J., Kloczkowski, A.: Gor v server for protein secondary
structure prediction. Bioinformatics 21(11), 2787–2788 (2005)
26. Kouza, M., Faraggi, E., Kolinski, A., Kloczkowski, A.: The gor method of protein secondary
structure prediction and its application as a protein aggregation prediction tool. In: Prediction
of Protein Secondary Structure, pp. 7–24. Springer (2017 )
27. Rost, B.: TOPITS: threading one-dimensional predictions into three-dimensional structures.
In: Third international conference on intelligent systems for molecular biology, pp. 314–321.
AAAI Press (1995)
28. Rost, B., Sander, C.: Protein fold recognition by prediction-based threading. J. Mol. Biol. 270,
471–480 (1997)
29. Kihara, D., Hui, L., Kolinski, Aj, Skolnick, J.: Touchstone: an ab initio protein structure pre-
diction method that uses threading-based tertiary restraints. Proc. National Acad. Sci. 98(18),
10125–10130 (2001)
30. Przybylski, D., Rost, B.: Improving fold recognition without folds. J. Mol. Biol. 341, 255–269
(2004)
31. Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recog-
nition. Bioinformatics 22, 1456–1463 (2006)
32. Qiu, J., Elber, R.: SSALN: an alignment algorithm using structure-dependent substitution
matrices and gap penalties learned from structurally aligned protein pairs. Proteins 62, 881–
891 (2006)
33. Liu, S., Zhang, C., Liang, S., Zhou, Y.: Fold recognition by concurrent use of solvent accessi-
bility and residue depth. Proteins 68, 636–645 (2007)
34. Blaszczyk, M., Kurcinski, M., Kouza, M., Wieteska, L., Debinski, A., Kolinski, A., Kmiecik,
S.: Modeling of protein-peptide interactions using the cabs-dock web server for binding site
search and flexible docking. Methods 93, 72–83 (2016)
35. Faraggi, E., Yang, Y., Zhang, S., Zhou, Y.: Predicting continuous local structure and the effect of
its substitution for secondary structure in fragment-free protein structure prediction. Structure
17, 1515–1527 (2009)
712 E. Faraggi and A. Kloczkowski
36. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: Pattern recognition of
hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
37. Heinig, M., Frishman, D.: Stride: a web server for secondary structure assignment from known
atomic coordinates of proteins. Nucleic Acids Res. 32(suppl_2), W500–W502 (2004)
38. Zhang, W., Dunker, A.K., Zhou, Y.: Assessing secondary-structure assignment of protein struc-
tures by using pairwise sequence-alignment benchmarks. Proteins 71, 61–67 (2008)
39. Roger, A.: Sayle and E James Milner-White. Rasmol: biomolecular graphics for all. Trends in
biochemical sciences 20(9), 374–376 (1995)
40. Faraggi, E., Xue, B., Zhou, Y.: Improving the prediction accuracy of residue solvent acces-
sibility and real-value backbone torsion angles of proteins by fast guided-learning through a
two-layer neural network. Proteins 74, 857–871 (2009)
41. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.:
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl.
Aci. Res. 25, 3389–3402 (1997)
42. Meiler, J., Muller, M., Zeidler, A., Schmaschke, F.: Generation and evaluation of dimension
reduced amino acid parameter representations by artificial neural networks. J. Mol. Model. 7,
360–369 (2001)
43. Dor, O., Zhou, Y.: Real-SPINE: an integrated system of neural networks for real-value predic-
tion of protein structural properties. Proteins 68, 76–81 (2007)
44. Xue, B., Dor, O., Faraggi, E., Zhou, Y.: Real value prediction of backbone torsion angles.
Proteins 72, 427–433 (2008)
45. Wootton, J.C.: Statistic of local complexity in amino acid sequences and sequence databases.
Comput. Chem. 17, 149–163 (1993)
Part V
Applications of Molecular Quantum
Mechanics
When Water Plays an Active Role
in Electronic Structure. Insights from
First-Principles Molecular Dynamics
Simulations of Biological Systems
1 Introduction
G. La Penna (B)
Institute for Chemistry of Organo-Metallic Compounds,
National Research Council of Italy, via Madonna del Piano 10,
50019 Sesto fiorentino (Firenze), Italy
e-mail: [email protected]
O. Andreussi
Department of Physics, University of North Texas, Denton, TX 76203, USA
e-mail: [email protected]
harvesting of solar energy and its incorporation into chemical compounds rich in
electrons (reductant species), is performed within a protein architecture involving
also lipids, water, many ions, including transition metal ions, and other non-protein
cofactors.
Many atomic details that are missing in the crystal structure (like hydrogen atoms,
disordered water molecules and organic cofactors) can be recovered resorting to the
interatomic interactions represented via empirical equations, including also a signif-
icant portion of water molecules or membrane components that are in the assembly
environment. One of the first computer programs to perform this essential task was
named “Assisted Model Building with Energy Refinement” (AMBER) [3, 4]. But
this approach has a long history, documented in the first chapter of this book and in a
recent contribution [5] by one of the main pioneer in the field, Harold Scheraga. The
visionary approach, to describe biological macromolecules in terms of elementary
microscopic interactions, has been recently recognized by the scientific community
with the Nobel Prize in Chemistry of the year 2013. For a coincise summary of the
pathway from the beginning to our days, with a perspective view, see the comment
to Nobel Prize published by Tamar Schlick [6], while the complete Nobel lectures
can be seen in the Nobel Prize web-site.
The many scales of atomic interactions able to capture most of the driving force
for the folding of the proteins into assemblies like photosystem II, are described in
the other chapters of this book.
Going back to our example (photosystem II), other details emerge if we select
in the structure particular elements, like Mn, Fe, Mg and the cofactors that interact
with these ions (righthand side of Fig. 1). For instance: why Mn and Fe are present
Fig. 1 X-ray structure of dimeric photosystem II in cyanobacteria, PDB 3BZ2 (left) and 3BZ1
(right) [7]. The monomer on the righthand side is represented according to the PDB secondary
structure (cofactors are not represented). The monomer on the lefthand side is represented only in
its cofactor content (no proteins). The content of bound ions is indicated in the closeby regions.
PLM indicates the approximate region occupied by the phospholipid membrane. Volumes of the
different objects are arbitrary. Different colors indicate different macromolecules in the assembly.
This and the following figures are drawn with the VMD package [8]
When Water Plays an Active Role in Electronic Structure … 717
in the assembly? These elements can change the number of electrons in their respec-
tive environments. This occurs in chemical reactions: the electrons move and the
charge distribution changes when molecules are close enough in space. The change
in charge distribution is accompanied by movements of mobile charge carriers, that
in a first approximation do not explicitely change the sharing of the electrons. How-
ever, among these mobile ions, in biological systems the protons (or better the proton
distribution in the water or protein environment) are those more available.
To make the movement possible, centers that can easily change the number of
electrons in their nearby are recruited by proteins. This is the basis of the catalysis
operated by enzymes and of the electron transport operated by some proteins [9].
Photosystem II, for instance, is an assembly where the electron transport within
several metallo-porphyrines is coupled with the catalysis of the water oxidation
on one side (the site containing Mn and Ca, known as oxygen evolving center) and
quinone reduction on the other side (the bundle of α-helices at the rightmost interface
of the assembly with the phospholipid membrane).
The aim of this chapter is to emphasize one of the most advanced tools to under-
stand the role of those details that subtly affect the structure and strongly affect
the chemical reactivity of the system. These details can be hardly represented with
the tools developed for the protein portion, and deserve a special treatment that
will be introduced in this chapter. The method that reached the best compromise
between accuracy and computational affordability is the density-functional theory
(DFT) for the description of the ground-state electronic structure and, therefore, only
this approach will be described in more detail.
The chapter is more a practice, while the theoretical background is well described
in many books and reviews included as selected references. Our aim is to provide
indications on how to build models for affordable DFT calculations, what to check
during the calculations, which results are reliable and which are severely limited by
the models behind the results. At the end of this chapter, the reader should be able to
take part of a large model built on the basis of empirical potentials (possibly a set of
configurations selected according to a given statistical ensemble) and to investigate
the behaviour of the selected small portion. Carefully tailoring the focused region,
allows the investigator to explicitly account for the behaviour of the electrons in the
system, which is key to model chemical reactions.
In terms of modeling, the description of electrons in extended systems like por-
tions of proteins in contact with transition metal ions is a real challenge. This is
because the description of the quantum nature of many electrons interacting with
atomic cores and between themselves require a huge number of variables. In this
respect the development of density functional theory helped this description in a
considerable manner. Nevertheless, the number of degrees of freedom needed for
electrons greatly limit the number of atoms that may be included in a model. A
strategy based on a incremental refinement from coarse-grained empirical models to
such limited samples of atoms including electrons must be devised. The goal of this
chapter is to describe how to perform in practice and in a robust way this refinement.
718 G. La Penna and O. Andreussi
2 A Simple Example
A single transition metal ion interacting with an organic ligand, representing a protein
region, and a few water molecules, representing the solvent, illustrate some of the
features that will be the subject of this chapter. Therefore, we introduce a simple
example showing how ions and water interact in a biological environment: a single
Cu2+ ion bound to a peptide that represent a region in the N-terminus of the human
prion protein.
Let us, for the very beginning, ignore the details of how the atomic forces are mod-
elled. Let us assume we have a hamiltonian describing both atomic cores and valence
electrons. Under the assumption that the valence electrons are in the ground state (i.e.
there is no effect from excited electronic states), forces are uniquely defined and can
be computed by using the Hellman-Feynman theorem (see Ref. [10] and references
therein, for a recent simple formulation of these and related approximations). In a
few words, let us assume we can describe within the classical mechanics the motion
of atomic cores driven by the forces due to the valence electrons deposited around
the cores.
Now, let us assume we have a computer that is able to simulate the motion of these
atomic cores (the core electrons are frozen as in the corresponding isolated atoms).
In practice, we have an initial set of positions for the atomic cores, we assign to each
atomic core a reasonable velocity, and, according to the forces, we move atomic
cores using the Newton’s equations keeping electrons in the ground state. This is the
so-called molecular dynamics (MD) simulation method.
In certain cases, the presence of a single transition metal ion in the proximity of
a peptide allows unusual proton extractions away from the peptide. The process is
the rearrangement of electrons from the peptide to the ion, i.e. the ligand (a Lewis
base) binds to the ion (the Lewis acid). This local rearrangement cooperates with the
transfer of protons from the ligand to the solvent. In other words, a slight reshuffling
of positive holes (metal ions and protons) occurs to reach a stable state for the peptide
in its environment.
In video V1, a single short trajectory representing this process is displayed. The
details will be described later in this chapter, but first we let the reader visualizing
the cooperation of the formation of Cu–N bonds (Cu is the orange sphere, N is in
blue) with the release of the proton (white sphere) initially bound to one N atom
to the water molecules around the peptide. The H3 O+ ion is transiently formed and
then the positive charge is shared with the bulk water. Therefore, we recommend
the reader to watch the video several times, trying to isolate the occurring chemical
transformation.
The result of this process is the transformation of a peptide weakly interacting
with a copper ion into a more stable complex. The process is definitely assisted by
water molecules in the nearby of the solute: with no water molecules, the state of the
extracted proton can be hardly modelled.
When Water Plays an Active Role in Electronic Structure … 719
The algorithms to simulate the statistical mechanics and the development of high
performance computers (HPC) will allow in the future the simulation of the time-
evolution of a huge number of degrees of freedom, in samples of large size. On the
other hand, many properties of condensed and fluid matter can be described in terms
of a few variables thanks to the effects of averaging over time and space of many
microscopic variables. At the end, a few collective variables can be used to represent
the environment of a more detailed system.
We are now in a stage where the two descriptions are complementary. Computer
simulations allow to observe the statistical averaging in action and to understand how
an average property emerges from the apparent disorder of the microscopic variables,
that are particles’ positions and velocities.
From this cooperation, two large classes of methods emerged:
• A fine description of one part of the system, coupled with descriptions more coarse
and approximated of the environment. This approach is inspired by the concept of
an isolated molecule perturbed, at different scales, by its environment, an approach
widely used, for instance, in spectroscopy.
• An approximate description (but not too approximate) of one piece of the system,
including an essential part of its environment. This is an approach inspired by
crystallography, where the unitary cell, sometimes including many molecules,
replicated in the three directions of space, is assumed as a good representation
of the same molecules in the liquid state of a water solution or in a more-or-less
hydrated crowdy state (like the living cell).
One example of the interplay between the statistical nature (the nature of a polar
and protic liquid) and the molecular nature (the interface between the solute and the
liquid bulk environment) is given by the behaviour of peptides and proteins in water
solution.
Even if water has been investigated in depth since a long time under every pos-
sible perspective, the peculiar role of this solvent is still an arena for experimental,
theoretical and modeling techniques. One example is the entire issue of Acc. Chem.
Res. dedicated to water in January 2012 [11]. A second example is the recent dis-
covery of a new form of ice (number XVII!) [12], thus showing the amazing extent
of structural disorder of water even in solid state [13]. The difficulty of modeling
water in confined regions and close to interfaces described at a molecular level, is
witnessed by the slow convergence towards a unified and simple approach describing
protein hydration within a mean–field approach (see the contributions to this book).
Many classes of methods are widely used to model biological macromolecules
in water, i.e. the environment more commonly used in experimental studies in vitro.
Water is also the environment where the metal ions, in different forms, live, at least for
a given time, in the intra- and extra-cellular compartments. In the following sections
the methods are briefly described together with simple examples.
720 G. La Penna and O. Andreussi
In the ’80s the age of simulations begun in the field of liquids [14]. Simulations were
devised as virtual laboratories where to test theories for intermolecular and later
interatomic interactions. Since then, computers are routinely used as microscopes to
look at atoms in molecules embedded in their phases, depending on thermodynamic
(macroscopic) parameters projected onto atomic (microscopic) properties (positions
and velocities). Along with the years, the experimental techniques able to probe
similar atomic scales (diffractions, spectroscopies, micromanipulations, and cryo-
electron microscopy) were used together with computational models and theoretical
equations to complete the understanding of observations in different phases.
There is no doubt that empirical models for interatomic forces between water
molecules and between water molecules and peptides continuously provide infor-
mation of extreme importance for understanding biological macromolecules. As a
simple example, even crude models of water greatly improved the understanding of
the behaviour of water around proteins, up to critical conditions that can be exper-
imentally monitored [15]. The shift of these conditions induced by the nature of
water compared to other solvents are among the reasons of its strong relationship
with life (proteins, nucleic acids and polysaccharides) [16, 17].
As a nowadays ongoing evolution of such description of atomic forces, we mention
the development of polarizable empirical force-fields, in theory able to mimic also
strong interactions between water molecules and charged groups in proteins or metal
ions in competition for the same charged groups [18–21]. These models are not yet
fully reproducible and/or transferable, because of the many parameters involved, but
they will eventually converge to a big step forward in modeling “soft” biological
systems of large size.
Most of the refined models for proteins merged into water environments and
represented at a molecular level, are built on the basis of empirical models. In practice,
a sample of 216 water molecules based on the Monte Carlo chain at T = 300 K
and P = 1 bar [22] is always included in every package for molecular dynamics
simulations based on empirical force-fields. Therefore, there is a chain effect where
the more coarse model provides configurations to the more refined model. This
common procedure, on one hand decreases the time to adapt the configuration to
the improved (and usually computationally more expensive) model, but on the other
hand introduces a bias coming from the approximations of the more coarse (and
more approximated) models.
When Water Plays an Active Role in Electronic Structure … 721
The aim of polarizable force-fields for describing strong polar interactions eventually
involving charge transfer among two sites, is a description of the first stage of the
formation of chemical bonds. The struggling difficulty of representing this event with
empirical models has been partially circumvented by density functional theory [23]
(DFT, hereafter) and by the huge improvements in computer hardware and algorithm
development for linear algebra. In the ’90s the success of DFT applications increased
its popularity, reaching a good compromise between the size of the systems and
the accuracy of the models. Later, the first-principles simulations begun when the
accuracy of atomic forces reached a high-quality level.
The calculation of electronic properties for molecules in solution is still a very active
area. One of the reasons of this intense activity is to understand the many spec-
troscopical data routinely collected for molecules dissolved in liquid samples. The
theoretical and computational models for these observables require, in most of the
cases, a quantum mechanical description of the degrees of freedom that is perturbed
with the electro-magnetic field. The response to the perturbation is strongly affected
by the environment in which the observable is measured (see also Sect. 17). The
assumption that the many degrees of freedom describing the state of the environ-
ment can be averaged into a continuum or a mean–field is at the heart of the physics
of condensed matter [24]. The representation of nature in terms of a few laws and
a few variables is actually one of the major achievements of mankind. The avail-
ability of computer machines handling many rules and many variables should not
bring us too far from this great achievements. Rather, calculations teach us how to
derive simple rules and a few variables from the complicate manyfold of the model
statistics.
722 G. La Penna and O. Andreussi
where rs and re are the arrays containing all the 3-dimensional microscopic vari-
ables (positions and momenta of atoms and electrons) in the system s and in the
environment e, respectively. The operator Ĥ is the hamiltonian of the system and
Ψ is the complex function describing the possible states of the system. The low-
est energy eigenvalue and the corresponding lowest-energy Ψ function solution of
equation above, describe entirely the ground state of the ensemble of particles. The
basis of every mean–field theory is to split the hamiltonian into several parts that can
be summed-up:
where the Ĥ int is the separable part containing the coupling between system and
environment. Assuming that the the re variables span all the relevant values (the
range Ωe ) for each single value of rs , it is possible to average re over its entire
space Ωe for each single value of rs , obtaining an effective hamiltonian that does not
depend on re , rather on its averaged effect:
Ĥ e f f (rs , Re ) = Ĥ s (rs ) + Ĥ e Re + Ĥ int (rs , re ) dre . (3)
Ωe
The symbol Re summarizes the set of parameters describing the way the environment
is averaged.
The most effective approximation for the environment is the continuum polariz-
able medium, separated from the solute molecule by a thin layer of empty space. The
averaging is therefore not explicitly operated, rather it comes from electrostatics of
dielectric materials [24]. The polarization of the interface between the solute and the
solvent is therefore described via boundary charge elements, i.e. the discrete small
charges distributed on elementary surface elements describing the solvent accessible
When Water Plays an Active Role in Electronic Structure … 723
solute surface. The shape of this interface is usually described numerically because,
with the exception of mesoscopic approximations, the shape of the solute molecule
cannot be approximated as a single sphere or a single analytically manageable solid
object.
The quantum properties of the solute can be described via a Schrödinger equation
where the hamiltonian includes electrostatic interactions between electrons and a
continuum polarizable environment. The greatest achievement is a set of continuum
models used in quantum mechanics (QM) [25]. The most popular form is the PCM
method encoded in many QM packages [26]. The method can be used in all the
different ways to solve SE, being not limited to ground state like DFT (see below).
The different ways to describe the solute cavity, the approximations made in the
solution of the SE and the different ways to compute energy gradients (to perform
energy minimization or geometry optimization) are still improving and different
approaches are still competing for the best performance and accuracy. See here for
a recent debate on this subject [27, 28].
In order to visualize the effect of a polarizable model for the environment, a simple
model can be devised. Let us extract a C-terminal lysine residue from a small protein,
the 1–16 segment of the amyloid-β (Aβ) peptide. Indeed, this terminal residue is very
important also in the propensity of Aβ aggregation, a challenging arena of computer
simulations. The possibility for the C-terminus of a peptide to form salt-bridges
with N-terminal partners is influenced by the alternative possibility to form intra-
molecular salt-bridges or hydrogen bonds. In the case of Lys, the protonation state at
pH∼7 is expected to be ruled by the pKa of the two groups that can carry a proton at
pH values in a range of 3 units of pH, i.e. the ammonium group of the sidechain (Nζ )
and the carboxylic group of the C-terminus (C). The N-terminus of Lys is, in this
example, blocked by the peptidic bond with the rest of the protein chain. In our model
this blocking is operated by an acyl group, and the small fragment will be indicated
as Ac-Lys. Between pH∼2 and 9 it is expected that the C-terminus is predominantly
a carboxylate group (deprotonated) and Nζ is in the ammonium form (protonated).
If we now perform the minimization of the energy of this Ac-Lys fragment in the
vacuum within a model at density functional theory level of approximation we obtain
the structure displayed in the left panel of Fig. 2. The proton is rapidly transferred
from the ammonium group in the sidechain to the carboxylate group, the neutral Lys
sidechain is hydrogen bonded to the neutralized C-terminus with two dihedral angles
in gauche state and the whole aminoacid adopts a compact structure.
The addition of the simplest level of polarizable continuum for the water solution
consists of adding to the solute described at the same DFT level, a homogeneous
dielectric with dielectric permittivity εr = 78 beyond the solvent accessible surface
of the molecule. This is performed, in the PCM method, for every configuration
iteratively built along with the energy minimization process. The minimization of
724 G. La Penna and O. Andreussi
Fig. 2 Minimal energy configurations obtained for Ac-Lys in the vacuum (left) and in the polariz-
able continuum model (PCM, right), obtained at the DFT level. A localized basis-set (6–31+G(d,p)),
the B3LYP hybrid exchange functional and a dielectric permittivity of 78 for the solvent were used
with the Gaussian09 package [29]
energy with this model produces the expected protonation state and a single gauche
dihedral angle in the conformation of the Lys sidechain (right panel in Fig. 2).
This exercise, that we then shall continue increasing the resolution for the solute
environment, shows that at low temperature the Ac-Lys fragment keeps its charged
groups separated in space, with no stress of the chain mechanics, thanks to the solvent
polarizability. The polarizability is that of liquid water at room conditions, so there is
a conceptual gap between the temperature of the solute and that of its environment.
Proper vibrational and entropic corrections can partially circumvent this gap.
In many more complicate molecular systems, when liquid water is the solvent, the
problem of sampling the relevant molecular configurations (even in terms of distribu-
tion among several numerable energy minima) becomes more challenging. Among
the many sources of problems, those of particular interest here are summarized below.
Beside the effect on the electronic structure due to the polarization of liquid water
as the reaction to the electric field induced by the solute, water molecules can form
specific interactions with certain chemical groups in the solute, like hydroge bonds.
When the hydrogen bond network between water molecules changes compared to
bulk liquid water because of the presence of the solute, then the structure of water
is perturbed. The mere replacement of a portion of bulk water due to the presence
of the solute exerts, therefore, an effect on the hydrogen bond structure of water,
thus changing also the polar nature of the solvent. This effect is summarized in
the so-called “hydrophobic” effect [30, 31]. Most of the theoretical and empirical
description of such effects are well described in other contributions to this book.
A second important effect is that water is a protic solvent. Water molecules
exchange protons among them and with the solute, thus providing a buffer of discrete
charges eventually interacting with mobile charges in the solute. A detailed descrip-
tion of this effect, indeed the movement of a positive hole in a cloud of electrons
When Water Plays an Active Role in Electronic Structure … 725
(thus resembling the movement of charge defects within metals) becomes mandatory
when an eventual flux of such mobile charges is the property of interest. This is an
important issue in the oxidoreductive chemistry.
The discrete nature of part of the environment close to the quantum portion of the
system can be included for certain atomic shells. This provides an elegant combina-
tion of quantum mechanics with the molecular mechanics of the environment, and
eventually with the continuum of its solvent at a longer spatial scale. The idea is to
describe at a quantum-mechanical level the portion of the system where electrons are
important (the QM portion), at a molecular mechanics level (described via a set of
empirical parameters) the portion of the system acting as a constrained mean–field
(where electronic variables are averages corresponding to each position of the nuclei,
the MM portion), and finally the farther portion of the system as a continuum (where
both electronic and nuclear variables are averaged).
The first level is known as the quantum-mechanics/molecular- mechanics approach
(QM/MM) and is nowadays a sort of routine method for addressing the catalysis in
enzymes, also in the presence of a sample of water molecules [32]. The addition of
an infinite continuum is relatively recent and is becoming attractive for computa-
tional spectroscopy of large assemblies in water solution (see for instance the GLOB
method in Ref. [33]).
In solid state physics, the approach of density functional theory (DFT) within periodic
boundary conditions became a routine tool also for studying lower symmetry states,
like solid-liquid interfaces, phase transitions, fractures, samples under stress. Since
disorder is in most of the cases related to finite temperature, this latter must be taken
into account. The best review of these methods is the comprehensive book by Marx
and Hütter [34].
Density functional theory became soon a computationally affordable approach to
describe the electronic structure of extended systems in the ground state using the
same set-up developed along with the years for the solid state. In terms of modeling,
the attribute “extended” refers to systems containing a large number of atoms that
are not significantly affected by the boundaries of the system. The sample, therefore,
is repeated periodically in the three directions of space, and it is the unitary cell
of an infinite crystal. When this unitary cell is large compared to the unitary cell
of a real crystal, then the concept of super–cell is used. The interactions and the
associated thermal fluctuations within the super–cell are an approximation of those
in the infinite sample. For instance, fluctuations with wavelength larger than the
726 G. La Penna and O. Andreussi
cell sides are neglected. The accuracy of the results depend heavily on the cell
size, shape and dimensions and the reliability of the information obtained by such
models concerning the interactions, fluctuations and motions of interest must be
critically assessed. There is still a large debate on the dependence of models from
their boundaries, both in the case of mean–field and periodic boundary conditions.
Basically, in both cases the accuracy depends on how the properties of interest depend
on long–range interactions, also mediated by the solvent, between the mostly involved
particles.
The basis of DFT (see also next chapter in this book) is that in the ground state
there is a one-to-one correspondence between a given electron density ρ(r) and a
given electrostatic external potential V (R) acting on the sample of electrons. In the
specific case of a large sample of atoms, the vector R spans all the positions of atomic
cores interacting with the electrons via electrostatic forces. The unique minimum of
the energy of the system with respect to electron density identifies the one-to-one
correspondence [23, 35]:
U (R) = min E(ρ, R) (4)
ρ
where E is the energy functional of ρ once the positions R are given. The existence
of a unique potential energy U allows the derivation of forces F I = −∂U/∂R I for
each core I and evaluated at the ρ function minimizing the E functional. All the
complications of the quantum mechanics are thrown into the functional form E(ρ).
The representation of electron density is done in the Kohn-Sham form: the density is
obtained by a single determinant built from one-electron states, called Kohn-Sham
(KS) states, and indicated with ψi , with i running over the possible occupations (that
depend on the available number of electrons):
1
E({ψi }, R) = fi ψi∗ (r)(− ∇ 2 )ψi (r)dr
i V 2
1 ρ(r)ρ(r )
+ drdr (5)
2 |r − r |
V V
the original local density approximation (LDA [23]), where the functional depends
on the electron density in a single point. Starting from this basic approximation, that
provides very poor descriptions of electron density in molecules, the non-local prop-
erties of the functional were modeled by expansions of the electron density around
the single point, providing the first generalized gradient approximations (GGA, see
Ref. [38] and references therein). The Hartree-Fock solution to the SE provides the
exact exchange functional:
1 1
E xH F = ψi∗ (r1 )ψ ∗j (r2 ) ψi (r2 )ψ j (r1 )dr1 dr2 , (6)
2 i, j V V r1,2
where the sum over i and j runs over occupied states and the two position vectors (r1
and r2 ) refer to the two electrons. Despite being exact for the exchange contribution,
the lack of an equally accurate description of the correlation effects has lead to the
development of hybrid approaches: a fraction of the exact exchange is combined with
the GGA approximations, with coefficients derived by fitting experimental results
or accurate calculations performed with non-DFT methods. By relying on a specific
parameterization, hybrid approaches are able to exploit the cancellation of errors in
the description of correlation and exchange contributions. Even if the hybrid scheme
is the most accurate description of the exchange functional, it is computationally
expensive, because of the matrix elements above (also involving an extension of the
KS set over zero-filling states) that must be computed. In practice the evaluation of the
exact exchange is about ten times slower than pure GGA approximations. The latter
can also include different corrections and in our examples the PBE approximation will
be used because it provides the best compromise between computational performance
and accuracy for calculations of atomic forces and their statistical effects, especially
for water at room conditions [39]. Most of the corrections for pure water then come
from the quantum nature of hydrogen atom [40].
Despite the presence of r1 and r2 in Eq. 6 and one-electron KS states in Eq. 5, we
must warn the reader that electrons are entirely dematerialized in quantum mechanics.
All variables and indeces related to one-particles are used to conveniently “repre-
sent” the quantum nature of many interacting electrons. In many books and articles
concerning the DFT method, this concept is indicated as non-locality. The concept
is, for instance, put in particular evidence when the simple Lennard-Jones approxi-
mation for interacting atoms is derived from DFT (see for instance the JuNoLo code
development and references therein [41]). Another example is the theory developed
to describe electron correlation in the ground state, also known as Hubbard-U the-
ory [42]. In non-DFT textbooks, the same concept arises from the combinations of
complicated determinants in the frame of multi-reference and configuration inter-
action methods (also called post-Hartree Fock methods, see the chapter about iron
porphyrins in this book).
The KS states ψi representing the electron density are in turn represented as linear
combinations of simpler orthogonal functions. Quantum chemists prefer to expand
the one-electron KS states in atomic contributions (or manageable representations
728 G. La Penna and O. Andreussi
of them, like several gaussian functions). This representation greatly helps in under-
standing the chemical structure of the electron density in the energy minimum, and
especially for isolated molecules (both in the gas phase or in mean–field represen-
tations of their environment). Solid state physicists prefer to use plane-waves con-
sistent with the unitary super–cell used as the sample. The advantage of this second
approach is that the representation does not depend on the variables R I , letting the
representation of ρ and the variables identifying the external potential determining
ρ, uncoupled. This allows a rigorous total energy conservation when the equation of
motions are solved using forces and velocities of each atomic core I : the existence of
invariants in dynamical methods is very useful in practice, to check for errors in code
implementation, writing equations, etc. The disadvantage is that a large super–cell
may contain a huge number of plane-waves. However, in the last 20 years, especially
because of the engineering of Fourier transform algorithms, the sizes of the afford-
able samples made this DFT approach the best performing tool for investigating also
small samples of biological models cutted away from larger empirical models.
The E functional written in Eq. 5 is a complicated potential energy functional of
a complicated, yet manageable, representation of the electron density and of atomic
core positions. The idea was then to use this potential energy in a lagrangian including
the kinetic energy of atomic cores (assumed as classical point masses) and some
artificial masses representing the KS states. This method is an extended lagrangian
method and it is known as Car-Parrinello molecular dynamics method [43]. Since
the KS states must be orthogonal at any time, an additional term representing this
constraint can be included:
L =μ |ψi (r)|2 dr
i V
1
+ Mi Ṙ2I − E K S ({ψi }, R) (7)
2 I
+ Λi, j ψi∗ (r)ψ j (r)dr − δi, j
i, j V
those within different atomic cores, where the electron density change in space is
less steep (energy cut-off in the range of 30 Ry) [46]. The real-space resolution of
the electron density became about 10 pm, i.e. that corresponding to the larger energy
cut-off.
At the end of the story, a limited set of words and numbers summarizes the type
of DFT model one is using:
1. the type of exchange functional;
2. the two energy cut-offs for the plane-wave basis-set (the two spatial resolutions
of the electron density);
3. the type of pseudo-potential used for modeling the atomic cores (related to the
above parameter).
As we have seen in the previous sections, two complementary approaches have been
developed to characterize solvated systems: implicit continuum models, coupled
to high-level quantum-mechanical calculations, usually performed on static isolated
system; and explicit, fully atomistic, QM or QM/MM simulations, in periodic bound-
ary conditions and using molecular dynamics as a way to sample statistical config-
urations at finite temperature. In fact, reformulations of continuum solvation have
been recently proposed in the literature [47–54] to allow the seamless coupling of
the two approaches.
The starting point of this new class of continuum approaches is the definition,
thanks to Fattebert and Gygi (FG) [47, 48], of the electrostatic free energy functional
of the system embedded in a polarizable continuum. The energy functional E of Eq. 4
becomes now a free energy functional F, because of the averaging of solvent variables
implicit in the dielectric function ε:
ε (ρ, R; r)
F (φ, ρ, R) = ρ (r) φ (r) + z I δ (r − R I ) φ (r) − |∇φ (r)| dr
2
I
8π
(8)
where φ is the electrostatic potential in the simulation cell, z I are the atomic (pseudo)
charges, and the key ingredient is represented by the dielectric function, ε (ρ, R; r),
which is assumed to vary smoothly from a value of 1 (vacuum) in the region where
the QM degrees of freedom are present, to the solvent bulk value of ε0 outside of the
QM system. The above expression allows to easily derive all the important equations
related to the QM/continuum interaction by simply exploiting a rigorous and elegant
variational approach. In order to find the equilibrium minimum-energy state of the
system, the functional derivatives with respect to the different fields entering the free
energy functional must vanish. In particular, by imposing a vanishing derivative with
respect to the electrostatic potential, the generalized Poisson equation
is obtained, which links the QM charge densities and the smooth dielectric function
with the electrostatic potential. Similarly, to optimize the electronic or ionic degrees
of freedom, functional derivatives of Eq. 8 can be computed analytically to provide the
right descent directions that will automatically include the presence of the dielectric
embedding. The solute electron density ρ is then represented in terms of one-electron
states as in the usual KS approach (see above). The KS potential used in DFT to
optimize the electronic density will be given by
δF 1 δε
(r) = φ − |∇φ (r)|2 (r) , (10)
δρ 8π δρ
Fig. 3 Self-consistent continuum solvation of an acetamide cation. Notice that the additional proton
is bound to carbonyl oxygen, as expected by the higher basicity of amide O compared to amide N. a
The self-consistent boundary is built in terms of isosurfaces of the electronic density, the transparent
surface corresponds to a value of 0.01 a.u. b The dielectric screening of the environment is effec-
tively modelled through an induced polarization density, the transparent red and blue isosurfaces
correspond to a value of plus and minus 0.001 a.u., respectively. c Value of the electronic density (in
black) and of the dielectric permittivity (in red) as a function of position along the axis visualized
in panel a. d Value of the polarization charge as a function of position along the axis visualized in
panel a
the energy penalty involved with the creation of the continuum/vacuum interface
inside the embedding medium. The cavitation energy functional was introduced,
similarly to what was done by Cococcioni et al. for the enthalpy functional [55], by
exploiting the concept of quantum surface of a QM system: similarly to the dielectric
continuum, also in this case the embedding energy is expressed as a functional of a
smooth interface function defined in terms of the QM degrees of freedom, namely
G cav = |∇s (r)| dr
When Water Plays an Active Role in Electronic Structure … 733
where the interface function, s (r) now goes from a value of 1 inside the QM region,
to a value of 0 inside the environment region.
In order to extend the capability and the accuracy of the model, following similar
approaches developed within the PCM framework, Andreussi et al. [54] substantially
revised the FG models, by improving the definition of the dielectric function, by
combining it with the enthalpy functional of Cococcioni et al. [55], and by carefully
parameterizing and testing the model on a reliable set of experimental data. The
resulting self-consistent continuum solvation (SCCS) approach proved to be close
to chemical accuracy in reproducing aqueous solvation energies of small organic
compounds [54]. Moreover, the largest deviations from the experimental results is
observed for strongly interacting functional groups, such as acidic or basic groups
or hydrogen bonding groups, for which the continuum approximation introduced by
the model is, in fact, expected to break down.
SCCS was later extended and tested on charged systems in solution [56]. With
respect to high level quantum-chemistry calculations, when using super-cell approach
one needs to take proper care of periodic boundary conditions, which can intro-
duce significant artefacts when used to model charged systems. Among some of
the most common approaches used to correct artefacts due to periodic boundary
conditions for isolated systems in vacuum, Makov-Payne, Martyna-Tuckerman and
point-countercharge methods were extended to include the presence of a contin-
uum dielectric embedding as defined in the FG or SCCS models [57]. Results on
charged systems show accuracies that are comparable to state-of-the-art continuum
approaches, but a reparametrization of the model specific for anions was shown
to be required [56]. This is probably a consequence of the poor description of the
hydrogen bond in continuum solvation together with the charge asymmetry in water
solvation, where negative and positive compounds are solvated via the hydrogen
atoms or oxygen lone pairs, respectively.
The elegant formulation of the FG derived continuum approaches allows the easy
coupling of this embedding strategy with most of the available techniques to compute
spectroscopic properties in solution, which provide key results to compare theory
and experiments. In particular, similarly to what was done in the PCM framework,
optical spectroscopies in solution can be computed [58] by exploiting linear response
approaches and assuming that the solvent dielectric screening during a fast process,
such as an electronic excitation, is the high-frequency optical one, ε∞ . Couplings
with vibrational, magnetic or core-level spectroscopies in solution also require minor
modifications with respect to the same calculations in vacuum and are the object of
on-going research.
In addition to the smooth self-consistent definition of the solvent interface pre-
sented above, smooth atom-centered definitions have been proposed in the literature
[52, 59], leading to a more tunable model that can better adapt to specific applica-
tions. For example, the possibility to specify different cavity parameters for different
atoms or portions of the QM system allows to have in the same simulation cell neu-
tral, positively and negatively charged compounds, which require different interface
parameters.
734 G. La Penna and O. Andreussi
O–H distances, values for the dihedral angles, etc.) the final result is the same of the
minimal energy in vacuo model represented with a localized basis-set (Fig. 2, left
panel).
It is interesting to notice that the carboxylate neutralization with a proton donated
by the ammonium group of the Lys sidechain may occur via a different mechanism.
If the super–cell is smaller (for instance with 0.5 nm of minimal distance between
nearest images) there is a lower energy pathway that drives a proton towards the car-
boxylate. The ammonium group of the nearest image comes close to the carboxylate
via the rigid rotation of the all-trans configuration. This pathway occurs also at T =
50 K, no need of increasing the temperature to overtake local energy barriers. The
final configuration is, in this case, the carboxylic group of an all-trans Lys interact-
ing, via hydrogen bond, with the amino group of the nearest image Lys sidechain.
Since there is one molecule per super–cell, this means that again the Lys molecule
is amino-carboxylic. This result is an artefact of the small empty space provided
around the solute. Nevertheless, it is an indication that the molecule wants to stay in
a neutral state and if there is a proton closer than that provided by the solute itself, it
is easily obtained by the environment when available.
In order to include the effects of the water solvent, the same initial configuration
is now merged into the same super–cell filled of water molecules. The configuration
of the water molecule are extracted by the Monte Carlo simulation of an empirical
model of water, the rigid TIP3P water molecule [22]. In the merging process, the
water molecules with oxygen atom closer than 1.5 Å to any atom in the Ac-Lys solute
are removed. This set-up results in a system composed by 29 atoms of the Ac-Lys
solute and 105 water molecules in a super–cell of 1.79 × 1.35 × 1.34 nm3 .
This system is then heated up to T = 300 K. One important effect that is introduced
in this simulation is the mechanism of proton transfer within water molecules in the
liquid state and at room conditions. The proton transfer from the ammonium group to
a more basic group in the environment, including activated water molecules in the
solvent, can occur via a fast rearrangement of X-H chemical bonds (the Grotthuss
mechanism, see Ref. [66] and references therein). This kind of mechanism can be
observed even in short simulations of about 1 ps (see below). The video V2 collects,
within 60 s, 1800 configurations of the hydrated Ac-Lys fragment (it is encoded with
30 frames per second). In the table below a summary of the frames is reported.
Video time since start (s) Simulation stage Simulation time (ps)
0–5 T = 50 K 0.18
5–10 T = 100 K 0.18
10–15 T = 200 K 0.18
15–27 T = 300 K 0.42
27–40 T = 300 K, d0 = 5.5 Å 0.48
40–47 T = 300 K, d0 = 4.5 Å 0.24
47–60 T = 300 K, d0 = 3.5 Å 0.48
736 G. La Penna and O. Andreussi
In the first 15 s of the video, the heating up to 200 K of the system, it can be
observed that the water molecules do not change significantly their positions, but
rather they start changing their orientation. Despite one can say that at 200 K water
should be a crystal, the simulation, that started from a liquid sample, is not long
enough to freeze the sample. Hydrogen atoms, that are lighter than oxygen, attempts
different hydrogen bonds with oxygen in the nearby. The dynamics of hydrogen
bonds became apparent at T = 300 K (about 20 s) when several “bonds” are drawn
just because atoms become closer than 1.6 Å. At this stage also hydrogen bonds
between the ammonium H and water O become visible, as well as those around the
carboxylate. Once the system entered this dynamic regime, an external potential was
added in order to sample configurations with progressively lower Nζ -C distances.
This method is called umbrella sampling (US) and it is related with the generalized
ensembles described in many other chapters of this book. In this case the US consists
of adding a harmonic potential UU S = K /2 (d − d0 )2 with K = 200 kcal mol−1 Å−2
and a mobile equilibrium distance, d0 for that distance. Within 20 and 40 s, the
US does not alter the room temperature dynamics, with the Lys sidechain sampling
gauche states. It is at time of about 50 s that the close distance (d0 is now 3.5 Å)
starts affecting the solute. It can be observed that the C–C bonds are stretched to
distances larger than the visual cut-off (1.6 Å), witnessing the violence exerted by
the electrostatic interactions between the ammonium and the carboxylate groups to
the aminoacid scaffold. Nevertheless, the conclusion is the same of the mean–field
model: the proton cannot be transferred from the ammonium to the carboxylate, even
though in some cases the H–O bond becomes visible. The hydrogen bonds between
the two charged groups and water molecules are more likely than the hydrogen
bonds of type N–H· · · O, and the water molecules around the two groups seem not
be perturbed by this compact state. If one follows the lefthand portion of the images
(the methyl group of the acyl fragment) it can be observed how the water molecules
are more rigidly oriented along the whole minute of the video. This hydrophobic
interaction is not affected by the US potential and by the closure of the Lys sidechain.
The water rotations and translations observed in the second half of the video,
both without and with the US potential, are consistent with the experimental radial
distribution function for O–H pairs belonging to different molecules.
In both cases the structural change in the water solvent is small, thus indicating
that the Ac-Lys fragment is soluble in water. The bulk structure of water is not
dramatically affected by the Ac-Lys solute, with the cavity formed by the solute not
perturbing the water liquid sample around it. When the US is introduced, the more
significant change in the Ow–Hw RDF compared to Ow–Ow (right and left panels
in Fig. 4, respectively), indicates that the structure of water is not changing in the
average relative position of centers of water molecules (the oxygen atoms), rather in
the orientation of O–H bonds in water molecules. The more pronounced structure
of Ow–Hw RDF (right panel) corresponds to a partial rotational freezing of O–H
bonds in the region close to the charged groups that are kept close in the US region
(thin line in Fig. 4), i.e. when the Ac–Lys molecule is forced towards more compact
configurations.
When Water Plays an Active Role in Electronic Structure … 737
4 2
CP-MD CP-MD
3.5 exp. exp.
3 US 1.5 US
2.5
g(r)
g(r)
2 1
1.5
1 0.5
0.5
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
r (Å) r (Å)
Fig. 4 Radial distribution functions for O–O pairs (left panel) and H–O pairs (right panel), with
H and O atoms belonging to water molecules. Solid line is for the T = 300 K stage, while the
thin line is for the stage where the US external potential is present and d0 = 3.5 Å. Points are the
experimental data reported in Ref. [39]
However, the main result of this short simulation is that since both the polar groups
(the ammonium and carboxylate) are interacting with water almost like other water
molecules replaced by them, the propensity for exchanging one of the protons among
the two groups, observed in the vacuum, is removed. Even when the H–O distance
approaches values in the range of 1.5 Å (the last 10 s of the video), the H–O bond is
not formed.
In Fig. 5, the behaviour of the potential energy along with the entire simulation
(including the heating up of the system from zero to 300 K) is displayed. The initial
decrease of energy is an effective energy minimization due to the low temperature
imposed to the system built as the sum of the minimal energy solute and a sample of
liquid water made of rigid “empirical” molecules. The atoms adapt their positions
to the chemical bonds now available in the quantum model, slightly rearranging the
interatomic distances according to the ground state electron density. The comparison
between this part of the time evolution with the in-set that displays the behaviour of
energy for the same model in the vacuum at low temperature, gives an estimate of the
energy contribution due to solvent rearrangement compared to the contribution of a
competitive solute change. In the solvent the energy decrease is of about 3000 kJ/mol
with no significant changes in the solute structure. In the vacuum, the drop of energy
738 G. La Penna and O. Andreussi
is 200 kJ/mol with a significant structural and electronic reshuffling of the solute:
this becomes more compact and the proton migrates between two groups. Even if
the number of atoms in the solvent (315) is large compared to that in the solute (29),
the amount of water molecules is small enough to be compared with a single shell
of water molecule around an aminoacid. The energy involved in the adjustment of
a mechanical model of a few water molecules interacting with a single aminoacid
to its quantum nature is about ten times that involved in a deep change of electronic
structure and consequent molecular structure in the isolated solute.
Continuing with the observation of Fig. 5, once a decent minimum of the energy
is achieved (at T = 100 K), the thermal fluctuations start to increase the energy,
transferring energy among the different degrees of freedom in the super–cell. These
include the water molecules, the major part of this thermal bath. As a significant
improvement compared to mean–field models, water molecules explicitely react to
movement of bonds in the solute. The horizontal lines represent the average potential
energy increase U = RT 2
Ndeg from the energy minimum (the zero reference in the
figure), due to the population of Ndeg = 3Nat − 6 independent degrees of freedom in
the super–cell (Nat the number of atoms in the sample). The increase of energy due
to thermal fluctuations reaches approximately the allowed average level, the better
when the temperature is large enough to facilitate equipartition.
Again, it is interesting to notice that the energy decrease operated by the addition
of the external potential (the US region) is huge for a quantum chemist, at least
about 200 kJ/mol. Notice that once the external potential is added, the total energy
is not conserved. Since in the simulation an external thermal bath is added to the
system to keep the temperature of the super–cell constant [67], the work done by
the external potential is incorporated into the potential energy of the super–cell. The
potential energy change of about −200 kJ/mol is the balance between even larger
quantities that cannot easily be separated from the sum. One is the energy decrease
due to the formation of an intramolecular hydrogen bond between the ammonium
and the carboxylate group. The other is the energy increase due to slight reorientation
of O–H bonds within the surrounding water molecules. The size of the sum of
the two quantities shows that even if the sample is very small, the involvement of
many degrees of freedom in the structural changes that accompany the external bias
spread the single collective variable into a large mechanical work, where the word
“mechanical” is used here just to indicate that there is no breaking and formation of
covalent bonds.
Indeed, the energy decrease observed in the vacuum with the proton transfer is
about 200 kJ/mol. This quantity for the 29 atoms of the solute is huge and there was
no way to disperse the same energy among the limited set of degrees of freedom of
the isolated molecule. The water solvent is therefore an excellent buffer for energy,
also when alternative formation of localized bonds is in theory possible.
The experiment reported above is not sufficient to access free energy changes,
because of the limited statistics acquired. Even perturbation methods cannot be
safely applied because the external US potential is too strong to be managed with
perturbation equations typical of umbrella sampling and reweight ing techniques.
Nevertheless, most of the methods recently developed for computing free energy
When Water Plays an Active Role in Electronic Structure … 739
changes are based on the same approach briefly described above: an external poten-
tial is added to the system, allowing the biasing of configurations over a range of
values for a convenient collective variable. In the case above, the collective variable
was the distance between a possible donor of protons (the ammonium group) and a
possible acceptor (the carboxylate group). But a wide range of collective variables
based on complicated combinations of microscopic variables (atomic coordinates)
can be exploited. The estimate of the free energy at room temperature as a function
of the collective variables allows to identify stable states, intermediates and transi-
tion states in rugged energy landscapes. Within a quantum mechanical description,
this becomes a tool for understanding a hypothetical reaction mechanism, once this
has been translated in a set of collective variables. Despite many numerical methods
have been developed for this purpose [68–70], only a few tentative applications are
reported in the context of first-principles calculations, because of the many technical
problems. Among these, we mention the effects of transient excited states (forces not
well defined) and changes of spin configuration (that is encoded in the f i populations
in Eq. 5, parameters that must be kept fixed along the simulation). In many cases, the
marks of these kind of problems are strong oscillations and drifts in the fake electron
kinetic energy, a quantity that must be kept small during the simulation in order to
guarantee a ground state definition and a proper separation among electronic and
atomic degrees of freedom.
In most cases concerning chemical reactions, the presence of an explicit buffer
of solvent is mandatory in any case. One reason is scientific: The solvent, especially
water, is the mean for coupling together portions of molecules far in space. It can
be observed also in video V2 that at a certain time one water molecule is bridging
the two charged groups. This coupling can be enhanced in the model via a collective
variable containing the two groups, while the explicit solvent, with its reaction to the
change of collective variable, introduces the necessary mean for the coupling to be
effective. Another reason is also technical: the solvent provides the explicit electrons
for transient electron pairs that allows the smooth rearrangements of electrons in a
ground state for the assembly (solute+solvent). This explicit electron bath, in addition
to the atomic bath compensating for strong oscillation of kinetic energy (the thermal
bath), is making feasible the exploitation of complicated reaction mechanisms that
in a mean–field solvent appear forbidden.
The description of a solute merged into a protic liquid like water, provided by DFT
and the dynamic algorithms sampling the statistics at room conditions emerging from
this model, are opening a new perspective in the frame of biological molecules.
The picture of the water mobility sketched by the exercise of Ac-Lys above holds
also when other kinds of positive holes dynamically attract electrons provided by
electronegative atoms, both on the protein and in the water solvent (the oxygen
atoms). This occurs when metal ions are present in the water solution in contact with
the protein. Among these ions, some are more or less spherical charges (like the
740 G. La Penna and O. Andreussi
usual assumption in empirical models for Na+ or Ca2+ ), others provide geometrical
constraints to the electron donation (the coordination chemistry of transition metal
ions like Zn, Cu, Fe, all abundant in biological environments).
Such directional interaction occurs in the N-terminal disordered region of the prion
protein, where a segment containing the aminoacid sequence HGGG is repeated sev-
eral times. The amide protons of two G residues, respectively, are given to water and
replaced with a single Cu2+ ion. A neutral Cu[HGGG] complex was first observed
by NMR in water solution at pH 7.4 (phosphate buffer, almost neutral) [71] and it
was then isolated as a crystal from the same solution [72]. The same kind of metal
ion-induced peptide neutralization, occurs in many other systems of biological inter-
est [73], thus showing that positive holes reshuffling is a common process, extremely
relevant in changing the behaviour of protein backbone when slight changes in the
protein environment occur.
So we came back to our initial example (see Sect. 2), the single Cu2+ ion interacting
with the peptide representing the HGGG repeat in the N-terminus of the human prion
protein. Now we have an idea of how the glue (the valence electrons), filling the dark
space in the video images, is represented in the computer.
In order to monitor the chain of microscopic events behind the transfer of proton
from the peptide to water, models of the HGGG peptide in contact with Cu ions has
been investigated in detail within the frame of the computational approach described
above for Ac-Lys [74, 75]. The idea was to deposit the Cu ion within the peptide
already templated as in the Cu[HGGG] crystal. The mechanism by which the amide
H atoms are extracted by the water solvent was then observed in all the microscopic
details.
In video V1 this process is displayed. The video lasts about 60 s and covers the
simulation at T = 50 K of the complex with 25 water molecules in an orthorhombic
super–cell for the time of 1.9 ps. At the beginning (a minimal energy structure of
the DFT model) one water molecule is located along the axis of the square-planar
coordination of Cu2+ . The copper ion is bonded by one imidazole Nδ (His), two
backbone amide N atoms (Gly 2, Gly 3) and the amide O atom (Gly 3) (see left
Fig. 6 The initial (left) and final (right) configurations for the Cu2+ [HGGG] complex in a super–
cell with 25 water molecules. The simulation was performed at T = 50 K for 1.9 ps. The elongated
bonds at the top-right of right panel indicate the location of the proton initially attached to N(Gly
3). Cu is displayed as an orange sphere
When Water Plays an Active Role in Electronic Structure … 741
panel in Fig. 6). Since the amide groups are protonated and the amide N lonepair is
delocalized over the carbonyl group of the peptide bond, its propensity for bonding
Cu is low. Nevertheless, the initial configuration is an energy minimum and the N–Cu
bonds are formed, as it is shown by the first few seconds of the video (and by the
time evolution of other electron parameters, data not shown here). After 10 s, the
axial water molecule moves from Cu towards the H atoms of Gly 3 (in anti to the His
sidechain), forming a hydrogen bond. At the same time, the N(Gly 3)-Cu distance
becomes shorter (the bond is drawn in the images when the distance between Cu
and a ligand is smaller than 2 Å). At about 30 s, the same water molecule moves
away from the complex carrying the proton of the amide group. Once the H3 O+ ion
moves into the layer of water molecules, the proton is passed to another molecule,
thus representing a snapshot of the Grotthuss mechanism for dissipating the excess
proton introduced in the water layer by the Cu ligand.
The proton of Gly 2 points away from the plane of the Cu ligands, interacting,
especially at the end of the video, with several water molecules. Once the ligand is
wrapped around Cu because of the two aligned Cu–N bonds, there is no chance to
avoid the further proton donation to water.
In many cases, copper ions interact with proteins via His sidechains. Like the prion
protein, the amyloid-β (Aβ) peptide that is the major component of the amyloid fibrils
observed in the Alzheimer’s disease, is characterized by a relatively high content of
His residues in the N-terminal region. Moreover, two of these His residues are next
in the sequence, thus increasing their potential role in Cu binding. In experiments, all
the three His residues (6, 13 and 14) are affected by Cu addition and binding [76], but
the possibility of a dynamic exchange of ligand atoms around Cu strongly affects the
conformational sampling of the peptide: the more defined is the Cu binding and the
more defined is the ligand conformation because of the many constraints due to the
Cu coordination. The Aβ peptide is intrinsically disordered [77] and the interaction
with one or more metal ions changes the population of conformers, thus modifying
the propensity for mutual interactions between peptides [78]. Moreover, metal ions
can form bridges between two or more peptides. This kind of effect is common to all
disordered ligands and a large number of disordered protein regions are to date known
to interact with metal ions. The type and fluxionality of metal binding determine the
assembling of peptides and ions together, with the consequent extrusion of water
from the peptide shell.
Several models for the Cu binding to the region 1–16 of Aβ were investigated [79].
Since Cu+ is better investigated in monomeric complexes with Aβ, that complex
allows an easier characterization.
The models allow to understand which are the conditions for a single Cu to bind
three His sidechains at the same time. At the oxidation state I, there is not such a
condition: starting from a high His crowding (Fig. 7, left panel), the structure with the
742 G. La Penna and O. Andreussi
Fig. 7 Initial (left) and final (right) states for one of the models of Cu+ -Aβ(1–16) complex. Colour
scheme is like in Fig. 6. In green are emphasized those atoms that are within 2 Åfrom Cu (in orange).
Orange bonds emphasize the His residues, while in green are emphasized bonds in Asp 1 and Asp
7, activating the His sidechains
Video time since start (s) Simulation stage Simulation time (ps)
0–5 T = 50 K 0.18
5–11 T = 100 K 0.22
11–17 T = 200 K 0.22
17–26 T = 300 K 0.32
The 3-His coordination (first 20 s) is a low energy state for the solute. To show that
the mechanical stress of the peptide is able to break a Cu–N bond it is necessary to
reach the temperature of 300 K. Moreover, in smaller models, containing separated
Ac-His-His-NHmet and Ac-His-NHmet fragments replacing the Aβ(1–16) chain, the
mechanical stress is larger in the His-His segment and the coordination of Cu+ by His
6 and His 13 is more stable than that of the His 13-His 14 segment. Therefore, with
When Water Plays an Active Role in Electronic Structure … 743
the entire 1–16 fragment the situation is completely different than in the truncated
models. Finally, no water molecules approach Cu and the radial distribution of Ow–
Cu pairs is very similar to that observed for square-planar complexes of Cu2+ , where
the interaction with axial water molecules is weak. The chance to have a ligand like
water in the plane from which His is extracted (the righthand side of the images in
the video) is low, but this can be observed only with a model including explicit water
molecules and the possibility to form new chemical bonds.
A deep inspection of video V3 shows that when a stable Cu+ digonal complex is
formed, water molecules are not allowed to enter into the Cu coordination sphere.
This is not only an impression: the reduced state of Cu in this His-Cu-His coordination
is more hydrophobic than the oxidized state of Cu. The reactivity of Cu in this type of
coordination can be further probed by using the above described dynamic methods
combined with external forces, as sketched in the example of Ac-Lys molecule.
Again, we first built a configuration for Cu in the oxidized state (Cu2+ ) bound to the
amyloid peptide. The bound state has been chosen as characterized by many potential
ligand atoms at close distance from Cu, where these potential ligand atoms include
the His sidechains of the peptide (that is here represented by the short sequences
DAGGGHD and Ac-HH-NCH3 . These two short peptides are segments representing
the DAEFRHDSGYEVHHDK 16 residues in the N-terminus of the Aβ peptide
mentioned above: it is a truncation of the 1–16 Cu-binding sequence, but not too
drastic. It is known that Cu ions are bound to these residues and that chemical
properties of Cu depend on its coordination to the peptide. The important point that
we want to emphasize here is that dynamic methods like those described in this
chapter allow to include, among the potential ligand atoms, the water solvent and,
eventually, other ions and molecules dissolved in it. In the following example the Cu
ion is partially released by the peptide, with the release induced by an external force
acting on the Cu coordination number, decreasing the coordination number from 4 to
2. This bias is useful to move the ion from a coordination environment to a different
one, where the biological ligand be eventually reorganized during the process. The
valence left free to Cu is dynamically captured by the chemical species that are able
to fill with their electrons the positive hole provided by Cu, but the external force
prevents a stable transfer of electrons into the hole. In our modeling study, at the end
of this release, the oxidation state of Cu is changed, adding one electron. Thus the
Cu-Aβ complex is reduced, in a configuration that is suitable to reduction because
close to a low-valence state of Cu. In this new reduced state, the ion is readsorbed
by the peptide, again by acting externally on the coordination number, this time
increased from 2 to 4. Then, once a high-valence state of reduced Cu is achieved, the
ion is again oxidized, and the coordination number relaxed to its more natural state,
i.e. 4–5.
744 G. La Penna and O. Andreussi
A schematic picture of the path that is performed by externally acting on the coor-
dination number is displayed in Fig. 8. Here a possible behaviour of the free energy
of the Cu-Aβ complex is displayed as a function of a reaction coordinate. The reac-
tion coordinate is not a thermodynamic measurable variable, because we measure,
when this is possible, an average of the coordinate. Therefore, the displayed free
energy is not a measurable work. Nevertheless, the quantities help in understanding
the chemistry of the atomic assembly as a function of its structure, the latter manipu-
lated via a suitable handle. Since many properties of an ion depend on the amount of
ligand atoms in its surrounding, we choose the coordination number as the reaction
coordinate to handle. As reminded above, it is possible to compute for a model the
free energy as a function of a chosen reaction coordinate [70], but in practice this is
not easy, because of the extremely large number of manipulated pathways required
to achieve a statistical convergence. However, this is the future and we describe in the
following a single manipulation, with some insights that can be obtained analysing
a bunch of these manipulations.
In Fig. 8, red points display the configurations sampled in the first oxidized state
(configurations 1–3 in red); blue points display the configurations sampled in the
following reduced state (4–6); green points are configurations obtained after the
second oxidation (7–8). The parabolic shape of the free energy for, respectively,
oxdized (Cu(II)) and reduced (Cu(I)) states is just the ideal representation of the
simplest approximation around the points of stability when the respective numbers
of electrons are deposited on the molecule.
The video V4 displays one of the pathways following the external change of the
coordination number and oxidation state, i.e. one pathway from point 1 to point 8 in
the schematic frame displayed in Fig. 8. The actual behaviour of the complex and its
solution environment depends on the initially chosen Cu-Aβ structure and on the rate
of the change in coordination number. Once a large number of pathways be collected,
the bias due to the initial conditions and to the way the pathway is performed, becomes
unrelevant: any other bias would produce, on average, the same effect. In practice,
When Water Plays an Active Role in Electronic Structure … 745
ΔE (V)
0
-0.5
-1
-1.5
1 2 3 4 5 6
CN (atoms)
the statistical weight of each sampled configuration and of the quantities averaged
over the sampled configurations slowly converges with the number of pathways and
time length of each pathway. However, a first view of the chemical properties can be
obtained assuming that each of the sampled configurations has the same weight.
In Fig. 9, the reduction potential of Cu(II)-Aβ is reported for several sampled
configurations (16 pathways). The reference of the reduction potential is the reduction
potential for the proton in pure water, computed with similar approximations. For
the details of these calculations, see Refs. [80, 81]. The colour of each point refers to
Fig. 8, thus a red point is for one of points 1–3 obtained in the trajectory, etc. There are
3 × 16 red points, 3 × 16 blue points and 2 × 16 green points. The black points are
transient points obtained during equilibration and/or driving the coordination number.
The video V4 collects, within 1 min and 20 s, 2010 configurations of the fragment
described above (it is encoded with 25 frames per second). The total simulated time is
2.41 ps. C N0 is the equilibrium coordination number of Cu imposed by the external
bias, that is a harmonic force function of the coordination number C N . In the table
below a summary of the frames is reported.
On the lefthand side there is most of the peptide, while on the righthand side there
is most of the water environment. The Cu ion is displayed as an orange sphere, while
the atoms that are closer than 2.5 Å (this is the cut-off distance used to evaluate
the coordination number) are displayed as green spheres. During the first 10 s, the
equilibration of the initial high-valence state of the ion occurs. At about 5 s, the
temperature of 300 K is achieved and after approximately other 5 s we see that one of
the three His sidechains (His 14) initially bound to Cu is released. This occurs because
there are too many ligand atoms around Cu and His 14 is the most stressed part of
the peptide forced to stay around Cu at the beginning. The stress is concentrated
in the repulsion between the two sidechains of, respectively, His 13 and His 14.
The reduction of coordination number by the external force is then attempted. This
process lasts until approximately 27 s. The configuration achieved at this point has
not the coordination number 2, that is externally forced, because the oxidized state
of the complex does not like that: this is why the approximate free energy for point
3 is larger than for point 2. However, the coordination number is 3 and Cu is bound
746 G. La Penna and O. Andreussi
Video time since start (s) Simulation stage Simulation time (ps) Point in Fig. 8
0–5 T = 50 K 0.15
5–10 T = 150 K 0.15
10–15 T = 300 K, C N0 = 6 0.15 1
15–18 T = 300 K, C N0 = 5 0.10
18–21 T = 300 K, C N0 = 4 0.10 2
21–24 T = 300 K, C N0 = 3 0.10
24–27 T = 300 K, C N0 = 2 0.10 3
30–35 T = 50 K 0.15 4
35–40 T = 150 K 0.15
40–45 T = 300 K, C N0 = 2 0.15 5
45–48 T = 300 K, C N0 = 3 0.10
48–51 T = 300 K, C N0 = 4 0.10
51–54 T = 300 K, C N0 = 5 0.10 6
54–59 T = 50 K 0.15 7
59–64 T = 150 K 0.15
64–80 T = 300 K, C N0 = 5 0.51 8
tively high. This is due both to the low chances for the oxidized state to have the free
valences of Cu occupied by ligand atoms, including O of water molecules, and to
the high stability of the reduced state, with Cu(I) nicely pinced by two N atoms of
either His sidechains or N-terminus (Asp 1). On the other side, when Cu has large
coordination number, Cu becomes reductant. The latter condition is assisted by the
water molecules coming from the solvent: in those cases where this be not possible,
the weak binding of Cu to the peptide would imply an oxidant activity.
The active role of water molecules in giving electrons to molecules containing
cations with variable oxidation states, is displayed by a catalyst that mimics the
oxygen evolving center in photosystem II [82]. This is a molecule containing several
cobalt-oxygen cores assembled together into a large anion. The cores, once oxidized,
stimulate different water molecules to come close together and to restore, by different
extents, the electrons lost by the metal ions in the large anion. In these conditions,
the oxygen atoms that are close in space are forced to eject the bound hydrogens
and to share their electrons in O-O bonds. The electron transfer, coupled with proton
release from metal-bound water molecules and to the structure of the large anion,
has been recently modelled with dynamic methods like those described above [83].
porphyrins). However, here we mention only the importance, in excited states, of the
requirement to extend the hypotheses of effective one-electron states (the Kohn-Sham
approximation) and of the two-variables description at the basis of density-functional
theory. Going beyond these approximations requires a full theory for many-body
interactions within a second-quantization theory. Some of the derived formalisms
that are more promising for large molecular systems are summarized in Ref. [85].
18 Perspectives
References
1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N.,
Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000). https://doi.
org/10.1093/nar/28.1.235, https://www.rcsb.org
2. Lawson, C.L., Patwardhan, A., Baker, M.L., Hryc, C., Garcia, E.S., Hudson, B.P., Lagerstedt,
I., Ludtke, S.J., Pintilie, G., Sala, R., Westbrook, J.D., Berman, H.M., Kleywegt, G.J., Chiu,
W.: EM databank unified data resource for 3d EM. Nucleic Acids Res. 44(D1), D396–D403
(2016). https://doi.org/10.1093/nar/gkv1126, https://www.emdatabank.org
3. Weiner, P.K., Kollman, P.A.: Amber: Assisted model building with energy refinement. A general
program for modeling molecules and their interactions. J. Comp. Chem. 2(3), 287–303 (1981).
https://doi.org/10.1002/jcc.540020311
4. Case, D.A., Cheatham, T.E., Darden, T., Gohlke, H., Luo, R., Merz, K.M., Onufriev, A., Sim-
merling, C., Wang, B., Woods, R.J.: The AMBER biomolecular simulation programs. J. Com-
put. Chem. 26(16), 1668–1688 (2005). https://doi.org/10.1002/jcc.20290
5. Scheraga, H.A.: My 65 years in protein chemistry. Quart. Rev. Biophys. 48(2), 117–177 (2015).
https://doi.org/10.1017/S0033583514000134
6. Schlick, T.: The 2013 nobel prize in chemistry celebrates computations in chemistry and
biology. SIAM News 46(10) (2013). https://www.biomath.nyu.edu/index/papdir/fullengths/
Nobel13.pdf
7. Guskov, A., Kern, J., Gabdulkhakov, A., Broser, M., Zouni, A., Saenger, W.: Cyanobacterial
photosystem II at 2.9 Å resolution and the role of quinones, lipids, channels and chloride. Nat.
Struct. Mol. Biol. 16, 334 (2009). https://doi.org/10.1038/NSMB.1559
8. Humphrey, W., Dalke, A., Schulten, K.: VMD visual molecular dynamics. J. Molec. Graph-
ics 14(1), 33–38 (1996). https://doi.org/10.1016/0263-7855(96)00018-5, https://www.ks.uiuc.
edu/Research/vmd
9. Bertini, I., Gray, H.B., Stiefel, E.I., Valentine, J.S. (eds.): Biological Inorganic Chemistry:
Structure and Reactivity. University Science Books, Sausalito, CA (2007)
10. Morante, S., Rossi, G.C.: A novel proof of the DFT formula for the interatomic force field of
molecular dynamics. Ann. Phys. 377(Supplement C), 71–76 (2017). https://doi.org/10.1016/
j.aop.2016.12.011
11. Bryant, R.G., Johnson, M.A., Rossky, P.J.: Water. Acc. Chem. Res. 45(1), 1–2 (2012). https://
doi.org/10.1021/ar2003286
12. Del Rosso, L., Celli, M., Ulivi, L.: New porous water ice metastable at atmospheric pressure
obtained by emptying a hydrogen-filled ice. Nat. Commun. 7, 13394 (2016). https://doi.org/
10.1038/ncomms13394
13. Bartels-Rausch, T., Bergeron, V., Cartwright, J.H.E., Escribano, R., Finney, J.L., Grothe, H.,
Gutiérrez, P.J., Haapala, J., Kuhs, W.F., Pettersson, J.B.C., Price, S.D., Sainz-Díaz, C.I., Stokes,
D.J., Strazzulla, G., Thomson, E.S., Trinks, H., Uras-Aytemiz, N.: Ice structures, patterns, and
processes: A view across the icefields. Rev. Mod. Phys. 84(2), 885–944 (2012). https://doi.org/
10.1103/RevModPhys.84.885
14. Allen, M.P., Tildesley, D.J.: Computer Simulation of Liquids. Clarendon Press, Oxford, UK
(1989)
15. Mazza, M.G., Stokely, K., Pagnotta, S.E., Bruni, F., Stanley, H.E., Franzese, G.: More than one
dynamic crossover in protein hydration water. Proc. Nat. Acad. Sci. U.S.A. 108(50), 19873–
19878 (2011). https://doi.org/10.1073/pnas.1104299108
16. Ball, P.: H2 O: A Biography. Weidenfeld & Nicolson, London (1999)
17. Ben-Naim, A.: Molecular Theory of Water and Aqueous Solutions—Part I: Understanding
Water. World Scientific, Singapore (2009). https://doi.org/10.1142/7136
18. Lamoureux, G., Roux, B.: Modeling induced polarization with classical drude oscillators:
Theory and molecular dynamics simulation algorithm. J. Chem. Phys. 119(6), 3025–3039
(2003). https://doi.org/10.1063/1.1589749
750 G. La Penna and O. Andreussi
19. Jiang, W., Hardy, D., Phillips, J., MacKerell, A., Schulten, K., Roux, B.: High-performance
scalable molecular dynamics simulations of a polarizable force field based on classical drude
oscillators in NAMD. J. Phys. Chem. Lett. 2, 87–92 (2011). https://doi.org/10.1021/jz101461d
20. Ponder, J.W., Wu, C., Ren, P., Pande, V.S., Chodera, J.D., Schnieders, M.J., Haque, I., Mobley,
D.L., Lambrecht, D.S., Di Stasio, R.A., Head-Gordon, M., Clark, G.N.I., Johnson, M.E., Head-
Gordon, T.: Current status of the Amoeba polarizable force field. J. Phys. Chem. B 114(8),
2549–2564 (2010). https://doi.org/10.1021/jp910674d
21. Senftle, T.P., Hong, M.M., Sungwook Islam, S.B., Kylasa, Y., Zheng, Y.K., Shin, C., Junker-
meier, R., Engel-Herbert, M.J., Janik, H.M., Aktulga, T., Verstraelen, A., Grama, A., van Duin,
A.C.T.: The Reax-ff reactive force-field: Development, applications and future directions. Npj
Comput. Mater. 2, 15011 (2016). https://doi.org/10.1038/npjcompumats.2015.11
22. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., Klein, M.J.: Comparison of
simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
https://doi.org/10.1063/1.445869
23. Parr, R.G., Yang, W.: Density Functional Theory of Atoms and Molecules. Oxford University
Press, New York (1989)
24. Landau, L., Lifchitz, E.: Physique Statistique. MIR, Moscow, URSS (1984)
25. Mennucci, B., Cammi, R. (eds.): Continuum Solvation Models in Chemical Physics: From
Theory to Applications. Wiley, Hoboken (2008). https://doi.org/10.1002/9780470515235
26. Tomasi, J., Mennucci, B., Cammi, R.: Quantum mechanical continuum solvation models.
Chem. Rev. 105, 2999–3093 (2005). https://doi.org/10.1021/cr9904009
27. Klamt, A., Mennucci, B., Tomasi, J., Barone, V., Curutchet, C., Orozco, M., Luque, F.J.: On the
performance of continuum solvation methods. a comment on “universal approaches to solvation
modeling”. Acc. Chem. Res. 42(4), 489–492 (2009). https://doi.org/10.1021/ar800187p
28. Cramer, C.J., Truhlar, D.G.: Reply to comment on "a universal approach to solvation modeling".
Acc. Chem. Res. 42(4), 493–497 (2009). https://doi.org/10.1021/ar900004j
29. Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R.,
Scalmani, G., Barone, V., Mennucci, B., Petersson, G.A., Nakatsuji, H., Caricato, M., Li, X.,
Hratchian, H.P., Izmaylov, A.F., Bloino, J., Zheng, G., Sonnenberg, J.L., Hada, M., Ehara, M.,
Toyota, K., Fukuda, R., Hasegawa, J., Ishida, M., Nakajima, T., Honda, Y., Kitao, O., Nakai, H.,
Vreven, T., Montgomery Jr., J.A., Peralta, J.E., Ogliaro, F., Bearpark, M., Heyd, J.J., Brothers,
E., Kudin, K.N., Staroverov, V.N., Keith, T., Kobayashi, R., Normand, J., Raghavachari, K.,
Rendell, A., Burant, J.C., Iyengar, S.S., Tomasi, J., Cossi, M., Rega, N., Millam, J.M., Klene,
M., Knox, J.E., Cross, J.B., Bakken, V., Adamo, C., Jaramillo, J., Gomperts, R., Stratmann, R.E.,
Yazyev, O., Austin, A.J., Cammi, R., Pomelli, C., Ochterski, J.W., Martin, R.L., Morokuma,
K., Zakrzewski, V.G., Voth, G.A., Salvador, P., Dannenberg, J.J., Dapprich, S., Daniels, A.D.,
Farkas, O., Foresman, J.B., Ortiz, J.V., Cioslowski, J., Fox, D.J.: Gaussian 09, Revision C.01.
Gaussian Inc., Wallingford, CT, USA (2010)
30. Muller, N.: Search for a realistic view of hydrophobic effects. Acc. Chem. Res. 23(1), 23–28
(1990). https://doi.org/10.1021/ar00169a005
31. Ben-Naim, A.: Molecular Theory of Water and Aqueous Solutions Part II: The Role of Water
in Protein Folding, Self-assembly and Molecular Recognition. World Scientific, Singapore
(2011). https://doi.org/10.1142/8154
32. Senn, H.M., Thiel, W.: QM/MM methods for biomolecular systems. Angew. Chem. Int. Ed.
48, 1198–1229 (2009). https://doi.org/10.1002/anie.200802019
33. Barone, V., Improta, R., Rega, N.: Quantum mechanical computations and spectroscopy: From
small rigid molecules in the gas phase to large flexible molecules in solution. Acc. Chem. Res.
41(5), 605–616 (2008). https://doi.org/10.1021/ar7002144
34. Marx, D., Hutter, J.: Ab Initio Molecular Dynamics: Basic Theory and Advanced Methods.
Cambridge University Press, Cambridge (2009)
35. Pastore, G., Smargiassi, E., Buda, F.: Theory of ab initio molecular dynamics calculations.
Phys. Rev. A 44, 6334–6347 (1991). https://doi.org/10.1103/PhysRevA.44.6334
36. Perdew, J.P., Burke, K., Ernzerhof, M.: Generalized gradient approximation made simple. Phys.
Rev. Lett. 77, 3865–3868 (1996). https://doi.org/10.1103/PhysRevLett.77.3865
When Water Plays an Active Role in Electronic Structure … 751
37. Becke, A.D.: Density-functional thermochemistry. iii. the role of exact exchange. J. Chem.
Phys. 98, 5648–5652 (1993). https://doi.org/10.1063/1.464913
38. Perdew, J.P., Ruzsinszky, A., Csonka, G.I., Vydrov, O.A., Scuseria, G.E., Constantin, L.A.,
Zhou, X., Burke, K.: Restoring the density-gradient expansion for exchange in solids and
surfaces. Phys. Rev. Lett. 100(13), 136406 (2008). https://doi.org/10.1103/PhysRevLett.100.
136406
39. Schwegler, E., Grossman, J., Gygi, F., Galli, G.: Towards an assessment of the accuracy of
density functional theory for first-principles simulations of water II. J. Chem. Phys. 121, 5400
(2004). https://doi.org/10.1063/1.1782074
40. Schwegler, E., Sharma, M., Gygi, F., Galli, G.: Melting of ice under pressure. Proc. Nat. Acad.
Sci. U.S.A. 105(39), 14779–14783 (2008). https://doi.org/10.1073/pnas.0808137105
41. Lazić, P., Atodiresei, N., Alaei, M., Caciuc, V., Blügel, S., Brako, R.: Junolo - Jülich nonlocal
code for parallel post-processing evaluation of VdW-DF correlation energy. Comput. Phys.
Commun. 181(2), 371–379 (2010). https://doi.org/10.1016/j.cpc.2009.09.016
42. Kulik, H.J., Cococcioni, M., Scherlis, D.A., Marzari, N.: Density functional theory in transition-
metal chemistry: A self-consistent hubbard U approach. Phys. Rev. Lett. 97(10), 103001 (2006).
https://doi.org/10.1103/PhysRevLett.97.103001
43. Car, R., Parrinello, M.: Unified approach for molecular dynamics and density-functional theory.
Phys. Rev. Lett. 55, 2471–2474 (1985). https://doi.org/10.1103/PhysRevLett.55.2471
44. Wolf, D., Keblinski, P., Phillpot, S.R., Eggebrecht, J.: Exact method for the simulation of
coulombic systems spherically truncated, pairwise r-1 summation. J. Chem. Phys. 110, 8254–
8282 (1999). https://doi.org/10.1063/1.478738
45. Vanderbilt, D.: Soft self-consistent pseudopotentials in a generalized eigenvalue formalism.
Phys. Rev. B 41, 7892–7895 (1990). https://doi.org/10.1103/PhysRevB.41.7892
46. Giannozzi, P., De Angelis, F., Car, R.: First-princple molecular dynamics with ultrasoft pseu-
dopotentials: Parallel implementation and application to extended bioinorganic systems. J.
Chem. Phys. 120, 5903–5915 (2004). https://doi.org/10.1063/1.1652017
47. Fattebert, J.L., Gygi, F.: Density functional theory for efficient ab initio molecular dynamics
simulations in solution. J. Comput. Chem. 23(6), 662–666 (2002). https://doi.org/10.1002/jcc.
10069
48. Fattebert, J.L., Gygi, F.: First-principles molecular dynamics simulations in a continuum sol-
vent. Int. J. Quantum Chem. 93(2), 139–147 (2003). https://doi.org/10.1002/qua.10548
49. Petrosyan, S.A., Rigos, A.A., Arias, T.A.: Joint density-functional theory: Ab initio study of
Cr2O3 surface chemistry in solution. J. Phys. Chem. B 109(32), 15436–15444 (2005). https://
doi.org/10.1021/jp044822k
50. Scherlis, D.A., Fattebert, J.L., Gygi, F., Cococcioni, M., Marzari, N.: A unified electrostatic and
cavitation model for first-principles molecular dynamics in solution. J. Chem. Phys. 124(7),
74103 (2006). https://doi.org/10.1063/1.2168456
51. Dabo, I., Cancès, E., Li, Y., Marzari, N.: Towards first-principles electrochemistry. arXiv
preprint arXiv:0901.0096 (2008)
52. Sanchez, V.M., Sued, M., Scherlis, D.A.: First-principles molecular dynamics simulations at
solid-liquid interfaces with a continuum solvent. J. Chem. Phys. 131(17), 174108 (2009).
https://doi.org/10.1063/1.3254385
53. Dziedzic, J., Helal, H.H., Skylaris, C.K., Mostofi, A.A., Payne, M.C.: Minimal parameter
implicit solvent model for ab initio electronic-structure calculations. Europhys. Lett. 95(4),
43001 (2011). https://doi.org/10.1209/0295-5075/95/43001
54. Andreussi, O., Dabo, I., Marzari, N.: Revised self-consistent continuum solvation in electronic-
structure calculations. J. Chem. Phys. 136(6), 064102 (2012). https://doi.org/10.1063/1.
3676407
55. Cococcioni, M., Mauri, F., Ceder, G., Marzari, N.: Electronic-enthalpy functional for finite
systems under pressure. Phys. Rev. Lett. 94(14), 145501 (2005). https://doi.org/10.1103/
PhysRevLett.94.145501
56. Dupont, C., Andreussi, O., Marzari, N.: Self-consistent continuum solvation (sccs): The case of
charged systems. J. Chem. Phys. 139(21), 214110 (2013). https://doi.org/10.1063/1.4832475
752 G. La Penna and O. Andreussi
57. Andreussi, O., Marzari, N.: Electrostatics of solvated systems in periodic boundary conditions.
Phys. Rev. B 90(24), 245101 (2014). https://doi.org/10.1103/PhysRevB.90.245101
58. Timrov, I., Andreussi, O., Biancardi, A., Marzari, N., Baroni, S.: Self-consistent continuum
solvation for optical absorption of complex molecular systems in solution. J. Chem. Phys.
142(3), 034111 (2015). https://doi.org/10.1063/1.4905604
59. Fisicaro, G., Genovese, L., Andreussi, O., Mandal, S., Nair, N., Marzari, N., Goedecker, S.:
Soft-sphere continuum solvation in electronic-structure calculations. J. Chem. Theory Comput.
13(8), 3829 (2017). https://doi.org/10.1021/acs.jctc.7b00375
60. Letchworth-Weaver, K., Arias, T.A.: Joint density functional theory of the electrode-electrolyte
interface: Application to fixed electrode potentials, interfacial capacitances, and potentials of
zero charge. Phys. Rev. B 86(7), 075140 (2012). https://doi.org/10.1103/PhysRevB.86.075140
61. Fortunelli, A., Goddard, W.A., Sha, Y., Yu, T.H., Sementa, L., Barcaro, G., Andreussi, O.:
Dramatic increase in the oxygen reduction reaction for platinum cathodes from tuning the
solvent dielectric constant. Angewandte Chem. Int. Ed. 53(26), 6669–6672 (2014). https://doi.
org/10.1002/anie.201403264
62. Hamada, I., Sugino, O., Bonnet, N., Otani, M.: Improved modeling of electrified interfaces
using the effective screening medium method. Phys. Rev. B 88(15), 155427 (2013). https://
doi.org/10.1103/PhysRevB.88.155427
63. Montemore, M.M., Andreussi, O., Medlin, J.W.: Hydrocarbon adsorption in an aqueous envi-
ronment: A computational study of alkyls on Cu(111). J. Chem. Phys. 145(7), 074702 (2016).
https://doi.org/10.1063/1.4961027
64. Sementa, L., Andreussi, O., Goddard III, W.A., Fortunelli, A.: Catalytic activity of Pt3 8 in
the oxygen reduction reaction from first-principles simulations. Catal. Sci. Technol. 6(18),
6901–6909 (2016). https://doi.org/10.1039/C6CY00750C
65. Onsager, L.: Electric moments of molecules in liquids. J. Am. Chem. Soc. 58(8), 1486–1493
(1936). https://doi.org/10.1021/ja01299a050
66. Knight, C., Voth, G.A.: The curious case of the hydrated proton. Acc. Chem. Res. 45(1),
101–109 (2012). https://doi.org/10.1021/ar200140h
67. Nosé, S.: A molecular dynamics method for simulations in the canonical ensemble. Molec.
Phys. 52, 255–268 (1984). https://doi.org/10.1080/00268978400101201
68. Frenkel, D., Smit, B.: Understanding Molecular Simulation. Academic Press, San Diego (1996)
69. Wales, D.J.: Energy Landscapes. Cambridge University Press, Cambridge, UK (2003)
70. Laio, A., Gervasio, F.L.: Metadynamics: A method to simulate rare events and reconstruct the
free energy in biophysics, chemistry and material science. Rep. Prog. Phys. 71(126), 601–622
(2008). https://doi.org/10.1088/0034-4885/71/12/126601
71. Łuczkowski, M., Kozłowski, H., Stawikowski, M., Rolka, K., Gaggelli, E., Valensin, D.,
Valensin, G.: Is the monomeric prion octapeptide repeat PHGGWGQq a specific ligand for
Cu2+ ions? J. Chem. Soc., Dalton Trans. 2002, 2269–2274 (2002). https://doi.org/10.1039/
B201040M
72. Burns, C.S., Aronoff-Spencer, E., Dunham, C.M., Lario, P., Avdievich, N.I., Antholine, W.E.,
Olmstead, M.M., Vrielink, A., Gerfen, G.J., Peisach, J., Scott, W.G., Millhauser, G.L.: Molec-
ular features of the copper binding sites in the octarepeat domain of the prion protein. Bio-
chemistry 41, 3991–4001 (2002)
73. Miura, T., Suzuki, K., Kohata, N., Takeuchi, H.: Metal binding modes of Alzheimer’s amyloid
β-peptide in insoluble aggregates and soluble complexes. Biochemistry 39(23), 7024–7031
(2000). https://doi.org/10.1021/bi0002479
74. Furlan, S., La Penna, G., Guerrieri, F., Morante, S., Rossi, G.: Ab initio simulations of Cu
binding sites on the N-terminal region of the prion protein. J. Biol. Inorg. Chem. 12, 571–583
(2007). https://doi.org/10.1007/s00775-007-0218-x
75. Furlan, S., La Penna, G.: Metal ions and protons compete for ligand atoms in disordered
peptides: Examples from computer simulations of copper binding to the prion tandem repeat.
Coord. Chem. Rev. 256, 2234–2244 (2012). https://doi.org/10.1016/j.ccr.2012.03.036
76. Hureau, C., Balland, V., Coppel, Y., Solari, P.L., Fonda, E., Faller, P.: Importance of dynamical
processes in the coordination chemistry and redox conversion of copper amyloid-β complexes.
J. Biol. Inorg. Chem. 14, 995–1000 (2009). https://doi.org/10.1007/s00775-009-0570-0
When Water Plays an Active Role in Electronic Structure … 753
77. Furlan, S., La Penna, G., Perico, A.: Modeling the free energy of polypeptides in different
environments. Macromolecules 41, 2938–2948 (2008). https://doi.org/10.1021/ma7022155
78. Miller, Y., Ma, B., Nussinov, R.: Zinc ions promote alzheimer aβ aggregation via population
shift of polymorphic states. Proc. Nat. Acad. Sci. U.S.A. 107(21), 9490–9495 (2010). https://
doi.org/10.1073/pnas.0913114107
79. Furlan, S., Hureau, C., Faller, P., La Penna, G.: Modeling the Cu+ binding in the 1–16 region
of the amyloid-β peptide involved in alzheimer’s disease. J. Phys. Chem. B 114, 15119–15133
(2010). https://doi.org/10.1021/jp102928h
80. La Penna, G., Hureau, C., Andreussi, O., Faller, P.: Identifying, by first-principles simulations,
Cu[amyloid-β] species making Fenton-type reactions in Alzheimers disease. J. Phys. Chem.
B 117, 16455–16467 (2013). https://doi.org/10.1021/jp410046w
81. La Penna, G., Hureau, C., Faller, P.: A cu-amyloid β complex activating Fenton chemistry
in Alzheimer’s disease: Learning with multiple first-principles simulations. AIP Conf. Proc.
1618(1), 112–114 (2014). https://doi.org/10.1063/1.4897690
82. Kanan, M.W., Nocera, D.G.: In situ formation of an oxygen-evolving catalyst in neutral water
containing phosphate and Co2+ . Science 321(5892), 1072–1075 (2008). https://doi.org/10.
1126/science.1162018
83. Mattioli, G., Giannozzi, P., Amore Bonapasta, A., Guidoni, L.: Reaction pathways for oxygen
evolution promoted by cobalt catalyst. J. Am. Chem. Soc. 135(41), 15353–15363 (2013).
https://doi.org/10.1021/ja401797v
84. Parsico, M., Granucci, G.: Continuum Solvation Models in Chemical Physics: From Theory
to Applications. In: Wiley, H. (ed.), Chapter Photochemistry in condensed phase. https://doi.
org/10.1002/9780470515235
85. Mosca Conte, A., Violante, C., Missori, M., Bechstedt, F., Teodonio, L., Ippoliti, E., Carloni,
P., Guidoni, L., Pulci, O.: Theoretical optical spectroscopy of complex systems. J. Electron
Spectrosc. Relat. Phenom. 189(S), 46–55 (2013). https://doi.org/10.1016/j.elspec.2013.02.002
86. Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D.,
Chiarotti, G.L., Cococcioni, M., Dabo, I., Dal Corso, A., de Gironcoli, S., Fabris, S., Fratesi,
G., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L.,
Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C.,
Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Paolo, U., Wentzcovitch, R.M.:
Quantum Espresso: A modular and open-source software project for quantum simulations of
materials. J. Phys. Condens. Matter 21, 395502 (2009). https://doi.org/10.1088/0953-8984/
21/39/395502, https://www.quantum-espresso.org
87. Giannozzi, P., Andreussi, O., Brumme, T., Bunau, O., Buongiorno Nardelli, M., Calandra, M.,
Car, R., Cavazzoni, C., Ceresoli, D., Cococcioni, M., Colonna, N., Carnimeo, I., Dal Corso, A.,
de Gironcoli, S., Delugas, P., Di Stasio Jr, R.A., Ferretti, A., Floris, A., Fratesi, G., Fugallo, G.,
Gebauer, R., Gerstmann, U., Giustino, F., Gorni, T., Jia, J., Kawamura, M., Ko, H.Y., Kokalj,
A., Kücükbenli, E., Lazzeri, M., Marsili, M., Marzari, N., Mauri, F., Nguyen, N.L., Nguyen,
H.V., Otero-de-la-Roza, A., Paulatto, L., Poncé, S., Rocca, D., Sabatini, R., Santra, B., Schlipf,
M., Seitsonen, A.P., Smogunov, A., Timrov, I., Thonhauser, T., Umari, P., Vast, N., Wu, X.,
Baroni, S.: Advanced capabilities for materials modelling with Quantum Espresso. J. Phys.
Condens. Matter 29(46), 465901 (2017). https://doi.org/10.1088/1361-648X/aa8f79
Electronic Properties of Iron Sites
and Their Active Forms in
Porphyrin-Type Architectures
1 Introduction
M. Radoń (B)
Academic Computer Center CYFRONET AGH, Nawojki 11, 30-950 Kraków, Poland
Present Adress:
M. Radoń
Faculty of Chemistry, Jagiellonian University in Krakow,
Gronostajowa 2, 30-387 Kraków, Poland
e-mail: [email protected]
E. Broclawik
Jerzy Haber Institute of Catalysis, Polish Academy of Sciences,
Niezapominajek 8, 30-239 Kraków, Poland
e-mail: [email protected]
© Springer Nature Switzerland AG 2019 755
A. Liwo (ed.), Computational Methods to Study the Structure and Dynamics
of Biomolecules and Biomolecular Processes, Springer Series on Bio-
and Neurosystems 8, https://doi.org/10.1007/978-3-319-95843-9_23
756 M. Radoń and E. Broclawik
treating either on QC in general [73, 125, 183] or on DFT [83]. Subsequently, recent
advances in quantum chemical calculations for various heme models are reviewed.
The achievements of correlated ab initio methods in description of these biologically
relevant iron complexes are highlighted, as well as an assessing relation of these
methods to DFT calculations.
2 Methodology
Ĥ Ψ = EΨ (1)
where Ĥ is the electronic Hamiltonian (the energy operator) of the molecular system.
The problem is then to find the energy E and the (many-electron) wave function Ψ
for the ground state and often also for a certain number of electronically excited
states. Equation (1) is obtained from the basic principles of quantum mechanics
within a, so called, Born-Oppenheimer approximation, meaning roughly that one is
interested in obtaining the electronic structure for fixed positions of the nuclei, i.e.,
for a given molecular structure. In such a way, the energy and wave function obtained
from (1) are dependent on nuclear coordinates and one is often interested in finding
the stationary points of the energy as a function of geometry because the minima
represent stable geometries of a given molecular system, while the saddle points give
transition states for possible chemical reactions. Although written in a very simple
way, Eq. (1) pose a tremendously difficult many-body problem, even for atoms and
very small molecules, not to speak about for relatively large systems, like porphyrin-
based complexes considered in this chapter. Many approximations, so called methods
of QC chemistry, have been therefore devised to provide approximate energy and
wave function (from which any molecular property can, in principle, be calculated),
that are accurate enough for the purpose of chemistry and biology, while still being
computationally tractable for large molecular systems.
The mother of all these methods is a mean field approximation to (1), also called
an independent particle model or a Hartree-Fock (HF) theory. In this approximation
the many-electron wave function (Ψ ) is given by Slater determinant (antisymmetrized
product) of one-electron functions—so called molecular (spin)orbitals. The optimal
form of molecular orbitals for a given system can be obtained by applying a varia-
tional principle of quantum mechanics, from which it follows that these variationally
optimal orbitals constitute a solution to, so called, Fock equations, i.e., an eigenvalue
problem of the Fock operator
758 M. Radoń and E. Broclawik
with eigenvalues (εi ) known as orbital energies. The Fock operator is an effective
one-particle energy operator, including not only kinetic energy and the electrostatic
energy of an electron in the field of the nuclei, but also its average interaction with
other electrons in the molecular system. This averaging of electronic repulsion—
a consequence of single-determinantal approximation to the wave function—is a
key feature and (as we shall see below) a serious drawback of this method. The
Fock equations (2) are usually solved by expanding the molecular orbitals in a pre-
defined basis set of atom-centered functions, so called atomic orbitals (AO). This
being done, the Fock equations are reduced to a generalized eigenvalue problem of
a (symmetric) Fock matrix. But, since to generate the Fock matrix one has to know
the occupied molecular orbitals, this eigenvalue problem has to be solved iteratively
until convergence is achieved in a self-consistent field (SCF) procedure.
As mentioned above, the main limitation of the HF approach comes from oversim-
plified structure of the wave function (i.e., single Slater determinant), which reflects
the average treatment of electron-electron repulsion in this method. In other words,
the electrons in the HF approximation behave very much like independent particles.
Oddly enough, they are actually not fully independent: two electrons with the same
spin cannot reach the same positions in space because the wave function would vanish
in such case. This comes because the Slater determinant wave function is (correctly)
antisymmetric with respect to exchange of two electrons. It may be shown that around
the position of a reference electron there is a deficiency in conditional probability of
meeting electrons with the same spin as compared to (unconditional) one-electron
probability; this depletion is known as exchange (or Fermi) hole. Consequently, two
spin-like electrons (↑↑ or ↓↓) show lower repulsion energy than the two electrons
with opposite spins (↑↓) occupying the same pair of molecular orbitals. This effect—
being a simple consequence of the (one-half) electronic spin and the Pauli exclusion
principle—is known as electron exchange or Fermi correlation.
The exchange (Fermi correlation) should be clearly distinguished from the other
type of electron correlation—the Coulomb correlation, herein called simply the elec-
tron correlation —which is due to deficiency of a single determinantal wave function
and thus not included in the HF method. Since the electrons are charged particles, they
should avoid each other (regardless their spins), especially at small distances, where
they repel most strongly. Proper correlation of their positions cannot be described
by a single-determinantal wave function while for a correct (i.e., correlated) wave
function there should appear a deficiency in conditional probability of meeting other
electrons around a position of a reference electron, regardless their spin. The deple-
tion in the conditional probability due to the Coulombic repulsion is defined as the
correlation hole. Therefore, electron correlation (likewise exchange) always reduces
the interelectron repulsion, thus makes a negative contribution to the total energy.
This contribution, called a correlation energy, can be formally defined as the differ-
ence between the exact energy (in non-relativistic approximation) and the HF energy
obtained in the limit of complete one-particle basis set:
Electronic Properties of Iron Sites … 759
(limit)
E corr = E exact − E HF . (3)
Although the correlation energy is usually a small fraction of the total energy, it
comprises an important (at times even a dominant) part of the energy differences rele-
vant in chemistry and biology (bonding energies, atomization energies, reaction ener-
gies and barriers, excitation energies, ionization potentials, etc). In fact, exchange and
correlation energy is sometimes even called a “nature’s glue that keeps atoms bound in
molecules” [121]. Exchange and correlation energy must be thus properly included
in all quantum chemical studies attempting to provide a quantitative description.
Nonetheless, while already a simple HF theory can correctly capture the exchange, it
is much more difficult to account for electron correlation. This can be accomplished
either by using a correlated wave function in ab initio quantum chemistry (discussed
in Sect. 2.2) or by employing an approximate exchange-correlation functional in den-
sity functional theory (DFT, discussed in Sect. 2.3). The third approach to electron
correlation, not discussed in this chapter, is quantum Monte Carlo (QMC) [15]. The
philosophy of QMC methods comes down to improving the description of quantum
systems by performing a stochastic search. At the moment, QMC methods are not
as mature as DFT and ab initio ones, and in bioinorganic area far less popular than
these approaches. Nevertheless, QMC approach is being considered promising for
transition metal systems [16, 84] and some more advanced QMC simulations for
systems of biological importance may be expected in near future.
Nonrelativistic approximation was assumed so far, which may be partly justified
by the fact that for the properties of first-row transition metals of interest in this
chapter, relativistic effects are often far less important than uncertainities due to
approximate description of electron correlation. However, these effects become more
important for heavier elements (e.g., Mo and W sites in some enzymes) and, for the
sake of consistency, they should be preferably included also for the first-row transition
metals [159].
Although very rigorous, four-components relativistic methods are available, most
practical calculations mentioned in this chapter, rely on simpler, two-component
approximation, e.g. methods based on Douglas-Kroll-Hess (DKH) transforma-
tion [10], where the relativistic effects naturally fall into two cathegories: scalar
effects and spin-orbit coupling. It is a rule of thumb that scalar effects are far more
important for calculations of potential energy surfaces with up to chemical accuracy
than spin-orbit coupling, but the latter becomes crucial for calculations of certain
molecular properties (EPR parameters, magnetic moments, etc.). Instead of being
treated explicitly, scalar relativistic effects can be alternatively described by means
of effective core potentials (ECPs) [17]. The methods of dealing with electron cor-
relation discussed in the next two sections are relevant for both relativistic and non-
relativistic calculations.
760 M. Radoń and E. Broclawik
In order to account for electron correlation, ab initio (Latin: first principles) methods
employ a wave function being a linear combination of many electronic configurations
(determinants), not just a single configuration (i.e., one Slater determinant) in the
HF theory. The main advantage of this approach is the accuracy and the systematics,
since methods of better and better accuracy can be used as series of well-defined
approximations that (ultimately) converge to the exact solution of the electronic
Schrödinger Eq. (1). However, when using correlated wave function, high accuracy
can be obtained only at the expense of great computational cost (both in a sense
of computational time and required resources, such as memory and disk space).
Moreover, this cost increases very rapidly with the size of the problem. For this
reason it is often simply not feasible to perform ab initio calculations at the desired
level of theory for many chemically and biologically interesting systems.
If the wave function is dominated by a single electron configuration, the HF
orbitals provide a good starting point for constructing a correlated wave function in
post Hartree-Fock methods, also known as single-reference methods. In this approach
the correlated wave function is obtained by supplementing the HF configuration with
other electronic configurations generated from the HF orbitals by virtual excitations
of a certain number of electrons out of the occupied into the unoccupied orbitals
(single excitations i → a, double excitations i, j → a, b, etc)1 :
occ.
virt.
occ.
virt.
occ.
virt.
Ψ = Ψ0 + Cia Ψia + Ciabj Ψiab
j + jk Ψi jk + . . . . (4)
Ciabc abc
1 One should be aware that these formally excited configurations merely serve to describe electron
correlation and have here no direct connotation to electronically excited states.
Electronic Properties of Iron Sites … 761
correlation is typically connected with this short-range effect, it also gives rise to
long range intermolecular dispersion forces, rooted in a sense in correlation effects.
A completely opposite picture arises in a situation where several electronic con-
figurations contribute to the wave function with comparable weights. In such a case
the single determinantal approximation may become qualitatively wrong. This type
of electron correlation, which is due to near-degeneracy of several electronic config-
urations, has been coined static or nondynamical correlation. In contrast to dynam-
ical correlation, the static correlation has large effect on molecular orbitals, which
should be thus preferably optimized not for a single (HF) configuration, but rather
for a multiconfigurational wave function. This is the underlying idea of multiconfig-
urational/multireference methods (discussed below), which are therefore by design
best suited to treat cases with strong nondynamical correlation.
In many real cases it is clearly not possible to unambiguously classify the electron
correlation effects as “purely dynamical” or “evidently nondynamical”. Dynami-
cal correlation is an universal phenomenon, occurring in all multi-electron systems
(both atoms and molecules). In contrast, nondynamical correlation is system-specific,
which means it occurs actually in selected systems (or only at certain geometries).
Certainly, this type of correlation appears along with the ubiquitous dynamical cor-
relation. For rare gas atoms and for closed-shell, well-behaving molecules near their
equilibrium geometry, the electron correlation is almost purely dynamical. In con-
trast, description of transition metal complexes often suffers from effects which are
described, in a broad sense, as “nondynamical correlation”. This means, in practice,
that low-order post-HF calculations, especially CI or perturbational Møller-Plesset
methods, may fail to provide a good description there. Multiconfigurational methods
clearly provide a better and more consistent approach to treat these effect. On the
other hand, higher-order post-HF methods can be still very useful in many cases.
For instance, several accurate ab initio benchmarks discussed further in this chapter
were obtained with a CCSD(T) method—i.e., a coupled cluster (CC) approach with
full (iterative) treatment of single and double excitations and approximate (noniter-
ative) treatment of triple excitations appearing in the cluster operator T̂ , defining the
correlated wave function via the exponential ansatz [9]:
this method cannot be (so far) applied even to the simplest iron porphyrin species,
being applicable only to their small mimics [107, 180].
The most important example of nondynamical correlation occurring in transition
metal species is left-right correlation. This type of correlation is characteristic of any
electronic pair involved in a covalent bond. The electrons paired in a bonding orbital
have a tendency to partially separate into spatially different regions in order to reduce
their interelectron repulsion. If one electron from the pair is closer to the nucleus
on the left-hand-side of the bond, the second electron is more likely found near the
nucleus on the right-hand-side. The motions of electrons are thus partially correlated
along a chemical bond, which can give rise to a long-range correlation effect when
the bond is stretched (and ultimately dissociated). The left-right correlation thus
becomes particularly critical in the dissociation limit where it warrants a physically
correct form of the wave function.
The role of this correlation can be intuitively explained on an example of hydrogen
molecule (H–H), which by homolytic breaking the σ bond, dissociates into two
neutral hydrogen atoms (H·+H·). However, if the dissociation is modeled with (spin
restricted) HF theory, the dissociation leads to a wave function which can be read as
equal mixture (superposition) of both neutral (H·+H·) and ionized (H+ + H− , H− +
H+ ) products; the latter two structures should not be present in the dissociation limit
and their appearance gives rise to erroneously too high energy. The qualitatively
correct description can be provided by a multiconfigurational wave function of the
form:
ΨH2 = C1 (σ )2 (σ ∗ )0 + C2 (σ )0 (σ ∗ )2 , (5)
which involves not only the configuration with two electrons paired in the bonding
orbital (σ ), which is the one used in HF, but also the second configuration with the
two electrons occupying the antibonding orbital (σ ∗ ). If the two coefficients C1 , C2
in (5) are regarded as variational parameters, near the equilibrium geometry one
would find C1 ≈ 1 and C2 ≈ 0 (then the HF description is qualitatively correct),
while in the dissociation limit C1 = −C2 = 2−1/2 (then both configurations play an
important role and HF description is qualitatively wrong). Neglect of the second con-
figuration in the wave function (i.e., neglect of left-right correlation) leads, within
the (spin restricted) HF method, to a very large error of 150 kcal/mol in the dis-
sociation energy of H2 . Although in this simple (textbook) example, nondynamical
correlation becomes important only near the dissociation limit, in case of transition
metal complexes the analogous correlation effects may be pronounced already at the
equilibrium geometry (vide infra).
As may be seen already from the simple example above, a multiconfigurational
wave function can be constructed as a linear combination of all electronic configu-
rations arising from a given set of active molecular orbitals. The configurations are
obtained by distributing the available active electrons in all possible ways among the
active orbitals. One can then simultaneously optimize the shapes of the molecular
orbitals and the coefficients of the various electronic configurations. This idea is real-
ized in a complete active space method (CASSCF) by Roos [146]. Orbitals which
are not important in description of static correlation are not put in the active space, so
Electronic Properties of Iron Sites … 763
they are either doubly occupied (inactive orbitals) or virtual (secondary orbitals) in
all the configurations considered. Since a number of electronic configurations grows
rapidly with a number of active orbitals, no more than 16 orbitals can be active in
practice. A modification of the CASSCF method was later proposed as restricted
active space (RASSCF) method [93]. Herein, an active space is divided in three
subspaces: RAS1, RAS2, and RAS3. The middle one, RAS2, plays exactly the same
role as the complete active space in CASSCF calculations. The orbitals in RAS1 are
mostly doubly occupied and only a limited number of electrons (often two or four)
can be excited out of this set into RAS2 and RAS3. Likewise, the orbitals in RAS3
are almost virtual and can be occupied with at most a limited number of electrons
from RAS1 and RAS2. These restrictions serve to eliminate by hand a multitude
of high-energy, less important configurations (which, anyway, would obtain close
to zero coefficients if they were formally included in the CASSCF wave function),
thus keeping the problem computationally tractable even for relatively large active
spaces.
The CASSCF or RASSCF calculations (with suitably chosen active orbitals)
serve to capture nondynamical correlation, but they cover only a very small part
of a dynamical correlation. Therefore, the missing correlation effects are included
in subsequent calculations, most typically with multireference second-order pertur-
bation theory (MRPT2). For this purpose, the Lund group developed a CASPT2
method [5] to include dynamical correlation on top of a CASSCF wave function;
this approach was recently generalized into a RASPT2 method [93] operating with
a RASSCF-type wave function. A different MRPT2 approach was proposed by Hirao
et al. under the name multi-reference Møller-Plesset (MRMP2) method [64]. This
method differs from CASPT2 in a different choice of zero-order Hamiltonian and
the way the first-order correction to wave function is constructed. However, con-
sidering the heme- and heme-related complexes there exists much more experience
with CASPT2 than with the MRMP2 approach [24]. An alternative to these MRPT2
methods is multireference configuration interaction (MRCI) approach. Due to their
very significant cost, MRCI calculations are in practice limited to MRCISD (with
single- and double-excitations only), and even this method is too expensive to be
applied for heme models. For this purpose the simplified, difference dedicated CI
(DDCI) methods can be only applied, which rely on neglect of those excitations that
are not expected to affect the energy difference between the considered electronic
states considerably [96, 97, 100].
It is vital to notice that while CASSCF/RASSCF calculations already provide
a qualitatively correct description of the electronic structure at the correlated level
(the natural orbitals and leading configurations in the CI expansion), the ener-
getics cannot be trusted before it is corrected for missing dynamical correlation
effects (most typically by means of CASPT2/RASPT2). Before this step is done, the
CASSCF/RASSCF energetics is usually meaningless.
Concerning correlated ab initio calculations (both with post-HF and multirefer-
ence methods) it must be remembered that a large basis set of atomic orbitals is
usually required in order to obtain meaningful results. This is due to slow conver-
gence of correlation energy with respect to one-particle basis set [62]. Typically, as
764 M. Radoń and E. Broclawik
large basis set as polarized quadruple-ζ has to be put on metal and triple-ζ one on the
ligands (particularly in the first coordination sphere). Additionally, the energies are
often extrapolated to infinite basis set based on results obtained with two (or more)
basis sets of different quality [62]. Special, systematically-convergent basis sets were
devised for a balanced description of correlation energy in ab initio calculations, such
as the Dunning-type correlation consistent basis sets (cc-pVnZ, n = D, T, Q, etc) [8,
36] or atomic natural orbitals (ANO) basis sets developed by the Lund group [130,
148].
An important bottleneck in ab initio calculations with large basis sets was for
a long time situated in evaluation and processing of two-electron repulsion integrals,
whose number grows with the fourth power of the basis set size. Fortunately, this
problem has been largely mitigated by development of techniques, such as Cholesky
decomposition (CD) [7] and resolution of identity (RI) [193], that avoid explicit use
of the two-electron integrals. The CD and RI approaches, by allowing for the use
of much larger basis sets than was previously possible, have already opened a new
route in ab initio calculations for transition metal complexes. The CD approach was
used in most of the CASSCF/CASPT2 calculations mentioned in this chapter.
We anticipate that even more substantial improvements may be expected in the
near future due to developments in local approaches to electron correlation [57, 158]
(allowing to efficiently treat the electron correlation for spatially extended systems) as
well as the onset of explicitly correlated approaches, such like CCSD(T)-F12 [1, 82].
Here, the term “explicitly correlated” means that the wave function employed in these
methods depends explicitly on the interelectron distance in contrast to the traditional
approach, in which this dependence is implicit, i.e., achieved solely through mixing
of many electronic configurations as in Eq. (4). Due to their construction, explicitly
correlated methods can describe dynamical correlation very efficiently even with
moderate-sized basis sets [101]. However, the explicitly correlated methods are yet
not widely used in the field of bioinorganic chemistry. Although the CCSD(T)-F12
method has already been applied to simple transition metal systems [69], most of the
calculations for heme- and heme-related systems still employ traditional approaches.
Some barriers stem from (so far) single-reference character of explicitly correlated
approaches (like, e.g., CCSD(T)-F12 variant of CCSD(T)), whereas many interesting
bioinorganic problems in fact require multireference methods.
The accuracy and reliability of multireference calculations rely heavily on the under-
lying active space (in CASSCF or RASSCF calculations) which should capture all
relevant effects of nondynamical correlation. Ideally, all molecular orbitals originat-
ing from the valence shells should be made active, but this is obviously not possible
for chemically interesting systems (except of very small molecules). Actually, it is
also not obligatory since typically just a few of them give rise to nondynamical cor-
relation effects. The orbitals important for description of nondynamical correlation
are not necessarily the frontier orbitals (HOMO, LUMO, HOMO−1, LUMO+1,
Electronic Properties of Iron Sites … 765
etc) obtained from a single-configurational treatment. In fact, what matters its the
character of orbitals, not their orbital energy. This points to the question of (some-
what arbitrary) active space selection. With growing computational experience, some
rules were provided for choosing the appropriate active space in transition metal
species [126, 127, 147].
A general principle is to make active all molecular orbitals with significant metal
nd character. Thus, for any covalent metal–ligand bond this rule prompts to make
active the bonding and the antibonding molecular orbital describing the bond. This is
because an important left-right correlation effect is typically connected with covalent
metal–ligand bonding. As a rule of thumb, the more pronounced is the covalent char-
acter (i.e., the larger is the mixing of metal nd with the corresponding ligand orbital),
the more important the nondynamical correlation effect should be expected [127].
Figure 1 shows contour plots of the (bonding and antibonding) molecular orbitals
involved in typical covalent metal–ligand interactions relevant for iron porphyrin
systems considered in this work. The contour plots shown refer to natural orbitals
(eigenvectors of one-particle density matrix) obtained from CASSCF calculations on
the respective species. Panels (a) and (b) show the orbitals of a σ and π components
of the Fe=O bond in iron-oxo porphyrin (one of the models discussed in Sect. 3.3).
The σ bond originates from an end-on interaction between Fe 3dz 2 and O 2pz orbitals;
the π bond originates from a side-on interaction between Fe 3dx z and O 2px orbitals.
Panel (c) shows the bonding and the antibonding orbital involved in a tetradentate
σ Fe−N bonding between the Fe atom and the N atoms of the porphyrin ring. This
type of bonding is found in all metal porphyrins as well as in complexes with other
similar macrocyclic ligands (e.g., salen, corrole, corrin, corrolazine). Even though
more complicated than the σ bonding in shown in (a), the tetradentate bonding in (c)
is classified as σ too, since it arises from an end-on overlap of Fe 3dx y with the ligand
orbital of the respective symmetry (being localized mostly on the four N atoms of
the porphyrin ring).
Occurrence of left-right correlation induces that actual occupation number of the
formally doubly occupied bonding orbital is lower than two; in a similar way, if
Fig. 1 Pairs of bonding and antibonding orbitals involved in description of covalent metal–ligand
bonds: a the σ component of an Fe−O bonding in the ferryl group; b one of two π components
of the Fe–O bond; c tetradentate σ bond between Fe and the porphyrin nitrogens. The figure gives
contour plots of natural orbitals obtained from CASSCF calculations
766 M. Radoń and E. Broclawik
the antibonding orbitals is formally vacant, its actual occupation number is some-
what larger than zero. This happens because the multiconfigurational wave function
contains (among many others) the configurations where the electronic pair has been
transferred from the bonding to the antibonding orbital (like the wave function for H2
molecule in Eq. 5). It must be mentioned that the same term “left-right correlation”
can be used to describe nondynamical correlation effects connected to various bond-
ing situations, including the three different ones shown in Fig. 1. For the tetradentate
σ Fe–Nporphyrin bond shown in (c) there is, clearly, no “left-” nor “right-hand-side”
of the bond, but the term “left-right correlation” is used in a general—not strictly
geometrical, but rather topological sense, indicating merely that there are two ends
of the bond considered: the one closer to the metal (inward) and the one closer to
the ligand (outward). It must be stressed that left-right correlation in metal-ligand
bonds is pronounced already in the equilibrium geometry. The actual bond distance
is often imposed by the structure of the ligand or results from a compromise between
metal-ligand and ligand-ligand interactions achieved in the equilibrium geometry.
In the complexes with porphyrin-like ligands, the distance between the metal and
the coordinating atoms (N atoms in porphyrins) is determined to large extent by the
geometry of the macrocycle. Thus, because of small size of the 3d orbitals for first-
row transition metals, the overlap between the metal and the ligand orbitals is often
far from optimal even for the equilibrium structure. Hence, in the sense of electronic
structure, the metal-ligand bonds may be described as “partially broken” already in
the equilibrium geometry!
Unless a metal–ligand bonding is very covalent, the bonding orbital is dominated
by a contribution from the ligand while the antibonding one by a contribution from
the metal. This situation is illustrated by the pair of orbitals (σx y , σx∗y ) shown in panel
(c) of Fig. 1. In such a case it is customary to denote the antibonding orbital simply
as metal nd and relate it to the respective nd orbital appearing in crystal field theory
(CFT) considerations. For instance, the σx∗y orbital in panel (c) of Fig. 1 could be
(and often is) denoted simply as “Fe 3dx y ” orbital, in agreement with its principal
iron-3d character. However, if the bond is more covalent (i.e., the mixing of metal
and ligand contributions more pronounced, like for orbitals depicted in panel (a)
and (b)), this classification is neither valid nor useful. It is then necessary to think
about both the bonding and the antibonding orbital as containing a significant metal
nd character. This is also a situation found in Fe–O2 and Fe–NO complexes, with
particularly covalent metal–ligand interactions, which are discussed in Sect. 3.2.
The active spaces for transition metal species should also account for a double-
shell effect, i.e., a strong radial correlation in the valence nd shell (an effect espe-
cially important for first-row transition metals due the small radial extent of their 3d
orbitals). This effect can be tackled by including in the active space a (n + 1)d-type
orbital for each occupied (not otherwise correlated) nd orbital [5, 145]. This requires
extending the active space with up to five extra vacant orbitals. For instance, at the
orientation of the porphyrin ring given above, the Fe 3dx 2 −y 2 orbital is essentially
nonbonding (i.e., not involved in covalent metal ligand interactions) and is doubly-
or singly-occupied (at least in low-lying electronic states). One should thus include
in the active space a correlating Fe 4dx 2 −y 2 orbital, which has the same shape as
Electronic Properties of Iron Sites … 767
3dx 2 −y 2 , but a larger radial extent. In contrast, for transition metal complexes it is
usually not necessary to make active the virtual orbitals with the (n + 1)s (n + 1)p
character (i.e., 4s,p for the first-row metals), even though these orbitals can be cru-
cial to properly describe atoms and small molecules in gas phase. Nonetheless, in
coordination compounds these orbitals are strongly destabilized by the ligand field;
as having too high orbital energy, they typically do not need to be included in the
active space.
N
ρ(r) = Ψ |δ(r − ri )|Ψ (6)
i=1
which depends only on the three spatial coordinates (irrespective of N , the number
of electrons) and directly corresponds to a well-defined physical property. In the
pioneering work, Hohenberg and Kohn proved that, in principle, a density-dependent
energy functional E[ρ] can be used to provide the exact energy of any multielectron
system, taking care of all electron correlation effects. If this functional were known
one could perform virtually exact quantum calculations for any molecule of interest
without the need to use (very complicated) correlated wave function. But, obviously,
the precise form of the mysterious energy functional is not known explicitly and all
DFT calculations are necessarily based on its approximations.
In practice, nearly DFT calculations are based on the Kohn-Sham (KS) method,
in which the unknown energy functional is partitioned into four terms:
E[ρ] = Ts [ρ] + d3rρ(r)v(r) + J [ρ] + E xc [ρ]. (7)
occ
occ
Ts [ρ] = φi | − ∇
1 2
2
|φi , where ρ(r) = |φi (r)|2 .
i i
768 M. Radoń and E. Broclawik
A simple integral in the second term of (7) serves to express the nuclei–electron
attraction, while the third term is a Coulomb interaction of electron density with
itself
ρ(r)ρ(r )
J [ρ] = 21 d3r d3r . (8)
|r − r |
where eigenvalues (εiKS ) are KS orbital energies. Equation (9) is very similar to the
analogous one in HF theory (Eq. 2) and both problems are solved using a similar
methodology (i.e., by the SCF procedure after expanding molecular orbitals in a given
AO basis set, vide supra). The difference between the HF and KS theories falls
obviously in the form of effective, one-electron Hamiltonian ( F̂ in HF vs. F̂KS in
KS theory). The KS operator is a sum of a kinetic energy operator and an effective
one-electron potential
d3r ρ(r )
vKS (r) = v(r) + + vxc (r), (10)
|r − r |
consisting of external potential v(r) (due to nuclei), the interaction with electron
density and the third term called an exchange–correlation potential, which is a func-
tional derivative of the E xc functional with respect to the density. This term takes
into account all exchange and correlation effect as much as they are (approximately)
included in E xc [ρ].
It must be stressed that, despite a formal similarity, the Hartree-Fock and Kohn-
Sham theories are physically very different: in HF the exchange is treated exactly
while correlation is entirely neglected; the KS method includes both effects albeit
approximately. However, the formal similarity of KS method to HF implies that
DFT calculations—although by construction covering correlation effects— are still
computationally robust and show a favorable scaling with the system size. Moreover,
within the KS formulation the chemically attractive concept of molecular orbital
survives in DFT. The KS orbitals are self-consistent with F̂KS (covering electron
correlation) thus, being effectively correlated, they are regarded as superior to HF
orbitals.
Electronic Properties of Iron Sites … 769
Such simple functionals, known as local density approximation (LDA) can be mod-
eled by referring to physics of a homogeneous electron gas. More complicated gen-
eralized gradient approximation (GGA) functionals employ the integrand function
depending not only on the density at a given point, but also on the gradient of density,
in order to account for inhomogeneity of electron gas:
GGA
E x,c [ρ] = d3r ex,c
GGA
(ρ(r), |∇ρ(r)|) . (12)
The, so called, meta-GGA functionals take even more complicated form, in which
the integrand function depends not only on ρ and |∇ρ|, but also on the laplacian of
the electronic density (∇ 2 ρ) or the density of kinetic energy.
The integrand for LDA exchange, exLDA , is so simple that it can be deduced from
the first principles (by considering the scaling relations for the exchange energy) and
the integrand for LDA correlation, ecLDA , can be parametrized (nearly exactly) based
on Monte Carlo simulations of a homogeneous electron gas. In contrast, the precise
form of the integrand functions for GGA and meta-GGA functionals is not known
and cannot be derived in a systematic way from the first principles. Nonetheless,
scaling relations and other known properties of exact functionals provide many clues
in this regard, facilitating wisdom creation of physically reasonable approximations
to E x and E c [119]. The approximate functionals created in this way are normally
abbreviated from the names of the authors and the year of publications. For instance,
the B or B88 symbol refers to exchange functional given by Becke in 1988 and P86 to
correlation functional created by Perdew in 1986; a complete exchange-correlation
functional being a combination of both these parts is labeled BP86. The symbol O
(e.g., in OLYP functional) refers to the exchange functional by Handy and Cohen
who called it an “optimized exchange”.
The functionals mentioned so far are all local in a sense that the integrand of
exchange and correlation energy depends on the electronic density, its gradient, and
possibly higher derivatives only at a given point. In contrast, the exact expression for
exchange energy, known from HF theory, cannot be written in such a way. The exact
exchange may be thus regarded a nonlocal functional of the electron density. Mixing
770 M. Radoń and E. Broclawik
Two factors determine the overall quality of a given quantum chemical calculation:
not only the accuracy of the applied computational method, but also the adequacy
of the molecular model used. The choice of reliable (i.e., sufficiently large) model
may be thus very important for accurate description of enzymatic active sites. How-
ever, as mentioned above, the computational cost of correlated ab initio methods
mostly prevents their applications to large systems. While DFT methods can be
nowadays applied to very large models (consisting of up to several hundreds atoms),
CASSCF/CASPT2 calculation can be performed for mononuclear complexes with
up to ∼50 atoms; CCSD(T) calculations are feasible only for twice smaller models
(∼15 to 25 atoms). It means that the CASSCF/CASPT2 method is suitable to study
iron complexes with porphyrin group, but the CCSD(T) method can only be applied
to small mimics of the heme systems—in which the porphyrin is truncated to amidine
or other N-donor chelating ligands (see, e.g., Ref. [107]). Although these models are
772 M. Radoń and E. Broclawik
clearly oversimplified for direct comparison with experiment, they are still useful for
benchmarking DFT or CASPT2 against CCSD(T) [189].
The use of relatively small models (indispensable for performing efficient ab
initio calculations) may be partly justified by the fact that the electronic properties of
transition metal centers in enzymatic active sites rely predominantly on the nearest
neighborhood of the metal and to smaller extent on the distal groups (not denying their
overall importance, see below). Thus, the basic features of the electronic structure
and properties of active sites in heme proteins can be modeled by iron–porphyrin
complexes with the axial ligand(s) reflecting the iron ligation state in the active
site. This choice presents an example of a minimal cluster approach to enzyme
modeling (extendable to larger clusters when necessary) [168]. In the simplest, yet
still valuable models, the porphyrin may be considered without the side substituents,
i.e., as a porphin ligand, abbreviated as P and the missing side chains often do
not extert a direct effect on the electronic structure of heme [74]. However, even if
truncating them is a routine approximation, one should be aware that these groups
may cause a steric hindrance by which protein can indirectly modulate the properties
of heme. Moreover, propionate side chains of protoheme IX are known in some cases
to modulate the electronic structure and actively participate in electron delivery to
the iron center [55]. The axial ligands may be also truncated: e.g., the cysteine ligand
in cytochrome P450 is often modeled as a thiolate SCH− −
3 or even SH ; the histidine
ligand in myoglobin can be modeled as imidazole (Im), substituted imidazole or
even an ammonia molecule (NH3 ) [90, 138]. Many examples of such model heme
complexes appear in Sect. 3. The geometry of the complexes is usually optimized at
DFT level and used in subsequent ab initio calculations. Symmetry is often imposed
on the structures of the model complexes to accelerate the calculations and simplify
their interpretation.
The calculations for the model complexes in gas phase, even if clearly valu-
able, are of course missing important effects due to protein environment. If a distal
aminoacid group has a direct influence on the studied property (e.g., by formation
of a hydrogen bonding), it should be preferably included in the cluster model (cf.
Sect. 3.2). A different effect is polarity of an enzyme bulk, which can be simulated
by means of continuous solvation models (like PCM or COSMO, implemented in
many QC programs). An important parameter in these models is the effective value
of dielectric constant (ε) of the environment, corresponding to the interior of an
enzyme. Although ε = 5.7 is a frequent choice for enzymes, their “solvation effect”
is known to converge rather quickly with the size of molecular model, meaning that
the actual value of ε is often not so important [168]. However, the effects of protein
environment can be most accurately treated in the QM/MM approach. In QM/MM
the most relevant part of the system (corresponding to the cluster or model complex
in the previous approach) is described at quantum mechanics (QM) level; the QM
part is surrounded by the environment described at molecular mechanics (MM) level.
A great advantage of QM/MM is accounting for polarization of the QM wave func-
tion by the electrostatic field from the MM part. Moreover, this approach can describe
the effect of enzyme on the structure of the active site. In most of the QM/MM cal-
culations performed, “QM” actually stands for DFT (B3LYP) method, but there are
Electronic Properties of Iron Sites … 773
also a few studies reported in Sect. 3 in which ab initio methods were used as the
QM layer. More about various approaches to modeling of metalloenzymes and their
reactions can be found in the next chapter.
3 Case Studies
A unique feature of many transition metal sites is their spin state isomerism: they can
adopt several spin states (with different number of unpaired electrons in the metal
nd orbitals) lying close in energy. This is apparent especially for the first-row (3d)
transition metals species and the iron ones in particular. Both ferrous (Fe(II), 3d6 )
and ferric (Fe(III), 3d5 ) complexes exist in three different spin states: the low-spin
(singlet for Fe(II) or doublet for Fe(III)), the intermediate-spin (triplet or quartet)
and the high-spin (quintet or sextet). Likewise, two (or more) spin states can also be
expected in case of high-valent ferryl species, i.e., containing an iron-oxo (Fe=O)
group with formal +IV oxidation state on Fe (3d4 ).
Which of the possible spin states is actually the ground state for a given complex
results from a delicate balance between the two counteracting factors: the splitting
of the nd orbitals by their interaction with the ligands (larger splitting favors electron
pairing) and the exchange interaction between spin-like electrons (which reduces the
electron–electron repulsion if more electrons are unpaired). In “typical molecules”
(e.g., organic or simple inorganic species), close to their equilibrium geometry, the
ground state usually contains a minimal number of unpaired electrons: either no
unpaired electrons (and the singlet ground state) for closed-shell molecules or one
unpaired electron (and the doublet ground state) for free radicals. This is because the
energy splitting between their highest lying occupied and lowest lying virtual molec-
ular orbitals is sufficiently large. However, in transition metal species the molecular
orbitals with predominant metal nd character are often close to degeneracy. In such
a case it may be energetically preferable to promote certain number of electrons
from lower- to higher-energy orbital(s), and thus to increase the number of unpaired,
spin-like electrons in the system (because the higher spin state benefits from larger
exchange stabilization, i.e., smaller electron–electron repulsion, cf. Sect. 2.1).
As a rule of thumb, a high-spin ground state is expected for metal complexes
with relatively small splitting of the nd orbitals (weak ligand fields). In contrast, in
complexes with significant splitting of the metal nd orbitals (stronger ligand fields)
an intermediate or low-spin state has lower energy, and occasionally becomes the
ground state. Iron sites in non-heme proteins (with aminoacidic ligands only) have
usually a high-spin (HS) ground state, while the intermediate-spin (IS) and low-spin
(LS) states are lying too high in energy to be accessible at ambient temperatures. In
contrast, the porphyrin ligand gives rise to larger splitting of the Fe 3d orbitals, by
strongly destabilizing the one pointing directly onto the four N atoms of porphyrin
774 M. Radoń and E. Broclawik
(3dx y ). As such, the IS and LS states are stabilized (with respect to the HS state) and
placed relatively close in energy, which is believed to be crucial for many biological
functions of iron porphyrin systems [74, 172, 197].
As the relative energy and even the ordering of the spin states is strongly depen-
dent on the coordination environment, the ground spin state may change in the course
of biochemical reactions, frequently proceeding with attachment or release of a lig-
and from a metal coordination sphere. In fact, a number of biologically relevant
transformations involve a change of spin state on the metal. These processes pro-
ceed by crossing from the energy surface of one spin state (the ground spin state
of the reactants) to another one (the ground spin state of the products). Therefore
their reaction energies (thermodynamics) as well as activation energies (kinetics) can
be dominated by relative energy of the two involved spin states [132]. Shaik et al.
recently outlined the importance of exchange stabilization of the transition state for
many reactions proceeding on d-electron metal sites—which can considerably favor
the high-spin channel (the concept of exchange-enhanced reactivity) [166]. All these
arguments show that spin state energetics is a very important issue for understanding
the properties and reactivity of transition metal species.
Unfortunately, spin state energetics for many interesting systems (including some
heme species) cannot be directly obtained from experiment. The same obviously
holds true for the relative energies of spin states along the reaction pathways. There-
fore, much effort has been put in theoretical calculations of spin state energetics.
However, although qualitative principles governing the relative energies of different
spin states (i.e., competition between a tendency to occupy the lower-lying MOs
and a tendency to maximize a number of exchange interactions) may seem intuitive,
quantitative computational prediction of the energy splitting between spin states
turns out to be surprisingly challenging. Notable difficulties are met as well in DFT
calculations, in which spin state energetics is functional-dependent (highly variable
from one functional to another) and thus often inconclusive, as for correlated ab
initio methods, where very high level of theory and a flexible basis set have to be
applied in order to obtain meaningful results [31, 42, 60]. Iron sites in heme and
heme-like coordinations are not exceptions from this general rule. In fact, the expe-
rience gathered for these systems clearly indicates that they might belong to most
difficult problems for computational treatment.
As we shall see below, a remarkable example of these challenges is already a basic
motif in many heme enzymes, an iron(II) porphyrin. This four-coordinated ferrous
porphyrin (with either tetraphenylporphin or octaetylporphin ring, but with no axial
ligands) has experimentally established triplet (i.e., IS) ground state [27, 45, 81,
103]. In contrast, five-coordinated complexes in which iron(II) is axially ligated
by an N-donor imidazole ligand, have a quintet (i.e., HS) ground state [66, 67]—
likewise the ferrous sites in deoxymyoglobin / deoxyhemoglobin (where Fe is axially
coordinated by an imidazole ring of the proximal histidine) and their functional
models [39, 44, 80, 98]. However, if iron(II) is coordinated by two such imidazole
(or histidine) ligands, the ground state changes to singlet (i.e., LS), like for six-
coordinated heme sites involved in electron transfer processes [173]. The LS ground
state is also characteristic of six-coordinated complexes obtained by coordination of
Electronic Properties of Iron Sites … 775
CO, NO, or O2 diatomic molecules to the ferrous heme groups (see Sect. 3.2). By and
large, in ferrous heme groups all the three possible spin states of Fe(II) can become
the ground state, depending on the number (and character) of axial ligands.
As discussed in Sect. 2.4, the invoked here ferrous heme sites can be modeled
as FeP (four-coordinated Fe) and FeP(Im) (five-coordinated Fe) and FeP(Im)2 (six-
coordinated Fe) complexes. Figure 2 shows principal electronic configurations for
the relevant spin states of FeP and FeP(Im) models which have been extensively stud-
ied by means of DFT and CASSCF/CASPT2 calculations (see below). The orbital
occupancies shown in this figure stem not only from theoretical calculations, but
may be also extracted from interpretation of Mössbauer, magnetic resonance (EPR,
ENDOR) or Raman spectra (see references above). In case of FeP, neither experi-
mental results nor theoretical calculations are fully conclusive in regard to the precise
identity of the lowest triplet state—it can be either 3 Eg or 3 A2g depending on the
method used; both these triplet states are shown in Fig. 2.2
Figure 2 shows that the degeneracy of the five Fe 3d orbitals becomes removed
by the ligands. In FeP, due to its high D4h symmetry, the degeneracy of Fe 3dx z and
3d yz is retained, but it is removed in less symmetric FeP(Im). The Fe 3dx y orbital
is pointing directly onto the porphyrin N atoms, therefore it is destabilized most
strongly in both systems. Analogously, the Fe 3dz 2 orbital is destabilized by the axial
imidazole in FeP(Im). In fact, going back to Sect. 2.2.2 one should rather say that
these two orbitals are involved in covalent, σ -type bonding interactions with P and
Im nitrogens, but since this mixing of metal- and ligand-based contributions is not
very large in this case, it is customary to denote the antibonding orbitals as Fe 3dx y
or 3dz 2 . In contrast, Fe 3dx 2 −y 2 is essentially nonbonding, iron-centered orbital. The
same essentially holds true for the remaining two Fe 3dx z,yz orbitals although they
have appropriate symmetry for some mixing with vacant porphyrin π orbitals.
Although the ground spin state can be identified experimentally for these ferrous
porphyrin systems (based on their Fe−N bond lengths from the crystal structures,
from interpretation of magnetic properties and spectra, etc), little is known about
energetics of their excited spin states. Such information is available for ferric heme
site in cytochrome P450, where the LS (doublet) and the HS (sextet) state of Fe(III)
lie so close in energy that their spin equilibrium is observed [174]. The ferric site
of P450 is an iron(III) porphyrin complex with axially coordinated cysteine (Cys),
which corresponds to FeP(SH) model used in many theoretical calculations. In the
resting state the water molecule is additionally coordinated to Fe (as the sixth ligand)
and the LS state is slightly favored (with low-lying HS state). In contrast, if this
water is removed (e.g., by a P450 substrate bound in the distal pocket), the ferric
complex becomes HS (with low-lying LS state). Spin state energetics can be obtained
experimentally for spin-crossover systems—like the one just invoked—where two
(or more) spin states lie so close in energy that their relative populations can be varied
2 The reader should also notice that the LS state is not shown for FeP in Fig. 2. The LS state
discussed below for this complex is a closed-shell singlet, (dx 2 −y 2 )2 (dx z ,d yz )4 , which has analo-
gous electronic structure to the singlet state in five- [FeP(Im)] and six-coordinated heme species.
However, this is not the lowest singlet state of FeP, the latter being instead an open-shell singlet,
(dx 2 −y 2 )2 (dz 2 )2 (dx z ,d yz )2 [12, 136].
776 M. Radoń and E. Broclawik
(a)
(b)
(c)
Fig. 2 The scheme of orbital occupancies in the principal configuration for the S = 1, 2 spin states
of FeP (a) and the S = 0, 1, 2 spin states of FeP(Im) (b), along with orientation of these models
in the coordinate system (c). Symmetry labels are given for the electronic states in accord with
symmetry group of the models (D4h for FeP, Cs for FePIm). The splitting of the d orbitals is shown
only schematically, not reflecting their actual orbital energies
states being compared [117]. Consequently, the free energy of the HS state with
respect to the LS state is most typically lowered by a few kcal/mol as compared to
purely electronic energy difference. If important for accurate comparison of theory
with experiment, these vibrational corrections to free energy can be easily and cred-
ibly modeled based on DFT-computed frequencies [117, 138]. However, it is much
more challenging and usually more important to obtain a correct prediction of the
electronic energy difference.
Table 1 summarizes purely electronic spin promotion energies for ferrous [FeP,
FeP(Im)] and ferric [FeP(SH)] heme models obtained with various computational
methods. This table gives electronic energy differences between the ground state and
the other two spin states for each of the considered complexes, computed for optimum
geometries of the spin states (i.e, adiabatic energies). A number of papers investigated
the spin state energetics of the heme models at DFT level; Table 1 includes only
the most recent results [136, 189], which agree well with the older ones [85, 90,
151, 180, 187]. The most evident conclusion drawn from the DFT results shown in
Table 1 is that they are tremendously functional-dependent where an exchange part of
a functional seems to matter the most. In summary, the non-hybrid functionals (e.g.,
BP86, PBE) favor the low- and intermediate spin states with respect to the high-spin
state, and hybrid functionals (B3LYP*, B3LYP, PBE0) behave in a quite opposite
way. A more detailed analysis (not shown) would reveal that the lowest triplet state
found by non-hybrid functionals is 3 Eg , while the hybrid functionals (likewise ab
Table 1 Relative spin state energetics (kcal/mol) for selected heme models with respect to their
experimental ground states states: 3 IS (triplet) for FeII P, 5 HS (quintet) for FeII P(Im), and 6 HS
(sextet) for FeIII P(SH), calculated at DFT level (with various choices of exchange-correlation
functional) and at ab initio (CASPT2) level
FeII P FeII P(Im) FeIII P(SH)
3 IS→5 HS 3 IS→1 LS 5 HS→3 IS 5 HS→1 LS 6 HS→2 LS 6 HS→4 IS
initio calculations, more of which below) point to 3 A2g . Nonetheless, both triplet
states of FeP are in all cases close to degeneracy. Interestingly, the OLYP functional
behaves in a different way than classical non-hybrid functionals (BP86), yielding
results often closer to hybrid functionals. This outstanding performance is attributed
to the exchange part of this functional, OPTX [58]. It is also useful to notice that
TPSS, a meta-GGA functional, behaves for this type of problems very much like
an ordinary GGA (BP86) [137]. It is also quite intriguing that (at least for ferrous
systems), the HS–LS and the HS–IS energy differences are more sensitive to the
choice of functional than the IS–LS one. It seems that transitions involving an electron
promotion from the Fe 3dx 2 −y 2 (nonbonding) to the Fe 3dx y (with antibonding Fe–
Nporphyrin character) are more sensitive to the choice of exchange functional than
other types of spin transitions, like those between IS and LS states in the ferrous
complexes [136, 138].
Even if for FeP the true (triplet) ground state is correctly recovered in all DFT
calculations (except M06), the spin promotion energies differ considerably from one
functional to another, and (due to lack of experimental IS→HS promotion energy)
it is virtually impossible to tell which functional performs the best. For FeP(Im)
only some functionals recover the experimental quintet ground state (after account-
ing for thermodynamics these are the functionals yielding 5 HS→3 IS energy larger
than ∼−2 kcal/mol). For FeP(SH) the correct result is quasi-degeneracy of the 6 HS
and 2 LS spin states, which is properly recovered by the hybrid functionals (PBE0,
B3LYP, B3LYP*) and OLYP. Since DFT calculations of spin state energetics are
considerably functional-dependent, and therefore not conclusive, while experiment
does not provide sufficient information, there has been a strong drive to perform ab
initio calculations for the heme models [24, 25, 127, 136, 138, 189, 190].
Along this line, the second part of Table 1 shows selected CASSCF/CASPT2
results for the considered heme models [136, 189]. The choice of active space in these
calculations conformed to standard rules for transition metal species (cf. Sect. 2.2.2).3
CASPT2 calculations correctly indicated the HS ground state in case of FeP(Im) and
FeP(SH). Surprisingly, however, they failed to predict the correct (i.e., IS) ground
state for FeP, favoring the HS one by about 5 kcal/mol (∼7 kcal/mol when taking
ZPVE into account) [136]. Although this result may seem not particularly appealing,
it represents a substantial improvement over all preceding ab initio calculations of
the IS–HS energy gap in FeP. To this matter, in 1998–9 Choe et al. obtained the
quintet state favored over the triplet state by 8.5 and 19.6 kcal/mol, based on their
3 Justso, the active space for FeP was obtained by distributing six d electrons of Fe(II) in its 3d
orbitals with added double-shell (4d), and a doubly occupied σ Fe–Nporphyrin orbital to account for
covalency of the iron–N(porphyrin) bonding. This selection led to the active space of 8 electrons
in 11 orbitals (8in11) for FeP. In case of FeP(Im), this active space was augmented with a doubly
occupied σ Fe-NIm orbital to account for covalency of the Fe–N(imidazole) bond, thus yielding
the total active space of 10 electrons in 12 orbitals (10in12) [136]. For the ferric model, the active
space was obtained by distributing five electrons of Fe(III) in its 3d orbitals with added double-shell
(4d) and three doubly occupied orbitals: σ Fe–Nporphyrin (which play the same role as in the ferrous
complexes), together with σ Fe–S and π Fe–S (describing covalency of the Fe–SH bond), yielding
the total active space of 11 electrons in 13 active orbitals (11in13) [189].
Electronic Properties of Iron Sites … 779
MRMP2 [24] and CASPT2 [25] calculations, respectively. Given that their calcula-
tions unambiguously pointed to the HS ground state, these authors even suggested
that this could be the actual spin state of FeP while the experiments are misinter-
preted [25]. However, in 2003 Pierloot [127] identified a main source of error in the
previous calculations, which was not making active a σ Fe–Nporphyrin bonding orbital
(which is necessary to describe nondynamical correlation effects associated with
the iron–porphyrin covalent bonding, see Sect. 2.2.2). Based on the “correct” active
space (8in11), she estimated the 5 A1g –3 A2g gap to 10.1 kcal/mol [127]. This result
was later refined by using a newer form of the zero-order Hamiltonian in CASPT2
(the, so called, IPEA-shifted Hamiltonian, which addressed some unbalance of the
original formulation) and larger basis set [136], providing the number quoted in
Table 1. A big role played by the σ Fe–Nporphyrin orbital in the active space well illus-
trates that spin state energetics in transition metal species has very much to do with
nondynamical (left–right) correlation connected to covalent metal–ligand bonds. In
fact, the spin states which differ in the occupation of the antibonding metal–ligand
orbitals (here: σ ∗ Fe–Nporphyrin , i.e., Fe 3dx y ) contain a different amount of left–right
correlation. This is because the antibonding orbital, once becoming singly occupied
in the HS state, cannot serve any longer as a correlating orbital for the electrons
paired in the bonding orbital as it does serve in the IS/LS state.
One might ask, whether the CASPT2 error on the IS–HS energy gap in FeP is
specific only to this system, or is it transferred to the other heme models as well.
This question was addressed by the subsequent paper from the Leuven group [189],
in which the authors compared behavior of CASPT2 (and DFT) calculations with
very accurate coupled cluster results for small heme-like models shown in Fig. 3: the
first two roughly mimicking FeP(SH) and the third one—FeP(Im). As explained in
Sect. 2.4, the CCSD(T) method is too expensive to be applied to real iron porphyrin
systems, but as the small models capture the main features of iron coordination in the
heme sites, they are good to validate performance of computational methods with
respect to the CCSD(T) calculations. Based on the comparison for these small models,
Vancoillie et al. concluded that the CASPT2 calculations are in close agreement
with CCSD(T) calculations for the ferric heme models, thus presumably they can
be trusted. However, likewise for FeP, CASPT2 seems to overstabilize the HS state
in the ferrous model by about 4–6 kcal/mol [189]. This effect is also similar to
Fig. 3 Small models of ferric (a, b) and ferrous (c) heme groups with chelating amidine ligands
(CN2 H− −
3 , C3 N2 H5 ) as mimics of the porphyrin ring, which were studied by Harvey and Olah [107]
at CCSD(T) level and subsequently by Vancoillie et al. [189] at CASPT2 level
780 M. Radoń and E. Broclawik
the CASPT2 behavior for the LS→HS promotion in a spin crossover Fe(salen)NO
complex [138]. Although salen [i.e., N , N -ethylenebis(salicylimine)] is a non-heme
ligand, this macrocycle binds iron in a quite similar mode as porphyrin (with its two
O and two N atoms instead of four N atoms). Moreover, likewise in the ferrous heme
complexes, the 2 LS→4 HS spin conversion in Fe(salen)NO comes down to an electron
promotion from the nonbonding (3dx 2 −y 2 ) to the antibonding (3dx y ) Fe 3d orbital (cf.
3
A2g →5 A1g transition in FeP or 3 A →5 A transition in FePIm, as shown in Fig. 2).
Thus, given similarity of the electronic structure, the 2 LS→4 HS spin promotion
in Fe(salen)NO complex can be regarded very similar to analogous transitions in
heme models with the axial nitrosyl ligand (discussed in Sect. 3.2). From spin-
crossover experiment, a purely electronic energy difference of 2.2 kcal/mol between
the LS (S = 1/2) and the HS (S = 3/2) state of Fe(salen)NO can be obtained [138],
whereas CASPT2 gives −4.9 kcal/mol, predicted with the basis set and active space
comparable to that for the heme systems [138].
Even though the CASPT2 spin state energetics was found somewhat deficient for
the ferrous heme systems, this is by no means general: this error is not present for the
ferric heme models as well as for many transition metal species for which CASPT2
perform very well [128, 129]. In fact, across the small ferrous and ferric models
studied in [189], CASPT2 provided a better accuracy than any of the tested DFT
methods. Furthermore, it was pointed out that even heavily parametrized Minnesota
functionals (M06, M06-L) [203–205] did not improve over traditional functionals,
like B3LYP and OLYP, which (across the studied systems) led to more systematic
behavior with respect to CCSD(T) [189]. In fact, the M06-L functional is quite
accurate for the ferrous heme but (likewise M06), it does not behave correctly for
the ferric heme: the LS is placed at nearly 15 kcal/mol above the HS state, whereas
experiment indicates that the both spin states should be close to degeneracy. In
addition, the aforementioned CASPT2 error for the spin crossover Fe(salen)NO
complex should be compared with equally large (or even larger) errors of common
density functionals. Actually, only the OPTX-based functionals (e.g., OLYP) seem
to deal with this difficult case, providing the LS–HS gap close to experiment [29,
138].
In the most recent paper from the Leuven group, the FeP system was re-examined
in more details [190]. It was shown that the triplet ground state (3 A2g ) can be cor-
rectly recovered in CASPT2 calculations for FeP by augmenting the active space
with iron semicore orbitals, i.e., 3s and 3p (providing the last CASPT2 number in
Table 3.1). Intershell correlation between the (3s,3p) and 3d orbitals of the first-
row transition metal has long been recognized as important, but in most studies this
effects was treated only at CASPT2 level, i.e., as a purely dynamical correlation. The
result quoted above, however, suggests that it cannot be properly dealt with by sole
CASPT2—meaning that the semicore 3s,3p orbitals should be preferably included in
the active space. However, making active these four extra orbitals (3s,3px,y,z ) means
a significant enlargement of the active space, which (for species more complicated
than FeP) may quickly become too large for performing CASSCF/CASPT2 calcula-
tions. Fortunately, Pierloot et al. have also demonstrated that the effect of (3s,3p)–3d
correlation in FeP can be adequately treated at less expensive RASSCF/RASPT2
Electronic Properties of Iron Sites … 781
level by keeping the 3s,3p orbitals in RAS1 subspace with up to double excitations
allowed [190]. Thus the development of RASSCF/RASPT2 approach has brought
an optimistic forecast that highly accurate (properly correlated) multireference cal-
culations can be carried not only for small FeP, but also for more complicated heme
models including axial ligands.
In the cited work [190] the effects of spin-orbit coupling (SOC) and an effective
magnetic moment of FeP in the equilibrium structure of 3 A2g were also investigated
within a sum-over-state approach [188] for a manifold of low-lying ligand field
states. Employing SOC led to a mixture of 3 A2g (68%), 3 Eg (13%), and 5 A1g (18%)
as the ground state at the equilibrium structure. The coupling with the orbitally-
degenerated triplet state (3 Eg ) and the quintet state (5 A1g ) increases the magnetic
moment
√ substantially with respect to the the spin-only value for the triplet state
(2 2μB ≈ 2.83μB ), yielding μeff = 4.43μB . This value falls in excellent agreement
with experimental estimates of 4.4–4.7μB [35, 179], supporting the high-quality
of the underlying electronic structure and energetics. Upon considering SOC the
electronic state corresponding to the 5 A1g structure remains predominantly quintet
and bears a μeff = 5.57μB . While the magnetic moment of the electronic states
changes considerably, the energy of the both SOC eigenstates is only slightly changed
as compared to the spin-orbit free states. The final estimate for the adiabatic 5 HS→3 IS
gap (including the nondynamical effects attributed to Fe 3s3p electrons and the SOC)
becomes −1.8 kcal/mol [190], which agrees with the experimental state ordering,
demonstrating that the spin state energetics can be predicted by CASPT2/RASPT2
with a chemical accuracy.
In conclusion, calculations of spin state energetics in heme systems remains a very
challenging problem for QC. At the moment no theoretical method applicable to heme
models can be fully trusted in this regard. The CASSCF/CASPT2 calculations with
standard choice of the active space may (in some cases) overstabilize the HS state;
enlargement of the active space with the Fe 3s,3p semicore orbitals was suggested as
a solution. On the other hand, the CCSD(T) calculations, though highly accurate, are
applicable only to small mimics. By contrast, DFT methods are very sensitive to the
arbitrary choice of exchange–correlation potential and no single functional can be
pointed out to perform uniformly the best. The hybrid functionals (B3LYP, B3LYP*)
and the OPTX-based one (e.g., OLYP) perform reasonably well as compared to
CCSD(T) benchmarks, although it is difficult to predict the proper ratio of exact
exchange admixture in the hybrid functionals. From a long story with FeP, one
should learn perhaps that multireference ab initio calculations have large potential,
even though they may go astray if inadequate (too small) active space is used.
We notice that a number of new methods are currently being developed which may
improve description of spin state energetics in transition metal systems. For instance,
NEVPT2 method with entirely different construction of the zero-order Hamiltonian
have been suggested as a possible alternative to CASPT2 (RASPT2) [6]. In the DFT
domain one should notice promising developments too, for instance the appearance of
double-hybrid functionals (cf. Sect. 2.3, [200]), localized, semiempirical corrections
to hybrid functionals [68], and DFT + U methods [2, 86, 155].
782 M. Radoń and E. Broclawik
Binding of diatomic molecules (XO = O2 , CO, and NO) to iron sites in heme proteins
is important for respiration, sensing and regulatory processes. The ferrous heme
sites in myoglobin (Mb) and hemoglobin (Hb) are employed by all vertebrates for
storage (Mb) and transport (Hb) of molecular oxygen (O2 ) [109]. This function is
inhibited by a poisonous carbon monoxide (CO), which binds to ferrous heme much
stronger than O2 (and practically irreversibly). Nitric oxide (NO) can be poisonous
either, especially at high concentration, while at low concentrations it plays important
biological role, being involved in intercellular signaling, smooth muscle relaxation,
and other regulatory functions [185]. Important enough, sensing of NO—relevant
for most of its biological roles—comes down to its binding to a ferrous site in soluble
guanylate cyclase (sGC) [185], which initiates subsequent cleavage of the axial Fe–
Nhistidine bond (at trans-to-NO axial position) thus leading to an allosteric transition
of the sensing protein [18]. There is also much interest in bonding of O2 to a heme
site in oxygenases, like cytochrome P450 [33], heme oxygenase (HO) [94], and nitric
oxide synthase (NOS) [149]. In these catalytic cycles the O2 coordination (and its
activation, prerequisite to enter the subsequent reactions) is preceded by a reduction
of the iron from the initial ferric to ferrous state [33, 94, 149]. The process of binding
the O2 and NO molecules by native (i.e., not reduced) ferric sites, such as in the resting
state of P450, deserve not less noticeable interest [38, 178, 182], nevertheless, for the
sake of brevity the discussion below will be restricted mostly to the ligand binding
to the ferrous sites.
For efficient discrimination between CO and O2 , the oxygen carriers (Mb/Hb) rely
on their different affinities to the ferous site, because otherwise—in terms of shape,
size, polarity, diffusion rate—these ligands are very similar, and cannot be efficiently
discriminated [109, 177]. In the above terms, NO is also very similar to CO and
O2 . But NO, as having a single unpaired electon, is more reactive and prone to
engage in specific interactions, what may seriously affect its mobility and lifetime in
biological systems. The strikingly different magnetic properties—CO diamagnetic,
NO and O2 paramagentic—are not expected to affect permeation of these ligands
towards the binding site because magnetic interactions does not play a major role in
this process. Given the arguments above, it is interesting and important for QC to
reliably reproduce the binding energies of the three XO ligands to heme by means of
electronic structure calculations. Moreover, it is all the same crucial to quantify the
role of weaker interactions of the XO ligands with distal residues in protein [11]. As
we shall see below, even if the latter interactions are rather well understood, it is still
very challenging to calculate the XO bonding energies to heme in a good agreement
with experimental data.
Electronic Properties of Iron Sites … 783
Structural features of the XO bonding to ferrous heme are well known and have
already been thoroughly discussed in the literature [26, 85, 151, 152]. All three XO
ligands coordinate to Fe(II) in an end-on manner (CO and NO via their X atom). The
CO molecule coordinates linearly (the optimal Fe−C−O angle is ∼180◦ ), while O2
and NO prefer bent coordination (with the Fe−O−O angle ∼120◦ and the Fe−N−O
angle ∼140◦ ). The structures of the heme–XO complexes, known from the crystal
structures of proteins [13, 102, 122–124, 160, 161] and their functional models [26,
28, 72], are well reproduced by DFT [11, 91, 136, 142, 152] and DFT/MM [20, 150,
171] calculations. Some functionals (e.g., BP86) were claimed to perform quantita-
tively better than the others in reproducing the experimental structures [142], but it
is noteworthy that various DFT methods predict very similar and actually reliable
structures for the heme–XO models [2].4
Although it is relatively easy to obtain reliable structures of the heme–XO com-
plexes from DFT calculations, the same is not true about the energetics of the ligand
binding. As will be shown below, the Fe–XO bond dissociation energies are strongly
functional-dependent [136] and thus not really conclusive [11, 152, 169, 181]. Two
main reasons were pointed out to rationalize the origin of these difficulties [136].
First, nondynamical correlation and dispersion effects [169] play a major role in
formation of weak and strongly covalent Fe−XO bond, thus description of the both
effects within DFT is questionable. Second, the bonding of all three ligands is accom-
panied by change of the spin state on the iron: from the HS quintet to the LS, either
singlet (diamagnetic) in Fe–O2 and Fe–CO or doublet in Fe–NO complexes. As dis-
cussed in the previous section, reliable theoretical description of spin state energetics
pose a big challenge for QC. The same problems arise for the XO binding energies
to heme, into which the energy of spin conversion on iron contributes a significant
part.
Focusing on the advances in quantum-chemical description of the heme–XO bond-
ing, one should not forget about the effects of distal protein residues on the ligand
bonding. Historically, these “protein effects” were recognized prior to any quantum
calculations: by inspection of the enzymes crystal structures and from experiments
comparing binding properties of wild-type enzyme Mb with that for its mutants
(where certain aminoacidic residues were selectively changed) and with simple heme
compounds (chelated protoheme) [109, 110, 176]. Consecutive DFT and DFT/MM
calculations [11, 20, 150] confirmed (and slightly corrected) the mechanism figured
out from earlier experiments. According to present knowledge, in case of myoglobin
the predominant effect is caused by a distal histidine (His64, shown in Fig. 4). In
deoxymyoglobin this histidine pulls a water molecule into the distal pocked, thus
binding of the XO ligands requires first to displace out this water. This costs around
1–2 kcal/mol and gives rise to an inhibition effect of this size for all three lig-
ands. However, the distal histidine also stabilizes the adsorbed XO molecules as
compared to free heme. The CO ligand is only very weakly stabilized in protein
and the inhibitory effect dominates. It was even postulated earlier that CO may be
4A minor exception from this rule, quoted for structures obtained from some hybrid functionals
for the Fe–O2 complexes, is discussed in Ref. [136].
784 M. Radoń and E. Broclawik
Fig. 4 View on the active site of oxymyoglobin (heme-O2 complex) from physeter catodon (PDB
code: 1MBO), showing the distal histidine (His64). The dashed lines indicate a possibility of
hydrogen bonding (the hydrogen atom not shown in the X-ray structure) between the distal His64
and the adsorbed O2 ligand bearing partial superoxide character due to participation of the Weiss-
type resonance structure, FeIII −O−
2 (see Sect. 3.2.2)
ment can be due to either limitations of the theoretical calculations (e.g., too small
models, not considering entropic effects properly) or to erroneous or misinterpreted
experiments. Therefore, it was proposed to estimate protein effect on the O2 binding
as an average of the available theoretical and experimental results, i.e., to assume
a value of ∼6 kcal/mol for protein effect [136]. Qualitatively similar effects as for
myoglobin are expected to play a role also for hemoglobin [161].
In brief, the predominant protein effect shows up in changing the ligand binding
energies as compared to free heme due to hydrogen bonding with distal histidine.
Consequently, the heme–O2 complexes acquire a significant extra stabilization in pro-
tein environment (roughly ∼6 kcal/mol), the CO bonding is inhibited by ∼1 kcal/mol,
while the NO bonding energy is nearly not affected by the protein environment as
compared to protoheme. Moreover, protein has also some effect on the molecular
structures of the Fe–XO complexes, most pronounced on the labile degrees of free-
dom (rotation of the XO group, orientation of the histidine imidazole) which are very
sensitive to weak interactions; nevertheless, it does not change the general features
of XO coordination to heme.
A comparative study of XO bonding to heme models was carried out by Radoń and
Pierloot [136] at DFT level (with several exchange–correlation functionals) and with
CASPT2 method. In this study, the heme group was modeled as porphin ring (P) with
axially coordinated imidazole ligand (Im), in accord with the FeP(Im)(XO) models
(X=C, N, O). In addition to six-coordinated complexes, five-coordinated FeP(XO)
complexes (i.e., without the axial Im) were also studied (Fig. 5). Geometries of the
complexes were optimized at DFT level (PBE0 and BP86 functionals) and used in
subsequent CASSCF/CASPT2 calculations. At the both levels of theory the largest
basis set used corresponded to a polarized quadruple-ζ quality on Fe and polarized
triple-ζ on the ligands.5
The bond dissociation energies (BDEs), i.e., energies of the reaction
where “heme” stands for either FeP or FeP(Im), were calculated assuming the ground
spin state for all molecules: high-spin for FeP(Im), intermediate-spin for FeP, and
low-spin for all the heme–XO complexes. The computed BDEs were corrected for the
difference in zero-point vibrational energies between the products and the reactant in
(14) as well as for basis set superposition error (BSSE). Despite using large basis set,
the BSSE correction at CASPT2 level was still found considerable, 7–9 kcal/mol,
much larger than at DFT level. However, since the iron spin state is changed upon
ligand binding, the binding energies with respect to the low-spin state (not the actual
ground state) of the respective heme were calculated in addition. The relation between
the binding energy with respect either to the ground state (ΔE BDE ) or the singlet state
(0)
(ΔE BDE ) of heme is given by:
5 In
CASPT2, due to its large computational cost, higher polarization functions were removed for
H and C atoms of P and Im ligands, keeping the fully polarized triple-ζ quality only in the first-
coordination sphere of Fe and on the XO ligand.
786 M. Radoń and E. Broclawik
Fig. 5 Structures of the ferrous heme models (a, e) and of their complexes with the XO ligands
(b–d, f–h)
(0)
ΔE BDE = ΔE BDE − ΔE sp , (15)
where ΔE sp is adiabatic spin-pairing energy, i.e., the energy difference between the
low-spin and the actual ground state of the respective heme model (see Table 1 in
(0)
Sect. 3.1). The ΔE BDE contribution can be interpreted as “intrinsic” binding energy
with respect to heme promoted to the spin state with most similar electronic structure
to that in the heme–XO adduct [136].
The active space for CASSCF/CASPT2 calculations chosen in Ref. [136] con-
formed to general rules for transition metal compounds (outlined in Sect. 2.2.2).
The spin-pairing energy (ΔE sp ) in FeP and FeP(Im) was obtained using the same
active spaces as described in Sect. 3.1. In FeP(XO) and FeP(Im)(XO) complexes,
new covalent metal–ligand bonding is found (i.e., Fe–XO), giving rise to additional
nondynamical correlation effects which can be accounted for by extending the active
space with the appropriate valence orbitals on XO: σ , π , and π ∗ . However, including
all above orbitals on top of the active spaces of the parent heme complexes turned out
to be impossible not only for the unacceptable computational cost of the CASPT2
calculations with such a large active space, but also for unfavorable orbitals rota-
tions experienced already during the CASSCF step (where orbitals with occupation
number close to either two or zero tended to rotate out of the active space in favor
of Fe 3s,3p core orbitals or antibonding ligand-centered orbitals). Therefore, after
doing many test calculations, some of these less important orbitals were removed
from the active space (made either inactive or virtual), allowing to find tractable and
computationally stable active spaces for description of heme–XO complexes [136].6
6 First,
since the occupation of the Fe 3dx y is always small the corresponding double-shell Fe 4dx y
was removed from the active space. For XO=CO the Fe 3dz 2 was also practically unoccupied,
thus the corresponding Fe 4dz 2 was not active either. In contrast, all the five CO orbitals (σ , π ,
and π ∗ ) appeared necessary, which led to active space of 14 electrons in 14 orbitals (14in14) for
FeP(CO) and 16 electrons in 15 orbitals for FeP(Im)(CO). For XO=NO or O2 , the explicit σ orbital
to describe σ -donation became less important thus only the π and π ∗ orbitals of NO and O2 were
Electronic Properties of Iron Sites … 787
This example illustrates that although the general principles governing the choice of
active orbitals (Sect. 2.2.2) are rather clear and quite intuitive, it is often not trivial,
neither straightforward, to find a well-balanced active space for larger complexes.
The active space constructed for each heme–XO species was used to calculate the
(0)
bonding energy ΔE BDE , by subtracting the CASPT2 energy of the respective complex
from the sum of CASPT2 energies of the isolated XO and the proper heme species
(in the closed-shell singlet state). Subsequently, the ΔE sp term was subtracted to
yield the bonding energy ΔE BDE by Eq. (15).
Table 2 shows the ligand binding energies calculated in Ref. [136] versus their
experimental estimations. The experimental BDE in gas phase is directly available
only for the FeP(NO) complex [23]. By contrast, the experimental BDEs given for
the FeP(Im)(XO) complexes are those estimated earlier by Blomberg et al. [11]
from kinetic (dissociation barriers) and thermodynamical (equilibrium constants)
data for either chelated protoheme or myoglobin [109, 176]; the latter are consistently
corrected for the absence of the protein environment in the present computational
model (vide supra).
Already a first look at Table 2 shows that the BDEs calculated at DFT level are
very sensitive to the exchange-correlation functional, particularly to its exchange
part. For all three ligands, the hybrid functionals (PBE0, B3LYP, B3LYP*) give
much lower bonding energies than the nonhybrid functionals (BP86, PBE, OLYP).
The BDEs from hybrid DFT methods clearly correlate with the amount of exact
exchange included in various functionals; in general the BDEs from PBE0 (25%)
are smaller than the BDEs from B3LYP (20%), which are smaller than the BDEs from
B3LYP* (15%). This simple trend does hold for all complexes except FeP(O2 ). It can
be observed that classical non-hybrid functionals (e.g., BP86) profoundly overbind in
all cases which is their rather typical behavior. On the contrary, the hybrid functionals
(except B3LYP*) greatly underbind all the three ligands. Rather poor performance
of the famous B3LYP functional is partially corrected by its reparametrized version,
B3LYP*. While the binding energies are clearly improved for CO and O2 , B3LYP*
still underbinds NO, both in FeP(NO) and in FeP(Im)(NO). In sum, the best DFT
BDEs were obtained with OLYP and B3LYP* functionals, but even these two func-
tionals have still noticeable difficulties: OLYP in reproducing the Fe–O2 BDE, while
B3LYP* in reproducing the Fe–NO BDE. In case of OLYP and O2 , the discrepancy
might, in fact, arise from limited accuracy of the experimental data (i.e., problems in
estimation of the protein effect for O2 , vide infra). Nonetheless, in case of B3LYP*
the problems appearing for both FeP(NO) and FeP(Im)(NO) indicate a failure of this
functional in providing the correct Fe–NO BDE.
made active. On the other hand, both in oxyheme and nitrosylheme complexes the Fe 3dz 2 was at
least partially occupied, thus the corresponding Fe 4dz 2 double-shell orbital was found important
and added straight for the FeP(NO) and FeP(O2 ) complexes, which led to active spaces of (13in14)
and (14in14), respectively. On the contrary, for FeP(Im)(NO) and FeP(Im)(O2 ), adding it on top
(0)
of Im σ turned out to be unfeasible. Thus, here the effect of Fe 4dz 2 on the binding energy ΔE BDE
had to be estimated from separate calculation with Fe 4dz 2 either active or not, but without Im σ
active, and used as a mere correction to the results obtained with Im σ active and Fe 4dz 2 virtual
[i.e, employing (15in14) for FeP(Im)(NO) or (16in14) for FeP(Im)(O2 )].
788 M. Radoń and E. Broclawik
By contrast CASPT2 method provides satisfactory results for all three ligands and
for both five- and six-coordinated complexes. While in case of FeP(NO) the CASPT2
BDEs are too large by 2–5 kcal/mol, the CASPT2 results for the six-coordinated
complexes are very close to experimental estimations (and systematically some-
what smaller). The overall good performance of CASPT2 is achieved because this
multireference method (with the present, balanced choice of active spaces) provides
a correct description of the heme–XO bonding in which static correlation plays a great
role. A particularly good (nearly quantitative) agreement is obtained for CO and NO
bonding, but this might be (in part) due to error cancelation (vide infra).
A more in-depth discussion provided in Ref. [136] reveals the role of both contri-
(0)
butions, namely ΔE BDE and ΔE sp , in determining the actual value of BDE for various
(0)
computational methods. In brief, the hybrid DFT methods lead to much lower ΔE BDE
(i.e., weaker bonding) than the nonhybrid ones (i.e., stronger bonding), consistently
for both five- and six-coordinated complexes. In contrast, the behavior of spin-pairing
energy (ΔE sp ) is different for both types of complexes (in line with discussion in
Sect. 3.1): for FeP the spin-pairing energy is nearly constant for various function-
Electronic Properties of Iron Sites … 789
the triplet, quintet, and heptet spin states (characterizing the asymptotic limits with
isolated quintet FeP(Im) and triplet O2 species as dissociation products). From the
energy of the short-distance minimum (Fe–O2 distance of 1.8 Å) on the singlet curve,
the authors estimated the Fe–O2 BDE as 14.9 kcal/mol.7 However, the energy curves
for the higher spin states (triplet, septet) generated additional minima for longer Fe–
O2 distances, corresponding to weakly bound deoxyheme...O2 complexes. These van
der Waals minima were not observed in the previous DFT calculations, presumably
because DFT does not describe the dispersion forces properly [22]. The existence of
the long-distance minima was pointed out as important for the kinetic reversibility
of the heme–O2 binding.
The Pauling model is consistent with diamagentism of oxyheme and explains the
bent geometry of the Fe–O–O fragment. Nonetheless, based on spectroscopic data
and chemical properties of synthetic Fe–O2 complexes, Weiss argued [194] that the
iron in oxyheme is most likely oxidized to Fe(III), while O2 is reduced to a superoxide
form (O− 2 ). The resonance structure proposed by Weiss to better describe electronic
properties of oxyheme was thus
7 It
should be noted that this number does not include a BSSE correction, in view of Ref. [136],
expected to reduce the BDE considerably.
792 M. Radoń and E. Broclawik
where the two unpaired electrons—one on the low-spin Fe(III) and the other
on superoxide—couple antiferromagnetically to yield the global singlet state. Yet
another formulation of oxyheme as
a realistic oxymyoglobin model. The QM region in the latter calculations was lim-
ited to FeP(Im)(O2 ) and the distal histidine (modeled as imidazole), whereas the
rest of the protein was simulated by point charges obtained from DFT:B3LYP/MM
calculations. The DFT/MM calculations were also used to optimize the structures for
subsequent use in the CASSCF/MM energy calculation. The (14in12) active space
was used, similar to the active spaces used in the other studies of oxyheme com-
plexes [77, 142, 198] and the one described in the previous Sect. [136], however,
without the σ Fe–NIm orbital and including only two Fe double-shell orbitals.
Figure 6 shows the three key orbitals describing the Fe–O2 bond in oxy-Mb
reprinted from the study of Chen et al. [20]. The middle one (labeled φ4 in Fig. 6) is
a bonding orbital corresponding to overlap of the O2 π∗ (i.e., the one lying in the Fe–
O–O plane) with Fe 3dz 2 . This molecular orbital thus describes a σ bonding between
∗
the Fe and the proximal O atom. The corresponding antibonding orbital (σFe−O ) is
not shown in Fig. 6 for clarity although the both orbitals were active in CASSCF. The
other two orbitals (labeled φ3 and φ8 in Fig. 6) are the bonding and the antibonding
combinations arising from interaction of O2 π⊥∗ (i.e., the one perpendicular to the
Fe–O–O plane) with Fe 3d yz . These two orbitals thus describe a π bonding between
the iron and the O2 fragment. The CASSCF wave function of the oxy-Mb model is
dominated (∼90%) by the the two main configurations, both having the σFe−O orbital
∗
doubly occupied (and the corresponding antibonding σFe−O vacant), but differing in
∗
occupation of the π (Fe–O2 ) and π (Fe–O2 ) orbitals:
Ψoxy-Mb = C1 . . . (πFe−O2 )2 (πFe−O
∗
2
)0 − C2 . . . (πFe−O2 )0 (πFe−O
∗
2
)2 (16)
(where three dots represent the closed-shell part of the wave function, common for
the two configurations). We notice that a qualitatively similar, two-configurational
description of the oxyheme was also established in the earlier CASSCF studies [77,
198], prior to the cited work by Chen et al., however, it was not always interpreted
correctly. In particular, although the wave function in (16) is dominated by a closed-
shell configuration (|C1 | > |C2 |), it does not describe the closed-shell electronic
structure and one must not automatically view this wave function as conforming to
the Pauling model (cf. Ref. [76, 77]).
Being well aware of these interpretational caveats, Chen et al. transformed the
two-configurational wave function of (16) into a generalized valence bond (GVB)-
type wave function with two new orbitals, obtained as combinations of π (Fe–O2 ) and
1
Ψoxy-Mb = √ . . . (d yz )↑ (π ∗ )↓ (17)
⊥
2 1 + S2
∗
is more alike oxygen-based orbital in oxy-Mb (simultaneously the σFe−O antibonding
orbital is dominated by Fe 3dz 2 ). Consequently, the character of the σ component of
the Fe–O2 bond changes from a typically covalent in the gas-phase model to nearly
dative in oxy-Mb [20].
By considering not only the π (Fe–O2 ) and π ∗ (Fe–O2 ) orbitals, but also the
doubly-occupied σFe−O orbital involving a combination of Fe- and O2 -centered
orbitals (cf. Fig. 6), Chen et al. refined the GVB description of Fe–O2 bonding.
The authors obtained a more complicated GVB-type wave function with the three
leading terms, which were identified as the three (Pauling, Weiss, and McClure) res-
onance structures. Nevertheless, the Weiss structure was still identified as the most
important one for oxy-Mb, with a noticeable admixture of the McClure structure and
only a small contribution of the Pauling structure [20]. Chen et al. also focused on
the charge on the O2 fragment. This charge was found to vary from −0.2e in the gas
phase (CASSCF) to about −0.5e in oxy-Mb (CASSCF/MM), but never reaching the
value of −1e, which might be (naively) expected for the Weiss-type bonding. Quite
similar fractional negative charges were also found earlier in DFT and DFT/MM
calculations. This effect was attributed to the σ component of the Fe–O2 bonding,
which results in partial back-donation of electrons from O− III
2 to Fe , thereby reducing
the negative charge on O2 [20].
The CASSCF description of the Fe–O2 bonding thus turned out to be quite close
to the earlier DFT suggestions (vide infra). Chen et al. have also demonstrated a good
correspondence between natural orbitals from DFT and from CASSCF as well as
between natural spin orbitals from DFT and GVB transformed CASSCF orbitals [20].
The similarities do not seem accidental. In fact, both computational approaches
point to essentially the same bonding picture in case of oxyheme complexes, albeit
pinpointing this similarity requires a proper “reading” the multiconfigurational wave
function in a VB-type language [162].
Fig. 8 Contour plots of spin densities obtained from B3LYP (hybrid DFT) and BP86 (nonhybrid
DFT) methods for FeP(NO), Fe(salen)NO, and FeP(NH3)(NO) complexes (all in the S = 1/2 spin
state). The red/green color indicates excessive spin-up/spin-down density. The annotated values are
Mulliken spin populations on Fe and NO fragments. Based on Refs. [29, 138]
Electronic Properties of Iron Sites … 797
1
B3LYP
B3LYP* Fe(P)NO
Fe(P)(NH3)NO
0.5 OLYP
BP86 CASSCF Fe(salen)NO
NO spin population
CASSCF BP86
0 BP86 CASSCF
OLYP S=3/2
OLYP
Fig. 9 CASSCF and DFT Mulliken spin populations on NO and Fe for FeP(NO), FeP(NH3 )(NO),
and Fe(salen)NO. The points for S = 1/2 complexes gather along the x + y = 1 line indicating that
both populations sum up to one unpaired electron; in contrast, the points for S = 3/2 complexes lie
slightly below the x + y = 3 line since a part of the iron spin population leaks to the macrocycle
via a covalent σ Fe–(macrocycle) bonding. Adapted with permission from [138]. Copyright (2010)
American Chemical Society
798 M. Radoń and E. Broclawik
species agree well with interpretations of EPR and MCD spectra for similar (i.e., five-
and six-coordinated) heme-nitrosyl species [138]. In contrast, the hybrid functionals
(B3LYP, B3LYP*) point to excessive spin polarization, which is more pronounced
for B3LYP than for B3LYP* since the former contains more exact exchange than the
latter. This can be regarded a way the hybrid functionals try to simulate nondynamical
(left-right) correlation [138].
The work cited above [138] provided a detailed description of several {FeNO}7
complexes at multiconfigurational level. In addition to CASSCF studies on the
heme models [FeP(NO), FeP(NH3 )(NO)] and the Fe(salen)NO complex discussed
so far, two other, experimentally characterized {FeNO}7 species were discussed,
namely: [Fe(T)NO]− [where T is tris(carbamoylmethyl)amine, a “tripodal ligand”]
and [Fe(H2 O)5 NO]2+ (a complex obtained in the “brown-ring” reaction) [140, 191].
For all these complexes the active spaces were chosen according to the standard rules
(Sect. 2.2.2) and composed of Fe 3d, double-shell 4d, NO π ∗ orbitals, supplemented
with up to two σ -type orbitals to describe the covalent bonding with equatorial and
axial ligands.
Figure 10 shows—on the example of FeP(NO)—the key molecular orbitals
involved in description of the Fe–NO bonding in the studied {FeNO}7 complexes.
Typically of heme complexes, one of the Fe 3d orbitals (dx 2 −y 2 ) essentially does
not interact with the ligand orbitals, while the other one (dx y ) is strongly desta-
bilized by the equatorial ligands. The remaining three Fe 3d orbitals (dz 2 ,dx z ,d yz )
∗
are involved in grossly covalent interactions with NO πx,y orbitals, leading to two
∗ ∗
bonding orbitals (d, πx,y )b , two antibonding orbitals (d, πx,y )a , and one orbital in
the middle with a nonbonding character, (d, πx∗ )n . A qualitatively similar bonding
picture was established for all studied complexes, with minor differences, e.g., in
the shape of the (d, πx∗ )n nonbonding orbital, slightly depending on the ligands and
the spin state [138].8 Apart from the two-component π -type bonding between Fe
and NO, there is an additional σ -type bonding, described by a weakly covalent (pre-
dominantly dative) interaction of the nitrogen lone-pair orbital with the Fe 3dz 2 . To
account for slight covalent character of this interaction, the occupied σ Fe–Naxial
orbital, with predominant nitrogen lone-pair character (not shown in Fig. 10), was
included in the active space in Ref. [138].
Figure 10 gives also the principal electronic configurations appearing in the
CASSCF calculations for the LS (S = 1/2) and the HS (S = 3/2) state of the studied
complexes. These configurations cover about 75–80% of the CASSCF wave func-
tion for the LS state and only ∼60% for the HS state. The remaining 20–40% part
of the wave function is distributed over many configurations, none of them reaching
a contribution larger than 10%. Among most important ones are the doubly-excited
∗
configurations with the electronic pair transferred from one of the bonding (d, πx,y )b
∗
orbitals to one of the antibonding ones (d, πx,y )a . The large role played by these
configurations is reflected in natural occupation numbers of the involved orbitals
(only about 1.7–1.8e for the bonding orbital and 0.2–0.3e for the antibonding one)
difference is most pronounced for the [Fe(H2 O)5 NO]2+ complex, in which due to the linear
8 The
Fe–N–O coordination the character of the nonbonding orbital is changed to pure Fe 3dz 2 .
Electronic Properties of Iron Sites … 799
the first one with Fe(III)–NO− character and the other two with Fe(II)–NO0 charac-
ter.9 All the three quartet VB-type resonance structures given in (18) involve a pairing
between the singly-occupied orbitals with Fe 3d and NO π ∗ character. The antifer-
romagnetic coupling of HS Fe(III)/Fe(II) with NO− /NO0 explains the origin of
significant spin polarization in the HS state (with majority spin-density on Fe and
minority spin-density on NO, cf. Fig. 9). It must be stressed that the spin polarization
in the quartet state cannot be understood by referring to only the principal electronic
configuration (cf. Fig. 10), since it has just three unpaired electrons on Fe and no
singly-occupied orbitals on NO. Likewise for the Fe–O2 species (see above), the anti-
ferromagnetic coupling arise from admixture of other configurations and becomes
more evident after transforming the wave function to the localized orbitals.
The analogous analysis of the CASSCF wave function for the LS (doublet) state of
the studied complexes produced even more configurations with comparable weights
than found for the quartet states; most of these configurations had either Fe(III)–
NO− or Fe(II)–NO0 character. Likewise for the HS state, some of these configu-
rations were found to describe the antiferromagnetic coupling (bond pair) of the
Fe(III) (SFe = 5/2) with the NO− (SNO = 1) fragments or the Fe(II) (SFe = 1) with
the NO0 (SNO = 1/2) fragments. However, other configurations described a (local)
singlet state of Fe(II) and an unpaired electron localized on NO0 [138]. To deal with
a large number of configurations collective weights of all configurations belonging
to a given resonance structure (e.g., all the configurations with Fe(III)–NO− char-
acter) were calculated, thereby rendering weights of various participating resonance
structures. In addition to the two already mentioned (FeIII −NO− , FeII −NO0 ), the
other two resonance structures were also identified (FeI −NO+ , FeIV −NO2− ), albeit
with very small weights. The summary of this analysis is depicted as a histogram
plot in Fig. 11. It turned out (rather surprisingly) that all studied {FeNO}7 complexes
are best described as roughly equal mixtures of the FeIII −NO− and FeII −NO0 res-
onance structures. This conclusion qualitatively agrees with Mössbauer spectra of
a number of known {FeNO}7 (S = 3/2) complexes, where the iron isomer shift is
placed consistently between the values characteristic of Fe(II) and Fe(III) states [144].
This wave function composition only slightly depends on the iron ligation; among
complexes with various ligands only the “brown ring” complex [Fe(H2 O)5 NO]2+
appears to have a predominant FeII − NO0 character (still, though, with considerable
participation of the FeIII − NO− structure). Moreover, a contribution of the previ-
ously suggested Fe(I)–NO+ structure [14, 49, 134] is very small, just a few %. In
other words, the “Fe(I)–NO+ ” description should merely be regarded as a formal
one, whereas in fact strong d → π ∗ backdonation repopulates the “empty” NO π ∗
9 The
assignment of oxidation states for a given VB-type structure comes down by calculating the
number of electrons in the Fe 3d and NO π ∗ fragment orbitals.
Electronic Properties of Iron Sites … 801
Contribution, %
terms of VB-type resonance
structures. Adapted with
60
permission from [138].
Copyright (2010) American
40
Chemical Society
20
Fe O
Fe )NO
Fe
Fe
Fe en)
Fe NO O
Fe len
Fe )(NH
(s
(s
(P
(P
(P NO
(H -
(P )NO
(T
al
a
)N
)(N
2
O
)5
H3
N
3
)N
)N
O
2+
O
2- 0 rest
Fe(IV)-NO- Fe(II)-NO+
Fe(III)-NO Fe(I)-NO
orbitals (of a hypothetical NO+ ) locating the effective iron oxidation number between
II and III.
The authors of Ref. [138] also pointed out that large differences between the spin
densities of the studied complexes are not reflected in changes of the effective Fe and
NO oxidation states. For instance, the effective oxidation states of Fe and NO groups
are sensitive neither to coordination of the axial ligand [FePNO → FeP(NH3 )NO] nor
to the change of the spin state, although these modifications change the spin density
distribution drastically. The latter fact is fully understandable since the doublet →
quartet transition rests simply on a spin promotion on Fe (dx 2 −y 2 → dx y , cf. Fig. 10),
with little participation of NO, which correlates well with very similar N–O distances
and the NO stretching frequencies in the both spin states. Based on the summarized
results, it must be noted that a common practice of taking spin densities as a measure
of oxidation states may be unjustified for the studied {FeNO}7 complexes, even if
the spin densities from modern ab initio methods can nowadays be trusted. Finally,
we notice that the presented approach to the assignment of the Fe and NO oxidation
states in {FeNO}7 complexes may be generalized to other metal–nitrosyl complexes.
For instance, an analogous approach as in Ref. [138] was used recently by Wieghardt
and coworkers for Tp*M(NO) species (where M = Co, Ni) [184].
alkyl sulfides), often with remarkably high regio- and stereoselectivity [111, 164].
It is estimated that ∼75% of drugs in clinical use are metabolized by P450—as well
as steroids, carcinogens, and many other xenobiotics [89]. Apart from the great bio-
logical importance, the interest of scientific community in P450 enzymes and the
related iron porphyrin complexes stems also from their unique catalytic properties
(e.g., a propensity to activate inert C–H bonds in aliphatic systems), which may
provide inspirations for designing new types of catalysts (synthetic complexes or
enzyme mutants) capable of performing organic transformations that are currently
considered very difficult or impossible.
During the catalytic cycle, the active site of a P450 enzyme is converted (in several
steps) from the resting (ferric) state into a very reactive, short-living intermediate
that is capable of transferring the oxo group to an organic compound. This active
species is usually assumed to be a ferryl [iron(IV)-oxo] porphyrin π radical cation,
(FeIV O)(P.+ ), which is known in the literature as Compound I (Cpd I). Formation
of P450 Cpd I has never been observed under catalytic turnover conditions (where
the last observable species is a precursor of Cpd I, so called Cpd 0 [32]), although
an iron(IV)-oxo porphyrin cation radical species can be generated (via a peroxide
shunt pathway) and detected spectroscopically by mixing an enzyme with a proper
oxidant under stopped-flow conditions [37, 79, 175]. Not observing P450 Cpd I in
action forms an indirect evidence for its high oxidative reactivity (which precludes
its accumulation during the catalytic cycle). On the other hand, however, synthetic
iron(IV)-oxo porphyrin cation radical complexes were, paradoxically, characterized
as sluggish oxidants in hydroxylation of C–H bonds [113]. Moreover, the Cpd I
species of a chloroperoxidase (CPO) enzyme (with very similar active site as in
P450s)—which is more stable than Cpd I of P450 enzymes and was obtained in
higher yields—turned out to be much less reactive towards C–H oxidation than
might be implied given a high reactivity of the P450 enzymes in analogous reac-
tions [143, 202]. These ambiguous kinetics results —together with elusive character
of the active intermediate—made some authors to suppose that Cpd I might not be
the actual active species of P450 enzymes. Instead, a ferric hydroperoxy species [19]
or a perferryl [i.e., iron(V)-oxo] electromer of Cpd I (more of which below) [115,
167, 192] have been proposed. Whereas the first possibility has been essentially ruled
out by experimental results [32] and theoretical calculations [164], the second one
is still intriguing, and to some extent supported by recent calculations (vide infra).
Only very recently Rittle and Green obtained a P450 Cpd I (from CYP 119 enzyme)
in much higher yields than in all previous studies with the aid of rapid freeze-quench
technique [143]. They proved that the spectral properties (UV/Vis, EPR, Mössbauer)
of the captured active species are consistent with its ferryl porphyrin cation radical
formulation [(FeIV O)(P.+ )]. Moreover, the authors were able to demonstrate that
the chemically generated Cpd I can hydroxylate aliphatic C–H bonds with high
efficiency, as obviously required for the active species of cytochrome P450 [143].
Due to elusive character of Cpd I and difficulties in providing experimental data
(especially before the cited work by Rittle and Green), much knowledge about the
physical and chemical properties of this active species was derived from theoretical
calculations [163, 164]. The calculations were extremely helpful already in iden-
Electronic Properties of Iron Sites … 803
tifying the electronic structure of the active oxidant species as iron(IV) porphyrin
radical cation. Moreover, the calculations fundamentally contributed to formulating
the mechanism of catalytic oxidation for different types of organic substrates. A num-
ber of authors currently attempt to apply DFT calculations on P450 models (more of
less directly) in predictive analyzes of drug metabolism [87, 89, 154]. Although the-
oretical modeling of enzymatic activity is certainly a fascinating subject, it is out of
scope of this contribution and cannot be covered here (comprehensive reviews can be
found elsewhere [112, 164, 165]). Instead, this section is focused on the description
of electronic properties of Cpd I and of the related model complexes with high-valent
iron. The primary purpose here is to summarize recent advances on this front and to
highlight new and important conclusions obtained from ab initio methods.
In the active site of P450 or CPO enzymes the iron-porphyrin system is axially
coordinated by the cysteine ligand trans to the oxo group. The Cpd I species can be
thus modeled as Fe(=O)P(SH) or Fe(=O)P(SCH3 ) complexes, in which the axial
cysteine is truncated to SH− /SCH− 3 (see, e.g., [104, 153]). More extensive models
with better representation of the cysteine and the porphyrin side chains were also
used in DFT and DFT/MM calculations (see, e.g., [156, 157]). However, the basic
electronic features of the Cpd I electronic structure are qualitatively reproduced for
the simplified models, albeit with noticeable effect of enzyme environment on the
electronic structure (vide infra).
In DFT calculations the ground state of Cpd I appears as a triradicaloid species:
∗
two unpaired electrons (residing in quasi-degenerate πFe=O orbitals) are coupled to
a local triplet state on the ferryl group, whereas a remaining unpaired electron is
delocalized on the ligands and is only weakly coupled with the FeO triplet [163,
164]. The character of the ligand radical turned out to be so much dependent on the
environment that Cpd I was called a chameleon species [105, 106]. For the small
models in gas phase the unpaired electron is described by a molecular orbital being
a combination of cysteine-based σS and the porphyrin a2u orbital. However, after
extending the models to account for the enzyme environment, the singly occupied
ligand orbital changes to mostly porphyrin a2u with a little σS admixture. This results
in a noticeable transfer of the spin density from the thiolate sulfur to the porphyrin
ring and, simultaneously, an increase of sulfur negative charge. Qualitatively similar
effect has been found not only in QM/MM calculations for realistic enzyme mod-
els [156], but also for simple model complexes [Fe(=O)P(SH)] with two ammonia
molecules added in the vicinity of the cysteine group (to model the S· · ·H−N hydro-
gen bonding interactions occurring in the enzyme environment) or even with the
model complex embedded in a continuous solvation medium (with dielectric con-
stant a ε = 5.7 or larger) [104, 105, 135]. In any case, hydrogen bonding and/or polar
environment reduces electron-donor properties of cysteine ligand, thus enhances the
stability of porphyrin π radical species as compared to the situation in gas phase.
Several alternative electronic states of Cpd I were also identified in the DFT stud-
804 M. Radoń and E. Broclawik
ies. These states showed similar local triplet on the ferryl group, but varied in the
character of a ligand radical, which was located either in a1u porphyrin π orbital or
in πS cysteine-based orbital (where πS is a nonbonding orbital of the sulfur atom
perpendicular to the Fe–S axis, in variance to the parallel σS one) [106, 135].
As already mentioned, the local triplet state on FeO (S1 = 1) is only weakly
coupled with the ligand radical (S2 = 1/2), producing a pair of close-lying states:
with the total quartet (S = 1 + 1/2 = 3/2) or doublet (S = 1 − 1/2 = 1/2) spin,
corresponding to either parallel or antiparallel alignment of the S1 , S2 spins. This
magnetic coupling can be described phenomenologically by an effective Heisenberg–
Dirac–van Vleck spin Hamiltonian
The principal electronic configuration for the S = 3/2 state (assuming the a2u
radical) is a single Slater determinant.
4
A2u = . . . (πx∗z )↑ (π yz
∗ ↑
) (a2u )↑ = . . . (πx∗z )α (π yz
∗ α
) (a2u )α , (21)
However, the principal configuration for the S = 1/2 state (again, assuming the a2u
radical) is a combination of the three Slater determinants given below.
2
A2u = . . . (πx∗z )↑ (π yz
∗ ↑
) (a2u )↓ = 23 . . . (πx∗z )α (π yz∗ α
) (a2u )β
+ − √16 . . . (πx∗z )α (π yz
∗ β
) (a2u )α − √16 . . . (πx∗z )β (π yz
∗ α
) (a2u )α (22)
∗
composed of the two πFe=O orbitals (πx∗z,yz ) and the porphyrin a2u , followed by
difference-dedicated CI (DDCI2).10 This CASSCF-DDCI approach was applied to
[Fe(=O)P(X)]+ cations (where P was porphin or tetraphenylporphin and the axial
ligand X was H2 O or nothing), similar to model Cpd I-like complexes studied exper-
imentally and to a model of Cpd I species in P450cam. The latter model (with
extended cysteine ligand) was considered both in gas phase and embedded in the
electrostatic charges taken from the preceding DFT:B3LYP/MM study [156] to sim-
ulate the enzyme environment. The work was focused on theoretical prediction of
spectroscopic (EPR, ENDOR, Mössbauer) parameters of Cpd I that would allow
a proper interpretation of the present and prospective experimental results for this
elusive compound. Selected conclusions from this study will be discussed below.
Independently, the authors of this chapter have applied CASSCF/CASPT2 method
to the Fe(=O)P(SH) model in gas phase [135], having remarkably more active orbitals
on the ferryl group (the bonding πx z,yz and the antibonding πx∗z,yz ; the bonding σz 2 and
antibonding σz∗2 ) as well as on the porphin ring (a2u , a1u , eg ) and on the thiolate ligand
(σS , πS ). This study pointed out the importance of sulfur-centered active orbitals (σS ,
πS ) since in the gas phase (prior to including dynamical correlation) the sulfur-based
triradicaloid states (2,4 ΣS , 2,4 ΠS ) had much lower energy at CASSCF level than the
porphyrin-based states (2,4 A2u ). Moreover, the A2u and ΣS configurations strongly
mixed with each other at CASSCF level, enforcing the use of state-averaged (SA)
approach (to capture simultaneously the both) and giving rise to large computational
difficulties. As a remedy, the multi-state (MS) CASPT2 approach was used, in which
the states from SA-CASSCF were allowed to re-mix in response to dynamical corre-
lation. Finally, the ground state with mixed porphyrin–sulfur cation radical character
(a2u –σS ) was obtained, having a rather similar character to the ground state obtained
from DFT (B3LYP) calculations. In the next relevant paper, Thiel and coworkers [4]
performed QM/MM calculations for P450cam with the CASSCF-DDCI+Q method,
but taking a more extensive active space than in the previous study from the same
group (Ref. [157]). Besides Cpd I, also its precursor (Cpd 0) and a hydroxy interme-
diate in catalytic camphor hydroxylation were studied. The active space for Cpd I was
composed of the relevant orbitals on the ferryl group (πx z,yz , πx∗z,yz , σz 2 , σz∗2 ) and on
the porphyrin ring (a1u , a2u ), along with the pair of σx y , σx∗y orbitals (to describe cova-
lency of the Fe–Nporphyrin bonding), and two “double-shell”-like orbitals for 3dx z,yz .
Unfortunately, this (13in12) active space led to convergence only for the quartet, but
not for the doublet states of Cpd I.
Beside answering many specific questions, a common goal of the three aforemen-
tioned ab initio studies [4, 135, 157] was to validate the picture of electronic structure
stemming from DFT calculations and, in particular, the description of the magnetic
coupling. Clearly, due to single-configurational Kohn-Sham wave function, the DFT
method runs into problems for the antiferromagnetically coupled doublet state (22),
which are reflected in the calculated spin density distributions and in spin popula-
10 However, the active orbitals in this study were not obtained self-consistently at CASSCF level
but were taken from restricted open-shell DFT (BP86) calculations.
806 M. Radoń and E. Broclawik
tions on the ferryl group and on the ligands. While DFT calculations predict a spin
population of ∼+2 on FeO and ∼−1 on the ligands (porphyrin and cysteine counted
together), the ab initio calculations point to spin populations with much smaller abso-
lute values. In fact, since three different determinants contribute to the correct (spin
adapted) wave function in eq. (22), the correct spin population for the ferryl group
should be close to +4/3 (i.e., 2/3 per each πx∗z,yz orbital), and −1/3 for the por-
phyrin and cysteine ligands [22]. Indeed, similar spin populations are obtained in ab
initio calculations that correctly take into account a multiconfigurational character of
the doublet state [22, 135, 157]. Due to the clear deficiency of single determinantal
description, it was suggested that spin-unrestricted DFT calculations should not be
used to calculate spin-dependent properties of Cpd I and the related systems, at least
without applying an appropriate spin-projection procedure [4, 157].
In spite of these problems, the DFT calculations yield rather realistic energet-
ics relevant to description of the magnetic coupling in Cpd I. In the mentioned
CASSCF-DDCI/MM study, Neese and Thiel et al. found that this level of theory
gives very similar relative energies of the S = 3/2 and S = 1/2 spin states of Cpd
I to those obtained in the previous B3LYP/MM calculations [157]. Both methods
predict a small antiferromagnetic coupling in P450cam Cpd I (J < 0) and give very
similar values of the doublet–quartet splitting. This state ordering is in agreement
with experimental data of CPO Cpd I. In contrast, for the six-coordinated model
complex [Fe(=O)(TPP.+ )(H2 O)]+ both B3LYP and DDCI point to quartet ground
state (J > 0), also in agreement with experimental data. The exchange coupling is
relatively small for both situations an the EPR spectra of the Cpd I species should
be dominated by relatively large zero-field splitting (ZFS), which is rooted in the
electronic structure of the ferryl group [157]. In was concluded that, taking spin
orbit interaction into account, Cpd I should have three close-lying Kramers doublets
(arising from mixing between the S = 1/2 and S = 3/2 spin states) which all are
populated at room temperature and may potentially contribute to the reactivity [157].
The difference in the sign of J between enzymes and thiolate-ligated complexes
(J < 0) in contrast to simple model complexes without thiolate ligand (J > 0) is
attributed to occurrence of an additional, weakly bonding interaction in the former
species: between one of the singly occupied ferryl orbitals (πx∗z ) and the ligand-
radical orbital (a2u –σS ) [195]. Without this effect, the quartet configuration (21)
would always have lower energy than the corresponding doublet one (22), since the
parallel spin alignment in the quartet state benefits from a larger exchange stabi-
lization (see Sect. 2). As argued by Weiss et al., this bonding stabilization of the
doublet state can only be effective if the symmetry is lowered (to Cs or C1 ) and
if the ligand radical is at least partially delocalized toward the ferryl group [195].
The presence of a soft sulfur atom provides the necessary delocalization of spin
density, absent from the non-thiolated Cpd I analogues. Green used natural mag-
netic orbitals to provide computational validation of this model on the basis of DFT
calculations [47, 48]. We found, however, that the nonhybrid DFT methods (BLYP,
BP86) point to much larger (and presumably overestimated) bonding effect than the
hybrid functional (B3LYP), which is in line with a tendency of the former functionals
to overestimate covalent bonding [121]. Consequently, the nonhybrid DFT methods
Electronic Properties of Iron Sites … 807
yield much larger splitting between the S = 1/2 and S = 3/2 electronic states with
the same radical character (i.e., a2u –σS or πS in Ref. [135]). The J values as large
as ∼ − 2 kcal/mol can be obtained with these functionals in contrast to J values on
the order of −0.1 kcal/mol obtained from hybrid DFT (the latter ones more consist
with experimental data). Neese and Thiel et al. noticed that the considered energy
lowering of the doublet state is essentially a multiconfigurational effect, being
described in CASSCF by mixing of the antiferromagnetically-coupled doublet con-
figuration (22) with other doublet configurations [4].
The most recent theoretical studies of Cpd I [21] or the related, Cpd I-like mod-
els [139] are largely focused on the energetics of the variety of their electromeric
states (i.e., electronic states with different iron spin or different charge distribution).
This problem is illustrated in Fig. 12. Starting from the “traditional” triradicaloid
electronic structure of iron(IV)-oxo porphyrin radical cation (top–left part of the
scheme), one may consider that: (a) the iron(IV)-oxo group is promoted to its high-
spin state (S = 2), yielding pentaradicaloid states (shown in the top–right part of
Fig. 12) or (b) open-shell porphyrin radical cation gets one electron from the iron,
becoming a closed shell porphyrin whereas the iron is oxidized to the perferryl state
[Fe(V)-oxo] (the bottom part of Fig. 12). With its Fe d3 configuration, the hypo-
thetical perferryl electromer of Cpd I may exist in the low- and high-spin states.
Moreover, the tri- and pentaradicaloid states may have a radical either in a2u or a1u
(or, in case of thiolate-ligated systems, alternatively in σS or πS ). In addition, for each
type of radical the ferryl multiplet and the ligand radical doublet may couple either to
ferro- or antiferromagnetic states, thus yielding in each case a pair of close lying spin
states: quartet/doublet for the triradicaloids, sextet/quartet for the pentaradicaloids.
Thus, there are plenty of possible iron(IV) and iron(V) electronic states with different
spin to be considered for Cpd I and Cpd I-related complexes. Radoń et al. pointed
out [139] that a reliable theoretical description of these electromeric states will have
to deal with at least two tricky issues: spin state energetics of the iron (see Sect. 3.1)
and the question of ligand noninnocence (see below) [42].
The perferryl versus ferryl-radical electromery here is, indeed, an example
of a more general issue of ligand innocence/noninnocence in transition metal
complexes [42, 133]. In this view, the porphyrin is “noninnocent” in the tri-
/pentaradicaloid states, but it would be “innocent” in the (hypothetical) perferryl
state. Although true Fe(V) complexes are exceedingly rare [34], some macrocycles—
notably TAML [108] and possibly also corrole [59] —can stabilize this high oxidation
state. A natural question thus arises whether porphyrin can do the same without being
immediately oxidized. On the basis of laser flash photolysis (LFP) experiments it
was speculated that the perferryl electromer Cpd I-like complex is initially formed
during photochemically induced oxidation and that it is stable enough to be “seen”
808 M. Radoń and E. Broclawik
vertical energies at the common structure of the triradicaloid state (obtained from
DFT/MM optimization of each enzyme model).
Both mentioned studies employed large active spaces to describe the Fe atom
(with double-shell effect), the covalent Fe = O and Fe–Nporphyrin bonding, as well as
the ligand radical on porphyrin (or, optionally, on sulfur in Ref. [21]). However, by
making active not only the porphyrin HOMOs (a2u and a1u ) but also the (degener-
ated) LUMO (eg ), Radoń et al. managed to describe all the states considered with
a common active space (15in16). In contrast, Chen et al. used slightly different
active space for different states, adapting it to the orbital occupancies for the state
being calculated in state-specific calculations. The electronic states with porphyrin-
/ sulfur-radical character were described with an active space (13in13) containing
only one of the relevant ligand orbitals (a2u , a1u , σS , or πS ), i.e., the one being approx-
imately singly occupied in the electronic state considered, whereas for providing the
energies of the perferryl states the authors used a smaller active space (11in12), with
none of these ligand orbitals included. Such a procedure was (in part) validated by
performing selected calculations with an active space (15in14) containing both a2u
and a1u orbitals. Nonetheless, despite these notable differences in the computational
methodology and different choices of the model systems, the both studies actually
arrived at quite similar conclusions in regard to stability of the various electromeric
states.
Perhaps the most intriguing result from [21, 139] is that the perferryl states were
found at surprisingly low energies in the both sets of calculations. For the enzyme
models, the LS FeV state was found only at 6–7 kcal/mol above the a2u -based FeIV
triradicaloid, and the HS FeV state at even lower relative energy (∼2 kcal/mol).
Considering the adiabatic energies, Radoń et al. found the analogous perferryl states
even below the triradicaloid states for both Fe(=O)P+ and Fe(=O)P(Cl). However,
this is a situation in gas phase, whereas upon considering an effect of polarizable
medium these states are shifted up in energy by a few kcal/mol.11 Nonetheless,
even in a polar medium the perferryl states are still low-lying (below 10 kcal/mol
from the triradicaloid states). These results are surprising given that in all previous
DFT calculations the perferryl states of Cpd I were found very high in energy (16–
26 kcal/mol already in gas phase, and presumably even higher in polar medium /
protein environment) [3, 106]. However, all DFT studies so far were based on B3LYP
functional, whereas Radoń et al. found that the relative energies of the perferryl
(FeV )(P) and ferryl-radical (FeIV )(P.+ ) states are considerably functional dependent.
The hybrid functionals (B3LYP, B3LYP*) place the perferryl states at high energy,
while the nonhybrid ones (BP86, OLYP) at much lower energy, in accord with the ab
initio predictions. This issue is an intriguing and potentially important one because
11 The medium effect ranges from 2–4 kcal/mol for the five-coordinated Fe(=O)P+ to 4–8 kcal/mol
for the six-coordinated Fe(=O)P(Cl) complex, with only slight dependence on the dielectric constant
(ε = 5.7 and 78 were tested), but nearly not depending on the exchange-correlation functional used,
nor on a specific solvent model (PCM, COSMO) used in the calculations. This suggests that the
effect is rooted simply in larger electrostatic stabilization of the iron(IV)-oxo porphyrin cation
radical states as compared to the iron(V)-oxo states with a closed-shell porphyrin.
810 M. Radoń and E. Broclawik
most DFT calculations for Cpd I systems and of their reactivity are based solely on
the B3LYP functional.
In both mentioned papers many cross-checks were carried out to test whether the
perferryl states are not artificially lowered in energy [21, 139]. With initial suspect
that the original active space (15in16) might not be extensive enough for a balanced
description of iron-to-porphyrin charge transfer the authors of Ref. [139] tested the
effect of enlarging the active space on the porphyrin ring. Their suspicion was based
on the previous experience with copper corroles [131] for which there is a copper-
to-corrole charge transfer analogous to iron-to-porphyrin charge transfer here. To
test the effect of enlarging the active space on porphyrin, the restricted active space
approach (RASSCF/RASPT2) was used since the resulting active spaces were too
large to be handled in generic CASSCF/CASPT2. The final active space proposed for
RASSCF/RASPT2 calculations was based on as many as 28 active orbitals, including
16 π , π ∗ orbitals on the porphyrin ring.12 However, this substantial enlargement of
the active space led to even slightly larger stabilization of the FeV states, confirming
that they lie indeed low in energy as already the CASPT2 calculations suggested.
Moreover, Radoń et al. carried also benchmark CCSD(T) calculations for a small
model complex Fe(=O)(L2 )+ (where L = C3 N2 H− 5 , analogous to small models shown
in Fig. 3), which can be considered a mimic of Fe(=O)P+ . Interestingly, the relative
energies of the perferryl and ferryl-radical states for this model are very similar at
RASPT2 and CCSD(T) levels, suggesting that the perferryl–ferryl gaps for larger
models are also correctly reproduced at CASPT2 or RASPT2 level.
If according to ab initio methods the perferryl states are lying just a few kcal/mol
above the triradicaloid states, they should be thermally accessible at room tempera-
ture and they might, indeed, appear in the mentioned LFP experiments (vide infra).
It is also possible that the metastable perferryl state can somehow contribute to reac-
tivity of Cpd I, for instance through the previously suggested mechanism [59, 167],
in which the perferryl electromer of Cpd I is initially formed (by a heterolytic FeO–
O bond cleavage in Cpd 0, the Cpd I precursor) and it quickly oxidizes an organic
substrate (due to small activation energy on this pathway), before it can isomerize to
a more stable ferryl–radical form of Cpd I. We notice that very recent DFT calcula-
tions by Isobe and coworkers seem to support a possibility of this intriguing scenario
in hydroxylation of aliphatic C–H bonds [70, 71].
In the other aspects (i.e., apart from predicting the existence of low-lying perferryl
states) the two mentioned ab initio studies generally support the view of the Cpd I
electronic structure established earlier on basis of DFT and DFT/MM calculations
(with the hybrid B3LYP functional). Considering the triradicaloid states, many pos-
sibilities were considered by Chen et al. The sulfur-based radicaloids (σS , πS ) were
found very high in energy for both enzyme models because they are destabilized
in protein environment. The a1u porphyrin-based radicaloid states were found 18–
12 In these calculations only 6 ferryl-based orbitals (πx z,yz , πx∗z,yz , σz 2 , σz∗2 ) and all (remaining)
singly occupied orbitals (e.g., a2u in the tri-/pentaradicaloids) were placed in the RAS2 subspace
(see Sect. 2.2), whereas the other active orbitals were placed in RAS1 (if nearly doubly occupied)
or in RAS3 (if nearly empty).
Electronic Properties of Iron Sites … 811
19 kcal/mol higher than the corresponding a2u -based radicaloids. This result falls
between the DFT/MM result (12 kcal/mol) [3] and the previous gas-phase CASPT2
estimate (∼23 to 25 kcal/mol) [135], but it is much higher than predicted by Altun et
al. in their CASSCF-DDCI+Q/MM study [4]. It was found that in correlated calcu-
lations a reasonable splitting between the two types of states is recovered only after
including dynamical correlation (e.g., in the CASPT2 step) [22]. Moreover, Radoń et
al. noticed that for the model complexes the gap between the a1u - and a2u -based rad-
icaloids is significantly affected by the quality of the active space. The standard one
(with only four frontier orbitals on the porphyrin: a1u , a2u , and degenerate eg ) cannot
provide a realistic splitting at the CASPT2 level, but RASPT2 calculations with the
larger active space (vide supra) resolved this problem. Concerning the triradicaloid
2,4
A2u or 2,4 A1u states, the standard active space also points to the doublet-below-
quartet state ordering, which is not correct for these model complexes without the
axial thiolate (see Ref. [139]). Again, this problem is corrected in the RASPT2 cal-
culations based on the extended active space.
Finally, the both cited studies considered also the relative energy of the pentarad-
icaloid FeIV states (with the HS state on the iron) as compared to the triradicaloid FeIV
states (with the IS state on the iron). Here, however, the CASPT2 or RASPT2 spin
state energetics is probably biased in favor of the HS state, likewise it was found for the
ferrous complexes discussed in Sect. 3.1. Such behavior is indeed suggested by com-
parison with CCSD(T) calculations for the small mimicking complex (vide supra)
carried by Radoń et al. [139]. Therefore, the present CASPT2 (or RASPT2) calcula-
tions most likely place these pentaradicaloid states of Cpd I too low in energy. Taking
all this into account, Chen and Shaik suggested that the previous DFT (B3LYP) gap
of ∼12 between tri- and pentaradicaloid states (see Ref. [3]) may be in fact a good
estimate [22]. Although the pentaradicaloid states have higher energy than the tri-
radicaloid ones for the equilibrium geometry Cpd I, they are considerably stabilized
during the reaction pathway for C–H hydroxylation via exchange interactions [3].
This effect may potentially give rise to exchange-enhanced reactivity of Cpd I if
(obviously) the pentaradicaloid state is not too high in energy at the beginning of
the reaction [166]. Final resolving this issue will require more credible calculation
of spin state energetics, not currently possible (see Sect. 3.1).
In sum, correlated ab initio calculations for Cpd I revealed a number of low-lying
electromeric states, with different chemical character, some of which were missed in
previous DFT calculations. The pentaradicaloid and perferryl states may potentially
contribute to multi-state reactivity of Cpd I, and this issue certainly deserves careful
investigation in near future.
4 Concluding Remarks
References
1. Adler, T.B., Knizia, G., Werner, H.J.: A simple and efficient CCSD(T)-F12 approximation. J.
Chem. Phys. 127(22), 221106 (2007). https://doi.org/10.1063/1.2817618
2. Ali, M.E., Sanyal, B., Oppeneer, P.M.: Electronic structure, spin-states, and spin-crossover
reaction of heme-related Fe-porphyrins: a theoretical perspective. J. Phys. Chem. B 116(20),
5849–5859 (2012). https://doi.org/10.1021/jp3021563
3. Altun, A., Shaik, S., Thiel, W.: What is the active species of cytochrome P450 during camphor
hydroxylation? QM/MM studies of different electronic states of compound I and of reduced
and oxidized iron-oxo intermediates. J. Am. Chem. Soc. 129(29), 8978–8987 (2007). https://
doi.org/10.1021/ja066847y
4. Altun, A., Kumar, D., Neese, F., Thiel, W.: Multireference ab initio quantum mechan-
ics/molecular mechanics study on intermediates in the catalytic cycle of cytochrome P450cam.
J. Phys. Chem. A 112, 12,904–12,910 (2008). https://doi.org/10.1021/jp802092w
5. Andersson, K., Malmqvist, P.Å., Roos, B.O.: Second-order perturbation theory with a com-
plete active self-consistent field reference function. J. Chem. Phys. 96(2), 1218–1226 (1991)
6. Angeli, C., Borini, S., Cavallini, A., Cestari, M., Cimiraglia, R., Ferrighi, L., Sparta, M.:
Developments in the N-electron valence state perturbation theory. Int. J. Quantum. Chem.
106(3), 686–691 (2006). https://doi.org/10.1002/qua.20831
7. Aquilante, F., Malmqvist, P.Å., Pedersen, T.B., Ghosh, A., Roos, B.O.: Cholesky
decomposition-based multiconfiguration second-order perturbation theory (CD-CASPT2):
application to the spin-state energetics of CoIII (diiminato)(NPh). J. Chem. Theory Comput.
4(5), 694–702 (2008). https://doi.org/10.1021/ct700263h
8. Balabanov, N.B., Peterson, K.A.: Systematically convergent basis sets for transition metals.
I. All-electron correlation consistent basis sets for the 3d elements Sc–Zn. J. Chem. Phys.
123(064), 107 (2005). https://doi.org/10.1063/1.1998907
9. Bartlett, R.J., Musial, M.: Coupled-cluster theory in quantum chemistry. Rev. Mod. Phys. 79,
291–352 (2007). https://doi.org/10.1103/RevModPhys.79.291
10. Barysz, M.: Two-component relativistic theories. In: Barysz, M., Ishikawa, Y. (eds.) Rela-
tivistic methods for chemists, no. 10 in challenges and advances in computational chemistry
and physics, pp. 165–190. Springer, The Netherlands (2010). https://doi.org/10.1007/978-1-
4020-9975-5_4
11. Blomberg, L.M., Blomberg, M.R., Siegbahn, P.E.: A theoretical study of the binding of O2 ,
NO and CO to heme proteins. J. Inorg. Biochem. 99, 949–958 (2005). https://doi.org/10.
1016/j.jinorgbio.2005.02.014
12. Blomberg, M.R., Johansson, A.J., Siegbahn, P.E.: O–O bond cleavage in dinuclear peroxo
complexes of iron porphyrins: a quantum chemical study. Inorg. Chem. 46, 7992–7997 (2007)
13. Brucker, E., Olson, J., Ikeda-Saito, M., Phillips Jr., G.: Nitric oxide myoglobin: crystal struc-
ture and analysis of ligand geometry. Proteins 30, 352–356 (1998). 10.1002/(SICI)1097-
0134(19980301)30:4<352::AID-PROT2>3.0.CO;2-L
14. Burlamacchi, L., Martini, G., Tiezzi, E.: Electron spin resonance of iron-nitric oxide com-
plexes. Iron-nitrosyl-halide compounds. Inorg. Chem. 8(9), 2021–2025 (1969). https://doi.
org/10.1021/ic50079a047
814 M. Radoń and E. Broclawik
15. Caffarel, M.: Quantum monte carlo in chemistry. In: Engquist, B. (ed.) Encyclopedia of
Applied and Computational Mathematics. Springer, Berlin (2011)
16. Caffarel, M., Daudey, J.P., Heully, J.L., Ramírez-Solís, A.: Towards accurate all-electron
quantum Monte Carlo calculations of transition-metal systems: spectroscopy of the copper
atom. J. Chem. Phys. 123(094), 102 (2005). https://doi.org/10.1063/1.2011393
17. Cao, X., Dolg, M.: Relativistic pseudopotentials. In: Barysz, M., Ishikawa, Y. (eds.) Rela-
tivistic Methods for Chemists, Challenges and Advances in Computational Chemistry and
Physics, vol. 10, pp. 215–277. Springer, The Netherlands (2010). https://doi.org/10.1007/
978-1-4020-9975-5_6
18. Capece, L., Estrin, D.A., Marti, M.A.: Dynamical characterization of the heme NO oxygen
binding (HNOX) domain. Insight into soluble guanylate cyclase allosteric transition. Bio-
chemistry 47(36), 9416–9427 (2008). https://doi.org/10.1021/bi800682k
19. Chandrasena, R.E.P., Vatsis, K.P., Coon, M.J., Hollenberg, P.F., Newcomb, M.: Hydroxylation
by the hydroperoxy-iron species in cytochrome p450 enzymes. J. Am. Chem. Soc. 126(1),
115–126 (2004). https://doi.org/10.1021/ja038237t
20. Chen, H., Ikeda-Saito, M., Shaik, S.: Nature of the Fe-O2 bonding in oxy-myoglobin: effect
of the protein. J. Am. Chem. Soc. 130(44), 14778–14790 (2008). https://doi.org/10.1021/
ja805434m
21. Chen, H., Song, J., Lai, W., Wu, W., Shaik, S.: Multiple low-lying states for compound I of
P450cam and chloroperoxidase revealed from multireference ab initio QM/MM calculations.
J. Chem. Theory Comput. 6(3), 940–953 (2010). https://doi.org/10.1021/ct9006234
22. Chen, H., Lai, W., Shaik, S.: Multireference and multiconfiguration ab initio methods in heme-
related systems: what have we learned so far? J. Phys. Chem. B 115(8), 1727–1742 (2011).
https://doi.org/10.1021/jp110016u
23. Chen, O., Groh, S., Liechty, A., Ridge, D.P.: Binding of nitic oxide to iron(II) porphrins: radia-
tive association, blackbody infrared radiative dissociation, and gas-phase association equilib-
rium. J. Am. Chem. Soc. 121, 11,910–11,911 (1999). https://doi.org/10.1021/ja991477h
24. Choe, Y.K., Hashimoto, T., Nakano, H., Hirao, K.: Theoretical study of the electronic ground
state of iron(II) porphine. Chem. Phys. Lett. 295, 380–388 (1998)
25. Choe, Y.K., Nakajima, T., Hirao, K., Lindh, R.: Theoretical study of the electronic ground
state of iron(II) porphine. J. Chem. Phys. 111(9), 3837–3845 (1999). https://doi.org/10.1063/
1.479687
26. Collman, J.P.: Functional analogs of heme protein active sites. Inorg. Chem. 36(23), 5145–
5155 (1997). https://doi.org/10.1021/ic971037w
27. Collman, J.P., Hoard, J.L., Kim, N., Lang, G., Reed, C.A.: Synthesis, stereochemistry, and
structure-related properties of α, β, γ , δ-tetraphenylporphinatoiron(II). J. Am. Chem. Soc.
97, 2676–2681 (1975). https://doi.org/10.1021/ja00843a015
28. Collman, J.P., Brauman, J.I., Iverson, B.L., Sessier, J.L., Morris, R.M., Gibson, Q.H.: O2
and CO binding to iron(II) porphyrins: a comparison of the “picket fence” and “pocket”
porphyrins. J. Am. Chem. Soc. 105, 3052–3064 (1983)
29. Conradie, J., Ghosh, A.: DFT calculations on the spin-crossover complex Fe(salen)(NO): a
quest for the best functional. J. Phys. Chem. B 111, 12,621–12,624 (2007). https://doi.org/
10.1021/jp074480t
30. Conradie, J., Quarless, D., Hsu, H.F., Harrop, T., Lippard, S., Koch, S., Ghosh, A.: Electronic
structure and FeNO conformation of nonheme iron-thiolate-NO complexes: an experimental
and DFT study. J. Am. Chem. Soc. 129(34), 10,446–10,456 (2007). https://doi.org/10.1021/
jp076979t
31. Cramer, C.J., Truhlar, D.G.: Density functional theory for transition metals and transition
metal chemistry. Phys. Chem. Chem. Phys. 11, 10,757–10,816 (2009). https://doi.org/10.
1039/b907148b
32. Davydov, R., Makris, T.M., Kofman, V., Werst, D.E., Sligar, S.G., Hoffman, B.M.: Hydrox-
ylation of camphor by reduced oxy-cytochrome P450cam: mechanistic implications of EPR
and ENDOR studies of catalytic intermediates in native and mutant enzymes. J. Am. Chem.
Soc. 123(7), 1403–1415 (2001). https://doi.org/10.1021/ja003583l. (pMID: 11456714)
Electronic Properties of Iron Sites … 815
33. Denisov, I.G., Makris, T.M., Sligar, S.G., Schlichting, I.: Structure and chemistry of
cytochrome p450. Chem. Rev. 105, 2253–2278 (2005). https://doi.org/10.1021/cr0307143
34. Dey, A., Ghosh, A.: “True” iron(V) and iron(VI) porphyrins: a first theoretical exploration. J.
Am. Chem. Soc. 124(13), 3206–3207 (2002). https://doi.org/10.1021/ja012402s
35. Dolphin, D., Sams, J.R., Tsin, T.B., Wong, K.L.: Synthesis and Moessbauer spectra of
octaethylporphyrin ferrous complexes. J. Am. Chem. Soc. 98, 6970–6975 (1976). https://
doi.org/10.1021/ja00438a037
36. Dunning, T.H.: Gaussian basis sets for use in correlated molecular calculations. I. The atoms
boron through neon and hydrogen. J. Chem. Phys. 90(2), 1007–1023 (1989). https://doi.org/
10.1063/1.456153
37. Egawa, T., Shimada, H., Ishimura, Y.: Evidence for compound I formation in the reaction
of cytochrome-P450cam with m-chloroperbenzoic acid. Biochem. Biophys. Res. Commun.
201(3), 1464–1469 (1994). https://doi.org/10.1006/bbrc.1994.1868
38. van Eldik, R.: Fascinating inorganic/bioinorganic reaction mechanisms. Coord. Chem. Revs.
251(13–14), 1649–1662 (2007). https://doi.org/10.1016/j.ccr.2007.02.004. (37th Interna-
tional Conference on Coordination Chemistry, Cape Town, South Africa)
39. Ellison, M., Schulz, C., Scheidt, W.: Structure of the deoxymyoglobin model [Fe(TPP)(2-
MeHIm)] reveals unusual porphyrin core distortions. Inorg. Chem. 41(8), 2173–2181 (2002).
https://doi.org/10.1021/ic020012g
40. Enemark, J., Feltham, R.: Principles of structure, bonding, and reactivity for metal nitro-
syl complexes. Coord. Chem. Revs. 13(4), 339–406 (1974). https://doi.org/10.1016/S0010-
8545(00)80259-3
41. Frenking, G., Fröhlich, N.: The nature of the bonding in transition-metal compounds. Chem.
Rev. 100, 717–774 (2000)
42. Ghosh, A.: Transition metal spin state energetics and noninnocent systems: challenges for
DFT in the bioinorganic area. J. Biol. Inorg. Chem. 11, 712–724 (2006)
43. Goddard III, W.A., Olafson, B.D.: Ozone model for bonding of an O2 to heme in oxyhe-
moglobin. Proc. Nat. Acad. Sci. 72, 2335–2339 (1975)
44. Goff, H., La Mar, G.N.: High-spin ferrous porphyrin complexes as models for deoxymyo-
globin and -hemoglobin: a proton nuclear magnetic resonance study. J. Am. Chem. Soc. 99,
6599–6606 (1977). https://doi.org/10.1021/ja00462a022
45. Goff, H., La Mar, G.N., Reed, C.A.: Nuclear magnetic resonance investigation of magnetic
and electronic properties of “intermediate spin” ferrous porphyrin complexes. J. Am. Chem.
Soc. 99, 3641–3646 (1977). https://doi.org/10.1021/ja00453a022
46. Goodrich, L.E., Paulat, F., Praneeth, V.K.K., Lehnert, N.: Electronic structure of heme-
nitrosyls and its significance for nitric oxide reactivity, sensing, transport, and toxicity in bio-
logical systems. Inorg. Chem. 49(14), 6293–6316 (2010). https://doi.org/10.1021/ic902304a
47. Green, M.T.: Evidence for sulphur-based radicals in thiolate compound I intermediates. J.
Am. Chem. Soc. 121, 7939–7940 (1999)
48. Green, M.T.: The structure and spin coupling of catalase compound I: a study of noncovalent
effects. J. Am. Chem. Soc. 123(37), 9218–9219 (2001). https://doi.org/10.1021/ja010105h.
(pMID: 11552853)
49. Griffith, W.P., Lewis, J., Wilkinson, G.: Some nitric oxide complexes of iron and copper. J.
Chem. Soc. 1958, 3993–3998 (1958). https://doi.org/10.1039/JR9580003993
50. Grimme, S.: Accurate description of van der Waals complexes by density functional theory
including empirical corrections. J. Comp. Chem. 25(12), 1463–1473 (2004). https://doi.org/
10.1002/jcc.20078
51. Grimme, S.: Semiempirical hybrid density functional with perturbative second-order corre-
lation. J. Chem. Phys. 124(034), 108 (2006)
52. Grimme, S., Antony, J., Schwabe, T., Mück-Lichtenfeld, C.: Density functional theory
with dispersion corrections for supramolecular structures, aggregates, and complexes of
(bio)organic molecules. Org. Biomol. Chem. 5, 741–758 (2007). https://doi.org/10.1039/
b615319b
816 M. Radoń and E. Broclawik
53. Grimme, S., Antony, J., Ehrlich, S., Krieg, H.: A consistent and accurate ab initio parametriza-
tion of density functional dispersion correction (DFT-D) for the 94 elements H–Pu. J. Chem.
Phys. 132(15), 154,104 (2010). https://doi.org/10.1063/1.3382344
54. Groves, J.: Models and mechanisms of cytochrome P450 action. In: Ortiz de Montellano,
P. (ed.) Cytochrome P450: Structure, Mechanism and Biochemistry, pp. 1–43. Kluwer Aca-
demic/Plenum Publishers, Dordrecht (2005). https://doi.org/10.1007/0-387-27447-2_1
55. Guallar, V., Olsen, B.: The role of the heme propionates in heme biochemistry. J. Inorg.
Biochem. 100(4), 755–760 (2006). https://doi.org/10.1016/j.jinorgbio.2006.01.019. (ce:title
High-valent iron intermediates in biology/ce:title xocs:full-name High-valent iron intermedi-
ates in biology/xocs:full-name)
56. Gütlich, P., Goodwin, H.A.: Spin crossover-an overall perspective. In: Gütlich, P., Goodwin,
H. (eds.) Spin Crossover in Transition Metal Compounds I, Topics in Current Chemistry, vol.
233, pp. 1–47. Springer, Berlin (2004). https://doi.org/10.1007/b13527
57. Hampel, C., Werner, H.J.: Local treatment of electron correlation in coupled cluster theory.
J. Chem. Phys. 104(16), 6286–6297 (1996). https://doi.org/10.1063/1.471289
58. Handy, N.C., Cohen, A.J.: Left-right correlation energy. Mol. Phys. 99(5), 403–412 (2001)
59. Harischandra, D., Zhang, R., Newcomb, M.: Photochemical generation of a highly reactive
iron-oxo intermediate. A true iron(V)-Oxo species? J. Am. Chem. Soc. 127(40), 13,776–
13,777 (2005)
60. Harvey, J.N.: On the accuracy of density functional theory in transition metal chemistry.
Annu. Rep. Prog. Chem. Sect. C: Phys. Chem. 102, 203–226 (2006). https://doi.org/10.1039/
b419105f
61. Harvey, J.N.: The coupled-cluster description of electronic structure: perspectives for bioinor-
ganic chemistry. J. Biol. Inorg. Chem. 16, 831–839 (2011). https://doi.org/10.1007/s00775-
011-0786-7
62. Helgaker, T., Klopper, W., Koch, H., Noga, J.: Basis-set convergence of correlated calculations
on water. J. Chem. Phys. 106, 9639–9646 (1997). https://doi.org/10.1063/1.473863
63. Henderson, T.M., Janesko, B.G., Scuseria, G.E.: Range separation and local hybridization in
density functional theory. J. Phys. Chem. A 112(49), 12,530–12,542 (2008). https://doi.org/
10.1021/jp806573k
64. Hirao, K.: Multireference Møller-Plesset method. Chem. Phys. Lett. 190(3–4), 374–380
(1992). https://doi.org/10.1016/0009-2614(92)85354-D
65. Hopmann, K.H., Conradie, J., Ghosh, A.: Broken-symmetry DFT spin densities of iron nitro-
syls, including roussin’s red and black salts: striking differences between pure and hybrid func-
tionals. J. Phys. Chem. B 113(30), 10,540–10,547 (2009). https://doi.org/10.1021/jp904135h
66. Hu, C., Roth, A., Ellison, M., An, J., Ellis, C., Schulz, C., Scheidt, W.: Electronic configuration
assignment and the importance of low-lying excited states in high-spin imidazole-ligated
iron(II) porphyrinates. J. Am. Chem. Soc. 127(15), 5675–5688 (2005). https://doi.org/10.
1021/ja044077p
67. Hu, C., An, J., Noll, B.C., Schulz, C.E., Scheidt, W.R.: Electronic configuration of high-spin
imidazole-ligated iron(II) octaethylporphyrinates. Inorg. Chem. 45(10), 4177–4185 (2006).
https://doi.org/10.1021/ic052194v
68. Hughes, T.F., Friesner, R.A.: Correcting systematic errors in DFT spin-splitting energetics for
transition metal complexes. J. Chem. Theory Comput. 7(1), 19–32 (2011). https://doi.org/10.
1021/ct100359x
69. Hughes, T.F., Harveyb, J.N., Friesner, R.A.: A B3LYP-DBLOC empirical correction scheme
for ligand removal enthalpies of transition metal complexes: parameterization against exper-
imental and CCSD(T)-F12 heats of formation. Phys. Chem. Chem. Phys. 14, 7724–7738
(2012). https://doi.org/10.1039/c2cp40220c
70. Isobe, H., Yamanaka, S., Okumura, M., Yamaguchi, K., Shimada, J.: Unique structural and
electronic features of perferryl-oxo oxidant in cytochrome P450. J. Phys. Chem. B 115(36),
10,730–10,738 (2011). https://doi.org/10.1021/jp206004y
71. Isobe, H., Yamaguchi, K., Okumura, M., Shimada, J.: Role of perferryl-oxo oxidant in alkane
hydroxylation catalyzed by cytochrome P450: a hybrid density functional study. J. Phys.
Chem. B 116(16), 4713–4730 (2012). https://doi.org/10.1021/jp211184y
Electronic Properties of Iron Sites … 817
72. Jameson, G.B., Rodley, G.A., Robinson, W.T., Gagne, R.R., Reed, C., Collman,
J.P.: Structure of a dioxygen adduct of (1-methylimidazole)-meso-tetrakis(α,α,α,α-o-
pivalamidophenyl)porphinatoiron(II). An iron dioxygen model for the heme component of
oxymyoglobin. Inorg. Chem. 17(4), 850–857 (1978). https://doi.org/10.1021/ic50182a012
73. Jensen, F.: Introduction to Computational Chemistry, 2nd edn. Wiley, New York (2007)
74. Jensen, K.P., Ryde, U.: Comparison of the chemical properties of iron and cobalt porphyrins
and corrins. ChemBioChem 4, 413–424 (2003). https://doi.org/10.1002/cbic.200200449
75. Jensen, K.P., Ryde, U.: How O2 binds to heme: reasons for rapid binding and spin inversion.
J. Biol. Chem. 279, 14,561–14,569 (2004)
76. Jensen, K.P., Roos, B., Ryde, U.: Erratum to “O2 -binding to heme: electronic structure and
spectrum of oxyheme, studied by multiconfigurational methods”. J. Inorg. Biochem. 99, 978
(2005). https://doi.org/10.1016/j.jinorgbio.2005.02.013
77. Jensen, K.P., Roos, B., Ryde, U.: O2 -binding to heme: electronic structure and spectrum of
oxyheme, studied by multiconfigurational methods. J. Inorg. Biochem. 99(1), 45–54 (2005b).
https://doi.org/10.1016/j.jinorgbio.2004.11.008
78. Jiang, W., DeYonker, N.J., Wilson, A.K.: Multireference character for 3d transition-metal-
containing molecules. J. Chem. Theory Comput. 8, 460–468 (2011)
79. Kellner, D.G., Hung, S.C., Weiss, K.E., Sligar, S.G.: Kinetic characterization of compound I
formation in the thermostable cytochrome P450 CYP119. J. Biol. Chem. 277(12), 9641–9644
(2002)
80. Kent, T.A., Spartalian, K., Lang, G.: High magnetic field Mössbauer studies of deoxymyo-
globin, deoxyhemoglobin, and synthetic analogues: theoretical interpretations. J. Chem. Phys.
71(12), 4899–4908 (1979). https://doi.org/10.1063/1.438303
81. Kitagawa, T., Teraoka, J.: The resonance Raman spectra of intermediate-spin ferrous por-
phyrin. Chem. Phys. Lett. 63, 443–446 (1979). https://doi.org/10.1016/0009-2614(79)80685-
5
82. Knizia, G., Adler, T.B., Werner, H.J.: Simplified CCSD(T)-F12 methods: theory and bench-
marks. J. Chem. Phys. 130(5), 054,104 (2009). https://doi.org/10.1063/1.3054300
83. Koch, W., Holthausen, M.C.: A Chemist’s Guide to Density Functional Theory, 2nd edn.
Wiley-VCH, Verlag GmbH, Weinheim (2001)
84. Koseki, J., Maezono, R., Tachikawa, M., Towler, M.D., Needs, R.J.: Quantum monte carlo
study of porphyrin transition metal complexes. J. Chem. Phys. 129(8), 085103 (2008). https://
doi.org/10.1063/1.2966003
85. Kozlowski, P.M., Spiro, T.G., Zgierski, M.Z.: DFT study of structure and vibrations in low-
lying spin states of five-coordinated deoxyheme model. J. Phys. Chem. B 104(45), 10,659–
10,666 (2000). https://doi.org/10.1021/jp001463u
86. Kulik, H.J., Cococcioni, M., Scherlis, D.A., Marziari, N.: Density functional theory in tran-
sition metal chemistry: a self-consistent Hubbard U approach. Phys. Rev. Lett. 97, 103,001–
103,004 (2006)
87. Lee, J.Y., Kang, N.S., Kang, Y.K.: Binding free energies of inhibitors to iron porphyrin
complex as a model for cytochrome P450. Biopolymers 97, 219–228 (2012). https://doi.org/
10.1002/bip.22009
88. Lee, T.J., Taylor, P.R.: A diagnostic for determining the quality of single-reference electron
correlation methods. Int. J. Quantum Chem. 36(S23), 199–207 (1989)
89. Li, D., Wang, Y., Han, K.: Recent density functional theory model calculations of drug
metabolism by cytochrome P450. Coord. Chem. Revs. 256(1112), 1137–1150 (2012). https://
doi.org/10.1016/j.ccr.2012.01.016
90. Liao, M.S., Scheiner, S.: Electronic structure and bonding in metal porphyrins, metal=Fe Co,
Ni. Cu. Zn. J. Chem. Phys. 117(1), 205–219 (2002). https://doi.org/10.1063/1.1480872
91. Liao, M.S., Huang, M.J., Watts, J.D.: Iron porphyrins with different imidazole ligands. A
theoretical comparative study. J. Phys. Chem. A 114(35), 9554–9569 (2010). https://doi.org/
10.1021/jp1052216
92. Lupinetti, A.J., Fau, S., Frenking, G., Strauss, S.H.: Theoretical analysis of the bonding
between CO and positively charged atoms. J. Phys. Chem. A 101, 9551–9559 (1997)
818 M. Radoń and E. Broclawik
93. Malmqvist, P.Å., Pierloot, K., Shahi, A.R.M., Cramer, C.J., Gagliardi, L.: The restricted
active space followed by second-order perturbation theory method: theory and application to
the study of CuO2 and Cu2 O2 systems. J. Chem. Phys. 128(204), 109 (2008). https://doi.org/
10.1063/1.2920188
94. Matsui, T., Unno, M., Ikeda-Saito, M.: Heme oxygenase reveals its strategy for catalyzing
three successive oxygenation reactions. Acc. Chem. Res. 43(2), 240–247 (2010). https://doi.
org/10.1021/ar9001685. (pMID: 19827796)
95. McClure, D.S.: Electronic structure of transition-metal complex ions. Radiation Res. Suppl.
2, 218–242 (1960)
96. Miralles, J., Daudey, J.P., Caballol, R.: Variational calculation of small energy differences.
The singlet-triplet gap in [Cu2 Cl6 ]2− . Chem. Phys. Lett. 198(6), 555–562 (1992). https://doi.
org/10.1016/0009-2614(92)85030-E
97. Miralles, J., Castell, O., Caballol, R., Malrieu, J.P.: Specific CI calculation of energy differ-
ences: transition energies and bond energies. Chem. Phys. 172(1), 33–43 (1993). https://doi.
org/10.1016/0301-0104(93)80104-H
98. Momenteau, M., Scheidt, W.R., Eigenbrot, C.W., Reed, C.A.: A deoxymyoglobin model with
a sterically unhindered axial imidazole. J. Am. Chem. Soc. 110, 1207–1215 (1988). https://
doi.org/10.1021/ja00212a032
99. Nakatsuji, H., Hasegawa, J., Ueda, H., Hada, M.: Ground and excited states of oxyheme:
SAC/SAC-CI study. Chem. Phys. Lett. 250(34), 379–386 (1996). https://doi.org/10.1016/
0009-2614(96)00033-4
100. Neese, F.: A spectroscopy oriented configuration interaction procedure. J. Chem. Phys.
119(18), 9428–9443 (2003). https://doi.org/10.1063/1.1615956
101. Neese, F., Valeev, E.F.: Revisiting the atomic natural orbital approach for basis sets: robust
systematic basis sets for explicitly correlated and conventional correlated ab initio methods?
J. Chem. Theory Comput. 7, 33–43 (2011). https://doi.org/10.1021/ct100396y
102. Norvell, J., Nunes, A., Schoenborn, B.: Neutron diffraction analysis of myoglobin: structure
of the carbon monoxide derivative. Science 190(4214), 568–570 (1975). https://doi.org/10.
1126/science.1188354
103. Obara, S., Kashiwagi, H.: Ab initio MO studies of electronic states and Mössbauer spectra
of high-, intermediate-, and low-spin Fe(II)-porphyrin complexes. J. Chem. Phys. 77, 3155
(1982). https://doi.org/10.1063/1.444239
104. Ogliaro, F., Cohen, S., Filatov, M., Harris, N., Shaik, S.: The high-valent compound of
cytochrome P450: the nature of the fe-s bond and the role of the thiolate ligand as an internal
electron donor. Angew Chem. Int. Ed. 39(21), 3851–3855 (2000a)
105. Ogliaro, F., Cohen, S., de Viser, S.P., Shaik, S.: Medium polarization and hydrogen bonding
effects on compound I of cytochrome P450: what kind of radical is it really? J. Am. Chem.
Soc. 122, 12,892–12,893 (2000b)
106. Ogliaro, F., de Visser, S.P., Groves, J.T., Shaik, S.: Chameleon states: high-valent metal-oxo
species of cytochrome P450 and its ruthenium analogue. Angew Chem. Int. Ed. 40, 2874–2878
(2001). 10.1002/1521-3773(20010803)40:15<2874::AID-ANIE2874>3.0.CO;2-9
107. Olah, J., Harvey, J.: NO bonding to heme groups: DFT and correlated ab initio calculations.
J. Phys. Chem. A 113, 7338–7345 (2009). https://doi.org/10.1021/jp811316n
108. de Oliveira, F.T., Chanda, A., Banerjee, D., Shan, X., Mondal, S., Lawrence Que, J., Bom-
inaa, E.L., Münck, E., Collins, T.J.: Chemical and spectroscopic evidence for an Fe(V)-oxo
complex. Science 315, 835–838 (2007). https://doi.org/10.1126/science.1133417
109. Olson, J.C., Phillips, G.N.: Myoglobin discriminates between O2 , NO and CO by electrostatic
interactions with the bound ligand. J. Biol. Inorg. Chem. 2, 544–552 (1997)
110. Olson, J.S., Mathews, A.J., Rohlfs, R.J., Springer, B.A., Egeberg, K.D., Sligar, S.G., Tame,
J., Renaud, J.P., Nagai, K.: The role of the distal histidine in myoglobin and haemoglobin.
Nature 336(6196), 265–266 (1988). https://doi.org/10.1038/336265a0
111. Ortiz de Montellano, P., James, J., De Voss, J.: Substrate oxidation by cytochrome P450
enzymes. In: Ortiz de Montellano, P. (ed.) Cytochrome P450: Structure, Mechanism and
Biochemistry, pp. 183–245. Kluwer Academic/Plenum Publishers, Dordrecht (2005). https://
doi.org/10.1007/0-387-27447-2_6
Electronic Properties of Iron Sites … 819
112. Ortiz de Montellano, P.R.: Hydrocarbon hydroxylation by cytochrome P450 enzymes. Chem.
Rev. 110, 932–948 (2010). https://doi.org/10.1021/cr9002193
113. Pan, Z., Zhang, R., Newcomb, M.: Kinetic studies of reactions of iron(IV)-oxo porphyrin
radical cations with organic reductants. J. Inorg. Biochem. 100(4), 524–532 (2006). https://
doi.org/10.1016/j.jinorgbio.2005.12.022
114. Pan, Z., Zhang, R., Fung, L.W.M., Newcomb, M.: Photochemical production of a highly
reactive porphyrin-iron-oxo species. Inorg. Chem. 46(5), 1517–1519 (2007). https://doi.org/
10.1021/ic061972w
115. Pan, Z., Wang, Q., Sheng, X., Horner, J.H., Newcomb, M.: Highly reactive porphyrin-iron-
oxo derivatives produced by photolyses of metastable porphyrin-iron(IV) diperchlorates. J.
Am. Chem. Soc. 131(7), 2621–2628 (2009). https://doi.org/10.1021/ja807847q
116. Pauling, L., Coryell, C.D.: The magnetic properties and structure of hemoglobin, oxyhe-
moglobin and carbonmonoxyhemoglobin. Proc. Nat. Acad. Sci. 22, 210–216 (1936)
117. Paulsen, H., Trautwein, A.X.: Density functional theory calculations for spin crossover com-
plexes. Top. Curr. Chem. 235, 197–219 (2004). https://doi.org/10.1007/b95428
118. Perdew, J.P.: The functional zoo. In: Geerlings, P., DeProft, F., Langenaeker, W. (eds.) Density
Functional Theory: A Bridge Between Chemistry and Physics, pp. 87–109. Vrije Universiteit
Brussel Press, Brussels (1999)
119. Perdew, J.P., Kurth, S.: Density functionals for non-relativistic coulomb systems in the new
century. In: Fiolhais C, Nogueira F, Marques M (eds) A Primer in Density Functional Theory,
Lecture Notes in Physics, vol. 620, pp. 1–55, Chap 1. Springer, Berlin (2003). https://doi.org/
10.1007/3-540-37072-2_1
120. Perdew, J.P., Ernzerhof, M., Burke, K.: Rationale for mixing exact exchange with density
functional approximations. J. Chem. Phys. 105(22), 9982–9985 (1996). https://doi.org/10.
1063/1.472933
121. Perdew, J.P., Ruzsinszky, A., Constantin, L.A., Sun, J., Csonka, G.I.: Some fundamental issues
in ground-state density functional theory: a guide for the perplexed. J. Chem. Theory Comput.
5, 902–908 (2009). https://doi.org/10.1021/ct800531s
122. Phillips, S.E.: Structure and refinement of oxymyoglobin at 1.6 Å resolution. J. Mol. Biol.
142(4), 531–554 (1980). https://doi.org/10.1016/0022-2836(80)90262-4
123. Phillips, S.E.V.: Structure of oxymyoglobin. Nature 273(5659), 247–248 (1978)
124. Phillips, S.E.V., Schoenborn, B.P.: Neutron diffraction reveals oxygen-histidine hydrogen
bond in oxymyoglobin. Nature 292, 81–82 (1981)
125. Piela, L.: Ideas of Quantum Chemistry. Elsevier, polish edition (2006). Idee Chemii Kwan-
towej, PWN, 2005
126. Pierloot, K.: Nondynamic correlation effects in transition metal coordination compounds. In:
Cundari, T.R. (ed.) Computational Organometallic Chemistry. Marcel Dekker Inc., New York
(2001)
127. Pierloot, K.: The CASPT2 method in inorganic electronic spectroscopy: from ionic transition
metal to covalent actinide complexes. Mol. Phys. 101(13), 2083–2094 (2003)
128. Pierloot, K., Vancoillie, S.: Relative energy of the high-(5 T2g ) and low-(1 A1g ) spin states of
[Fe(H2O)6 ]2+ , [Fe(NH3 )6 ]2+ , and [Fe(bpy)3 ]2+ : CASPT2 versus density functional theory.
J. Chem. Phys. 125(124), 303 (2006). https://doi.org/10.1063/1.2353829
129. Pierloot, K., Vancoillie, S.: Relative energy of the high-(5 T2g ) and low-(1 A1g ) spin states of
the ferrous complexes [Fe(L)(NHS4 )]: CASPT2 versus density functional theory. J. Chem.
Phys. 128(034), 104 (2008)
130. Pierloot, K., Dumez, B., Widmark, P.O., Roos, B.: Density matrix averaged atomic natural
orbital (ANO) basis sets for correlated molecular wave functions. IV. Medium size basis sets
for the atoms H-Kr. Theor. Chim. Acta. 90, 87–114 (1995)
131. Pierloot, K., Zhao, H., Vancoillie, S.: Copper corroles: the question of non-innocence. Inorg.
Chem. 49, 10,316–10,329 (2010). https://doi.org/10.1021/ic100866z
132. Poli, R., Harvey, J.N.: Spin forbidden chemical reactions of transition metal compounds. New
ideas and new computational challenges. Chem. Soc. Rev. 32, 1–8 (2003)
820 M. Radoń and E. Broclawik
133. Popescu, D.L., Chanda, A., Stadler, M., de Oliveira, F.T., Ryabov, A.D., Münck, E., Bominaar,
E.L., Collins, T.J.: High-valent first-row transition-metal complexes of tetraamido (4N) and
diamidodialkoxido or diamidophenolato (2N/2O) ligands: synthesis, structure, and magneto-
chemistry. Coord. Chem. Revs. 252, 2050–2071 (2008)
134. Praneeth, V., Neese, F., Lehnert, N.: Spin density distribution in five- and six-coordinate
iron(II)-porphyrin NO complexes evidenced by magnetic circular dichroism spectroscopy.
Inorg. Chem. 44, 2570–2572 (2005)
135. Radoń, M., Broclawik, E.: Peculiarities of the electronic structure of cytochrome P450 com-
pound I: CASPT2 and DFT modeling. J. Chem. Theory Comput. 3(3), 728–734 (2007). https://
doi.org/10.1021/ct600363a
136. Radoń, M., Pierloot, K.: Binding of CO, NO, and O2 to heme by density functional and
multireference ab initio calculations. J. Phys. Chem. A 112(46), 11,824–11,832 (2008). https://
doi.org/10.1021/jp806075b
137. Radoń, M., Srebro, M., Broclawik, E.: Conformational stability and spin states of cobalt(II)
acetylacetonate: CASPT2 and DFT study. J. Chem. Theory Comput. 5(5), 1237–1244 (2009).
https://doi.org/10.1021/ct800571y
138. Radoń, M., Broclawik, E., Pierloot, K.: Electronic structure of selected FeNO7 complexes in
heme and non-heme architectures: A density functional and multireference ab initio study. J.
Phys. Chem. B 114(3), 1518–1528 (2010). https://doi.org/10.1021/jp910220r
139. Radoń, M., Broclawik, E., Pierloot, K.: DFT and Ab Initio study of iron-oxo porphyrins: may
they have a stable iron(V)-oxo electromer? J. Chem. Theory Comput. 7, 898–908 (2011).
https://doi.org/10.1021/ct1006168
140. Ray, M., Golombek, A.P., Hendrich, M.P., Yap, G.P.A., Liable-Sands, L.M., Rheingold, A.L.,
Borovik, A.S.: Structure and magnetic properties of trigonal bipyramidal iron nitrosyl com-
plexes. Inorg. Chem. 38, 3110–3115 (1999)
141. Reiher, M., Salomon, O., Hess, B.A.: Reparameterization of hybrid functionals based on
energy differences of states of different multiplicity. Theor. Chem. Acc. 107(1), 48–55 (2001).
https://doi.org/10.1007/s00214-001-0300-3
142. Ribas-Ariño, J., Novoa, J.J.: The mechanism for the reversible oxygen addition to heme.
A theoretical CASPT2 study. Chem. Commun. 2007, 3160–3162 (2007). https://doi.org/10.
1039/b704871h
143. Rittle, J., Green, M.T.: Cytochrome P450 compound I: Capture, characterisation, and C–
H bond activation kinetics. Science 330, 933–937 (2010). https://doi.org/10.1126/science.
1193478
144. Rodriguez, J.H., Xia, Y.M., Debrunner, P.G.: Mössbauer spectroscopy of the spin coupled
Fe2+ -FeNO7 centers of nitrosyl derivatives of deoxy hemerythrin and density functional
theory of the FeNO7 (S = 3/2) motif. J. Am. Chem. Soc. 121(34), 7846–7863 (1999). https://
doi.org/10.1021/ja990129c
145. Roos, B.O.: Multiconfigurational self consistent field theory. In: Roos, B.O., Widmark, P.O.
(eds.) European Summerschool in Quantum Chemistry, vol. 2, pp. 287–360. Lund University,
Lund (2003)
146. Roos, B.O., Taylor, P.R., Siegbahn, P.E.M.: A complete active space SCF method (CASSCF)
using a density matrix formulated super-CI approach. Chem. Phys. 48(2), 157–173 (1980)
147. Roos, B.O., Andersson, K., Fulscher, M., Malmqvist, P.Å., Serrano-Andres, L., Pierloot,
K., Merchan, M.: Multiconfigurational perturbation theory: applications in electronic spec-
troscopy. In: Prigogine, I., Rice, S.A. (eds.) Advances in Chemical Physics: New Methods in
Computational Quantum Mechanics, vol. 93, pp. 219–331. Wiley, New York (1996)
148. Roos, B.O., Lindh, R., Malmqvist, P.Å., Veryazov, V., Widmark, P.O.: New relativistic ANO
basis sets for transition metal atoms. J. Phys. Chem. A 109, 6575–6579 (2005)
149. Rosen, G.M., Tsai, P., Pou, S.: Mechanism of free-radical generation by nitric oxide synthase.
Chem. Rev. 102(4), 1191–1200 (2002). https://doi.org/10.1021/cr010187s
150. Rovira, C.: Role of the His64 residue on the properties of the Fe-CO and Fe-O2 bonds
in myoglobin. A CHARMM/DFT study. J. Mol. Struc. (Theochem) 632, 309–321 (2003).
https://doi.org/10.1016/S0166-1280(03)00308-7
Electronic Properties of Iron Sites … 821
151. Rovira, C., Kunc, K., Hutter, J., Ballone, P., Parrinello, M.: Equilibrium geometries and
electronic structure of iron-porphyrin complexes: A density functional study. J. Phys. Chem.
A 101(47), 8914–8925 (1997). https://doi.org/10.1021/jp9722
152. Rovira, C., Kunc, K., Hutter, J., Ballone, P., Parrinello, M.: A comparative study of O2 , CO,
and NO binding to iron-porphyrin. Int. J. Quantum. Chem. 69(1), 31–35 (1998)
153. Rydberg, P., Sigfridsson, E., Ryde, U.: On the role of the axial ligand in heme proteins: a
theoretical study. J. Biol. Inorg. Chem. 9, 203–223 (2004). https://doi.org/10.1007/s00775-
003-0515-y
154. Rydberg, P., Gloriam, D.E., Olsen, L.: The SMARTCyp cytochrome P450 metabolism pre-
diction server. Bioinformatics 26, 2988–2989 (2010). https://doi.org/10.1093/bioinformatics/
btq584
155. Scherlis, D.A., Cococcioni, M., Sit, P., Marzari, N.: Simulation of heme using DFT + U: a
step toward accurate spin-state energetics. J. Phys. Chem. B 111, 7384–7391 (2007). https://
doi.org/10.1021/jp070549l
156. Schöneboom, J.C., Lin, H., Reuter, N., Thiel, W., Cohen, S., Ogliaro, F., Shaik, S.: The
elusive oxidant species of cytochrome P450 enzymes: characterisation by combined quantum
mechanical/molecular mechanical (QM/MM) calculations. J. Am. Chem. Soc. 124, 8142–
8151 (2002). https://doi.org/10.1021/ja026279w
157. Schöneboom, J.C., Neese, F., Thiel, W.: Toward identification of the compound I reactive
intermediate in cytochrome P450 chemistry: a QM/MM study of its EPR and Mössbauer
parameters. J. Am. Chem. Soc. 127(16), 5840–5853 (2005)
158. Schütz, M., Werner, H.J.: Low-order scaling local electron correlation methods. IV. Linear
scaling local coupled-cluster (LCCSD). J. Chem. Phys. 114(2), 661–681 (2001). https://doi.
org/10.1063/1.1330207
159. Schwarz, W.H.E.: An introduction to relativistic quantum chemistry. In: Barysz, M., Ishikawa,
Y. (eds.) Relativistic Methods for Chemists, Challenges and Advances in Computational
Chemistry and Physics, vol. 10, pp. 1–62. Springer, The Netherlands (2010). https://doi.org/
10.1007/978-1-4020-9975-5_1
160. Shaanan, B.: The ironoxygen bond in human oxyhaemoglobin. Nature 296, 683–684 (1982).
https://doi.org/10.1038/296683a0
161. Shaanan, B.: Structure of human oxyhaemoglobin at 2.1 Å resolution. J. Mol. Biol. 171(1),
31–59 (1983). https://doi.org/10.1016/S0022-2836(83)80313-1
162. Shaik, S., Chen, H.: Lessons on O2 and NO bonding to heme from ab initio multirefer-
ence/multiconfiguration and DFT calculations. J. Biol. Inorg. Chem. 16, 841–855 (2011).
https://doi.org/10.1007/s00775-011-0763-1
163. Shaik, S., De Visser, S.: Computational approaches to cytochrome P450 function. In: Ortiz de
Montellano, P. (ed.) Cytochrome P450: Structure, Mechanism and Biochemistry, pp. 45–
85. Kluwer Academic/Plenum Publishers, Dordrecht (2005). https://doi.org/10.1007/0-387-
27447-2_2
164. Shaik, S., Kumar, D., de Visser, S.P., Altun, A., Thiel, W.: Theoretical perspective on the struc-
ture and mechanism of cytochrome P450 enzymes. Chem. Rev. 105(6), 2279–2328 (2005)
165. Shaik, S., Cohen, S., Wang, Y., Chen, H., Kumar, D., Thiel, W.: P450 enzymes: their structure,
reactivity, and selectivity-modeled by QM/MM calculations. Chem. Rev. 110(2), 949–1017
(2010)
166. Shaik, S., Chen, H., Janardanan, D.: Exchange-enhanced reactivity in bond activation by
metaloxo enzymes and synthetic reagents. Nat. Chem. 3, 19–27 (2011). https://doi.org/10.
1038/nchem.943
167. Sheng, X., Horner, J.H., Newcomb, M.: Spectra and kinetic studies of the compound I deriva-
tive of cytochrome P450 119. J. Am. Chem. Soc. 130(40), 13,310–13,320 (2008). https://doi.
org/10.1021/ja802652b
168. Siegbahn, P.E.M., Himo, F.: The quantum chemical cluster approach for modeling enzyme
reactions. Wiley Interdisc Rev: Comput Mol Sci 1, 323–336 (2011)
169. Siegbahn, P.E.M., Blomberg, M.R.A., Chen, S.L.: Significant van der Waals effects in tran-
sition metal complexes. J. Chem. Theory Comput. 6, 2040–2044 (2010). https://doi.org/10.
1021/ct100213e
822 M. Radoń and E. Broclawik
170. Sigfridson, E., Ryde, U.: On the significance of hydrogen bonds for the discrimination between
CO and O2 by myoglobin. J. Biol. Inorg. Chem. 4(1), 99–110 (1999)
171. Sigfridson, E., Ryde, U.: Theoretical study of the discrimination between O2 and CO by
myoglobin. J. Inorg. Biochem. 91(1), 101–115 (2002)
172. Sigfridsson, E., Ryde, U.: The importance of porphyrin distortions for the ferrochelatase
reaction. J. Biol. Inorg. Chem. 8, 273–282 (2003)
173. Sigfridsson, E., Olsson, M.H.M., Ryde, U.: A comparison of the inner-sphere reorganization
energies of cytochromes, iron-sulfur clusters, and blue copper proteins. J. Phys. Chem. B
105(23), 5546–5552 (2001). https://doi.org/10.1021/jp0037403
174. Sligar, S.G.: Coupling of spin, substrate, and redox equilibriums in cytochrome P450. Bio-
chemistry 15(24), 5399–5406 (1976)
175. Spolitak, T., Dawson, J.H., Ballou, D.P.: Reaction of ferric cytochrome P450cam with
peracids: kinetic characterization of intermediates on the reaction pathway. J. Biol. Chem.
280, 20,300–20,309 (2005). https://doi.org/10.1074/jbc.M501761200
176. Springer, B.A., Egeberg, K.D., Slighar, S.G., Rohlfs, R.J., Mathews, A.J., Olson, J.C.:
Discrimination between oxygen and carbon monoxide and inhibition of autooxydation by
mioglobin. J. Biol. Chem. 264(6), 3057–3060 (1989)
177. Springer, B.A., Sligar, S.G., Olson, J.S., Phillips, G.N.J.: Mechanisms of ligand recognition
in myoglobin. Chem. Rev. 94(3), 699–714 (1994). https://doi.org/10.1021/cr00027a007
178. Stawoska, I., Orzel, Ł., Łabuz, P., Stochel, G., van Eldik, R.: Application of high pressure
laser flash photolysis in studies on selected hemoprotein reactions. Biochim. Biophys. Acta
1784(11), 1481–1492 (2008). https://doi.org/10.1016/j.bbapap.2008.08.006
179. Strauss, S.H., Silver, M.E., Long, K.M., Thompson, R.G., Hudgens, R.A.,
Spartalian, K., Ibers, J.A.: Comparison of the molecular and electronic struc-
tures of (2,3,7,8,12,13,17,18-octaethylporphyrinato)iron(II) and (trans-7,8-dihydro-
2,3,7,8,12,13,17,18-octaethylporphyrinato)iron(II). J. Am. Chem. Soc. 107(14), 4207–4215
(1985). https://doi.org/10.1021/ja00300a021
180. Strickland, N., Harvey, J.N.: Spin-forbidden ligand binding to the ferrous-heme group: Ab
initio and DFT studies. J. Phys. Chem. B 111, 841–852 (2007)
181. Strickland, N., Mulholland, A.J., Harvey, J.N.: The Fe-CO bond energy in myoglobin: A
QM/MM study of the effect of tertiary structure. Biophys. J. 90, 27–29 (2006). https://doi.
org/10.1529/biophysj.105.078097
182. Sun, X., Wang, H., Feng, D.: Binding properties of CO, NO, and O2 to P450 heme: a density
functional study. Chin. J. Phys. Chem. 20, 552–556 (2007). https://doi.org/10.1088/1674-
0068/20/05/552-556
183. Szabo, A., Ostlund, N.S.: Modern quantum chemistry. In: Introduction to Advanced Electronic
Structure Theory. Dover Publications Inc, New York (1989)
184. Tomson, N.C., Crimmin, M.R., Petrenko, T., Rosebrugh, L.E., Sproules, S., Boyd, W.C.,
Bergman, R.G., DeBeer, S., Toste, F.D., Wieghardt, K.: A step beyond the feltham-enemark
notation: spectroscopic and correlated ab initio computational support for an antiferromag-
netically coupled M(II)-(NO)− description of Tp*M(NO) (M = Co, Ni). J. Am. Chem. Soc.
133(46), 18,785–18,801 (2011). https://doi.org/10.1021/ja206042k
185. Traylor, T.G., Sharma, V.S.: Why no? Biochemistry 31(11), 2847–2849 (1992). https://doi.
org/10.1021/bi00126a001
186. Turner, J.W., Schultz, F.A.: Coupled electron-transfer and spin-exchange reactions. Coord.
Chem. Revs. 219, 81–97 (2001). https://doi.org/10.1016/S0010-8545(01)00322-8
187. Ugalde, J.M., Dunietz, B., Dreuw, A., Head-Gordon, M., Boyd, R.J.: The spin dependence of
the spatial size of Fe(II) and of the structure of Fe(II)-porphyrins. J. Phys. Chem. A 108(21),
4653–4657 (2004). https://doi.org/10.1021/jp0489119
188. Vancoillie, S., Malmqvist, P.Å., Pierloot, K.: Calculation of EPR g tensors for transition-metal
complexes based on multiconfigurational perturbation theory (CASPT2). ChemPhysChem
8(12), 1803–1815 (2007)
189. Vancoillie, S., Zhao, H., Radoń, M., Pierloot, K.: Performance of CASPT2 and DFT for relative
spin-state energetics of heme models. J. Chem. Theory Comput. 6(2), 576–582 (2010). https://
doi.org/10.1021/ct900567c
Electronic Properties of Iron Sites … 823
190. Vancoillie, S., Zhao, H., Tran, V.T., Hendrickx, M.F.A., Pierloot, K.: Multiconfigurational
second-order perturbation theory restricted active space (RASPT2) studies on mononuclear
first-row transition-metal systems. J. Chem. Theory Comput. 7, 3961–3977 (2011). https://
doi.org/10.1021/ct200597h
191. Wanat, A., Schneppensieper, T., Stochel, G., van Eldik, R., Bill, E., Wieghardt, K.: Kinetics,
mechanism, and spectroscopy of the reversible binding of nitric oxide to aquated iron(II). An
undergraduate text book reaction revisited. Inorg. Chem. 41, 4–10 (2002). https://doi.org/10.
1021/ic010628q
192. Wang, Q., Sheng, X., Horner, J.H., Newcomb, M.: Quantitative production of compound I
from a cytochrome P450 enzyme at low temperatures. kinetics, activation parameters, and
kinetic isotope effects for oxidation of benzyl alcohol. J. Am. Chem. Soc. 131(30), 10629–
10636 (2009). https://doi.org/10.1021/ja9031105
193. Weigend, F., Häser, M., Patzelt, H., Ahlrichs, R.: Ri-mp2: Optimized auxiliary basis sets and
demonstration of efficiency. Chem. Phys. Lett. 294, 143–152 (1998)
194. Weiss, J.J.: Nature of the ironoxygen bond in oxyhaemoglobin. Nature 202, 83–84 (1964).
https://doi.org/10.1038/202083b0
195. Weiss, R., Mandon, D., Wolter, T., Trautwein, A.X., Müther, M., Bill, E., Gold, A., Jayaraj,
K., Terner, J.: Delocalization over the heme and the axial ligands of one of the two oxidizing
equivalents stored above the ferric state in the peroxidase and catalase compound-i interme-
diates: indirect participation of the proximal axial ligand of iron in the oxidation reactions
catalyzed by heme-based peroxidases and catalases? J. Biol. Inorg. Chem. 1(4), 377–383
(1996). https://doi.org/10.1007/s007750050069
196. Westcott, B.L., Enemark, J.L.: Transition metal nitrosyls. In: Solomon, E.I., Lever, A.B.P.
(eds.) Inorganic Electronic Structure and Spectroscopy, vol. 2, pp. 403–450. Wiley, New York
(1999)
197. Williams, R.: Metallo-enzyme catalysis: the entatic state. J. Mol. Catal. A 30, 1–26 (1985).
https://doi.org/10.1016/0304-5102(85)80013-4
198. Yamamoto, S., Kashiwagi, H.: CASSCF study on the Fe-O2 bond in a dioxygen heme complex.
Chem. Phys. Lett. 161(1), 85–89 (1989)
199. Yamamoto, S., Teraoka, J., Kashiwagi, H.: Ab initio RHF and CASSCF studies on Fe–O bond
in high-valent iron-oxoporphyrins. J. Chem. Phys. 88, 303–312 (1988)
200. Ye, S., Neese, F.: Accurate modeling of spin-state energetics in spin-crossover systems with
modern density functional theory. Inorg. Chem. 49(3), 772–774 (2010). https://doi.org/10.
1021/ic902365a
201. Zhang, R., Newcomb, M.: Laser flash photolysis generation of high-valent transition metal-
oxo species: insights from kinetic studies in real time. Acc. Chem. Res. 41(3), 468–477 (2008).
https://doi.org/10.1021/ar700175k
202. Zhang, R., Nagraj, N., Lansakara-P, D.S.P., Hager, L.P., Newcomb, M.: Kinetics of two-
electron oxidations by the compound I derivative of chloroperoxidase, a model for cytochrome
P450 oxidants. Org. Lett. 8(13), 2731–2734 (2006). https://doi.org/10.1021/ol060762k
203. Zhao, Y., Truhlar, D.G.: Density functional for spectroscopy: No long-range self-interaction
error, good performance for rydberg and charge-transfer states, and better performance on
average than B3LYP for ground states. J. Phys. Chem. A 110(49), 13,126–13,130 (2006a).
https://doi.org/10.1021/jp066479k. (pMID: 17149824)
204. Zhao, Y., Truhlar, D.G.: A new local density functional for main-group thermochemistry,
transition metal bonding, thermochemical kinetics, and noncovalent interactions. J. Chem.
Phys. 125(194), 101 (2006b). https://doi.org/10.1063/1.2370993
205. Zhao, Y., Truhlar, D.G.: The M06 suite of density functionals for main group thermochemistry,
thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two
new functionals and systematic testing of four M06-class functionals and 12 other functionals.
Theor. Chem. Acc. 120, 215–241 (2008). https://doi.org/10.1007/s00214-007-0310-x
Bioinorganic Reaction
Mechanisms—Quantum Chemistry
Approach
1 Introduction
In majority of cases where enzymes binding transition metal ions in their active
sites are recruited as catalysts, the enzymatic reaction involves redox steps that are
inconceivable without participation of specialized cofactors. The latter usually can
adopt several oxidation states and stabilize various reactive forms of substrates dur-
ing the catalytic cycle, and hence provide a low energy path for alternatively chemi-
cally demanding transformations. This type of catalysis poses serious challenges to
research on such systems. The reactions considered here are usually multi-step pro-
cesses and involve short-lived and often highly reactive intermediates that frequently
can decay along various scenarios. This makes the studies on the bioinorganic cat-
alytic mechanisms highly demanding, and thus it requires employment of various
research techniques. One of them is computational quantum chemistry (QC) with its
unique feature that it can provide a description of all species along the catalytic cycle
on equal footing, including transition states and short-lived intermediates [3], which
are frequently out of reach for temporary experimental techniques. In this chapter
we are discussing several instructive examples illustrating how QC can be applied
to study reaction mechanisms of metalloenzymes, emphasizing the new insights that
were obtained thanks to the computations and showing the limitations of the current
approach. The examples are taken from our research work on mononuclear nonheme
iron enzymes.
2 Methodology
The usual starting point for computational studies on the enzymatic reaction mecha-
nism is a crystal structure, preferably solved for an enzyme-substrate/product com-
plex. When a structure of this type is available, construction of a QC active site model
becomes relatively straightforward, however, several more or less arbitrary decisions
must still be taken by the researcher. The (S)-2-hydroxypropylphosphonic acid epox-
idase (HppE) depicted in Fig. 1 may serve as an illustrative example [23]. As can
be noticed, in the active site of HppE a single metal ion, i.e. Fe(II), is coordinated
by three protein ligands: two histidines and one glutamate and by the organic sub-
strate - HPP. In the immediate vicinity of the first coordination shell there are several
polar residues (displayed in ball-and-stick representation for the X-ray structure) and
hydrophobic ones, the latter close to the methyl group of HPP.
A whole enzyme or extended active site region is hardly tractable by QC and
therefore the model must be considerably reduced, unless one chooses to use a hybrid
QM/MM method where the active site of the protein is described by a QC method and
the remaining part of the system by molecular mechanics [39]. The first limitation
concerns the size and composition of the QC model of an active site, which is usually
a compromise between its completeness and the computational cost, growing very
fast with the model size. In our example, a minimal model would include none but
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 827
Fig. 1 Active site region of the X-ray structure of HppE-Fe(II)-HPP (PDB: 1ZZ8 [23]) and its model
(with bound O2 ) used in QC investigations of the catalytic reaction mechanism [32]. Asterisks mark
atoms with fixed coordinates
Fe(II) center and its first coordination shell with properly truncated protein residues
(e.g. bonds that were cut being saturated with hydrogen atoms). Minimal models
were used routinely a decade ago when computer power was rather modest [42,
43], although they may be still of practical value also nowadays. For example, they
are useful in preliminary screening of prospective reaction paths or as a reference
to larger models when attempting to identify catalytic effects due to the second (or
consecutive) shell. In particular, polar residues forming hydrogen bonds (H-bonds)
with the first shell ligands may be important as they can certainly not only modulate
mobility of the ligands but also their proton affinity and redox potential of the metal
or ligands. Thus, it is strongly recommended that these polar groups are explicitly
included in the QC model, as was indeed done in the study on HppE. Here whole
side chains were retained in the model for Asn135, Asn197 and Tyr105, whereas for
Lys23 and Arg97 the fragments were truncated at the carbon neighbouring the basic
group. Moreover, a water molecule H-bonding to Arg97 and HPP was included in
the QC model. Hydrophobic residues from the second coordination shell would be
included in a still larger (and more complete) model [17, 45], even if they are not
supposed to affect qualitative features of the reaction mechanism due to weak nature
of their non-bonded interactions.
Since X-ray protein structures are very rarely of sufficient resolution to reveal
positions of hydrogen atoms, the latter have to be added manually on top of the
selected model. This is a relatively straightforward step for hydrocarbon fragments
828 T. Borowski and E. Broclawik
and amide, amine and guanidinium groups. On the other hand, for alcohol and phenol
OH groups one has to decide on the value of the H–O–C–C dihedral angles in the
initial model, usually chosen so that H-bond interaction network is optimal, i.e.
maximum number of H-bonds is obtained. Histidine residues that are not coordinated
to the metal are the most difficult target: firstly because pKa of a histidine side
chain (free amino acid) is close to 7 and thus the group can be either neutral or
positively charged. Secondly, if the group is neutral the single nitrogen-bound proton
can be placed on either of the nitrogen atoms of the residue. To resolve these issues
one usually relies on experimental data concerning the possible catalytic role of a
given His (proton acceptor/donor) and/or looks for H-bond partners at the immediate
surrounding of a given His side chain and makes a qualified guess. The histidines
bound to the metal ion are usually assumed to be electro-neutral (with a single N-
bound proton). In an alternative, and presumably less biased, approach one would use
a Poisson–Boltzmann titration method to determine the protonation states of protein
residues [1].
In the actual protein the side chains are covalently bound to the backbone (main
chain), which is usually omitted in the active site model. The anchoring role of the
backbone is introduced to the model via constraints imposed on selected peripheral
atoms [35]. In our example, hydrogen atoms introduced to saturate cut bonds and
their bonding partners were constrained to mimic the anchoring role of the backbone
(Fig. 1). For majority of the side chains this choice corresponds to fixing in space
the Cβ carbons and the hydrogens replacing Cα (one of backbone atoms), i.e. the
model assumes a perfectly rigid backbone. This approximation is not necessary if a
QM/MM method is applied to the whole enzyme-substrate complex.
In cases when X-ray structures are not available for enzyme-substrate complexes,
appropriate macromolecular models of such species are usually built on the basis of
existing (fragmentary) structural data, aided with the use of empirical or semiempir-
ical force fields and docking or molecular dynamics simulation methods, which are
covered elsewhere in this book.
Once constructed, the active site model is used to explore the potential energy
surface (PES) with methods presented in the following subsections.
Active site models typically include 50–300 atoms and they serve for exploration of
potential energy surfaces, which usually involves numerous repetition of geometry
optimization and frequency calculations. Thus already sheer amount of computations
to be done puts severe limitations on the QC method applicable to the problem. With
the presently available computer power the methods of density functional theory
(DFT) offer the best compromise between accuracy and computational demand. DFT
methodology has been briefly introduced in the preceding Chapter and shown to be
modest with respect to computing resources requirements and to perform reasonably
well.
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 829
Geometry optimization of the initial model yields a single structure of the closest
stable chemical species, whereas the other intermediates with joining them transi-
tion states need to be found manually (automatic procedures for PES exploration do
exist, yet they are practical only for small systems [30]). Based on previous experi-
ence, existing chemical knowledge and mechanistic hypotheses, one usually assumes
830 T. Borowski and E. Broclawik
Fig. 2 Active site model with approximate reaction coordinate marked in green, and an energy
profile obtained in a relaxed energy scan (in red; the point marked in green depicts explicitly
optimized transition state)
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 831
This (partly manual) scenario for potential energy surface exploration is continued
until all reaction steps and conceivable side reactions are covered. One must admit,
however, that completeness of such a PES exploration relies heavily on chemical
knowledge, intuition as well as experience of the researcher. One should also note
here that once the active site model becomes larger than ca. 100 atoms, the PES
becomes complicated by the presence of multiple minima, and hence many transition
paths may be found. This problem is well recognised in the field of macromolecular
simulations [14] and, for example, methods of transition path sampling were designed
to provide a full multi-pathway picture of the transition (here chemical reaction) [13].
Unfortunately, such methods require generation of system trajectories of lengths
practically not available at the moment for DFT methods applied to active site models
of metalloenzymes. However, in the field of QM/MM, methods that take into account
appropriate sampling of the configurational space of the MM part of the system were
proposed [25, 26, 36], and as their computational demand is roughly of the same
order as for static QM/MM calculations, they should gain popularity soon.
The above described procedure works smoothly provided all critical points (min-
ima and saddle points) lie on the same adiabatic PES. However, it is not very uncom-
mon in the reactions of metalloenzymes that either the spin of the system changes
in the course of the reaction or that two diabatic surfaces of the same spin form
a sharp crossing due to very distinct local symmetry of occupied orbitals. In such
cases, instead of TS one looks for a minimum energy crossing point (MECP), i.e.
a minimum energy point along the crossing seam (Fig. 3). As the energy separation
between the two diabatic surfaces is very sensitive to the choice of a basis set, it is
recommended that for MECP optimization the part of the system that is the locus
Fig. 3 Example of two diabatic potential energy surfaces with their crossing seam and a minimum
energy crossing point (MECP)
832 T. Borowski and E. Broclawik
of electronic structure changes is described with the extended, triple-ζ quality basis
set. It is worth to note here that there are automatic procedures searching for MECP
available [21], though in cases of PES:s with the same spin a manual search, similar
to that presented in Fig. 3, may still be necessary.
Total electronic energy for the optimized intermediates (TS:s and MECP:s) is
subsequently computed with triple-ζ basis set for all atoms; this energy should be
corrected with several other energy contributions to yield final energies. The first
additional term accounts for energy of zero-point vibrations and it is calculated
based on full frequency analysis. The entropy (and the free energy) contributions are
usually not computed for models with constrained atoms as the entropy is mostly
affected by low frequency modes and these are, in turn, very sensitive to the presence
of constraints.
The purpose of the next energy contribution is to cover the electrostatic effects
exerted by the part of the protein not explicitly included in the QC model. To this end,
a polarizable continuum model (PCM) is routinely used where the surrounding of
the QC model is described as a continuous dielectric, characterized by an appropriate
dielectric constant, usually assumed as equal to 4 (mimicking hydrophobic interior
of the protein). The last energy correction is introduced to correct for the deficiency
of DFT methods in describing the van der Waals interactions. This energy correction
may be computed at various levels, yet usually already the simplest empirical formula
gives satisfactory results [18].
Once the final energies of all intermediates, TS:s and MECP:s are computed, a
diagram presenting a profile of potential energy along the reaction coordinate can
be constructed. The diagram shown in Fig. 4 illustrates the example sketched briefly
in a preceding section: here the point labelled as 5 TS5 (+6.5 kcal/mol), joining
intermediates 5 10 and 5 11, corresponds to the transition state for the C–H bond
cleavage, whose search was described above (the superscript is the spin multiplicity).
The analysis of this diagram allows for identification of the most likely mechanism
Fig. 4 Reaction energy profile for the HppE catalysed formation of fosfomycin
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 833
(black solid line) leading to the product (5 14), as this path involves lowest activation
barriers. Though, a variant of the mechanism proceeding through MECPOXO , 5 10,
5
TS5, 5 11 and MECPHS cannot be excluded because the barrier connected with
MECPOXO is negligible. On the path marked with the solid line, the 5 8 → 5 TS4
→ 5 9 step is the rate-limiting one (with the highest energy barrier), whereas in the
alternative path the 5 10 → 5 TS5 → 5 11 defines the rate limiting step. For other
alternative scenarios (marked in grey) barriers are by at least 4.6 kcal/mol higher
than on the preferred path, which indicates these alternative steps are less probable.
3 Case Studies
organic
substrate
O O
FeII
O R
organic
product O2
organic CO2
RCOCOO- + substrate
organic substrate O
O
FeIV
O R
O O O
O
FeIV
R
O
A
O + CO2
O O
O O O
B FeIV
FeII / III
O R
O R
O
O
FeII
R + CO2
O
release, though the O–O bond is preserved in the resulting peracid intermediate.
Subsequent O–O bond heterolysis leads to the oxoferryl species. Thus, at least three
chemically distinct reaction paths can be envisioned for the process and the picture
is further complicated by the fact that several spin states can be involved here.
With the aim to provide some insights into this intricate process, QC studies on the
reaction mechanism were done with the use of an active site model for clavaminic acid
synthase (CAS, a representative α-KDD), depicted in Fig. 7 [5]. The organic substrate
was not included in the model since the O2 activation mechanism is supposed to be
independent of its identity (vide supra).
In the E-Fe(II)-α-KG complex the metal ion is in a high-spin (quintet) electronic
configuration with four singly (equispin) and one doubly occupied Fe 3d orbitals.
When this species starts to interact with triplet O2 (two unpaired electrons), pairing
of the electron spins on these two centres may lead to complexes with triplet, quintet
or septet total spin state. Interestingly, for the E-Fe(II)-α-KG-O2 complex all three
states are computed to lie within 6 kcal/mol energy range, with triplet being the
ground state and quintet being the highest one (Fig. 8). As can be noticed in the
Figure, reaction energy profiles run along various scenarios in the three spin states,
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 835
Fig. 7 The active site model for CAS, i.e. a representative α-KDD, used in the study on dioxygen
activation [5]. Reprinted with permission from Ref. [5]
Fig. 8 Reaction energy profile for activation of O2 by α-KDD. Black solid line: the lowest activation
energy quintet path proceeding through Fe(II)-peracid complex; blue dotted line: the septet single-
step path; green line: the critical parts of the triplet path
836 T. Borowski and E. Broclawik
with septet exhibiting the simplest course - with a single barrier, which corresponds
to the mechanism B from Fig. 6. Such behaviour of the system in the septet spin state
indicates that the bicyclic structure with high spin Fe(III) and superoxo ketal is not
stable; similarly, the Fe(II)-peracid complex can neither form in the septet state. The
latter observation is not surprising since a septet state of the Fe(II)-peracid complex
would imply high spin Fe(II) coupled to triplet peracid (or other, similarly unstable
valence states).
In the triplet state, however, the bicyclic species with superoxo ketal, similar to
that proposed in the mechanism A from Fig. 6, is a stable structure, though it lies
at a considerably elevated (+24.0 kcal/mol) energy level. In this complex, interme-
diate spin (IS) Fe(III) (with one empty, three singly and one doubly occupied Fe
3d orbitals) couples antiferromagnetically (opposite spins) with the superoxo-ketal,
and the empty 3d orbital participates in a strong coordination bond with the two
negatively charged oxygens of the substrate. This bond is one of the factors stabiliz-
ing the bicyclic structure. The consecutive step, whereby CO2 is released and triplet
Fe(II)-peracid complex is formed, proceeds through a high energy transition state
(32.8 kcal/mol), which definitely precludes the triplet spin state from participation
in O2 activation by α-KDD.
From the QC results it follows that the oxidative decarboxylation proceeds on
the quintet PES, according to the mechanism C from Fig. 6. Thus, the attack of Fe-
bound oxygen on the keto group of the ketoacid triggers decarboxylation leading to
a high spin Fe(II)-peracid intermediate. This step involves a barrier of 16.1 kcal/mol,
considerably lower than the barrier on the triplet PES and somewhat smaller than
the barrier in the septet state. The quintet spin state seems optimal for oxidative
decarboxylation since first, the process can proceed through a stable Fe(II)-peracid
intermediate which means that only the C–C bond needs to be cleaved when oxygen
attacks the keto group, and secondly because quintet is a ground spin state for the
Fe(II)-peracid complex featuring a high-spin Fe(II). In contrast, in the septet state
the O–O bond needs to be cleaved simultaneously with the C–C bond as the peracid
complex is not stable in this spin state, whereas in the triplet state iron must adopt a
high-energy IS electronic configuration when the bicyclic intermediate forms.
The Fe(II)-peracid complex is very short-lived as the barrier to its decay amounts
to only 5.7 kcal/mol. This barrier is connected with the first one-electron transfer
from the iron to the O–O bond occurring during the O–O bond cleavage whereas the
transfer of the second electron is practically spontaneous. Cleavage of the O–O bond
completes the oxygen activation stage of the catalytic cycle yielding the reactive
Fe(IV)=O species. As can be noticed in Fig. 8, quintet is the ground electronic state
also for the oxoferryl species, with triplet and septet lying considerably higher.
In conclusion, the QC study on the O2 activation by α-KDD revealed that the
three proposed mechanisms (Fig. 6) are realized on three various PES:s differing in
total spin; this mechanism - spin dependence can be understood taking into account
the electronic structure requirements imposed by a particular spin state. The lowest
activation energy path is located on the quintet PES and it proceeds through oxidative
decarboxylation yielding Fe(II)-peracid intermediate which subsequently undergoes
an easy O–O bond heterolysis leading to the reactive Fe(IV)=O species.
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 837
HO
OH
OH
O
O O
O O III / II O
O Fe
IV O Fe O Fe
II
Fig. 9 The reaction of aromatic ring hydroxylation coupled to side chain migration catalysed by
HPPD
838 T. Borowski and E. Broclawik
Fig. 10 Active site model used in the recent study on the catalytic reaction mechanism of HPPD.
Second shell residues other than Gln309 are drawn in a simplified way
σ -complex forms and the following C–C bond cleavage is a homolytic process yield-
ing a carboxymethyl and semichinone radicals coordinated to Fe(II). Rebound of the
radicals completes the reaction. In contrast, when the big active site model was
employed either within a QC or QM/MM model, the mechanism simplified to a
single-step process (lower branch in Fig. 11). The key difference between these two
mechanisms is the electronic structure of the σ -complex which seems to be decisive
for the migration path. Thus, in a radical σ -complex/Fe(III) species the C–C cleav-
age is a homolytic reaction, whereas in the cation σ -complex/Fe(II) intermediate the
C–C cleavage is formally a heterolytic process coupled to formation of the new C–C
bond, i.e. a single step 1,2-migration.
The fact that different mechanisms were obtained with the two models highlights
the important role played in this case by the second shell residues, which were miss-
ing in the small model. With this respect, Gln309 is most important as it forms two
H-bonds with the first-shell (carboxylate) ligands (Fig. 10), and these bonds substan-
tially strengthen when the iron ion is reduced from Fe(IV) to Fe(II). As a result, the
electrophilic attack of the Fe(IV)=O on the ring is a two-electron process yielding
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 839
OH
OH
O O
O 17.7 O II
III O Fe
O Fe
4.3 11.7
14.3
HO
OH
O
-0.5 O
O II
O Fe
O
II
O Fe
-2.5
-23.9
-28.7
Fig. 11 Two distinct reaction mechanisms for the hydroxylation/side chain migration step of the
HPPD catalytic cycle. In red energies for the radical two-step mechanism found with the small
active site model; in green energies for the single-step mechanism supported by the recent study
employing the big model. Energies (in kcal/mol) are relative values with respect to the corresponding
Fe(IV)=O species
OH
O
III
Fe
O O
P O H+ + e-
A O
O A1 O
O OH
Fig. 13 HppE reaction energy profile for the mechanism C (with the C–H cleavage via 5 TS5) and
initial steps of the mechanisms A and B
Composition of the HppE active site model was described in the Sect. 2.1. As
the external electrons are supposedly delivered to the HppE active site together with
protons, these steps can be described as H-atom uptakes, and their energy can be
reliably computed since the total electric charge of the active site remains unchanged.
To this end, one only needs to calculate a donor-H bond energy for a suitable external
electron donor, which in our case is a fully reduced flavin coenzyme (FMN) that was
used in experimental work on HppE.
It follows from the computed reaction energy profile for the initial steps of the
three mechanisms (Fig. 13) that first, the barriers encountered in the mechanisms A
and B are considerably higher than in the mechanism C, and second, that only in the
mechanism C the energy of consecutive intermediates drops monotonically, whereas
in the mechanisms A and B some of the initial steps are energetically uphill. The
activation energies of the initial “chemical” steps are: 26.5, 33,8 and 12.5 kcal/mol,
for the mechanism A, B and C, respectively (TS1, TS3 and TS4), which forms a
strong argument in favour of the mechanism C. It is assumed that proton and electron
uptake steps are faster than the first chemical steps of the mechanisms A and B, i.e.
the effective barrier to electron/proton transfer is lower than 26.5 kcal/mol, which
seems reasonable.
The detailed reaction energy profile (relative energies with respect to 5 8) obtained
for the mechanism C is presented in Fig. 4, whereas the corresponding reaction
diagram is shown in Fig. 14. In the mechanism C, two electrons and two protons
are delivered to the active site prior to any bond cleavage step. In the resulting
intermediate 5 8 a high-spin Fe(II) is coordinated by a hydroperoxo ligand and the
substrate protonated on the alcoholate oxygen (other positions for the added proton
were considered, yet they had higher energies). Heterolytic O–OH bond cleavage
is connected with protonation of the leaving OH by the substrate’s OH group, and
842 T. Borowski and E. Broclawik
HO O O
HO O O
FeIII FeII FeIII FeIV
O O H+ + e- HO O O O O O
5
TS4 via MECPOXO
P O P O P O P O
OH OH H2O OH OH
64 59 5
(B1) 5
8 10 (C1)
5TS5'
5TS5
OH2
OH OH OH
FeII FeII FeIII HS Fe(III) IS Fe(III)
FeIII
O O O O O O
O O 5TS6 via MECPHS
P O P O P O P O
O OH OH OH
514
513 (EP) 512 (C2) 5
11
5TS8
5TS7
5TS9
OH2 OH2 5TS7'
5
II FeIII TS6'
Fe FeII
O 5
O O
TS10 O O
O
P O P O
P O
O O OH
5
17 516
HO
515
it leads to a highly reactive Fe(III)-O• species (5 9), which is an excited state form
of the more common Fe(IV)=O intermediate (5 10). Either of these reactive species
elicits cleavage of the C1-H bond, which is exposed towards the oxyl/oxo group.
In the native substrate with S-configuration at C2, the C2–H bond points in the
opposite direction than the Fe–O bond and hence is not accessible for the reaction.
The C–H bond cleavage by 5 9 is a barrierless process, as is the internal conversion
via MECP O X O to the ground state form, i.e. 5 10. The latter cleaves the C-H bond
with a barrier of 19.6 kcal/mol (5 TS5), and this step yields intermediate spin (IS)
Fe(III)-OH / carbon radical species 5 11 that decays via MECP H S to the ground
state with a high spin Fe(III) (5 12). In the following step, the Fe(III)-bound OH
group is protonated at the expense of the phosphonic group of the substrate (5 12
→ 5 TS8 → 5 16) and then the only remaining chemical step is closing the epoxide
ring. Concerning the stereochemistry of this step, HppE catalyses the conversion
to the cis-epoxide, i.e. fosfomycin (5 14), yet in some cases with an accompanying
trans-epoxide (5 17) byproduct. However, the active site model employed in the QC
study predicts the transition state (5 TS10) leading to the less hindered, and thus more
stable, trans-epoxide lies 3.8 kcal/mol below 5 TS9 that leads to fosfomycin. In other
words it means that the model predicts the trans-epoxide would be the major product.
This discrepancy can be easily understood if one recalls that the QC active site model
does not include any hydrophobic residues forming a niche for the methyl group of
the substrate (truncation of the model was a necessary simplification for this study
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 843
Extradiol dioxygenases are typically found in soil bacteria capable of using aromatic
compounds as a carbon source. They catalyse oxidative cleavage of a catechol ring
leading to acyclic 2-hydroxymuconaldehyde acid product (see Fig. 15) [10]. The
catechol substrate binds to the active site Fe(II) ion in a bidentate mode with one
oxygen left protonated (a) [47]. Subsequent binding of O2 leads eventually to species
b with a hydroperoxo bridge between the ring and ferrous ion. For the following steps
two different mechanisms were proposed. In one scenario (upper branch in Fig. 15),
the O–O bond is cleaved homolitically with the aid of metal ion which provides one
electron to reduce the OH radical to HO− (c). In the next step the oxyl radical attacks
the ring yielding an epoxide radical (d), which in a few following and fast steps
is transformed to the product complex f. This mechanism is supported by several
QC studies [6, 16, 44] and an X-ray structure for species with O–O bond cleaved,
analogous to c [27]. The second mechanism assumes that the O–O bond cleavage is a
heterolytic process coupled to ring expansion leading in one step from b to a lactone
intermediate e [31]. Hydrolysis of the lactone completes the reaction. Notably, this
mechanism assumes that the metal ion does not change its oxidation state between
species b and f. Indirect argument in support of this mechanism was obtained in
a study where a mechanistic probe was used. The probe substrate has a -CH2 -OH
group instead of the -OOH present in the intermediate b, and it was reported to be
transformed to 2-tropolone, i.e. a 7-membered ring derivative of species analogous
to e.
With the aim to test if the proposed mechanism would be energetically viable
for the mechanistic probe, a QC study was undertaken with the use of an active site
model for extradiol dioxygenase [8]. Moreover, as the mechanism assumes that the
metal ion is merely a Lewis acid, a much simpler model was considered, i.e. a model
844 T. Borowski and E. Broclawik
O HO O HO
O O
a b
OH
HO
O
heterolytic O-O cleavage,
Fe as Lewis acid FeII COO
FeII
+
O O
O
e f
Fig. 15 Two mechanisms proposed for ring cleavage reaction catalysed by extradiol dioxygenases.
Reprinted with permission from Ref. [8]
where the whole active site is replaced by a single molecule of formic acid. The two
models are presented in Fig. 16. In the reduced model the formic acid is placed so
that it donates its acidic hydrogen to form a H-bond with the leaving OH group of the
probe substrate. The second oxygen accepts a H-bond from the ring-bound OH. Such
arrangement enables the formic acid to shuttle a proton from O5 to O1 during the
reaction. For the actual active site model various protonation patterns were probed,
and the one that gave the lowest activation energy is presented in Fig. 16. Moreover,
for comparative purposes a hypothetical Fe(III) oxidation state was also considered
for the E-Fe-substrate complex, as Fe(III) is a stronger Lewis acid than Fe(II).
The major results of the study are synthetically presented in Fig. 17, where the
relative energies and key bond lengths are reported for the critical stationary points.
Two general conclusions can be drawn from the analysis of the Figure. First, that
Fig. 16 Models used in the study on the alkenyl migration mechanism. Reprinted with permission
from Ref. [8]
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 845
(a) O O O
H
H2 a
C OH H2C O
b H O
c O
OH H O
a 2.20 A CH2 H
O
O O H
b 1.91 A H H2O
O c 1.70 A O
H H
O OH O OH O HO
a
H 2C H2C
FeIII FeIII FeIII
b
O c O O
a 2.49 A C
H2
sFe = 4.05 b 1.76 A
His248H His248H sFe = 4.07 sFe = 4.09
c 1.60 A His248H
O OH O OH O H2O
a
H2C C III H2
FeII Fe FeIII
b C
O c O O
a 2.22 A
b 1.98 A
His248H sFe = 3.74 His248H sFe = 4.09 c 1.51 A His248H sFe = 4.11
Fig. 17 Reaction mechanisms obtained for the alkenyl migration with three different models.
Reprinted with permission from Ref. [8]
in this example formic acid is already quite reasonable model for the Fe(III) form
of the active site. Second, that even for the probe substrate the redox activity of the
metal ion is a key catalytic factor. More specifically, the critical bond lengths for
the transition states in panels A and B are very similar and the chemical structure of
the organic product is the same in both cases. In parallel, the computed activation
energies are also close to each other, though very high. Importantly, in both cases the
transition state is for a heterolytic alkenyl migration mechanism, which we attempted
to test (no spin polarization along the cleaved C–O bond). On the other hand, for
a model with a native Fe(II) oxidation state of the metal a different mechanism
with lower activation energy was found. As shown in Fig. 17c, in this case the C–
O bond cleavage is a homolytic reaction uncoupled from the ring expansion. Just
as in the radical mechanism proposed for the native substrate (Fig. 15), the leaving
846 T. Borowski and E. Broclawik
OH radical is one-electron reduced by the ferrous ion, whereas the -CH2 radical
attacks the nearby atom from the aromatic ring. Importantly, the calculated barrier is
significantly lower than that for the (hypothetical) Fe(III)-bound model, where the
proposed mechanism is the heterolytic alkenyl migration. In light of these facts, and
taking into account the enormous height of the barrier, it is safe to conclude that the
heterolytic ring cleavage mechanism is very unlikely for extradiol dioxygenases.
In summary, the study on the mechanism of ring expansion for a mechanistic
probe for extradiol dioxygenases showed that significant insights into catalytic factors
ruling enzymatic reaction can be obtain by deliberate constructing and testing a range
of QC models differing in size and composition. First, a minimal model, where a
molecule of formic acid was used in place of the whole active site, turned out to give
similar geometries and energies as the active site model. Such a small model could
be used in benchmark energy computations with, for example, CCSD(T) method.
Second, an active site model with a non-native Fe(III) state of the metal cofactor was
tested and compared to the model with the normal Fe(II) cofactor. Comparison of
the results allowed us to rule out the heterolytic ring expansion mechanism.
4 Concluding Remarks
In this chapter we showed how the DFT methods applied to active site models can be
used to test mechanistic hypotheses for metalloenzymes. As exemplified by the case
studies summarised here, for such systems one often needs to test several plausible
spin states or various reactive species that can be formed in a course of a redox
reaction. In some cases the second shell residues need to be included in the model
as their omission may even lead to a completely altered reaction mechanism. On the
other hand, in some other instances the whole active site model may be replaced
by a single molecular fragment and still yield valuable information. In yet another
case, the active site model needs to be rather large to reproduce the stereospecificity
of the enzyme. One should also note here, that the accuracy of the DFT methods
is inevitably limited, and computed reaction energies and barriers can sometimes
become burdened by an error of up to ca. 10 kcal/mol. However, the aim of the
mechanistic studies is usually not to reproduce an experimental value of a barrier
height or reaction energy, but rather to suggest the most likely mechanism. Thus,
if two mechanisms differ in the rate-limiting barriers by more than 10 kcal/mol the
lower barrier mechanism is to be selected as the most likely and the other is ruled
out. When the difference is below 5–6 kcal/mol, which is a typical magnitude of
error of DFT methods, one can attempt to construct a reduced model and use it
in, e.g. CCSD(T) benchmark calculations, though this procedure may not always
succeed. With all these caveats, we believe the DFT modelling of enzymatic reaction
mechanisms will continue to be a valuable complement to experimental techniques.
The experimental identification and characterization of transition states is out of
reach.
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 847
References
17. Georgieva, P., Himo, F.: Quantum chemical modeling of enzymatic reactions: the case of
histone lysine methyltransferase. J. Comput. Chem. 31(8), 1707–1714 (2010). https://doi.org/
10.1002/jcc.21458
18. Grimme, S.: Semiempirical GGA-type density functional constructed with a long-range dis-
persion correction. J. Comput. Chem. 27, 1787–1799 (2006)
19. Hammes-Schiffer, S.: Theory of proton-coupled electron transfer in energy conversion pro-
cesses. Acc. Chem. Res. 42, 1881–1889 (2009)
20. Hanauske-Abel, H.M., Gnzler, V.: A stereochemical concept for the catalytic mechanism of
prolylhydroxylase: applicability to classification and design of inhibitors. J. Theor. Biol. 94(2),
421–455 (1982)
21. Harvey, J.N., Aschi, M., Schwarz, H., Koch, W.: The singlet and triplet states of phenyl cation. A
hybrid approach for locating minimum energy crossing points between non-interacting poten-
tial energy surfaces. Theor. Chem. Acc. 99, 95–99 (1998)
22. Hausinger, R.P.: Fe(II)/α-ketoglutarate-dependent hydroxylases and related enzymes. Crit. Rev.
Biochem. Mol. Biol. 39(1), 21–68 (2004). https://doi.org/10.1080/10409230490440541
23. Higgins, L.J., Yan, F., Liu, P., Liu, H., Drennan, C.L.: Structural insight into antibiotic fos-
fomycin biosynthesis by a mononuclear iron enzyme. Nature 437(7060), 838–844 (2005).
https://doi.org/10.1038/nature03924
24. Holm, R.H., Kennepohl, P., Solomon, E.S.: Structural and functional aspects of metal sites in
biology. Chem. Rev. 96, 2239–2314 (1996). https://doi.org/10.1021/cr9500390
25. Hu, H., Lu, Z., Parks, J., Burger, S., Yang, W.: Quantum mechanics/molecular mechanics
minimum free-energy path for accurate reaction energetics in solution and enzymes: sequential
sampling and optimization on the potential of mean force surface. J. Chem. Phys. 128(034),
105 (2008)
26. Kawatsu, T., Lundberg, M., Morokuma, K.: Protein free energy corrections in ONIOM QM:
MM modeling: A case study for isopenicillin N synthase (IPNS). J. Chem. Theory Comput. 7,
390–401 (2011)
27. Kovaleva, E.G., Lipscomb, J.D.: Intermediate in the O–O bond cleavage reaction of an extradiol
dioxygenase. Biochemistry 47, 11168–11170 (2008)
28. Lee, C., Yang, W., Parr, R.G.: Development of the Colle-Salvetti correlation energy formula
into a functional of the electron density. Phys. Rev. B 37, 785–789 (1988)
29. Liu, P., Murakami, K., Seki, T., He, X., Yeung, S.M., Kuzuyama, T., Seto, H., Liu, H.: Protein
purification and function assignment of the epoxidase catalyzing the formation of fosfomycin.
J. Am. Chem. Soc. 123(19), 4619–4620 (2001)
30. Maeda, S., Ohno, K., Morokuma, K.: Exploring multiple potential energy surfaces: photo-
chemistry of small carbonyl compounds. Adv. Phys. Chem. Article ID 268,124, 13 pages
(2012)
31. Mendel, S., Arndt, A., Bugg, T.D.H.: Acid-base catalysis in the extradiol catechol dioxygenase
reaction mechanism: site-directed mutagenesis of His-115 and His-179 in Escherichia coli
2,3-dihydroxyphenylpropionate 1,2-dioxygenase (MhpB). Biochemistry 43(42), 13390–13396
(2004). https://doi.org/10.1021/bi048518t
32. Miłaczewska, A., Broclawik, E., Borowski, T.L.: On the catalytic mechanism of (S)-2-
hydroxypropylphosphonic acid epoxidase (HppE): a hybrid DFT study. Chem. Eur. J. (2012).
https://doi.org/10.1002/chem.201202825
33. Moran, G.R.: 4-Hydroxyphenylpyruvate dioxygenase. Arch. Biochem. Biophys. 433(1), 117–
128 (2005). https://doi.org/10.1016/j.abb.2004.08.015
34. Ng, S.S., Kavanagh, K.L., McDonough, M.A., Butler, D., Pilka, E.S., Lienard, B.M.R., Bray,
J.E., Savitsky, P., Gileadi, O., von Delft, F., Rose, N.R., Offer, J., Scheinost, J.C., Borowski,
T., Sundstrom, M., Schofield, C.J., Oppermann, U.: Crystal structures of histone demethylase
JMJD2A reveal basis for substrate specificity. Nature 448(7149), 87–91 (2007). https://doi.
org/10.1038/nature05971
35. Pelmenschikov, V., Blomberg, M., Siegbahn, P.E.: A theoretical study of the mechanism for
peptide hydrolysis by thermolysin. J. Biol. Inorg. Chem. 7, 284–298 (2002)
Bioinorganic Reaction Mechanisms—Quantum Chemistry Approach 849
36. Rod, T., Ryde, U.: Accurate QM/MM free energy calculation of enzyme reactions: Methylation
by catechol O-methyltransferase. J. Chem. Theory Comput. 1, 1240–1251 (2005)
37. Schenk, G., Mitić, N., Gahan, L.R., Ollis, D.L., McGeary, R.P., Guddat, L.W.: Binuclear met-
allohydrolases: Complex mechanistic strategies for a simple chemical reaction. Acc. Chem.
Res. (2012). https://doi.org/10.1021/ar300067g
38. Schofield, C., Zhang, Z.: Structural and mechanistic studies on 2-oxoglutarate-dependent oxy-
genases and related enzymes. Curr. Opin. Struct. Biol. 9(6), 722–731 (1999)
39. Senn, H., Thiel, W.: QM/MM methods for biological systems. Top. Curr. Chem. 268, 173–290
(2007)
40. Sheppard, D., Terrell, R., Henkelman, G.: Optimization methods for finding minimum energy
paths. J. Chem. Phys. 128(134), 106 (2008)
41. Siegbahn, P.E.M.: Modeling aspects of mechanisms for reactions catalyzed by metalloenzymes.
J. Comput. Chem. 22, 1634–1645 (2001)
42. Siegbahn, P.E.M.: Mechanisms of metalloenzymes studied by quantum chemical methods. Q.
Rev. Biophys. 36, 91–145 (2003)
43. Siegbahn, P.E.M., Borowski, T.: Modeling enzymatic reactions involving transition metals.
Acc. Chem. Res. 39(10), 729–738 (2006). https://doi.org/10.1021/ar050123u
44. Siegbahn, P.E.M., Haeffner, F.: Mechanism for catechol ring-cleavage by non-heme iron extra-
diol dioxygenases. J. Am. Chem. Soc. 126(29), 8919–8932 (2004). https://doi.org/10.1021/
ja0493805
45. Siegbahn, P.E.M., Himo, F.: Recent developments of the quantum chemical cluster approach
for modeling enzyme reactions. J. Biol. Inorg. Chem. 14(5), 643–651 (2009). https://doi.org/
10.1007/s00775-009-0511-y
46. Trewick, S.C., Henshaw, T.F., Hausinger, R.P., Lindahl, T., Sedgwick, B.: Oxidative demethyla-
tion by Escherichia coli AlkB directly reverts DNA base damage. Nature 419(6903), 174–178
(2002). https://doi.org/10.1038/nature00908
47. Vaillancourt, F.H., Barbosa, C.J., Spiro, T.G., Bolin, J.T., Blades, M.W., Turner, R.F.B.,
Eltis, L.D.: Definitive evidence for monoanionic binding of 2,3- dihydroxybiphenyl to 2,3-
dihydroxybiphenyl 1,2-dioxygenase from UV resonance Raman spectroscopy, UV/Vis absorp-
tion spectroscopy, and crystallography. J. Am. Chem. Soc. 124(11), 2485–2496 (2002). https://
doi.org/10.1021/ja0174682
48. Wójcik, A., Broclawik, E., Siegbahn, P.E.M., Lundberg, M., Moran, G., Borowski, T.: Role
of Substrate Positioning in the Catalytic Reaction of 4-Hydroxyphenylpyruvate Dioxygenase -
A QM/MM Study. J. Am. Chem. Soc. 136(41), 14472–14485 (2014). https://doi.org/10.1021/
ja506378u
49. Ye, S., Riplinger, C., Hansen, A., Krebs, C., Bollinger, J.M., Neese, F.: Electronic structure
analysis of the oxygen-activation mechanism by Fe(II)- and α-ketoglutarate (αkg)-dependent
dioxygenases. Chemistry 18(21), 6555–6567 (2012). https://doi.org/10.1002/chem.201102829
50. Zhou, J., Kelly, W.L., Bachmann, B.O., Gunsior, M., Townsend, C.A., Solomon, E.I.: Spec-
troscopic studies of substrate interactions with clavaminate synthase 2, a multifunctional α-
KG-dependent non-heme iron enzyme: Correlation with mechanisms and reactivities. J. Am.
Chem. Soc. 123, 7388–7398 (2001)
Index