One can use pair-wise forces to assess specific protein-ligand interactions, or to just debug a system, i.e. by checking for unrealistically high forces. Yet, in most cases, the response to a mechanical or allosteric signal becomes visible by comparing forces for different states. These states can be a system with and without applied external force, or the apo and holo configuration of an allosteric protein; here we call these states reference (ref) and perturbed (pert) for simplicity. Parts under mechanical strain become visible by subtracting forces of both states for each pair of interacting atoms. Changes in force, ΔF, are then given by the equation:
To achieve a sufficient signal to noise ratio, FDA will often require exhaustive sampling of the conformational space. This is best done by calculating a set of independent trajectories, as MD simulations are prone to being trapped in local minima. The signal to noise ratio is estimated by comparing ΔF to differences in force observed for systems in the same state, ΔFnoise, where ΔFnoise = Fref - Fref'. Due to random coordinate fluctuations on the ps-ns timescale (Figure 1C), we found convergence of forces to require simulation timescales of 10 to several hundred nano seconds [4–7] (Figure 1D), depending on the flexibility of the system. Moreover, normalization by the standard error between independent trajectories, ε, may help to improve data quality. We previously defined the normalized change in force, Δf
ij
as:
Visualization and statistical analysis
To be able to map changes in force onto the atoms of a protein structure, e.g. as a color gradient, a projection of the pair-wise space into the atom wise space is required. Such a projection is given by the simple column sum:
ΔF
j
can be seen as the mechanical coupling of an atom j with respect to all other atoms. We use absolute values in the summation to avoid forces from canceling each other out. The ΔF
j
can be visualized by simply writing them as b-factors into a PDB file, Figure 2A+B. Alternatively, changes in pair-wise forces can be mapped as a network on a protein structure, Figure 2C. Pair-wise forces can be seen as edges connecting atoms, and the force between these atoms is the edge weight. The network of interactions under strain can then be visualized by drawing an edge between each atom for which |ΔF
ij
| > cutoff. Hereby it is often insightful to study the contributions of individual groups, such as side-chains or backbone atoms [4, 6], what is easily achieved by restricting the summation to atoms within a certain group.
More details are obtained by statistical analysis on force trajectories, e.g. using principal component analysis (PCA). We previously identified a network of correlated fluctuations governing the allosteric function of MetJ by performing PCA on force trajectories [5]. In this case, it can be advantageous to look at forces between residue pairs instead of atom pairs. This will significantly decrease the memory requirements of the calculations.
Finally, FDA automatically provides detailed ligand-protein interaction profiles, as we have demonstrated for MetJ [5]. The interaction of a ligand with a protein structure depends on the complex interplay of individual atoms, charges play a crucial role. Intuitively, this can be seen as a pair-wise interaction between the ligand and the individual protein residues. FDA thus naturally provides a detailed description of the interaction pattern.
Usage and tools
To ease the analysis of force trajectories, the FDA implementation comes with a tool called g_fdatools to process or convert force trajectories. g_fdatools provides options to sum up force trajectories, to average forces over a distinct time interval and to optionally calculate the corresponding variances. To allow for normalization and estimation of the sampling quality, the standard error ε between average forces obtained from individual runs can be calculated. Due to the vast number of pair-wise atomic interactions, which in the worst case grows quadratically with system size, it is often advantageous to instead analyze forces between residue pairs. g_fdatools allows to calculate residue averaged pair-wise forces
from atomic force trajectories in two ways.
The first and often sufficient way is to simply sum up pair-wise forces between every two residues u, v as defined in Equation 4.
The resulting forces are not entirely correct, as we do not work with vectorized forces, but they are a good approximation. The second way is to calculate force vectors between residues using the coordinate trajectories. Here, prior to summation, exact atomic force vectors are restored by multiplying the force F
ij
with the coordinate vector
between atoms i, j:
In this case, x, y, z components and the norm of the force vectors between each residue pair will be returned. From our experience, a coordinate output frequency of 10 ps is sufficient to precisely restore the force vectors. In both cases, it is possible to restrict averaging to certain interaction types and atom groups, e.g. only Coulomb interactions between pairs of side-chains.
For further analysis we provide an R package called FDAtools that allows to import and analyze pair-wise forces in R [10]. The package provides several methods for data import, supporting full and sparse matrix representations. In all cases the output format is a simple matrix, which can easily be used for statistical analyses. Various functions to visualize the force distribution by mapping pair-wise forces onto PDB structures as described above are provided. Networks of residue pairs showing correlated changes in forces, as e.g. revealed by PCA, can easily be created with FDAtools as well. We hereto provide wrapper functions that allow to easily apply the R PCA implementation (prcomp) on FDA data. Finally, we provide a tool to convert binary force matrices into an ASCII representation, allowing users to easily import the data into the software package of their choice.
A step-by-step tutorial guiding the interested user through simulation setup, data-processing and analysis is available on the FDA project page hosted on Google Code http://code.google.com/p/force-distribution-analysis/.
Limitations and complementary methods
FDA is perfectly fitted to elucidate mechanisms rendering proteins mechanically stable, or to understand slow conformational transitions. In cases of high intrinsic conformational flexibility or when larger conformational transitions occur during the sampling of forces, force averages will not converge, and FDA will fail to provide meaningful results. Such situations may occur for simulations at temperatures close to the melting point, for intrinsically disordered proteins, or molecules showing conformational transitions on the ps to ns time scale. Applying external strain to a protein and observing the induced changes by FDA as described in [4, 6] is only feasible for mechanically stable and rigid proteins. For proteins with low mechanical stability, other mechanical perturbations such as e.g. ligand binding [5] can be characterized, or a time-resolved analysis of forces is required, which is the focus of ongoing work.
We here want to emphasize that FDA is not a replacement but a complement for existing, coordinate based methods. It provides an additional layer of information, namely how internal stress distribution will lead to conformational or entropic changes. By combining, for example, conventional trajectory analysis, normal mode analysis and FDA, we could shed light on dynamic allostery in MetJ, a transcription factor [5].