Normal mode calculation is based on the harmonic approximation of the potential energy function, *V*, around a minimum energy conformation, Equation 1, where *r* is the distance between atoms, *R* is the equilibrium distance between atoms, *u* is the difference from equilibrium distance between atoms, *i* and *j* refer to the atom number, and *α* and *β* refer to the direction of the motion.

\mathbf{V}\left({\mathbf{r}}_{n}\right)=\frac{1}{2}\underset{\alpha =x,y,z}{\sum _{i}^{N}}\underset{\beta =x,y,z}{\sum _{j}^{N}}{\left(\right)close="|">\frac{{\partial}^{2}\mathbf{V}}{\partial {r}_{\mathrm{i\alpha}}\partial {r}_{\mathrm{j\beta}}}}_{}\n \n R\n \n

(1)

This approximation allows an analytic solution of the equations of motion by diagonalising the mass-weighted Hessian matrix, **D**, (the mass-weighted second derivatives of the potential energy matrix), Equation 2, where {D}_{\mathrm{i\alpha},\mathrm{j\beta}}={\left(\right)close="|">{\partial}^{2}\mathbf{V}/\partial {r}_{\mathrm{i\alpha}}\sqrt{{m}_{i}}\partial {r}_{\mathrm{j\beta}}\sqrt{{m}_{j}}}_{}\n \n R\n \n and *m* is the mass.

{\mathbf{e}}^{-1}\mathbf{D}\mathbf{e}=\text{diag}\left[{\omega}_{1\to n}^{2}\right]

(2)

The eigenvectors of this matrix, *e*, are the normal modes, and the eigenvalues are the squares of the associated frequencies, *ω*. The protein movement can then be represented as a superposition of these normal modes, fluctuating around a minimum energy conformation. The normal modes responsible for most of the amplitude of the atomic displacement are associated with the lowest frequencies.

In order to avoid time-consuming energy minimisations, a single-parameter Hookean potential can be used, which is shown to yield low-frequency normal modes as accurate as those obtained with more detailed, empirical, force fields [10]. The spring constant of the Hookean potential, *k*, is generally assumed to be the same for all interacting pairs within an arbitrary cut-off, *R*_{
c
}, beyond which interactions are not taken into account.

{V}_{\mathit{\text{ij}}}=\left\{\begin{array}{cc}\frac{{k}_{\mathit{\text{ij}}}}{2}{\left({r}_{\mathit{\text{ij}}}-{R}_{\mathit{\text{ij}}}\right)}^{2}& \phantom{\rule{2.77626pt}{0ex}}{R}_{\mathit{\text{ij}}}^{2}\le {R}_{c}^{2}\\ 0& \phantom{\rule{2.77626pt}{0ex}}{R}_{\mathit{\text{ij}}}^{2}>{R}_{c}^{2}\end{array}\right.

(3)

*Δ* *Δ*PT toolbox has a default cut-off of 12 Å and a Hookean potential of 1 kcal mol ^{-1} Å ^{-2}, these can be changed with the relevant flags (-c and -r respectively) in the GENENMM program. This approximation implies that the reference structure represents the minimum energy conformation. As default, all atom masses are set to the same fixed value in the kinetic energy term, 1 Da, as this approximation was shown to have little influence on the low-frequency modes; however, if desired the true atomic masses can be used (add -mass flag) or, if the model is based on the C_{
α
} atoms, only the residue mass can be assigned (add -res flag with -ca flag), Figure 1.

The GENENMM program also allows elastic network models with a varying spring constant, either with an empirical power decay on the interaction (-an flag), with the Hinsen exponential spring constant (-hine) [12], with Hinsen fitted spring constants (-hin) [42], or with individually set values between residues (-f *file*). GNMPROD also allows the production of the one-dimensional Gaussian network model instead of the three-dimensional elastic network model^{a}.

The resulting Hessian can be either fully diagonalised using the DIAGSTD program (not recommended for many more than 1000 sites - although in reality a system this size will only take around 10 minutes to solve on a desktop PC - run serial on an AMD Phenom™II 3.2 GHz Quad Core) or diagonalised using the rotation-translation-block (RTB) approach, DIAGRTB program. The RTB approach groups several atoms into a single point, which is generally achieved by division into residue blocks, or multiple residue blocks. The rigid-body rotations and translations of these ‘super’-sites are used as the new co-ordinate system instead of Cartesian co-ordinates [6]. When a small number of residues per block are used, the approximation has very little effect on the low frequency modes; although the frequencies do increase predictably due to internal block stiffening [8]. Using this approximation, it becomes possible to treat very large proteins, or protein complexes, in an all-atom level of description in reasonable computing time. DIAGRTB can be set to block into groups by a number of residues (-r *n*), block into the protein secondary structure (-str SECO), or block into custom domains (-str DOMN). The lowest frequency modes mainly depend on the overall shape of the system; they can be captured at extremely high levels of coarse-graining [50] or by using low-resolution structural data [51].

For comparison with atomistic simulations, the COVAR program allows calculation of a mass weighted covariance matrix, **F**, from trajectories generated with Gromacs, Amber, or DL_POLY, Equation 4, where **x** is the atomic position matrix and **m** is the mass matrix.

\mathbf{F}=\u3008{\mathbf{m}}^{1/2}\left(\mathbf{x}-\u3008\mathbf{x}\u3009\right){\left({\mathbf{m}}^{1/2}\left(\mathbf{x}-\u3008\mathbf{x}\u3009\right)\right)}^{T}\u3009

(4)

COVAR also corrects the displacements by removing the centre of mass motion and rigid body rotations; this produces more accurate results as the motion is not dominated by the rigid body motions. These displacements can be expanded into normal modes, principle component analysis, Equation 5, where *Q* is the eigenvalue matrix.

{\mathbf{e}}^{-1}\mathbf{F}\mathbf{e}=\u3008{\mathbf{Q}}^{2}\u3009

(5)

As we are again approximating the full motion to harmonic style motions, the solution is governed by harmonic oscillatory statistical mechanics. This means that for each eigenvector Equation 6 must hold true [52], where *k* is the Boltzmann constant, *T* is the temperature, and *v* is the normal mode number.

\u3008{Q}_{v}^{2}\u3009=\frac{\mathit{\text{kT}}}{{\omega}_{v}^{2}}

(6)

The COVAR program also plots the trajectory frames onto the lowest frequency eigenvectors, Figure 2. If the inbuilt principle component analysis tools in Gromacs or Amber are preferred, GroAMED can convert the default outputs respectively for use of the other toolbox programs.

The FREQ/EN program calculates the mode frequencies, the free energy, and the entropy from the calculated eigenvalues. The free energy and entropy are calculated using the full solution, Equations 7 and 8, and the Schlitter approximation [53] for comparison with other programs^{b}, where *G* is the free energy, *S* is the entropy, and \hslash is the reduced Plank constant.

{G}_{v}=-\mathit{\text{kT}}ln\left(\frac{1}{1-\text{exp}\left(-\frac{\hslash {\omega}_{v}}{\mathit{\text{kT}}}\right)}\right)

(7)

{S}_{v}=k\left(\frac{\frac{\hslash {\omega}_{v}}{\mathit{\text{kT}}}}{\text{exp}\left(\frac{\hslash {\omega}_{v}}{\mathit{\text{kT}}}\right)-1}-ln\left(1-\text{exp}\left(-\frac{\hslash {\omega}_{v}}{\mathit{\text{kT}}}\right)\right)\right)

(8)

The RMS/COLL program calculates the root mean squared displacements of all the atoms for each of the selected modes along with the collectivity, *κ*, of the modes, Equation 9 [54], where *α* is the collectivity constant selected so that \sum _{i}^{N}\alpha \left|{\mathbf{e}}_{i}^{2}\left(v\right)\right|=1 and *N* is the number of atoms.

{\kappa}_{v}=\frac{1}{N}\text{exp}\left(-\sum _{i}^{N}\alpha \left|\underset{i}{\overset{2}{\mathbf{e}}}\left(v\right)\right|\text{log}\left(\alpha \left|\underset{i}{\overset{2}{\mathbf{e}}}\left(v\right)\right|\right)\right)

(9)

The degree of collectivity indicates the fraction of atoms that are significantly affected by a given mode. For modes involving all the atoms, the degree of collectivity tends to be one, whereas for localised motions the degree of collectivity approaches zero (actually 1/*N*). The first 25-50 low-frequency normal modes tend to have a collectivity of above 0.4 meaning a significant part of the protein is involved in each mode. Low collectivity in the lowest frequency modes is indicative of extended parts of the system, either N- or C- termini or large unstructured loops. These loops cannot be modelled in a meaningful way as they intrinsically adopt multiple conformations, can appear to be invisible in one crystal form but visible in a different crystal form [55], or can even appear ordered due to crystal packing [56].

It is common practise to remove these extended parts prior to the normal mode computation. If a RTB approximation is used, there is some advantage to blocking and representing large unstructured loops by one block so they are included but do not dominate the motion.

The RMS/COLL program also calculates the B-factors, *B*, Equation 10 [52], from the mean square displacements of the first 25 lowest frequency modes (this can be changed with the -e *n* option), ignoring the six rigid block rotational and translation modes (starting from the seventh mode, -s 7, is the default). The B-factors should be calculated with the same mass weighting options as GenENMM. Correlations to crystallographic B-factors are typically found to be greater than 0.5-0.6 [15], and can even be greater than 0.8 [1]^{c}.

{B}_{i}=\frac{8\mathit{\text{kT}}{\pi}^{2}}{3{m}_{i}}\sum _{v}\frac{\left|{\mathbf{e}}_{i}^{2}\left(v\right)\right|}{{\omega}_{v}^{2}}

(10)

Adjusting the cutoff value can slightly improve such correlations, and if possible it is recommended that the correlation between the shape of the predicted and the experimental values is iterated upon when setting the cutoff value if no other information is available. The comparison between the shape of the computed and observed crystallographic B-factors provides a measure of how well the protein’s flexibility in its crystal environment is described by the normal modes. This motion tends to echo, but is more restricted (by crystal packing) than the motion in solution.

The CROSCOR program calculates the cross-correlation, *C*, of atoms over the first 25 modes (although this can be changed with the -b *n* and -e *n* flags), Equation 11. The cross-correlation shows which atoms tend to move in the same direction with a correlated motion in the modes, Figure 3(a). A value of 1 implies perfectly correlated motion and -1 perfectly anti-correlated motion. As the numerator is calculated as the dot product between the two vectors, as is a common manner of calculation, the correlation is dependant on the angle of the motion, i.e. fluctuations of the same period and phase but with a difference in orientation of 90° will give a value of 0. Thus, the cross-correlation is useful for identifying which atoms make up a group with correlated motions; however, a spherical breathing mode is difficult to identify from the cross-correlations because they are positive for atoms on the same side, negative for atoms on opposite sides, and 0 for atoms at 90° [57].

{C}_{\mathit{\text{ij}}}=\sum _{v}\left(\frac{{\mathbf{e}}_{i}\left(v\right)\xb7{\mathbf{e}}_{j}\left(v\right)}{{\left({\left|{\mathbf{e}}_{i}\left(v\right)\right|}^{2}{\left|{\mathbf{e}}_{j}\left(v\right)\right|}^{2}\right)}^{0.5}}\right)

(11)

The extent of this motion, \u3008\delta {r}_{\mathit{\text{ij}}}^{2}\u3009, can be calculated with the MOVEING program. This calculates the change in the distance between atoms between the equilibrium value and the value after applying the eigenvectors, Equation 12, Figure 3(b).

\phantom{\rule{-14.0pt}{0ex}}\u3008\delta {r}_{\mathit{\text{ij}}}^{2}\u3009=\sum _{v}\left(\frac{{\left({\mathbf{x}}_{i}+{\mathbf{e}}_{i}\left(v\right)-\left({\mathbf{x}}_{j}+{\mathbf{e}}_{j}\left(v\right)\right)\right)}^{2}-{\left({\mathbf{x}}_{i}-{\mathbf{x}}_{j}\right)}^{2}}{{\omega}_{v}^{2}}\right)

(12)

The OVERLAP program calculates the overlap of the atomic motion between eigenvectors, *v*_{1} and *v*_{2}, Equation 13. There can be different eigenvectors for the same NMA, eigenvectors produced with ENM to atomistic NMA, NMA eigenvector to difference in two crystal structures, or any combination thereof. A values of 1 indicates that the motions are identical whereas a value of 0 indicates that the motions are completely different.

{I}_{{v}_{1}{v}_{2}}=\left(\frac{\sum _{i}\left|{\mathbf{e}}_{i}\left({v}_{1}\right)\xb7{\mathbf{e}}_{i}\left({v}_{2}\right)\right|}{{\left(\sum _{i}{\left|{\mathbf{e}}_{i}\left({v}_{1}\right)\right|}^{2}\sum _{i}{\left|{\mathbf{e}}_{i}\left({v}_{2}\right)\right|}^{2}\right)}^{0.5}}\right)

(13)