Automated NMR relaxation dispersion data analysis using NESSY

Background Proteins are dynamic molecules with motions ranging from picoseconds to longer than seconds. Many protein functions, however, appear to occur on the micro to millisecond timescale and therefore there has been intense research of the importance of these motions in catalysis and molecular interactions. Nuclear Magnetic Resonance (NMR) relaxation dispersion experiments are used to measure motion of discrete nuclei within the micro to millisecond timescale. Information about conformational/chemical exchange, populations of exchanging states and chemical shift differences are extracted from these experiments. To ensure these parameters are correctly extracted, accurate and careful analysis of these experiments is necessary. Results The software introduced in this article is designed for the automatic analysis of relaxation dispersion data and the extraction of the parameters mentioned above. It is written in Python for multi platform use and highest performance. Experimental data can be fitted to different models using the Levenberg-Marquardt minimization algorithm and different statistical tests can be used to select the best model. To demonstrate the functionality of this program, synthetic data as well as NMR data were analyzed. Analysis of these data including the generation of plots and color coded structures can be performed with minimal user intervention and using standard procedures that are included in the program. Conclusions NESSY is easy to use open source software to analyze NMR relaxation data. The robustness and standard procedures are demonstrated in this article.


Background
Proteins are responsible for many different and important biochemical functions, such as ligand binding, signaling or catalysis [1][2][3][4]. Many of these events occur on the μs-ms timescale making the detection and interpretation of these motions critical for understanding their biological significance. Advances in solution Nuclear Magnetic Resonance (NMR) of biomolecules have enabled the measurement of protein dynamics on different timescales. Relaxation experiments measuring the two relaxation rates R 1 (the longitudinal relaxation rate) and R 2 (the transverse relaxation rate) as well as the steady state heteronuclear NOE are used to detect motion within ps-ns [5]. However, detection of these motions is limited by the molecular tumbling of the molecule, which is on the order of ns. Recent developments in 15 N/ 13 C relaxation-compensated Carr-Purcell-Meiboom-Gill (CPMG) and R 1ρ rotating-frame relaxation dispersion experiments [6][7][8][9] have enabled measurement of protein dynamics on the μs-ms timescale. From these experiments the exchange contribution to transverse relaxation (R ex ), exchange constant (k ex ), chemical shift differences between exchanging states (δω) and the population of individual states can be extracted. Consequently, relaxation dispersion experiments are commonly used to investigate changes in dynamics caused by ligand binding [10], to discover limiting events during enzyme catalysis [8], to follow protein folding and intermediate states as well as to detect functionally important hidden states [11]. Furthermore, it has been proposed that a detailed understanding of the dynamical behavior of a protein at atomic resolution may help identify novel drug target sites [12].
Here, we present a novel software (NESSY) that is designed for simple and automated analysis of relaxation dispersion data. Data can be fitted to two states and at one or more field strengths. Model selection protocols are included for the selection of the best models and Monte Carlo simulations are used to estimate errors of extracted parameters. Furthermore, publication quality 2D and 3D plots and color coded structures are easily created.

Programming
The software NESSY is written in Python http://www. python.org for multi platform compatibility and uses wx.Python http://www.wxpython.org to build the graphical user interface ( Figure 1). The program was tested on Windows (Vista and 7) and Linux (Ubuntu 9.10 to 11.04) and can be downloaded for free from http:// nessy.biochem.unimelb.edu.au either as compiled binaries (Linux, Mac or Windows format, including all Python modules) or as source code. The software is open source and general public licensed (GPL v3).
Theory NESSY performs automatic relaxation dispersion data analysis using NMR peak lists as inputs. In a first step, the effective transverse relaxation rate R 2 eff is extracted from a series of CPMG relaxation dispersion experiments according to: where T CPMG is the constant CPMG time, I(0) is the intensity of the peak in the reference spectrum and I (v CPMG ) is the intensity of the peak at the CPMG frequency, v CPMG . Then, data are fitted to different models, which can be selected individually. These models are divided on the basis of no exchange (model 1), two-site fast (model 2) and two-site slow (model 3) exchange.

No exchange: model 1
Model 1 describes residues, which are not involved in exchange processes: where R 0 2 is the effective transverse relaxation rate at infinite v CPMG . Two-state exchange: model 2 and 3

Equation (3) describes exchange between two states A and B
Where k a-b and k b-a are the forward and backward exchange rates, respectively, between the states A and B. Model 2 describes fast-limit exchange (k ex >> δω) and is fitted to [7,13]: and p a and p b are the populations of the two state models (p a is the major conformation, p a + p b = 1), k ex is the chemical/conformational exchange (k ex = k a-b + k b-a ) constant and δω is the chemical shift difference between states. In eq. (4) only R 0 2 , k ex and F can be extracted, as p b and, δω cannot be uniquely determined. Model 3 describes slow-limit exchange (k ex << δω) according to the Richard-Carver equation [14]: where D ± = 1 2 ±1 + +2δω 2 ( 2 +ξ 2 ) 1 / 2 NESSY also automatically calculates the exchange contribution to transverse relaxation, R ex , where for fast exchange, R ex = F/k ex , and slow exchange, R ex = p a p bk ex /(1 + (k ex /δω) 2 ).

Optimization and Model Selection
Exchange parameters are optimized by using the Levenberg-Marquardt algorithm to minimize the target function where R n,eff 2 is the value at each v CPMG at data point n, R n,eff ,calc 2 is the calculated rate constant. Error σ R eff 2 of R 2 eff was assumed to be constant within each residue and calculated according to the definition of pooled variance (eq. 7) [15]. σ R eff 2 is calculated using N dup replicated experiments for v CPMG values, such that n j replicates were obtained at (v CPMG ) j . The effective transverse relaxation rate of a given peak R eff 2 in these n j replicates has the standard deviation s j : Model selection is performed using AICc (Akaike information criteria with second order correction for small sample size) by default, but other tests, such as AIC or F-test are included as well [16]. Uncertainties are estimated using 500 (default value, but user controlled) Monte Carlo simulations.

Synthetic data
To demonstrate validity of fit and model selection performed by NESSY, synthetic data was produced using the in-built Synthetic Data Creator. Data were created for models 2 and 3 by introducing an error of 2, 5, 8 and 10% (see Table 1) as experimental NMR relaxation dispersion data usually have smaller errors than 5%. Data was analyzed using NESSY and fitted to the models described above.

Synthetic data
To demonstrate the validity of the coding and model selection, synthetic data containing 2, 5, 8 and 10% errors for models 2 and 3 were created using the inbuilt tool in NESSY. This synthetic data was fitted to models 1, 2 and 3 by minimizing the χ 2 function (eq. 6) and the model selection was performed by comparing AICc values. Errors for the extracted dynamics parameters were estimated by 500 Monte Carlo simulations. In all cases, irrespective of the noise, the correct model was chosen by AICc for each data set (Tables 1 and 2).
Notably, for the synthetic-data model 2 the χ 2 values calculated for model 3 were always at least as good as those for model 2, indicating the need for model selection by statistical methods such as AICc. For the synthetic-data model 3, both χ 2 and AICc selected model 3. During the calculation, NESSY creates plots for each fit to a model. For example, a comparison of fits to models 1, 2 and 3 for the synthetic data is shown in Figure 2 (3D plots are generated using the NESSY plotting tool).
The parameters k ex , p b and δω that were extracted for model 3 matched the initial parameters used to generate the synthetic data within the order of the introduced error ( Table 1). As expected, introducing larger errors into the synthetic data sets produced larger errors for the extracted parameters (calculated by Monte Carlo Simulations).

Fitting relaxation dispersion experiments of PCTX1 at 600 MHz
NESSY was tested using NMR data acquired at 600 MHz for PCTX1, a 40-residue spider toxin [19]. Data for two resonances showing frequency dependent R ex were fitted to models 1 to 3. Trp 24 best fitted to model 3 (two-state, slow-limit exchange) and Phe 30 to model 2 (two-state, fast-limit exchange). An overview of fits to model 1 to 3 as well as the best fit is shown in Figure 3.
Fitting of the slow-limit exchanging signal of Trp 24 extracts a k ex of 1029.3 ± 194 1/s. Although the fit shows slow-limit exchange, only one resonance was present in the spectrum. This is in accordance to the extracted population of the minor conformation (p b ) of 0.844 ± 0.002% and a chemical shift difference (δω) for the two conformations of 2884 ± 76 rad/s. As residue Phe 30 is in fast exchange, only k ex and F were extracted. The extracted conformation exchange rate (k ex ) for this residue was 3750.2 ± 162 1/s.
To best illustrate calculated motion, NESSY creates PyMol macros for color and width-coded structures. Such structures are created for model selection, k ex and R ex . Figure 4 shows color coding for both model selection and k ex of PCTX1. Models are visualized in different colors per residue, while extracted dynamics parameters (k ex and R ex ) are colored from yellow to red and also represented by thicker lines for increasing values. Note that in Figure 4 only two residues were analyzed and therefore, only two residues are colored.

Global fit to synthetic data
The dynamics parameters, k ex , p b and δω, are highly interdependent and data from a single static field is typically insufficient for their accurate determination. Consequently acquisition of relaxation dispersion data at  two or more fields is necessary for the robust and accurate estimation of these parameters [20]. NESSY is able to perform a global fit by sharing parameters k ex , p b and δω, but taking the field dependency of δω into account.
To demonstrate this function, synthetic data was created for data virtually recorded at 800 and 600 MHz. For 800 MHz data, synthetic data (5% error) previously described was used. Additionally, synthetic data was generated at 600 MHz (δω 600 = 600/800 δω 800 ). Both data sets were simultaneously fitted to models 1 to 3 (Table 3). Model selection was performed using AICc and errors of fit for the parameters were estimated by 500 Monte Carlo simulations per residue and individual experiments.
For the synthetic data for model 2 at two different spectrometer frequencies, fits (χ 2 ) generally improved with increasing number of parameters. Using AICc model selection NESSY clearly chooses the correct model (model 2). For the synthetic data for model 3, the correctly chosen model according to AICc was model 3 (Table 3). A summary of extracted parameters is given in Table 4 and fits to models 2 and 3 for both synthetic data sets (each at two spectrometer frequencies) are shown in Figure 5.

Cluster Analysis
Under certain circumstances, motion in different parts of a macromolecule may be concerted. Therefore, multiple residues/nuclei may experience the same exchange. In such cases, it is useful to cluster these residues and fit them simultaneously. Cluster analysis of data recorded at one or more static fields is included in NESSY, where single values of k ex and populations are determined for all residues. Individual parameters, δω and (eq. 4) are obtained for each residue. To demonstrate this function, synthetic data for 4 residues experiencing k ex of 306.15 1/s and a minor population (p b ) of 0.072 were created using model 3 (2 state slowlimit exchange) consisting of 5% noise. The transverse relaxation rate, R 0 2 , as well as δω were different for each residue. The data was clustered and simultaneously fitted to models 2 or 3 ( Figure 6A). The selected model by AICc was model 3 and the extracted exchange constant k ex was 286.69 ± 66.5 1/s and population p b was 0.075 ± 0.013. Furthermore, the extracted values for δω and R 0 2 correlated to their initial values ( Figure 6B, C).

Discussion
The software presented in this article permits the automated data analysis of relaxation dispersion experiments. The user has the opportunity to choose to fit to the entire data set or individually selected residues. In addition, data from two or more magnetic fields can be globally fitted. During global fits, field dependency of δω is taken into account. Multiple residues can be grouped and analyzed simultaneously using built-in cluster analysis (individual or global fit). Maximum flexibility of data entry has been included. For example, peak lists containing peak intensities that are created by any other software can be imported; protein sequences are read either from PDB files, retrieved from the internet using UniProt identifier http://www. expasy.ch or can be added manually; CPMG frequencies are read directly from Bruker VD (variable delay) lists. Furthermore, NESSY is linked to the Bruker Protein Dynamic Center (PDC, starting with version 1.1, http:// www.bruker-biospin.com, in collaboration with Dr. Klaus-Peter Neidig) so that projects can be exported in PDC and directly read into NESSY for extended data analysis. In addition, NMRView [21] tables can be imported directly. During the calculation, NESSY produces plots and CSV files (text files compatible with Excel and OpenOffice) of the R eff 2 profiles and the individual fits for each calculated residue as well as the extracted values (such as R 2 , k ex and p b ) and statistics (c 2 and AICc) for each model. In addition, color coded structures for selected models, k ex and R ex are created (structures are drawn using Pymol). Furthermore, NESSY offers tools to create custom made 2D and 3D plots in different formats and styles suitable for publication (see Figures 2 to 6).
Synthetic data as well as experimental NMR CPMG relaxation dispersion data were analyzed using NESSY. The quality of fits of the synthetic data was assessed by comparing original values to those extracted from data analysis (Table 1 and 2, Figure 2). For choosing the best fitting model NESSY evaluates the need and benefit for describing the experimental data with more parameters by using AICc, AIC or F-test. By default, model selection is performed using AICc to avoid over fitting, as more parameters (more complex models) may give better fits. The advantage of using AICc (and AIC) compared to F-test is that data do not have to be normally distributed. As relaxation dispersion experiments usually consist of 15 to 20 data sets, normal distribution of data cannot be assumed. For the synthetic data the correct model was consistently chosen and the extracted values were statistically similar to the initial values. To demonstrate usability of NESSY for NMR data, two signals of PCTX1 were analyzed (Figures 3 and 4). One signal experienced slow-limit and the other fast-limit chemical exchange. In the case of slow exchange, the minor populated conformation was present at 0.8%, which is in accordance with the NMR spectra, where only one peak is visible.
To be able to unambiguously distinguish between fast and slow exchange and to extract populations and shift differences, experiments should be collected at two or more different magnetic fields [20]. NESSY supports global fitting at multiple field strengths, while taking the field dependency of δω into account. Synthetic data for models 2 and 3 at two different static frequencies (600 and 800 MHz) were created and analyzed. Data were fitted to models 1 to 3 and in each case the correct model was chosen for both global fits (Tables 3 and 4, Figure 5).

Conclusions
In this article, the main features of NESSY have been presented, such as curve fitting, model selection and data presentation of relaxation dispersion experiments. Nevertheless, NESSY is not limited to these functions. As relaxation dispersion experiments enable the extraction of populations of individual states, NESSY offers tools to calculate the free energy (ΔG) between states. For data that fit best to fast exchange models, the populations of each state cannot be extracted, as k ex and δω cannot be uniquely determined. An integrated function in NESSY is to calculate the populations for residues with known chemical shift differences, such as observed in ligand binding experiments. If a series of experiments are recorded over a temperature range, the temperature dependence of ΔG can be used to extract entropy (ΔS) and enthalpy (ΔH) changes using van't Hoff analysis. NESSY supports automatic van't Hoff analysis of both linear and non-linear models [22]. Furthermore, NESSY can calculate activation energy barriers using transition state theory and the Eyring equation and generate energy landscape plots.
Taken together, we present user-friendly software for NMR relaxation dispersion ( 15 N/ 13 C) data analysis that requires minimal user intervention. In addition, NESSY can be used to analyze biophysical experiments, such as van't Hoff and transition state theory analysis and to create publication quality 2D and 3D plots. Due to its flexibility, users can choose between different relaxation dispersion models and statistical tests for model selection. Results generated by NESSY are Figure 4 Color coded structures created by PyMOL macros by NESSY. PCTX1 structure is color coded according to selected model (A) and k ex (B). Note only 2 residues were analyzed and therefore, the majority of the structure is colored in black (no data). Different models are colored differently; increasing values are colored from yellow to red and line width is increased for increasing values of k ex .