Residual Dipolar Couplings (RDCs) have emerged within the last two decades as a powerful source of data that can be acquired by Nuclear Magnetic Resonance (NMR) spectroscopy. RDCs can be used for several purposes, but the primary impetus in their use is the study of structure and dynamics of biomolecules in solution [1]. This is attributed to their ability to provide structural information at atomic resolution, while also containing sensitivity to motions ranging from time scales of picoseconds to milliseconds [2–5]. RDCs have been used in studies of carbohydrates [6–10], nucleic acids [11–16], proteins [17–24] and small molecules [25, 26]. Their utility has also been demonstrated in various applications including: investigations of protein backbone structure [23, 27, 28], development of powerful assignment strategies [29, 30], and the simultaneous examination of structure and dynamics of target molecules [31–33]. In summary, RDCs can be used as informative, accurate, and economical probes of structure and internal dynamics for both routine and challenging macromolecules [12, 31, 34–36].

Historically, the use of RDCs has been limited by two main factors: sample preparation, and data analysis. The introduction of a variety of alignment media [37–39], combined with advances in instrumentation [40] and data acquisition, have mitigated the experimental limitations in obtaining RDCs. The major challenge in utilization of RDC data in recent years has been in disentangling the various components which it encapsulates. This task is particularly challenging considering that an individual RDC datum reports valuable information related to the overall tumbling and preferred orientation of a molecule, as well as the relative orientation of each individual interaction vector within the alignment frame. Therefore the main limiting factor in full utilization of RDC data has been a lack of powerful, and yet user-friendly, RDC analysis tools capable of extracting the pertinent information that is embedded within this complex source of data.

Nearly all of the currently existing NMR data-analysis software packages such as Xplor-NIH [41], CNS [42], CYANA [43], DYANA [44] or MSpin [45] have been modified to include RDC data as additional restraints in their analyses. RDC data have also been incorporated into some popular molecular dynamic simulation packages such as Amber [46] and GROMACS [47]. Despite these adaptations, structural refinement of biological macromolecules from RDC data continues to be a non-trivial task. The proper use of RDC data is further hindered by an iterative process that normally consists of three distinct steps. During the first step, an initial structure is evaluated for fitness to RDC data [48, 49]. During the second step, structural refinement software is deployed for refinement of an initial structure that may be several angstroms away from the native structure (as measured over the backbone atoms). Related to this step, various mechanisms have been introduced [48–54] for the estimated order parameters or order tensors to prime the search mechanisms of the refinement tools. Finally, a third step often consists of visual inspection of the refined structure using programs such as Molmol [55], Pymol [56] or VMD [57]. This entire process, of structure refinement from RDC data, may be manually repeated until convergence to an optimal structure. However, a number of pragmatic and theoretical limitations are normally encountered during the refinement of macromolecular structures from RDC data. These limitations include activities such as the conversion of file formats and the transferring of results from one analysis software to another, which are tedious but important. Another category of challenges associated with the study of RDCs is selecting the most optimal mechanism of structure refinement using RDC restraints. Examples include: selection of the most representative order tensor/s during the refinement process, selection of region/s that should be subjected to a refinement procedure, or determining the aggressive nature of a refinement process (temperature scheme of annealing).

Here we report advances in the REDCAT [48] software package, which address several of the aforementioned hindrances in an effort to promote and expedite more effective analyses of RDC data. This latest version of REDCAT incorporates several new features including combined analyses, inclusion of a flexible selection mechanism, importing/exporting functions, improvement of the core computational engine, and the release of its source code under the GNU open-source licensing. In addition, interfaces have been developed that allow for direct interaction of REDCAT with VMD [57] and Xplor-NIH [41]. In this report, we describe each new feature and its utility in detail. We also reveal the results obtained in the testing of these features with respect to structure refinement and validation using computed and experimental RDC data. The latest software package is available for download via the WWW from http://ifestos.cse.sc.edu.

### Theory

Theoretical and experimental aspects of RDCs have been extensively presented in the literature. However, in order to facilitate a more informed discussion, here we include a very brief overview of RDC theory as it relates to the presented work.

Residual Dipolar Couplings (RDCs) are derived from the interaction of two magnetic dipoles, when in the presence of the external magnetic field of an NMR instrument [

35,

58]. This interaction yields information regarding the average orientation of two nuclei relative to the magnetic field (Equation 1).

In Equation 1,

*RDC*
^{
ij
} is the RDC between nuclei

*i* and

*j*,

*μ*
_{
0
} is the magnetic permeability of free space,

*h* is Planck’s constant,

*γ*
_{
i
} and

*γ*
_{
j
} are the nuclear specific gyromagnetic ratios for atoms of type

*i* and

*j*,

*r*
_{
ij
} is the distance between nuclei

*i* and

*j* (in units of Angstrom), and

*θ(t)* is the time dependent angle between

*B*
_{
0
} and the vector adjoining nuclei

*i* and

*j*. REDCAT utilizes an expanded form of Equation 1, shown in Equation 2, and its vector notation (refer to Equation 3).

In Equation 2,
is the maximum observable RDC value for a pair of nuclei *i* and *j*, when separated by 1.0Å; *x*, *y* and *z* represent the normalized coordinates of the vector adjoining nuclei *i* and *j*; and *s*
_{
kl
} denotes the individual elements of an order tensor matrix. Reformulation of RDCs, as shown in Equation 3, provides a computationally friendlier form of the RDC interaction. In this equation, *S* refers to the *Saupe* order tensor matrix [48, 51, 59] and *v* represents the normalized interacting vector.

Available RDC data from multiple sites on a protein can be combined into a single linear algebraic representation, shown in Equation 4. This

*Ax=b* representation of RDCs enables the use of Singular Value Decomposition (SVD) [

48,

51,

60,

61] to easily obtain the optimal order tensor matrix. In Equation 4, the matrix

*A* is computed from the coordinates of the interacting vectors,

*x* corresponds to the vector representation of an order tensor, and

*b* corresponds to the observed values of the RDC data. Furthermore, in this equation, the traceless property of the order tensor is utilized to calculate

*S*
_{
zz
} from

*S*
_{
xx
} and

*S*
_{
yy
}, in order to reduce the six variables of the order tensor vector to five. Elimination of the

*S*
_{
zz
} term is the reason for the appearance of the

*z*
^{
2
} term in the first two columns of the

*A* matrix in Equation 4. Other modifications of the system of equations shown in Equation 4, with their corresponding adaptations of SVD, have also been introduced in order to accommodate conformational rotation of side chain methyl and phenyl groups [

62,

63].