Kissing loop interaction in adenine riboswitch: insights from umbrella sampling simulations

Riboswitches are cis-acting regulatory RNA elements prevalently located in the leader sequences of bacterial mRNA. An adenine sensing riboswitch cis-regulates adeninosine deaminase gene (add) in Vibrio vulnificus. The structural mechanism regulating its conformational changes upon ligand binding mostly remains to be elucidated. In this open framework it has been suggested that the ligand stabilizes the interaction of the distal"kissing loop"complex. Using accurate full-atom molecular dynamics with explicit solvent in combination with enhanced sampling techniques and advanced analysis methods it could be possible to provide a more detailed perspective on the formation of these tertiary contacts. In this work, we used umbrella sampling simulations to study the thermodynamics of the kissing loop complex in the presence and in the absence of the cognate ligand. We enforced the breaking/formation of the loop-loop interaction restraining the distance between the two loops. We also assessed the convergence of the results by using two alternative initialization protocols. A structural analysis was performed using a novel approach to analyze base contacts. Our simulations qualitatively indicated that the ligand could stabilize the kissing loop complex. We also compared with previously published simulation studies. Kissing complex stabilization given by the ligand was compatible with available experimental data. However, the dependence of its value on the initialization protocol of the umbrella sampling simulations posed some questions on the quantitative interpretation of the results and called for better converged enhanced sampling simulations.


Introduction
Riboswitches are portions of ribonucleic acid (RNA) able to regulate gene expression in bacteria and plants at several levels. They bind their sensed ligands without the need for protein factors. To regulate their target gene, riboswitches can either act on transcription, on translation, or, more rarely, as interfering, antisense or self-splicing RNAs [1]. More precisely, riboswitches are cis-acting RNA elements prevalently located in the leader sequences of bacterial mRNA [2] that regulate the expression of the same gene from which they have been transcribed. They are composed of an aptamer domain that binds the effector ligand, and of an expression platform, usually located downstream of the aptamer, that transduces the ligand-induced conformational switch into the gene expression regulation [3,4]. Riboswitches are classified according to the nature of the sensed ligand [1]. Among them, the purine-sensing riboswitches emerge as important model systems for exploring various aspects of RNA structure and function [5] because of their structural simplicity and relatively small size.
Within the purine family the add adenine-sensing riboswitch (A-riboswitch) is one of the most characterized. Found in the mRNA 5'-untraslated region, it cis-regulates the adenosine deaminase gene in Vibrio vulnificus acting rho-independently at the translational level [6]. Its regulatory activity depends on the availability of the ligand: in the presence of adenine the riboswitch is in the ON state, and the protein synthesis is permitted, whereas in the absence of the ligand the riboswitch folds into the OFF state blocking the translation initiation ( Figure 1). The ligand-bound structure of its aptamer [7,8] is a junction of three stems (P1, P2, P3) with the ligand completely encapsulated into the structure (Figure 2). There are three structurally important regions: the binding pocket, the P1-stem and the loop-loop tertiary interaction between L2 and L3, usually called "kissing loops". The latter includes two inter-loop Watson-Crick (WC) base pairs [9,10]. The ligand-dependent structural mechanism inducing the switch between the ON-and the OFF-state in the A-riboswitch mostly remains to be elucidated. The role of the ligand in the structural organization of the aptamer has been investigated using structure-based fluorescence spectroscopy [11], multidimensional NMR techniques [12] and single-molecule experiments [13]. These investigations however lack both the atomistic details and the distinct energetic contributions associated to ligand binding. In this open framework, in particular, it has been suggested experimentally that the ligand stabilizes the interaction of the distal kissing complex [14]. At the same time a stable kissing interaction seems to contribute to the ligand binding energy stabilizing the complex in a cooperative fashion [5,11,12]. Also in silico techniques have been used obtaining an accurate description of the system from a structural point of view [15][16][17][18]. In a few cases a computational approach has been employed to provide a thermodynamic characterization of the system [19][20][21]. In particular, Allner et al. [20] computed the free-energy profile corresponding to the formation of the kissing complex using molecular dynamics (MD) simulations in explicit solvent, both in the presence and in the absence of the ligand, using the CHARMM36 force field [22,23]. MD does not require experimental inputs and can in principle be used in a predictive fashion. However, accuracy of atomistic force fields is still debated and it is thus very important to compare results obtained employing different sets of parameters.
In this paper we use atomistic MD with the latest variant of the Amber force field [24] in combination with enhanced sampling techniques [25] to provide a more detailed perspective on the formation of the kissing loop complex. The combined approach allows this contribution to be dissected from the other ligand-aptamer interactions and the impact of the ligand on the stability of the loop-loop interaction to be quantified. We reproduce exactly the same protocol that has been used by Allner et al. [20] in order to perform a fair comparison between the two force-fields on this particular system. Effects of the initialization protocol on the results of umbrella sampling simulations are also discussed in detail.

Methods
In this work, we used umbrella sampling (US) simulations [26] to study the thermodynamics of the kissing loop in the presence and in the absence of the cognate ligand. We enforced the breaking/formation of the looploop interaction steering the distance between the two loops and then used the resulting structures as starting conformations for US with multiple restraints [27]. Simulations were carried out with the Gromacs 4.6.3 program package [28] combined with the PLUMED 2.0 plug-in [29]. All the simulation parameters are discussed in detail in the following subsections.

System set up
All simulations reported hereafter were performed on two systems: the add aptamer domain complexed with adenine (Holo form) and without adenine bound (Apo form). In both cases we used the X-ray structure solved by Serganov et al. [PDB:1Y26] [7]. The ligand was removed to simulate the unbound state. MD simulations were performed using the Amber99 force field [30] refined with the parmbsc0 a/g corrections [31] and the latest c torsional parameters [24]. The general Amber force field [32] was used to parametrize the ligand. Partial atomic charges were assigned using the restricted electrostatic potential fit method [33] based on an electronic structure calculation at the HF/6-31G* level of theory performed with Gaussian03 [34]. The electrostatic interactions were calculated using the particle-mesh Ewald method [35] and bond-lengths were constrained with LINCS [36]. The systems were set-up following exactly the protocol described in Allnér et al. [20]: aptamers were solvated in a rhombic dodecahedron having 8 nm as box vector lenghts, with a Mg 2+ -H2O solution using approximately 11000 TIP3 molecules [37], and a recent parametrization for divalent cations [38]. The 5 crystallographic Mg 2+ were initially kept at their respective position, whereas the additional 30 ions added to neutralize the system ([Mg 2+ ] = 0.18 M) were randomly placed. A steepest descent minimization (150 steps) was performed followed by 200 ps of MD at constant temperature (298 K, using stochastic velocity rescaling [39]) and pressure (1 atm, using the Berendsen barostat [40]) with positional restraints on both RNA and ions so as to equilibrate water. This procedure was repeated first removing the constraints on the ions and then removing all the remaining constraints. Finally, 12 ns unrestrained simulations at constant volume were performed for each system.

Umbrella sampling
In order to compute the thermodynamic stability of the loop-loop interaction we employed US simulations with multiple harmonic restraints. The distance between the center of mass (CoM) of the backbone atoms of the two loops (L2: bases 20-26; L3: bases 48-54; Figure 2B) was used as a collective variable (CV). We will refer hereafter to this distance as L. 44 uniformly spaced reference values were taken in the range spanning from 12.5 to 34 Å, and restraints with stiffness k = 20 (kcal/mol)/Å 2 were employed. In the production phase of the US simulations each of the 44 windows was run for 5 ns. A very important issue in US simulation is the generation of the starting conformations. We here performed two independent US sampling simulations, using starting conformations generated with two different protocols (hereafter referred to as forward and backward). To generate the starting points for the forward US simulations we employed the same protocol used by Allnér et al. [20]. Namely, we ran a series of 44 short (0.25 ns) simulations with a stiffer restraint (k = 40 (kcal/mol)/Å 2 ) keeping the CV at the 44 reference values, where each simulation was initialized from the last frame of the previous one. In this way, before each US window starting structure was sampled, we let the system equilibrate.
The reference values were iterated allowing an increasing distance between the loops. The backward US simulation was initialized with an equivalent procedure but iterating the restraints in the opposite order, i.e. starting from the structure with undocked loops. In principle, if US simulations are converged, the result should be independent of the initialization procedure.

Analysis methods
The data were analyzed using the last 4 ns of each window. The potential of mean force (PMF) profiles were constructed using the weighted histogram analysis method (WHAM) [27] implemented by Grossfield [41] taking the CV values distribution resulting from the US simulations. This implementation of WHAM allows to compute errors with a bootstrapping procedure that assumes uncorrelated samples. To avoid artifacts due to possible correlations we instead adopted a blocking procedure. Namely, we split the final 4 ns of each trajectory in four blocks of 1 ns each and performed the WHAM calculation using only a single block from each simulations. The four resulting profiles are aligned at their CV starting value (12.1 Å for the forward profiles, 34.4 Å for the backward profiles) and error at each point is computed as the standard deviation among the four profiles divided by √ 4 . To define the number of stacking interactions and the number of base-pair contacts a local coordinate system was constructed in the centre of each six-membered rings, with the x axis pointing towards the C2 atom and the z axis orthogonal to the ring plane. The pairing and stacking relationship between two bases j and k is based on the vector r jk , i.e. the position of ring center k relative to the coordinate system constructed on base j. The criteria for determining the canonical WC base pairs are: 1) the base pair must be AU or GC; 2) The relative position of the bases is compatible with the geometry of a WC interaction. The latter condition is considered satisfied when the product of the Gaussian function N (r jk ; μ, σ ) × N (r kj ; μ, σ ) > 10 −8 . Mean µ and covariance s were obtained from the empirical distribution of WC pairs in the crystal structure of the large ribosomal subunit [42]. The criteria for determining the noncanonical base pairs are: 1) the ellipsoidal distance D jk ≡ x 2 jk /25 + y 2 jk /25 + z 2 jk /9 < √ 2.5, and D kj < √ 2.5; 2) |z jk | and |z kj | < 2Å; 3) it is not a WC pair. The criteria for determining the stacking base pairs are: 1) the ellipsoidal distance D jk < √ 2.5, and D kj < √ 2.5; 2) |z jk | and x 2 jk + y 2 jk < 5Å) x 2 jk + y 2 jk < 5Å and x 2 kj + y 2 kj < 5Å. This procedure yields similar results compared to the MCannotate software [43] and was shown to be useful for characterizing both structural and dynamical properties of RNA molecules [44]. The software used to perform this structural analysis is available online (http://github. com/srnas/barnaba).

Forward process
The analysis of the Holo forward and Apo forward US trajectories allowed the PMF for the disruption of the kissing complex to be computed. The resulting profiles are plotted in Figure 3 for both Holo and Apo systems. The PMF shows a minimum at L ≈ 12.5Å , corresponding to the initial structure. The free energy change upon disruption of the kissing complex for the Holo structure is G = 52 ± 2 kcal/mol. For the Apo structure the stability of the complex is largely reduced to G = 35 ± 3 kcal/mol. The stabilization of the kissing complex provided by the ligand can thus be estimated as G = 17 ± 3 kcal/mol. To understand which are the interactions that are relevant for the kissing complex formation we analyze inter-loop pairings and inter-and intra-loop stacking interactions for each of the restrained simulations ( Figure 4). For both the Holo and the Apo forms, at a L ≥ 16Å , only the two inter-loop WC base pairs (G25-C49, G26-C48) peculiar of the loop-loop interaction are still formed. On the contrary, all the non-canonical base pairs are disrupted. In the Apo structure the inter-loop WC pairings were irreversibly lost at L > 23Å , whereas in the Holo structure they are at least partially maintained until L ≈ 30Å . We also analyzed the rupture of stacking interactions, distinguishing intra-loop and interloop contacts. Inter-loop stacking behaves in a manner qualitatively similar to the inter-loop WC pairings, going to zero at a distance L ≈ 23Å (Apo) and 30 Å (Holo). On the contrary, the intra-loop stacking interactions are still present when the kissing loop is disrupted, indicating that the internal structure of the two loops is preserved during undocking. It can be observed that in the Holo simulation the number of intra-strand stacking slightly decreases (≈ 5) for 29Å < ∼ L < ∼ 30Å because of the distortion in the structure induced by the one of the two inter-loop WC pairings. After this residual interaction is lost all the intraloop stacking contacts are recovered.

Backward process
In order to better assess the convergence of the free energy landscape for kissing complex formation, we also reconstructed the PMF profiles of the Holo and Apo structure from US simulations initialized with the backward process (Figure 2). The forward and backward profiles were aligned at L = 34Å , since at that distance the starting structure of the backward process is equal to the final structure of the forward one. The free-energy change upon docking is estimated taking the difference between the minimum value of the PMF (Holo: L ≈ 19Å ; Apo: L ≈ 16Å ) and its value for the undocked structure ( L = 34Å ): for the Holo G = −3.2 ± 0.9 kcal/mol, for the Apo G = −5.9 ± 0.6 kcal/mol. Albeit negative, these numbers are too small and not compatible with the ones found in the forward process. This is a clear signature of hysteresis in the pulling procedure that strongly biases the initial starting points of the US simulation. The reason for this discrepancy can be better understood by performing a structural analysis of the interactions on the different US windows. As it can be seen in Figure 4, in the backward process the native WC base pairs are not reformed. In general, a few contacts are formed between the two loops but they are not enough to stabilize the kissing complex. To be sure that this is a systematic effect we also tried a few alternative settings for backward simulations. Results are presented in the Appendix.

Discussion and Conclusions
Our calculations provide quantitative and atomistic details on the mechanism of kissing loop breaking and formation in the add riboswitch aptamer domain. The results can be compared with those recently obtained by Allnér et al. [20] on the same system using the CHARMM36 force field [22,23]. In particular the freeenergy computed with the forward process has been obtained with an identical protocol so as to allow a fair comparison between the force fields. In our work the estimated stability of the kissing loop complex is G = 52 ± 2 kcal/mol (Holo) and G = 35 ± 3 kcal/mol (Apo), so that upon ligand binding G = 17 kcal/mol > 0. On the contrary, Allnér et al. reported G = −10 kcal/mol < 0. The sign of ΔΔG indicates whether the ligand binding and the formation of the kissing loop complex are cooperative (positive) or anticooperative (negative). Results obtained with the two force fields thus interestingly lead to two opposite pictures.
Recent experiments probed the differential ligand affinity in aptamers with mutations hindering the formation of the kissing complex [14]. The change in affinity indicates a cooperativity between ligand binding and kissing complex formation. This stabilization has been estimated to be G ≈ 6 kcal/mol. This number should be interpreted with caution since it is based on the assumption that the mutated aptamer mimics a ligandbound state that is accessible to the wild type aptamer [14]. Results obtained with Amber force field are in qualitative agreement with this picture.
Recently, the thermodynamics of other stand-alone kissing complexes have also been studied using different biophysical techniques [45,46]. In these two experimental works the stability of the loop-loop interactions were found to be in the range 8 < ∼ G < ∼ 14 kcal/mol. Stability depends on the exact sequence and set of intra-loop interactions, but is always on the order of ten kcal/mol. The estimated stability of the kissing loop complex in Figure 3 Potential of mean force for kissing-complex formation Potential of mean force (PMF) as a function of the distance between the centers of mass of the L2 and L3 loops. Results for Holo and Apo forms are shown as obtained from two independent umbrella sampling simulations using different protocols to obtain the initial structures (forward, Fwd, and backward, Bwd, see main text for definition). Fwd and Bwd profiles are aligned at the maximum distance. our calculation, namely G = 52 ± 2 kcal/mol and G = 35 ± 3 kcal/mol for Holo and Apo respectively, is thus much larger than expected. Results obtained with CHARMM indicate a lower ΔG for both the systems [20], in better agreement with experimental results, even if still overestimated. Our result could be affected by the known overestimation of stacking interactions in the Amber force field [47]. Additionally, we would like to point out that the overestimation found in our calculations could also be a consequence of difficult convergence in the US simulations. To test if the US simulation are effectively converged, we tried to recover the profiles from simulations initialized with the backward process, with a procedure inspired by two directional pulling in steered MD [48][49][50]. The discrepancy between forward and backward process is an index of high dependence of the PMF on the initialization procedure and poses some questions on the actual convergence of the US simulations. Similarly to steered MD, one can expect that simulations in the forward and backward process are respectively overestimating and underestimating the kissing complex stability. Optimal results can be obtained in steered MD by combining simulations performed with both protocols [49,50]. We stress here that even if the forward simulations apparently recover the qualitative behavior of the general accepted model, they cannot be trusted for a quantitative estimation of the free-energy change. The fact that the backward process cannot reach the native docked state is a signature of a barrier in an orthogonal degree of freedom that is not properly sampled. A possible candidate is the barrier related to the desolvation of the loops, required to form the correct interstrand interactions. Additionally, we observe that pulling on the distance between the two loops does not necessarily induce the entropic reduction required upon docking. These issues are expected to affect both forward and backward pulling. Our simulation could not give an estimate of the additional barriers, but we can assume that these issues equally affects the Holo and the Apo systems. Thus, the converged ΔΔG upon ligand binding should be somewhere in between results from the forward and backward simulations. Thus we can expect the ΔΔG to be in a wide range between -2.7 kcal/mol and 17 kcal/mol, which is in qualitative agreement with already mentioned experiments [14]. The convergence problem is not related to the US method itself but to the difficulty of describing such a complex docking event using a single distance as a CV. This variable is not sufficient to drive the system through the appropriate transition states. This is likely due to the existence of additional barriers on hidden degrees of freedom (e.g. solvation). We believe that in order to reliably quantify the ΔΔG for this system with US or other biased sampling methods one should employ more complex CVs which are closer to the actual reaction coordinate.
In conclusion, in this work we addressed the formation of the kissing loop complex in the A-riboswitch aptamer by means of accurate molecular dynamics simulations in explicit solvent combined with enhanced sampling techniques in presence or absence of the cognate ligand. Results are compatible with experiments and suggest that the ligand stabilizes the kissing loop formation. However, our results also spot some weakness of the umbrella sampling method and call for calculations performed with more advanced techniques, which will be the subject of future investigations.

Appendix
In order to assess the backward procedure for the US method, we repeated it for the Apo form using a softer restraint (k = 20 (kcal/mol)/Å 2 ). We performed two additional simulations: A Starting from the final snapshot of the forward procedure explained above we perform a backward procedure with the softer restraint.
B We repeated both the forward and the backward procedures using the softer restraint.
Structural analysis is shown in Figure 5, where it can be appreciated that only the simulations with protocol A (in light blue) were able to correctly form the native WC pairs. Although the restraint stiffness could affect the result, we believe that here the differences are mostly due to the stochastic nature of MD.
Using the snapshots from protocol A, we performed another US simulation. The resulting PMF profiles are shown in Figure 6. Also in this case the free-energy landscape is incompatible with the one obtained from the forward protocol (compare with Figure 3), indicating convergence issues in the US calculation. Figure 6 Potential of mean force with alternative backward protocol Potential of mean force (PMF) as a function of the distance between the centers of mass of the L2 and L3 loops. Results for Apo form are shown in red (Bwd (k = 20)) as obtained from the control umbrella sampling simulations discussed in the Appendix, and aligned with the other two Apo profiles described in the Methods section (Fwd in orange, Bwd (k = 40) in black) for comparison.