Tractable RNA–ligand interaction kinetics

Background The binding of small ligands to RNA elements can cause substantial changes in the RNA structure. This constitutes an important, fast-acting mechanism of ligand-controlled transcriptional and translational gene regulation implemented by a wide variety of riboswitches. The associated refolding processes often cannot be explained by thermodynamic effects alone. Instead, they are governed by the kinetics of RNA folding. While the computational analysis of RNA folding can make use of well-established models of the thermodynamics of RNA structures formation, RNA–RNA interaction, and RNA–ligand interaction, kinetic effects pose fundamentally more challenging problems due to the enormous size of the conformation space. The analysis of the combined process of ligand binding and structure formation even for small RNAs is plagued by intractably large state spaces. Moreover, the interaction is concentration-dependent and thus is intrinsically non-linear. This precludes the direct transfer of the strategies previously used for the analysis of RNA folding kinetics. Results In our novel, computationally tractable approach to RNA–ligand kinetics, we overcome the two main difficulties by applying a gradient-based coarse graining to RNA–ligand systems and solving the process in a pseudo-first order approximation. The latter is well-justified for the most common case of ligand excess in RNA–ligand systems. We present the approach rigorously and discuss the parametrization of the model based on empirical data. The method supports the kinetic study of RNA–ligand systems, in particular at different ligand concentrations. As an example, we apply our approach to analyze the concentration dependence of the ligand response of the rationally designed, artificial theophylline riboswitch RS3. Conclusion This work demonstrates the tractability of the computational analysis of RNA–ligand interaction. Naturally, the model will profit as more accurate measurements of folding and binding parameters become available. Due to this work, computational analysis is available to support tasks like the design of riboswitches; our analysis of RS3 suggests strong co-transcriptional effects for this riboswitch. The method used in this study is available online, cf. Section “Availability of data and materials”.

The design of tailored riboswitches for specific applications and advanced control logic is therefore an attractive endeavor in synthetic biology [1]. A riboswitch can be understood as the composition of its aptamer and its actuator domain. It senses the ligand by binding it to a binding pocket of the aptamer domain; this influences the conformations of the actuator domain and thereby leads to a measurable response to ligand binding, e. g. by terminating transcription (OFF-switch) or suppressing the terminator hairpin (ON-switch).
The computational design of artificial riboswitches requires a sufficiently accurate model of the ligand binding process and the structural response of the RNA to ligand binding. The equilibrium thermodynamics of RNA-ligand binding has been studied for RNA-RNA interactions, e. g. in [2,3], and for small molecule binding in RNA-ligand [4]. As in the case of single molecule RNA folding, purely thermodynamic models are sometimes insufficient because they disregard the dynamics of the process. This can cause dramatic mis-predictions. Various approaches have analyzed the kinetics of single molecule RNA folding [5][6][7][8]. For tractability, the continuous process is decomposed into elementary steps, simplified based on heuristic assumptions, and/or approximated by a coarse-grained process.
Wolfinger et al. [7] present a coarse-graining approach to approximate RNA folding. Following e. g. [6], they analyze the folding process on the energy landscape of conformations, i. e. secondary structures, R i of an RNA R. Conformation change is modeled by elementary moves (base pair insertion or deletion) endowed with reaction rates that follow the Arrhenius rule and thus depend on the energy barrier between the source and target conformations. In the approximation of RNA secondary structures, activation energies for opening/closing of single base pairs are approximately constant. The energy barrier thus effectively depends only on the energy difference between source and target [7]. This defines a Markov Process on the state space of all secondary structures, which is too large to make it possible to analyze it by diagonalizing the corresponding rate matrix. To effectively reduce the state space, [7] combine states into basins that consist of all conformations that are connected to the same local minimum by their gradient walk on the energy landscape. Since gradient walks connect states to their lowest energy neighbors, they correspond to the fastest folding paths from a state into a local minimum. This provides the rationale for approximating the full process by the macroprocess on gradient basin macrostates, which are assumed to be equilibrated. Consequently, the rates between the macrostates are canonically derived as weighted sums of microrates of the original process. By employing this heuristic approach, the size of the conformation space of smaller RNA molecules of up to a hundred nucleotides is typically reduced to just a few thousand macrostates. For example, in our analysis of the 81 nucleotide long riboswitch RS3 [1], 11.4 millions of secondary structures are mapped to 1133 macrostates. The macroprocess is finally solved by diagonalization. In our approach we re-use ideas of this coarse-graining, which also allows us to re-use several tools for single-molecule RNA kinetics (RNAsubopt [9], barriers [10], treekin [7]).

RNA-ligand interaction model
We are going to describe a reaction system of the RNA R (given by its sequence of nucleotides {A, C, G, U}) with the ligand L at the level of RNA and binding complex conformations, such that we can study the kinetics of association, dissociation, and conformation changes. For simplicity, we assume that there is only a single ligand conformation (also denoted L). In the same way as a single RNA molecule transitions between various conformations during the folding process, the complex of RNA and ligand adopts different conformations LR i . Importantly, only a subset of the RNA conformations binds the ligand. The part of the total state space that corresponds to the RNAligand complexes is therefore isomorphic to a subset of the state space of the free RNA molecule. Thus, our system of consideration consists of the reactions for all i, j ∈ {1, . . . , N}, i = j. According to the rate laws for elementary reactions, the rates of each of these reactions depend on specific rate constants and the concentrations of the molecules. The reactions 1 and 4 only have nonzero rate constants, if the RNA conformations R i and R j are related by an elementary move such as the insertion or deletion of a base pair. Moreover, L and R i can interact only for the subset of the R i that form an appropriate binding pocket; otherwise, the complex LR i is deemed unstable and thus excluded from the model. Since RNA conformations correspond to RNA secondary structures, the energies of monomer states can be calculated from the Turner energy model [11]. For dimer states, we add the aptamer-ligand-specific binding energy. For the exemplary studied riboswitch RS3, this energy can be derived from the empirical dissociation constant [1]. Finally, we derive the rate constants as Metropolis rates with appropriate pre-exponential factors that can be estimated from empirical rates. Note that the rates of base pair opening k − and closing k + are directly related by the energy change G due to the closing. Concretely, k − /k + = exp( G/RT) for G < 0. Experimental values are available for the zippering rate, which corresponds to the rate of closing the last hairpin in a helix. A careful analysis in [12] yields a value in the range 4.7 · 10 7 to 10 9 s −1 roughly consistent with earlier estimates [13][14][15]. In principle, a kinetic constant can be derived for the closing of first base pair in a loop from a worm-like chain model [12,16]; following earlier work on RNA kinetic models [7], we use here a single kinetic parameter k + for all base pairs. An empirical rate of one specific theophylline aptamer association was reported as 600 M −1 s −1 [17], which may serve as rough estimate for comparable systems. Note that [17] measured the macroscopic apparent rate that depends on the rate of dimerization as well as the rate of refolding into structures with theophylline binding pocket.
While the Reactions 1, 3, and 4 are of first order, the second order association in Reaction 2 introduces nonlinearity into the system. Assuming ligand excess, which is a very plausible assumption for small molecular ligands, however, it is possible to devise a pseudo-first order approximation of the system.
Even if the reaction equations above appropriately model the RNA-ligand interaction, this system is still computationally intractable for typical riboswitch sizes. As a remedy, we construct a coarse-grained process based on the separate gradient-basins for the monomer and dimer-states. The monomer states with suitable binding pocket are connected to dimer states, cf. Fig. 1. Importantly, there is no direct mapping from monomer macrostates to dimer macrostates of our coarse-grained system because conformations without binding pockets are absent from the "dimer world". The upper basin in the "monomer world" of Fig. 1 is subdivided into two basins in the dimer world; conversely, the middle and lower Fig. 1 Relation between the monomer energy landscape (above) and the dimer energy landscape * (below). We obtain the landscape of the dimers from the landscape of the monomers by constraining the structures to contain the binding pocket. Blue circles indicate structures with binding pocket, while the remaining structures are shown as green squares. Notably, the assignment to gradient-basins regularly differs for corresponding structures in both landscapes, if gradient neighbors (solid arrow transitions) of the monomer world have no binding pocket such that non-gradient neighbors (dashed arrow transitions) of the monomer world correspond to gradient neighbors in the dimer world. Filled circles and squares mark local minima monomer basins correspond to a single basin of the dimer world.

Contributions
We start by elaborating the general macroprocess of RNA-ligand interaction, based on gradient basin macrostates, and derive the corresponding rate constants. This contributes an original description of coarse-grained interaction processes; it is the first fundamental prerequisite for our tractable RNA-ligand kinetics approach. Furthermore, we leverage that a wide spectrum of biological RNA-ligand systems operate under strong ligand excess, justifying the pseudo-first order approximation. On these grounds, we establish the first analytical approach for RNA-ligand interaction kinetics. Based on solving the master equation of the interaction process, this enables the computation of time-dependent macrostate probabilities. Finally, we study the kinetics of the artificially designed riboswitch RS3 [1] interacting with theophylline. We analyze the system at different concentrations and present results that strongly suggest co-transcriptional effects.

Methods
We consider the fixed interaction system of the RNA R and the ligand L. Let X denote the set of all monomer microstates, X = {R i | i = 1, . . . , N}; in our setting, the R i are the secondary structures of a given RNA sequence. The subset X + ⊆ X comprises the conformations that can bind the ligand. Here X + contains all states with a specific binding pocket. Furthermore, define X * as the set of dimer A dimer microstate LR i ∈ X * has the energy E(R i ) + θ, where θ < 0 denotes the binding energy of R and L. The inverse temperature is b = 1 RT , where T is the absolute temperature and R is the universal gas constant. For a set S ⊆ X ∪X * of microstates, let Z[ S ] := x∈S exp(−bE(x)) denote the partition function of S. The probability of a microstate x in S is then given by Let x, y ∈ X ∪ X * be microstates. The microrate constant from x to y is denoted k(x → y) (or k(y ← x)). On microstates x, y, we define the symmetric neighborhood relation N such that x N y holds if and only if x and y have a distance of exactly one elementary move.
For x N y, let be the activation energy of the transition x → y as defined by the Metropolis rule. Accordingly, the microrate constants for distinct states x, y ∈ X ∪ X * , where x N y, are defined as denotes the reaction-specific pre-exponential factor. For our purposes, we assume that this factor depends only on the type of reaction and the factors for conformation change in monomers and dimers are equal. Thus, we distinguish the factors A a for association, A d for dissociation, and A R for conformation changes of the RNA secondary structure. As we will show later, A a = A d due to detailed balance. Denote the powerset of a set S by P(S). A monomer (or dimer) macrostate is a set of monomer (or dimer) microstates, i. e. an element of P(X) (P(X * )). We denote the (macro)rate constant from macrostate α to β by r(α → β) (or r(β ← α)). Macrorate constants are defined by summing over microrate constants and their respective state probabilities, i. e.
We emphasize that we use the term macrostates freely to denote general sets of microstates. Only when we introduce specific partitions of the microstates into macrostates, it makes sense to distinguish represented macrostates of our specific coarse-grained system from other sets of microstates.

Macrostate kinetics of RNA-ligand interaction
For a microstate x ∈ X + , we denote its corresponding dimer microstate (after binding to L) by Lx, i. e. for x = R i , Lx = LR i . This notation is raised to sets of microstates by defining Lα := {Lx | x ∈ α}. Lemma 1 below asserts that the rate constants between dimer microstates and macrostates can be computed exactly like rate constants of monomer states.
Proof The individual claims follow easily from the def- Finally, The microrate constant from monomer to dimer states is constant, whereas the back rate depends on the binding energy θ. Lemma 2 (Association and dissociation microrate constants) For x ∈ X + , the rate of association is k( All other rates between monomer and dimer microstates are 0.
Proof By Metropolis rule, for x ∈ X + , and, additionally, Analogously, for the inverse microrate, The association (dissociation) microrates due to Lemma 2 induce corresponding macrorates, which additionally depend on the probability of the associable (dissociable) microstates in the source macrostate, respectively (Lemma 3).

A tractable model under ligand excess
For our coarse-grained RNA-ligand interaction process, we partition the monomer microstates X and the dimer microstates X * into sets of macrostates and * , respectively. For the theoretical discussion, we require only that and * are partitions of the respective sets X and X * . Later, in our application, we are going to define macrostates as gradient basins (within their respective component).
We denote the monomer macrostates (in ) by α 1 , . . . , α n and the dimer macrostates (in * ) by β 1 , . . . , β m . Since-by model assumption-the ligand is in large excess, the change of the ligand concentration [ L] is essentially negligible in relation to the change of RNA concentrations. Formally, we assume d/dt[ L] = 0, i. e. at all times [ L] = l 0 , for the initial ligand concentration l 0 . The change of RNA monomer and RNA-ligand dimer concentrations over time is described by a system of ordinary differential equations (ODEs) corresponding to Reactions (1)- (4).
Following the first-order rate laws, Reaction (1) causes In contrast to these simple first-order transitions, the state changes due to Reaction 2 follow second-order rate laws contributing the n · m flows r( ≤ n and 1 ≤ j ≤ m). Without the assumption d/dt[ L] = 0, the rate would depend on two variable concentrations, causing the system to be non-linear. However, by our assumption, the concentration [ L] is constant.
The system of ODEs. The change of concentrations is now described by summing over single contributions: for j = 1, . . . , m. We set γ := (α 1 , . . . , α n , β 1 , . . . , β m ) T and define the (n + m) × (n + m)-matrix R(l 0 ). Then the entire coarsegrained system under ligand excess can be expressed by is constructed from four submatrices: A is an n×n-matrix with entries a ij = r(α i ← α j ) for

Computing RNA-ligand kinetics
The described ODE system can be solved analytically building on existing software. The entire computation pipeline consists of five major steps: 1. enumeration of the RNA's structure space 2. computation of the gradient basins and corresponding rates for (a) the monomer landscape (b) the dimer landscape 3. computation of the rates between the monomer and dimer basins 4. construction of the full rate matrix R(l 0 )

integration of the linear ODE system
Since an exhaustive enumeration of the structure space is infeasible even for short RNAs, Step 1 generates only a selected part of all possible secondary structures of the input RNA. For this work, we consider only structures up to a certain energy above the minimum free energy of the sequence as computed by RNAsubopt [9]. To further reduce the number of structures, only structures without any isolated base pairs are generated.
Often, the restriction to low energy structures excludes important microstates of relatively high energy such as the open RNA chain from the model. Simply adding such a structure to the system is insufficient without also including transitional structures that connect it to the remaining states. The solution for this work was to develop a heuristic algorithm to partially explore an energy landscape around a given structure of interest, flooding neighbored basins if their local minimum has a lower energy than the current one, until one reaches structures within the already explored energy band. This approach seems to be more adequate in the context of a gradient basin coarse graining than a direct path heuristic (e. g. findPath [6]).

In
Step 2a, we compute the gradient basins and rates for the monomer landscape from the list of input structures using barriers [7] (with minh heuristic). For Step 2b, a list of all input structures that contain the binding pocket is generated with RNAsubopt's constraint folding mode. This enables one to enumerate dimer structures up to a higher energy than possible for the entire landscape, ensuring the dimer world is connected. As shown in Lemma 1, the transition rates in the constrained dimer landscape are independent of the ligand's binding energy and thus can be computed exactly like those of the monomer landscape. In Step 3, the transition rates between monomer and dimer macrostates are computed based on Lemma 3 using the mapping of the monomer and dimer structures to their respective basins. For this purpose we modified barriers to output this information.
Step 4 yields the full rate matrix R(l 0 ) for one set of preexponential factors and a certain ligand concentration l 0 by combining the previously computed rate constants. We emphasize that we can easily compute R(l 0 ) for different values of l 0 , A a and A R without repeating the previous, more time-consuming computation steps.
Finally, in Step 5 the system of ODEs is solved directly using the closed form c(t) = exp(tR(l 0 )) · c(0), where c(t) is the vector of macrostate concentrations at time t. The exponential exp(tR(l 0 )) is obtained by diagonalizing R(l 0 ) numerically using the tool treekin [7], which performs this computation efficiently.

Parameters from empirical measurements
The binding energy θ can be derived from an empirically measured dissociation constant K A d of the aptamer; e. g. in the case of theophylline, [18] measure a K A d of 0.32 μM for the theophylline aptamer of RS3. From the macroscopic measurement, we derive the binding energy as where T A = 298K is the temperature of the measurement, R is the gas constant, and Pr[ "pocket" | A, T A ] denotes the equilibrium probability of the binding pocket in the aptamer at temperature T A as calculated in the Turner energy model (cf. [1], which neglect the probability). This relation allows calculating the effective dissociation constant at temperature T R of a theophylline riboswitch like RS3 that contains the aptamer, due to the inverse relation Given the apparent association rate A m a (which we assume to equal the macroscopic pre-exponential factor of dimerization), one can bound the microscopic preexponential factor A a . If we assume that refolding is much slower than dimerization, then A m a is a product of the microrate and the equilibrium probability of the binding pocket. Conversely, if we assume the refolding to be much faster, than A m a directly measures the dimerization microrate. Thus, A m a ≤ A a ≤ A m a · Pr[ "pocket" | aptamer ] −1 . In the case of theophylline, Pr[ "binding pocket" | aptamer ] ≈ 1 and consequently, A a ≈ A m a . Finally, the pre-exponential factor for dissociation A d equals A a . This is a consequence of detailed balance of the dimerization reaction, i. e.

Empirical results
We apply our system to demonstrate the effect of changes in ligand concentrations to the interaction of the designed ON-switch RS3 from [1] with the ligand theophylline. Using our prototypical software, we precompute the macroprocess for RS3 including rate constants in several hours. As noted above, this yields 1133 macrostates in total, 22 of which are dimer states representing RNA molecules bound to the ligand. From the technical point of view, our system is described by 1133 coupled differential equations for the states α 1 , . . . , α 1111 through β 1 , . . . , β 22 . Subsequently, we compute kinetics for each combination of concentrations and pre-exponential factors within seconds (on a Core i5-750 @ 4 × 2.67GHz). Figure 2 summarizes our results; each subfigure plots the probabilities of prominent monomer and dimer states over time. In addition, the minimum energy secondary structures is shown for the most important macrostates. It serves as a suitable representative of the macrostate's ensemble of structures. In this sense it provides a useful, coarse-grained picture of the most likely refolding paths. We set the pre-exponential factors to the estimations A R = 10 6 s −1 and A a = 600 M −1 s −1 as described before. This allows interpreting the time and ligand concentrations in concrete units and relates the speed of folding and dimerization. Figure 2a-c show the results for ligand concentrations 10 4 M, 10 5 M, and 10 6 M. In the RS3 riboswitch, the aptamer domain is fused to a rho-independent terminator at the 3'-end. Thus, during transcription the aptamer is available shortly before the strong terminator stem can be formed and then dominates the entire structure ensemble. Therefore, we study a partially transcribed riboswitch RS3 that is shortened by the 3'-half of the terminator stem and the 3' poly-U stretch. The kinetics of the shortened riboswitch are shown for concentrations of 10 −7 M, 10 −6 M, and 10 −3 M in respective Fig. 2d-f. Note that the time scales for interaction of RS3 with theophylline are in accordance with the computed dissociation constant K RS3 d , which implies that the monomer and dimer concentrations are balanced at about 10 4 M ligand concentration. This extreme concentration suggests that the riboswitch would be non-functional without further, probably co-transcriptional, effects. This is a plausible hypothesis since RS3 was designed to regulate at the transcriptional level.
The estimated rates are derived from a small number of empirical measurements at different conditions, such as ion concentrations (100 mM NaCl in [12], 5 mM MgCl 2 and 0.5 M NaCl in [18], no Mg 2+ and 100 mM NaCl in [17]), temperatures, and actual sequences; hence they are not directly comparable. Nevertheless, they provide reasonable ball park estimates, because we observed that the qualitative behavior of the system is robust against variations of these parameters by several orders of magnitude.

Conclusions
Several refinements of the model remain for future research. Most importantly, the assumption of only a single binding motif is rather stringent. In general, one would like to support multiple binding sites with different Note that since subfigures a-c are based on exactly the same landscapes, they share the same macrostates (e. g., mon1 in a and mon1 in b are equal). As well, this holds among subfigures d-f. However, across the two groups of subfigures, macrostates are not comparable (e. g., mon1 of a = mon1 of d), since the landscapes differ binding energies. Our model can be naturally generalized to such scenarios by introducing multiple "dimer worlds" corresponding to different binding motifs. Furthermore, some ligands, such as Mg 2+ have multiple binding sites. The current implementation of the Arrhenius approximation of the RNA folding kinetics, finally, is quite simplistic, using only a single kinetic prefactor for all structural rearrangements. A refined model would presumably distinguishing constants for nucleation, stack extension, base pair sliding, and loop pinching. Moreover, in particular transcriptional riboswitches, which operate temporally coupled with the progressive transcription of the RNA, will be influenced by this kinetic interplay. It is well known that RNA chains commonly change their optimal structure while growing during transcription. Consequently, the RNAs refold during the process of transcription [8]. The framework presented here can be extended to co-transcriptional interaction analysis. However, this will require additional experimental measurements to calibrate the parameters of the model to properly relate the different "reaction" speeds. In particular, this entails accurate measurements of the thermodynamic parameters for the ligand binding and of the kinetic prefactors of folding and dimerization as well as the speed of transcription.
Abbreviations ODE; Ordinary differential equation