Computational prediction of hinge axes in proteins

Background A protein's function is determined by the wide range of motions exhibited by its 3D structure. However, current experimental techniques are not able to reliably provide the level of detail required for elucidating the exact mechanisms of protein motion essential for effective drug screening and design. Computational tools are instrumental in the study of the underlying structure-function relationship. We focus on a special type of proteins called "hinge proteins" which exhibit a motion that can be interpreted as a rotation of one domain relative to another. Results This work proposes a computational approach that uses the geometric structure of a single conformation to predict the feasible motions of the protein and is founded in recent work from rigidity theory, an area of mathematics that studies flexibility properties of general structures. Given a single conformational state, our analysis predicts a relative axis of motion between two specified domains. We analyze a dataset of 19 structures known to exhibit this hinge-like behavior. For 15, the predicted axis is consistent with a motion to a second, known conformation. We present a detailed case study for three proteins whose dynamics have been well-studied in the literature: calmodulin, the LAO binding protein and the Bence-Jones protein. Conclusions Our results show that incorporating rigidity-theoretic analyses can lead to effective computational methods for understanding hinge motions in macromolecules. This initial investigation is the first step towards a new tool for probing the structure-dynamics relationship in proteins.


Background
Proteins play a significant role in virtually all biological processes. These macromolecules are composed of sequences of amino acids folded into 3D shapes of varying size and complexity. The structures of many proteins have been determined experimentally and are easily accessible [1]. The key to protein function, however, is the wide range of motions exhibited by the molecules, from local vibrational fluctuations to larger global movements significantly altering the conformational state [2]. The motions of biological interest occur on the timescales of picoseconds to nanoseconds, which makes their study challenging. Only a few experimental techniques, such as NMR and single-molecule FRET, are capable of probing dynamics at this level [3][4][5][6]. However, these techniques are not able to reliably provide the level of detail required for elucidating the exact mechanisms of protein motion and the underlying structure-function relationship, essential for effective drug screening and design. Theoretical models and computational tools are instrumental for gaining better mechanistic understanding and predictive power.
In this work, we demonstrate the applicability of rigidity theory, an area of mathematics that studies the flexibility properties of general structures, to the analysis of protein dynamics. In particular, we focus on a set of proteins that exhibit "hinge" behavior, a rotational movement of one domain of the protein relative to another (see Figure 1). Recent work [7] presented an approach to identifying revolute (allowing a single rotational motion) and prismatic (allowing a single translational motion) joints in Computer Aided Design structures; hinge proteins exhibit behavior very similar to that allowed by revolute joints. By analyzing a protein's geometric structure from this perspective, we can predict the relative axis of motion for a given pair of domains, providing a quantitative description of the molecule's range of motion. The success of our approach demonstrates that rigidity theory is a powerful tool which can be used to understand the geometric properties determining the dynamics of macromolecules. To the best of our knowledge, this is the first computational method to predict such an axis based on a single conformational state.

Related work
Computational methods for predicting hinges in proteins generally focus on determining which residues comprise the "hinge" joint, expected to allow flexibility that results in a motion of two larger domains. The most closely related approaches include Stonehinge [8], HingeProt [9], and DynDom [10]. Both Stonehinge and HingeProt rely on analysis of rigidity and flexibility properties of the protein by using elastic network models; Stonehinge additionally incorporates the same underlying rigidity theory as KINARI [11] to find a cluster decomposition. These methods seek to pinpoint the location of the "hinge" joint; while this is done from a single conformation as input, a predicted axis of motion is not part of the output. The approach of DynDom does identify an axis of motion, but requires two conformations as input.

Contributions
We present a computational approach for predicting the type of motion allowed by a protein; as input, we require a single structure with two domains identified for which relative motion should be studied. Our analysis models the protein as a geometric structure studied in rigidity theory and predicts the relative axis of motion. We use KINARI [11] to perform initial rigidity analysis, resulting in a decomposition of the structure into rigid regions, or "clusters." This reduces the complexity of the protein, allowing subsequent computational analysis for predicting the axis of motion. We take steric hindrance, a molecular property not modeled by the theory, into account by incorporating Rosetta energy calculations [12] when sampling conformations near the native state. We evaluate our approach on 19 structures of proteins known to exhibit hinge-like motions and verify that the predicted axis of motion is consistent with a second conformation for 15 of them. To illustrate our results, we present a case study of three of the proteins from our dataset: calmodulin, the LAO binding protein and the Bence-Jones protein.

Methodology
Our approach is based on results from infinitesimal rigidity theory. We begin with a brief overview of the relevant theoretical concepts, then present our analysis pipeline. For a more thorough treatment of classical rigidity theory, see [13]; for further explanation of the theory behind identifying revolute and prismatic joints, see [7].

Preliminaries
Rigidity theory considers geometric constraint structures, such as the classical bar-and-joint framework. A bar-and-joint framework consists of universal joints whose motion is constrained by fixed-length bars and can be expressed by an algebraic system of quadratic distance equations. See Figure 2 for example bar-andjoint frameworks with 4 joints.

Infinitesimal rigidity theory
Infinitesimal rigidity theory studies the first-order behavior of the system of quadratic distance equations, and a rigidity matrix encodes the corresponding linear constraints. The null space of the rigidity matrix determines the infinitesimal motion space (for brevity, we will omit "infinitesimal" for the remainder of this paper), which assigns a velocity vector to each joint such that the bars maintain their lengths infinitesimally; refer to Figure 3. Since the trivial instantaneous rigid body motions (in the plane, translation in the xand y-directions and rotation about the origin) are always contained in the motion space, additional "pinning" rows are often added to the rigidity matrix so that the null space contains only internal motions.
Assuming the framework is not in a singular position, the dimension of the motion space after pinning defines the number of degrees of freedom available to the framework; this is equivalent to the minimum number of bars whose addition would stabilize the framework. For example, the 4-bar mechanism in Figure 2(a) has 1 degree of freedom, as the addition of a single bar creates a rigid structure (Figure 2(b)). In 2D, generic rigidity of a barand-joint structure is characterized by a graph-theoretic property proven by Laman [14]; however, in 3D, no analogous result is known. (Intuitively, the term "generic" indicates that the structure is not in a "special position"the technical definition of genericity is outside the scope of this paper.) For 3D body-bar-hinge frameworks, composed of rigid bodies with fixed-length bars or hinges between them, a similar graph-theoretic characterization is given by Tay [15,16]. A bar imposes a distance constraint between two points on the respective bodies, and a hinge allows only a rotational degree of freedom.
The KINARI software that we use models a protein as a body-bar-hinge structure by assigning bars or hinges to chemical interactions computed to be present in the protein; for example, a covalent bond is modeled as a hinge, allowing only the dihedral angle to vary [11]. The infinitesimal rigidity theory of body-bar-hinge structures is analogous to that of bar-and-joint structures: a rigidity matrix encodes the first-order behavior of the constraints, and its null space gives the motion space.

Instantaneous motions of body-bar-hinge frameworks
Since we use the motion space of a body-bar-hinge framework in our analysis, we now provide a few more details about instantaneous motions in 3D relevant to this work.
By Chasles' Theorem, every rigid body motion in 3D can be described by a screw motion: a rotation and translation along a screw axis. One can imagine a screw motion as being analogous to traveling along an alpha helix, with the screw axis defined by the helix's direction and placement. As a consequence, every instantaneous rigid body motion  Infinitesimal rigidity theory of a bar-and-joint framework in the plane: pinning the red bar, the black velocity vector maintains the length of the green bar infinitesimally. It is tangent to the dotted circle whose radius is defined by the bar's length.
can be described by a twist: an instantaneous rotation and translation along a twist axis.
We represent a twist with a 6-vector (ω 1 , ω 2 , ω 3 , v 1 , v 2 , v 3 ), which can be further decomposed into two vectors of length 3: ω = (ω 1 , ω 2 , ω 3 ) and v = (v 1 , v 2 , v 3 ). The vector ω is the angular velocity, giving the direction of the twist axis and speed of rotation about it (via its magnitude). The remaining translational speed and position of the twist axis in 3D can be decoded from the vector v. When 〈ω, v〉 = 0, where 〈 〉 denotes the dot product, the twist satisfies a relation called the Plücker relation and corresponds to an instantaneous motion that is either a pure rotation or a pure translation. Given a twist (ω, v) for a rigid body, we can compute the instantaneous velocity p' for a point p on the body according to the following formula (see, e.g., [17], page 43): For a body-bar-hinge framework with n bodies, a motion of the whole structure assigns a twist to each body and can be described by a vector of length 6n. The motion space for the framework, which is a vector space of dimension d, may be described by a set of d basis vectors b 1 , . . . , b d ; a motion vector s in the space can be expressed as a linear combination of the basis

Approach
For the analysis, we picked a set of 19 structures of hinge proteins previously analyzed by Stonehinge [8]. Table 1 provides a summary of the dataset. Figure 4 presents an overview of our approach for analyzing each structure.

Rigid cluster decomposition with KINARI
We use the KINARI-Web application [11] to model each protein structure as an initial set of bodies (generally one per atom) with constraints between them (determined by inter-atomic chemical interactions). Depending on the nature of the interactions, KINARI represents them as bars or hinges allowing certain degrees of freedom; these choices are adjustable parameters and can thus be modified. Once set, the software analyzes the rigidity of the structure and produces a cluster decomposition that reduces the complexity of the initial body-bar-hinge model, where bodies in the original model are grouped into larger rigid clusters.
We only adjust the parameter for hydrogen bonds, which are calculated by KINARI based on the geometry of the structure and are assigned an energy value denoting their strength (the smaller the energy, the stronger the bond). By default, all hydrogen bonds, including the weakest ones, are modeled as hinges with a single degree of freedom. However, this representation may overly restrict the motion of the structure by producing very large rigid clusters where the domains of interest are grouped into the same cluster. In this case, we adjust the hydrogen bond energy cutoff parameter to remove the weakest hydrogen bonds from the model (preserving the default modeling of a hydrogen bond as a hinge). The rigidity analysis is then performed again to produce a new cluster decomposition. This process is repeated until an energy value is found that produces a cluster decomposition with the two domains in distinct clusters. We proceed with the analysis using the corresponding body-bar-hinge (BBH) framework output by KINARI: each cluster is itself a rigid body, connected to other clusters with bars and hinges. The clusters are labeled by size, with Cluster 0 containing the largest number of atoms. To maintain consistency, we will refer to rigid bodies as "clusters" for the remainder of this paper.

Motion space calculation
We choose two clusters to represent the domains whose relevant motion we are studying. We pin one and refer to it as the pinned cluster; the other is called the moving cluster. Let n be the number of clusters and d the number of degrees of freedom of the BBH framework output by KINARI. We create the rigidity matrix for the BBH framework, adding the appropriate rows to eliminate motion of the pinned cluster, and compute its null space, or motion space. The motion space is output as a set of d basis vectors bi, where i = 1, . . . , d. This is used to generate 1000 samples, where each sample is created by the following process: 1. Randomly generate d weight coefficients between 0 and 1: c i for i = 1, . . . , d.

Compute the resulting linear combination of basis
For each cluster i, let {p 1 , . . . , p ni } be the set of positions of the n i atoms found in the cluster. For each atom position p j , compute p' j = ω i × p j + v i and move the atom in the direction p' j to compute its new position. 4. Output s (twists for each cluster) and a PDB with the updated positions.

Steric hindrance
Rigidity theory does not consider collisions (see Figure 2 (a)), as it is based on a system of linear equations; collisions would require inequality constraints. However, due to the close packing of atoms in a protein, steric hindrance plays a significant role in the allowable conformations near the native state. We generate the samples by moving each atom an "infinitesimal" distance using the computed motion space, but the direction of many of these motions may be biologically infeasible. Therefore, we use PyRosetta [12,18] to calculate the energy of each generated PDB structure and determine how favorable the motion is. Note that we do not relax the structure, but instead use the computed score to select the most appropriate set of motions for further analysis.
To illustrate the treatment of a typical protein from our dataset, we present the results for calcium-free calmodulin [PDB:1CFD] in Figure 5 with each sample represented as a (very thin) horizontal bar. The total energy score is shown in Figure 5(a), and the Van der Waals repulsion term (denoted fa_rep in the Rosetta scoring function) [12] is shown in Figure 5(b); the 1000 samples are sorted in descending order by the total energy score. We found the fa_rep term to be the only term to vary significantly across the samples. This behavior is consistent with our hypothesis that collisions between the atoms increase the Van der Waals repulsion between them, confirming that steric hindrance must be taken into consideration. We therefore restrict further analysis to the most biologically feasible samples, working with lowest 5% in terms of total energy scores.

Twist analysis and aggregated data
For each twist (ω, v), we compute the angle a (in degrees) between the two 3-vectors ω and v using the dot product 〈ω, v〉; a can take on values ranging from 0°to 180°. Recall that the twist is a pure rotation or translation if 〈ω, v〉 = 0: if v is the zero vector, then the twist is a pure rotation about a line through the origin with direction ω; if ω is the zero vector, then the twist is a pure translation in the direction of v. Therefore, we refer to the computed angle a as the twist purity; a value of a close to 90°corresponds to a dot product close to 0. Values further from 90°c orrespond to more general screw motions (with both rotational and translational components).
We aggregate data over the twists used to generate the lowest energy samples, and compute the mean twist and the mean twist purity for the moving cluster. From the mean twist, we extract the twist axis, referring to it as the average axis of motion. This axis, as well as the corresponding twist purity, give a quantitative description of the motion of the moving cluster relative to the pinned cluster.

Results and discussion
Refer to Tables 1 and 2 for a summary of our results. For each structure in the dataset, we evaluate the validity of our predictions by manually comparing with a second conformation. We first align the two conformations on the pinned cluster using PyMol, then generate a Jmol script to display the average axis of motion. By visually comparing the conformations in the 3D viewer, we determine if the computed axis is consistent with a motion that allows a feasible pathway for the moving cluster's two positions. For 15 of the structures, the computed axis was consistent with a motion between the analyzed structure and a second conformation; these are listed as bold entries in the tables.
The computational complexity of the entire approach is O(n 3 ), dominated by the calculation of the null space (Mathematica). In practice, though, the linear-time sample generation program (written in Java) is the most time consuming step; depending on the number of clusters in the model, processing time ranged from less than 30 minutes up to 5 hours on a MacBook Pro with a 2.6 GHz Intel Core i7 processor and 8GB of memory. However, the focus of this study was not on execution time; future analysis will rely on an optimized codebase and precise timing experiments.

Case studies
We now present detailed studies of our analysis on three proteins: calmodulin, the Lysine-Arginine-Ornithine (LAO) binding protein and the Bence-Jones protein. We chose these proteins as they are well-documented in the literature as undergoing conformational changes through hinge-like motions. Studies using NMR [19], x-ray crystallography [20], MD simulation studies [21] and algorithms that combine information from normal modes, experimental thermal factors, bond constraint networks, energetics, and sequence [22], all agree on the mechanism and measurement for hinge motions in calmodulin. The motions of both the LAO binding and Bence-Jones proteins were studied in [23], and the structure of the LAO crystal analyzed in detail in [24].

Calmodulin
Calmodulin is a multifunctional, calcium-binding, intermediate messenger protein, which is expressed in all eukaryotic cells. Metabolism, apoptosis, muscle contraction and memory are only few of the many crucial processes mediated by the protein [25]. Calmodulin contains 4 calcium binding sites, with a pair in each of the EF-hand globular domains found at the N-and C-termini; these are connected by a helix with a "weak" center around residue 78. This helix plays a key role in conformational changes in calmodulin: (1) tightening when binding calcium, and (2) unraveling for subsequent peptide binding. We analyze both conformational changes and, as we discuss below, the predicted axes of motions agree, computed to be roughly in the same direction as this helix.
The first conformational change is triggered when calcium binds to calcium-free (apo) calmodulin and causes the central helix to straighten, correlated with a relative rotation of the two globular domains. The resulting conformation is Ca 2+ -bound calmodulin in its open state [PDB:1CLL], depicted in Figure 6. We ran our analysis by choosing to pin Cluster 0 (red), computing the relative axis of motion for Cluster 1 (green, containing residues 100-115, and chosen to represent the globular domain at the C-terminus); the blue line depicts the average axis of motion. The calcium-free structure [PDB:1CFD] is shown faded, with the two structures aligned on the central helix residues (65-77) of the pinned cluster (shown in red). Motion of the moving green cluster rotating about the computed axis is consistent with the conformational change; our analysis assigns a twist purity value of about 95, indicating a motion that is almost a pure rotation. A subsequent conformational change occurs when open Ca 2+ -bound calmodulin binds a peptide; the two globular domains "wrap" around the peptide, leading to the closed state [PDB:2BBM] shown in Figure 7. We pin Cluster 4 (red, containing residues 64-75) and compute the relative motion of Cluster 7 (green, containing residues 101-113) to be consistent with the previous analysis. The axis of motion (blue line) is consistent with a twisting motion that would correspond to this "wrapping" motion and unraveling of the central helix. Indeed, our analysis assigns a twist purity value of around 150, indicating a more

LAO binding protein
The Lysine-Arginine-Ornithine (LAO) binding protein is a bacterial peri-plasmic protein that assists the arginine transport system by interacting with membrane-bound receptors. The open [PDB:2LAO] and closed [PDB:1LST] structures, determined by x-ray crystallography, indicate a rotation of two "lobes" relative to each other [24]. We show the results of our analysis on the open structure in Figure 8; Cluster 0 (red) is pinned, and the average axis of motion computed for Cluster 1 (green) is shown in blue. The closed structure is shown faded, aligned on residues 1-88; this computed axis is consistent with the rotation, and our analysis computes a twist purity value of around 92.

Bence-Jones protein
Bence-Jones proteins are the "light chains of immunoglobins" and "subunits of antibodes" produced by neoplastic (early stage tumor) white blood cells. The presence of the Bence-Jones protein in urine is often an indication of multiple myeloma or bone marrow cancer [26]. Different conformations, determined by ray crystallography, indicate a rotation of one subdomain relative to the other [27]. We show our analysis of the closed conformation ([PDB:4BJL], chain A) in Figure 9. We pin Cluster 0 (red) and compute the average axis of motion

Axis of motion analysis
Our results demonstrate the potential that rigiditytheoretic analysis has for predicting protein motion, establishing the initial groundwork for future studies. While the use of Jmol to visually validate the predicted axis is intuitive, it can be subjective and highlights the need for a robust computational method that quantifies the validity of the computed data.       The computed axis of rotation (blue) for the moving cluster (green) relative to the pinned cluster (red) is not consistent with an expected pathway of motion.
These structures require further investigation, as the twist data we aggregate over correspond to motions that maintain the geometric modeling of chemical interactions while minimizing steric hindrance. We hypothesize that an inconsistent axis may be due to: • an infeasible axis produced by the averaging of feasible twists; • a feasible, but "unexpected" pathway between the two conformations, such as the unfolding of an alpha helix (a potential explanation for the closed conformation of adenylate kinase, Figure 11); • or, a feasible motion to a conformation that has not been experimentally determined.

Conclusions
Using rigidity theory, we developed a computational approach for predicting an axis of motion for two domains of a protein, requiring only a single conformation as input. We evaluated our approach on a dataset of 19 protein structures, verifying a consistent axis of motion for 15 of them, and presented a detailed discussion of proteins whose motions are well-documented: calmodulin, the LAO binding protein and the Bence-Jones proteins.
Our results show that rigidity theory can be applied to analyze proteins and accurately predict information that may elucidate conformational changes tied to protein function. To the best of our knowledge, calculation of twists from a single conformation has not been done before; however, it would be interesting to compare with standard optimization techniques (such as simulated annealing) by seeking twists whose resulting conformation minimizes energy.
This initial investigation represents the first step to a more comprehensive study. We wish to find the minimum number of samples to generate, as this is the most timeconsuming step; a sample set of 100 for calmodulin [PDB:1CFD] seemed to exhibit the same behavior as the sample set of 1000. Since the current approach required close interaction with KINARI to produce an appropriate cluster decomposition, we plan to automate this part of the process in the future, enabling an evaluation of the method on a larger dataset. We ultimately expect to develop a web tool that will allow users to analyze a single structure by uploading or choosing a PDB file. Finally, we seek to develop a computational measure for evaluating the validity of our results instead of visually comparing two conformations using Jmol.

Structures and figures
All protein structures were obtained from the RCSB Protein Data Bank (using the indicated PDB IDs) [1]. Figures were generated by adapting the output of KINARI [11] and its associated Jmol scripts [28]. Alignment of structures was performed using PyMOL [29].