Characterization of the cofactor-binding site in the SPOUT-fold methyltransferases by computational docking of S-adenosylmethionine to three crystal structures

Background There are several evolutionarily unrelated and structurally dissimilar superfamilies of S-adenosylmethionine (AdoMet)-dependent methyltransferases (MTases). A new superfamily (SPOUT) has been recently characterized on a sequence level and three structures of its members (1gz0, 1ipa, and 1k3r) have been solved. However, none of these structures include the cofactor or the substrate. Due to the strong evolutionary divergence and the paucity of experimental information, no confident predictions of protein-ligand and protein-substrate interactions could be made, which hampered the study of sequence-structure-function relationships in the SPOUT superfamily. Results We used the computational docking program AutoDock to identify the AdoMet-binding site on the surface of three MTase structures. We analyzed the sequence divergence in two distinct lineages of the SPOUT superfamily in the context of surface features and preferred cofactor binding mode to propose specific function for the conserved residues. Conclusion Our docking analysis has confidently predicted the common AdoMet-binding site in three remotely related proteins structures. In the vicinity of the cofactor-binding site, subfamily-conserved grooves were identified on the protein surface, suggesting location of the target-binding/catalytic site. Functionally important residues were inferred and a general reaction mechanism, involving conformational change of a glycine-rich loop, was proposed.


Background
S-adenosyl-L-methionine (AdoMet or SAM) is the most commonly used donor of methyl groups in cellular alkylation reactions and is second only to ATP in the variety of reactions it serves as a cofactor (review: [1]). The AdoMet methyl group is bound to a charged sulfur atom (Figure 1), which thermodynamically destabilizes the molecule and makes it very reactive. The ∆G°' in the reac-tion of hydrolysis: AdoMet + homocysteine (Hcy) to S-adenosylhomocysteine (AdoHcy) + methionine is -17 kcal/ mol [2]. RNA methylation is particularly diverse, with over 20 different methylated nucleosides identified in virtually all types of RNA molecules (review: [3]). The most abundant is methylation of 2'-hydroxyl groups of ribose. Among all, nearly 100 different posttranscriptional RNA modifications, 2'-O-methylation is second only to pseudouridine formation.
The reactions of methyl transfer are catalyzed by AdoMetdependent methyltransferases (MTases), which act on substrates as varied as nucleic acids, proteins, lipids, and small molecules (comprehensive review: [4]). Most of the known MTases, whose structures were solved by X-ray crystallography or NMR (currently over 30 structures in the Protein Data Bank) belong to a large superfamily related to Rossmann-fold proteins [5,6]. The "classical" Rossmann-fold proteins (RFP), which bind NAD(P), and the Rossmann-fold MTases (RFM), which bind AdoMet, use structurally equivalent and evolutionarily conserved cofactor-binding site and they interact with the adenosine and ribose moieties of their ligands in a very similar manner. In RFM, AdoMet assumes an extended conformation ( Figure 1a). Nearly all RFM and RFP exhibit analogous hydrophobic packing against the adenine rings and RFM and NAD-binding RFP coordinate one or both of the adenosine ribose hydroxyls by Asp/Glu (in NADP one of the ribose hydroxyls is phosphorylated and no such bonding can occur in NADP-binding RFP). The methionine moiety of AdoMet has no counterpart in NAD(P) and is bound in a unique way by RFM: in motif I, another conserved Asp/ Glu residue coordinates the amino group of methionine by a water-mediated contact, while the glycine-rich region forms a loop (G-loop) with some residues in "disallowed" region of the Ramachandran plot, which accommodates the "sidechain" of AdoMet [6,7] There are several groups of AdoMet-dependent MTases, which neither share the RFM/RFP fold nor are structurally or evolutionarily related to one another. Because of their independent evolutionary origin, they should be classified as "superfamilies", regardless of the relatively scarce number of well-characterized representatives. The activation domain of methionine synthase (MetH) [8] and the B12 biosynthetic enzyme CbiF [9] are single examples of structurally characterized representatives of superfamilies with alternative folds that can support AdoMet-dependent methyl transfer reactions (review: [10]). In MetH, AdoMet assumes an extended conformation (Figure 1b), which is distinct from that observed in RFM. The adenine ring is stacked between two tyrosines, but the polar protein-ligand interactions include interactions with conserved Arg residues [8], which is distinct from RFM. In the CbiF structure, AdoHcy (a product of hydrolysis of AdoMet) assumes a folded conformation (Figure 1c), its adenine moiety is not enclosed by hydrophobic amino acids, while the ribose hydroxyls and amino and carboxylate groups of homocysteine interact with main chain NH and CO groups rather than with Asp/Glu [9]. In the recently solved structures of SET-superfamily members histone:lysine N-MTase Set7/9 [11] and Rubisco:lysine N-MTase [12] AdoHcy also assumes a folded conformation (Figure 1d). Its ribose hydroxyls do not make hydrogen bonds with the protein and the amino and carboxyl groups of homocysteine make several contacts with the protein backbone and only one with the side chain (of a conserved Asn residue). The N6 and N7 atoms of adenine are hydrogen-bonded to the main chain amide and carbonyl groups, however the hydrophobic adenine-binding pocket is not present in all members of the SET superfamily. It is obvious that while the AdoMet-binding site is partially conserved within the individual MTase superfamilies, there is little resemblance between both the cofactor conformation and protein-ligand interactions in unrelated enzymes.
Recently, another superfamily of AdoMet-dependent MTases has been defined based on bioinformatics analyses and dubbed SPOUT for the two major lineages: SpoU and TrmD [13]. All experimentally characterized members of this superfamily act on RNA: the SpoU relatives are 2'-O-ribose MTases [14] and orthologs of TrmD are tRNA:m 1 G MTases [15]. Three structures of SPOUT-superfamily members have been reported: 23S rRNA:G2251 2'-O-MTase RlmB [16](1gz0 in PDB), hypothetical RNA 2'-O-MTase with unknown specificity [17](1ipa in PDB) and hypothetical RNA-binding protein MT1 [18]. All these proteins comprise two domains, which exhibit different spatial arrangements. The smaller domain is not conserved and exhibits structural similarity to various unrelated RNA-binding proteins. The large, conserved domain (the actual SPOUT domain) exhibits a novel and unusual fold with a deep knot.
Anatharaman et al [13] studied the sequence conservation in the SPOUT superfamily and hypothesized that the AdoMet-binding site corresponds to a glycine-rich loop (G-loop), localized in the C-terminal part of the SPOUT domain. Subsequent determination of the aforementioned crystal structures revealed that the moderately conserved region localized C-terminally to the G-loop corresponds to a topological knot [16][17][18]. Unfortunately, none of the published structures of SPOUT MTases include the cofactor. In the MAD electron density map of RlmB, a segment of unassigned density was observed in the knotted region, which was interpreted as a noncovalently bound small molecule [16]. Nevertheless, the authors were not able to unequivocally identify this molecule in the native structure, nor were they convinced to model it as AdoMet or AdoHcy using the data obtained from cocrystallization experiments. Hence, the precise localization of the cofactor-binding site and the mode of protein-ligand interactions in the SPOUT superfamily remain obscure.

Identification of the AdoMet-binding site
To aid the experimental analysis of the SPOUT MTases in the absence of appropriate co-crystal structures, we decided to investigate the binding of AdoMet to the three available crystal structures (1ipa, 1gz0, and 1k3r) using the computational docking program AutoDock 3.05 [19]. In this approach, the surface of the protein is explored to identify fields that are energetically most favorable for interaction with the flexible ligand. According to the crystallographic analyses, the biologically relevant form is a homodimer, hence we used MTase dimers as targets in all docking simulations. Since the asymmetric unit of 1gz0 contained 8 monomers, of which two were partially disordered, we selected a representative dimer, comprising chains A and F.
Initially, we disregarded all previous predictions of the AdoMet-binding site and carried out the search for all potential AdoMet-binding sites for a whole-molecule grid. The preliminary docking analysis was carried out using the Genetic Algorithm (GA) and Lamarckian Genetic Algorithm (LGA) methods (see Methods for details). Each docking experiment consisted of a series of 100 simulations, each producing a docking solution. The solutions were first sorted in terms of the similarity of the structures. Each set of solutions having a root-mean-square (rms) deviation of all atoms of less than 0.5 Å in pairwise comparisons was designated as a cluster of solutions. Only the lowest-energy solution of each cluster was retained. The experiment was repeated for all three structures (1ipa, 1gz0, and 1k3r). Figure 2 shows the results of the preliminary docking analysis using the LGA algorithm, obtained for the 1gz0 structure with the grid encompassing the whole dimer. The distributions of solutions obtained with the GA and LGA methods for all three structures were qualitatively similar with respect to distribution of ligand conformations (data not shown). Nevertheless, the energy values of GA solutions were significantly higher than those of LGA (best solutions for 1gz0, 1ipa, and 1k3r [kcal/mol]: -6.64, -7.58, and -9.61 for GA vs -9.09, -9.73, and -11.38 for LGA), suggesting that the pseudo-Solis-Wets local search procedure was very efficient in finding local energetic minima. These results revealed that SPOUT-superfamily members exhibit strong preference to bind AdoMet in the same location, in a deep, negatively charged groove on the protein surface formed by three loops in the knotted Cterminal region. The lowest-energy preliminary docking solutions were always found in this region, regardless of the structure and the algorithm used, although the conformational details and orientation of the ligand with respect to the groove did vary; the most common lowest-energy conformation is indicated in Figure 2. Other docking solutions mapped to the interface between the catalytic domains of the MTase dimer ( Figure 2), but they all exhibited significantly less favorable energies of proteinligand interactions (-7 kcal/mol and higher), suggesting that they are less likely to correspond to physiologically relevant ligand-binding sites. These results confirmed the earlier sequence-based, low-resolution predictions of the cofactor-binding site [13,16] and allowed us to carry out a refined docking analysis focused on a defined structure fragment. To refine the prediction of the AdoMet binding mode in 1ipa, 1k3r and 1gz0, we calculated new affinity grids for each of these structures. Positions of the best-scoring ligands obtained in the initial calculations were chosen as the center of grids of dimensions 20 Å × 20 Å × 20 Å, with grid points separated by 0.375 Å and the docking simulations were repeated as described in Methods. Since we aimed at identification of the true energetic minimum, only the LGA method was used. Briefly, in the refined protocol, the solutions obtained from subsequent runs were first sorted in terms of the similarity of the structures to identify clusters and the analysis was carried out until the total number of clusters for a given docking experiment reached 50. Figure 3 shows the populations of 20 lowest-energy docking solutions for all three structures, as obtained in the course of the LGA search. The values of the estimated energy of protein-ligand interactions for the best docking solutions are summarized in Table 2; the ligand-binding pockets of 1gz0, 1ipa, and 1k3r are shown in Figure 4. According to the results of our docking simulations, AdoMet binds to all SPOUT MTases in the same folded conformation, similar to that observed for AdoHcy in unrelated CbiF and SET MTases, and different from the extended conformation typical for the RFM superfamily ( Figure 1). Nevertheless, in all "top 20" solutions for each SPOUT MTase, the ribose moiety of AdoMet is in the C2'-endo conformation, similarly to the RFM structures, while in CbiF it is C3'-endo (1cbf) and in the SET superfamily C1'exo (Set79; 1mt6) or C2'-exo (Rubisco LSMT; 1mlv). In nearly all low-energy docking solutions, the cofactor adopts an anti conformation about the glycosidic bond (typical for all MTase structures solved to date ( Figure 1; review: [6]), which maximizes the burial of the adenine moiety ( Figure 3).

Conformation and interactions of AdoMet in the cofactorbinding pocket of SPOUT MTases
Despite similar conformations, the protein-ligand interactions in the CbiF, SET and SPOUT superfamilies are different. The key interactions between the three SPOUT MTases and AdoMet, present in the lowest-energy docking solution and in the majority of sub-optimal solutions, are listed in Table 3a. It is striking that despite the structural variability and extensive sequence divergence, many protein-ligand interactions were found to be conserved in all three cases. The adenine ring of AdoMet lies between the backbone of invariant Gly (G201 in 1gz0) and a variable (usually small) residue (A175 in 1gz0). The docking models explain why G201 is invariant -the adenine ring stacks against the C-α atom of Gly and any side-chain at this position would block the narrow groove. On the other hand, the other face of the adenine ring stacks against the sidechain of A175 (or its counterpart in the two other struc-tures), which allows for different substitutions, depending on the space available to accommodate the side-chain in a way that it does not clash with the cofactor. The N6 and N7 groups of adenine are exposed to the solvent and not make any conserved interactions with the protein. N1 and N3 are usually surrounded by hydrophobic sidechains (N1 is solvent-exposed only in 1 k3r) and do not form recurring hydrogen bonds. It seems that adenine moiety of the cofactor is recognized by SPOUT MTases mainly based on the shape complementarity, rather than any specific interactions.
Unlike the adenine ring, the ribose and methionine moiety of AdoMet seem to make conserved interactions with the protein, although nearly all of them are to the backbone carbonyl and amide groups rather than to sidechains. This agrees with the observation that there are no truly invariant residues in the cofactor-binding site of all SPOUT MTases [ [13]; our unpublished data]. In all topscoring docked solutions, the ribose 2'-hydroxyl invariably hydrogen-bonds to the backbone amide group of M202 in 1gz0 or a homologous (variable) residue in the other structures. The 3'-hydroxyl hydrogen-bonds to the backbone carbonyl of a semi-conserved Thr residue in 1ipa and 1k3r (but not in 1gz0) and to the backbone carbonyl of a nearly invariant G196 in 1gz0 (G218 in 1k3r; in 1ipa, the H-bond with G215 is present in many solutions, but not in the top-scoring one). In all three structures, the α-carboxyl group of methionine hydrogenbonds to the backbone amide of an aliphatic residue, whose side-chain forms a floor of the cofactor-binding pocket (I216 in 1gz0). In 1gz0 and 1ipa, the backbone carbonyl of the same residue binds also the α-amino group of methionine. Instead, the α-amino group of the methionyl moiety in the 1k3r docked complex is hydrogen-bonded to the side-chains of N234 and Q239.

Spatial relationship of the AdoMet binding site and the predicted catalytic site
In the course of the analysis of sequence conservation in the entire SPOUT superfamily, including representatives with unknown structure, we noticed that the C-terminal motifs implicated in cofactor-binding, are generically conserved among all superfamily members, while the N-terminal motifs show subfamily-specific patterns of conservation (JMB and JD, unpublished data). The same is true for close homologs of the three structures analyzed in this work, suggesting possible correlation with binding of different substrates or/and different mechanisms of catalysis in 1 gz0 and 1ipa on the one hand, and 1k3r on the other. Similar differential conservation of cofactor-binding and substrate-binding/catalytic regions is known to be correlated with functional differences in the RFM superfamily of MTases [6]. The availability of the crystal structures along with the results of our docking simulations allowed to analyze the evolutionary conservation and variability in the structural context and thereby infer sequence-structure-function relationships in the SPOUT superfamily.
In all three structures of SPOUT superfamily members, two grooves (one deeper, one more shallow) can be iden-tified on the protein surface, separated by a barrier ( Figure  3). The deep groove corresponds to the predicted cofactorbinding site, while the shallow groove is lined up with subfamily-specific residues, which are not conserved on the superfamily level. The barrier is formed by a G-loop (196-GAEGEG-201 in 1gz0, 215-GPEHEG-220 in 1ipa, 218-GGPYKG-223 in 1k3r). In all low-energy docking solutions, the methyl group of the cofactor is invariably 20 best docking solutions obtained during the refined (local) docking procedure, mapped onto the surface of the SPOUT superfamily members (monomers) colored by sequence conservation. The protein surface is colored according to the relative sequence conservation among its orthologs (blue -strongly conserved, to cyan -moderately conserved, to red -variable). The orthologous sets for each of the three structures are non-overlapping. a) 1gz0 (53 orthologous sequences), b) 1ipa (54 sequences), c) 1k3r (30 sequences). d) the SPOUT domain of 1gz0, colored by sequence conservation computed for all the three aligned families. Notably, when all three families are considered, the invariant and nearly-invariant residues disappear from the predicted target-binding groove, while a few of them still remain in the cofactor-binding groove (including Gly predicted to interact directly with AdoMet). The invariant side chain at the "bottom" of each monomer comes from the Arg residue, which interacts with the "barrier" structure of the second monomer in the dimer (shown in detail in Figure 4).  solutions obtained for a) 1gz0, b) 1ipa, c) 1k3r. AdoMet and selected important residues are shown in the wireframe representation and labeled. The label for the invariant Arg side chain, provided by the second monomer, is boxed. The rest of the protein is shown in a schematic representation (brown helices, green strands and purple loops). Selected sidechains are colored according to their physicochemical properties (Arg and Lys -blue; Glu -red; Thr and Sergreen; aliphatic (Pro, Val, Leu) -gray; Gly -cyan). For the residues that bind the cofactor and for the cofactor itself, the following color scheme is used: C -white, O -red, N -blue, S -yellow). Predicted hydrogen bonds are shown as green broken lines.
directed towards the barrier, and the shallow groove behind it. In the light of our docking simulation, this arrangement of protein surface features and residue conservation patterns strongly suggest that the shallow groove corresponds to the target-binding/catalytic site, which evolved to interact with different substrates. We hypothesize that the unbound structures (1gz0, 1ipa, and 1k3r) represent a "closed", catalytically inactive conformation, and that a structural change is required to open the barrier between the two grooves and thereby allow the substrate to carry out an enzyme-assisted nucleophilic attack on the methyl group of AdoMet. It seems that a moderate conformational change of the G-loop would be sufficient to bring the substrate close to the methyl group donor. It is tempting to speculate that such change could be induced by substrate binding. Nevertheless, modeling of such conformational transitions is at the brink of the available methodology and at the same time beyond the scope of the present study. Moreover, the target specificity is known only for the RlmB MTase/1gz0 (2'-hydroxyl of G2251 in 23S rRNA) [20], while it remains to be determined for the other two proteins (1ipa and 1k3r), therefore we did not attempt to dock the substrate to the predicted catalytic pocket.
The list of conserved residues mapping to the vicinity of the predicted catalytic pocket (Figure 4) is shown in Table  3b. On the one hand, it is evident that 1gz0 and 1ipa have very similar active sites, which confirms earlier prediction that 1ipa is an RNA:ribose 2'-O-MTase [17]. On the other hand, the predicted active site of 1k3r is different, which suggests this protein and its orthologs may be involved in a different type of methylation.
It is noteworthy that without the knowledge of the ligandbinding site, confident predictions of the catalytic sites in the SPOUT superfamily could not be made. For the conserved residues in the 1k3r structure no functional predictions have been made even though its relationship to SPOUT MTases has been unambiguously identified [18].    N129  K27  E198  E217  P220  S224  S243  T243  N226  N245  R245  T246  S228  S247  E247  R114 (2nd subunit) R135 (2nd subunit) R33 (2nd subunit) In the 1ipa structure, three conserved residues (N129, E217, and N245) were hypothesized to "form the AdoMet-binding site and the catalytic site". Our analysis suggests that they all involved in target binding and catalysis and not in AdoMet binding. Michel et al [16] analyzed the conserved residues in the light of their structure (1gz0) and suggested that, by analogy to the classis AdoMetbinding site, E198 could bind the ribose moiety, while N108, R114, S224, and N226 could participate in recognition of the adenine ring or the α-amino and α-carboxyl group of the methionine. According to our model, all these residues may be important for substrate-binding and catalysis rather than AdoMet-binding. The presented docking solutions and predicted functionally important residues will facilitate experimental studies and aid the elucidation of the catalytic mechanism of the SPOUT-superfamily MTases

Conclusions
To explore potential cofactor-binding sites on the surface of the SPOUT-superfamily MTases, and to establish possible spatial relationships between the methyl group donor and the predicted catalytic sites of E. coli RlmB (1gz0), hypothetical MTase from T. thermophilus (1ipa), and M. thermoautotrophicus MT1 protein (1k3r), we simulated docking of AdoMet using AutoDock. The results show unequivocally that the preferred cofactor-binding site is in the groove in the knotted region and suggest a preferable ligand conformation, with its methyl group directed towards the putative catalytic site. Hence the computational docking procedure has not only identified the generic cofactor-binding cleft, but also generated proteinligand complexes in a biologically relevant conformation. It is intriguing, but probably purely coincidental, that the cofactor-binding site of unrelated MTases from the SET superfamily is also localized on a knot.
We consider it significant that the three independent docking simulations identified a homologous region of the protein structure as the most likely cofactor-binding site, even in the absence of significant sequence similarity between the structures of 1ipa and 1gz0 and the structure of 1k3r. Our results reinforce the sequence-based prediction of Anantharaman et al. [13] that the cofactor-binding site is localized in the C-terminal part of the SPOUT domain. Our analysis has identified conserved protein-ligand contacts, which may be typical for the entire SPOUT superfamily and suggested which of the subfamily-specific residues may be important for binding of specific substrates and catalysis of particular variants of the methyltransfer reaction. The results of our study will be useful in designing future experiments to analyze the proposed interactions between the SPOUT MTases, AdoMet and the substrates.

Preparation of the ligand and target molecules for docking
Structures of Protein Data Bank (PDB) entries 1gz0, 1ipa and 1k3r were used for docking simulations of AdoMet. The protein targets and the ligand were prepared for docking using Autodock 3.05 [19,21] and AutoDockTools 1.1alpha [22]. All "heteroatoms", including water molecules and ions were removed from the original files. The positions of polar hydrogens and charges were assigned using the Kollman algorithm [23]. Atomic solvation parameters and fragmental volumes were determined using the Addsol program. AutoTors was used to define AdoMet torsion angles. Flexible torsions were enabled on all bonds except the adenine ring. The ribose C1 atom was chosen as the ligand root. The carbon atoms of the adenine ring were designated as aromatic, the other carbons were designated as aliphatic. This affects both the torsions definition stage and subsequent force fields definition stage. After picking the root, the AutoTors procedure builds up the tree describing degrees of freedom for all atoms in the ligand molecule. The tree is then traversed to expand all branches defining existing torsions. Leaving out the rigid adenine bonds we obtained six flexible torsions total. Polar hydrogen charges of the Gasteiger-type [24] were assigned and the non-polar hydrogens were merged with the carbons. The protein sidechains were not allowed to change their conformation in any docking simulations described here.

Docking simulations
In the preliminary, global docking experiment, mass-centered grid maps were generated with 0.375 Å spacing by the AutoGrid program for the whole protein target. Lennard-Jones parameters 12-10 and 12-6 (supplied with the program package) were used for modeling H-bonds and Var der Waals interactions, respectively. The distancedependent dielectric permittivity of Mehler and Solmajer [25] was used for the calculations of the electrostatic grid maps. The Genetic algorithm (GA) and Lamarckian genetic algorithm with the pseudo-Solis and Wets modification (LGA/pSW) methods were used with default parameters (Table 1). Random starting positions on the entire protein surface, random orientations and torsions were used for all structures. For all simulations the populations in the genetic algorithm was 50. Each simulation comprised 2,5 × 10 6 energy evaluations. Each docking experiment consisted of a series of 100 simulations.
In the refined, local experiment, the 20 Å × 20 Å × 20 Å grid was generated, with the center corresponding to the lowest-energy solution from the global search. Only the LGA/pSW method was used. The analysis was carried out until the total number of clusters for a given docking experiment reached 50. Other parameters of the docking simulations were identical to that of the global search (see above and Table 1).

Clustering of the docking solutions
Docked conformations of the ligand were sorted in the order of increasing binding energy. The lowest-energy solution was used as a reference, and all other conformations were binned using a rms (root-mean-square) deviation threshold of 0.5 Å. For the whole-molecule docking procedure, clustering was carried out without taking any ligand conformation as a reference.

Sequence and structure analysis
Orthologs of the SPOUT MTases 1gz0, 1ipa, and 1k3r were identified and aligned using PSI-BLAST [26]. Multiple sequence alignments were edited manually to maximize the sequence conservation and to minimize the number of insertions and deletions in helices and strands in the reference structures. For closely related sequences (>90% identity) only single representatives were retained. The final alignments included only the orthologs of each of the structurally characterized protein (1gz0-53, 1ipa-54, and 1k3r-30) and were mutually exclusive (i.e. no sequence appeared in more than one alignment). The alignment of all sequences was guided by structural superposition of the SPOUT domain of 1gz0, 1ipa, and 1k3r. For each residue in the reference structure, the % sequence identity was calculated from the corresponding column in the alignment, and translated into temperature factors (using a reverse linear relationship, with 100% identity corresponding to the temperature factor of 00.00 and 10 % or less to 99.99). Protein surface analysis, including mapping the distribution of residue conservation and electrostatic potential, was carried out using SwissP-DB-Viewer [27].