Skip to main content

RPocket: an intuitive database of RNA pocket topology information with RNA-ligand data resources

Abstract

Background

RNA regulates a variety of biological functions by interacting with other molecules. The ligand often binds in the RNA pocket to trigger structural changes or functions. Thus, it is essential to explore and visualize the RNA pocket to elucidate the structural and recognition mechanism for the RNA-ligand complex formation.

Results

In this work, we developed one user-friendly bioinformatics tool, RPocket. This database provides geometrical size, centroid, shape, secondary structure element for RNA pocket, RNA-ligand interaction information, and functional sites. We extracted 240 RNA pockets from 94 non-redundant RNA-ligand complex structures. We developed RPDescriptor to calculate the pocket geometrical property quantitatively. The geometrical information was then subjected to RNA-ligand binding analysis by incorporating the sequence, secondary structure, and geometrical combinations. This new approach takes advantage of both the atom-level precision of the structure and the nucleotide-level tertiary interactions. The results show that the higher-level topological pattern indeed improves the tertiary structure prediction. We also proposed a potential mechanism for RNA-ligand complex formation. The electrostatic interactions are responsible for long-range recognition, while the Van der Waals and hydrophobic contacts for short-range binding and optimization. These interaction pairs can be considered as distance constraints to guide complex structural modeling and drug design.

Conclusion

RPocket database would facilitate RNA-ligand engineering to regulate the complex formation for biological or medical applications. RPocket is available at http://zhaoserver.com.cn/RPocket/RPocket.html.

Background

RNA regulates a variety of biological functions by interacting with other molecules. It is currently recognized that more than 70% of the human genome is transcribed into non-coding RNAs [1]. In contrast, 1.5% of the human genome encodes proteins, and only 0.05% of the human genome has been identified as protein-targeted for drug development. A human probably produces more than 15,000 long non-coding RNAs [1]. Thus, even a tiny part of these non-coding RNAs may eventually prove to be disease-related drug targets. For example, the combination of HIV tat RNA with acetyl promazine can inhibit Tat-TAR interaction [2]. Besides, riboflavin exhibits antibacterial properties by targeting flavin RNA riboswitch [3]. Similarly, a very recent study shows the nucleotide analog inhibitors in one essential molecule for the pathogenesis of COVID-19 by binding with virus-dependent RNA polymerase [4]. Thus, it is believed that RNA is more widely involved in the various regulatory processes.

At present, some experimental methods can determine the RNA-ligand structure. Unfortunately, the flexible RNA molecules are challenging to be well-crystallized and determined by X-ray crystallography. Besides, electron microscopy is expensive and time-consuming. The available RNA-ligand experimental structures are few (572 structures on February 19, 2020) due to these technical limitations. Some computational methods can predict the RNA and RNA-ligand structures by homologous fragment modeling [5,6,7,8,9,10,11,12], molecular dynamics simulation [13,14,15,16], or docking [17,18,19]. However, it is still challenging to predict the high accurate RNA-ligand structures due to the limited understanding of the structural principles for RNA-ligand binding.

There are several existing RNA-related databases and tools to provide sequence, structure, or interaction information (Additional file 1: Table 1). For example, (1) the structure databases (the PDB, NAD, PDB-Ligand, and R-bind) provide tertiary structure information of RNA-ligand complexes, structure and physicochemical properties of ligand [20,21,22,23]; (2) the RNA-ligand experimental databases (the NALDB, SMMRNA, and KDBI) provide the chemical reaction information and kinetic data of the formation of RNA-ligand [24,25,26]; (3) RNA docking datasets and tools (the RRDB, HNADOCK, DrugScoreRNA, and LigandRNA) provide the docking algorithms, scoring functions, and docking benchmarks [17, 27,28,29]; (4) RNA pocket detection tools (3 V, Caver, and PocketFinder) identify RNA pockets and size of pocket [30,31,32]. However, the available information in these databases cannot be directly used in the RNA ligand study. The well-analyzed RNA pocket and binding sites are still minimal. Thus, a comprehensive and updated RNA pocket database is urgently needed, especially targeting the pockets in RNA for drug development.

Here, we performed a systematic analysis of 240 pockets from 94 non-redundant RNA-ligand complex structures. We first analyzed the characteristic patterns of secondary structure for all the identified RNA pockets. Then, we introduced RPDescriptor to calculate the pocket topology property quantitatively. Moreover, we performed a statistical analysis of the RNA-ligand interaction features. Our results suggest that some charged interaction pairs might provide the long-range steering force to bring the RNA and ligand together. Then, the short-range interactions optimize and stabilize the binding. The different scales of structural topology characteristics may improve the RNA structure prediction and RNA-related drug design. We also developed one user-friendly bioinformatics tool, RPocket, to facilitate ligand design or RNA engineering to regulate the complex formation for biological or medical applications.

Construction and content

For biologists to better access the information of RNA pocket, we established a user-friendly online database: RPocket. RPocket contains 240 pocket information of 94 RNA-ligand complex structures (non-redundant). A workflow of constructing the RPocket database is shown in Fig. 1.

Fig. 1
figure1

The workflow of the RPocket database construction. A 269 RNA-ligand structures were used for analysis. B To acquire the non-redundant dataset, we performed the sequence alignment for the 269 structures using the CD-Hit server. We have used two identity cutoffs (0.80 and 0.95) to get relatively loose and more strict non-redundant datasets: RBL75 and RBL94 (75 and 94 clusters for 0.80 and 0.95 sequence identity cutoffs). C Interaction information and ligands binding sites were identified using Ligplot + and a distance-based calculation. D The functional motifs were identified by the RegRNA program. The RegRNA identifies the RNA motifs by integrating regulatory RNA motifs from the published literature and RNA motif databases. E The pockets were detected by the 3 V server using the rolling probe method. F RNA pocket shape distribution and classification were generated using RPDescriptor. G The ligands functional groups, hydrogen bond and non-bond interactions, the secondary structure patterns, and pocket topology information were calculated and provided in the RPocket server

(A) The PDB structural files and sequence FASTA files of 1448 RNAs were extracted using the REST API advanced search interface in the Protein Data Bank before February 19, 2020 [33]. Here, we only considered the single-strand RNA molecules with ligands (remaining 298 entries). Then, we removed the short (less than ten nucleotides) and highly complex (more than 500 nucleotides) RNAs. If the RNA has several NMR structures, the first structural model is selected. There are remaining 269 RNA-ligand structures after this screening step.

(B) To acquire the non-redundant dataset, we performed the sequence alignment for the 269 structures using the CD-Hit server [34]. We have used two identity cutoffs (0.80 and 0.95) to get relatively loose and more strict non-redundant datasets: RBL75 and RBL94 (75 and 94 clusters for 0.80 and 0.95 sequence identity cutoffs) [8, 34, 35]. We performed the RMSD calculations to reflect the divergence between the representative and other structures in each cluster [36]. All the representative structures in two non-redundant datasets and the RMSD between representative and class members can be downloaded on the website. For example, one cluster in the non-redundant dataset has 24 RNA-ligand complexes. The representative structure is guanine riboswitch (PDB code: 3FO4). We calculated the RMSDs between 3FO4 and all other RNA-ligand complexes. The RMSDs of 0.30 ± 0.19 Å show that the RNAs in the cluster are highly similar (Additional file 1: Fig. 1). Here, we analyzed the 94 representatives in the RBL94 to obtain the RNA-ligand structural principles.

(C) We identified the RNA-ligand binding sites using a distance-based calculation. A nucleotide is considered one binding site if the distance is less than 4 Å between the RNA and ligand. The detail interactions were generated using Ligplot + with the HBPLUS program [37, 38]. The Ligplot + can provide the hydrogen bond and non-bond contacts between RNA and ligands at the atomic level.

(D) The functional motifs were identified by the RegRNA program [39]. The RegRNA identifies the RNA motifs by integrating regulatory RNA motifs from the published literature and RNA motif databases. The functional motifs can be divided into 12 categories: motifs in transcriptional, Pre-mRNA, translational, UTR motifs, mRNA degradation elements, RNA cis-regulatory elements, RNA editing sites, riboswitches, RNA structural patterns, functional RNA sequences, RNA-RNA interaction regions, and user-defined motifs. In addition, the secondary structure units of stacking bases, interior loop, bulge loop, hairpin loop, multibranch loop, and pseudoknot were identified and generated using RNA FRABASE 2.0 [40,41,42]. All the identified functional motifs can be downloaded on the RPocket website.

(E) The pockets were detected by the 3 V server using the rolling probe method [30, 43,44,45]. The volume and surface area were calculated by rolling two virtual probes (a shell probe and a solvent probe) around the van der Waals surface [30, 43,44,45,46]. We used the default radius value (10 Å for shell probe radius and 3 Å for solvent probe radius) to extract the RNA pockets.

(F) We developed RPDescriptor (RNA Pocket Descriptor) to calculate the pocket geometric characteristics for RNA molecules. RPDescriptor can generate two descriptors based on Normalized Principal Moments of Inertia Ratios (NPRs) [47]. The shape of the RNA pocket can be visually displayed on an isosceles triangle by projecting the two descriptors (\(rpd_{1}\) and \(rpd_{2}\)) onto the two-dimensional plane. We defined a shape similarity score \(s_{i}\) that allows pockets to be classified quantitatively.

(G) The ligands functional groups, hydrogen bond and non-bond interactions, the secondary structure patterns, and pocket topology information were calculated and provided in the RPocket server.

Utility and discussion

One user-friendly bioinformatics tool for RNA pocket information has been missing. This limitation motivated us to develop the RPocket, a user-friendly web server, to analyze the RNA pockets using a simple graphical user interface. Some advanced features implemented in RPocket are (1) contains 240 pocket information extracted from 94 non-redundant RNA-ligand structures; (2) displays the sequence, secondary structure, and RNA-ligand interaction characteristic patterns; (3) constructs a database with the pocket geometric topology information such as volume, surface area, and shape similarity scores; (4) provides a visualization tool for users to scale and rotate the structure; (5) provides one executable script for users to perform pocket topology analysis. (6) offers the related tools to predict or simulate RNA structures. RPocket web server is a reliable and user-friendly tool and facilitates the RNA pocket study without installing programs locally.

RPocket consists of eight modules: Home, Search, Visualization, Download, Links, Tutorial, Statistics, and Contacts. The Home module provides a brief introduction to the RPocket database and navigation to other modules. Users can identify and extract the pocket information using the Search module (Fig. 2). The Search module consists of four parts: a pulldown search box, a summary table of RNA clusters, a table of RNA descriptions, and a sequence preview module. The pulldown search box can identify the RNAs by defining the sequence identity cutoff, RNA class, and PDB ID. The RNA cluster information table shows the RMSD between representative RNA and other members. A comprehensive information table consists of three sections: experiment, RNA-ligand interaction, and pocket geometrical information. Users can click the highlighted links to check the complexes' detailed interaction graph and download the pockets' structure file. The Sequence Preview module shows the ligand-binding sites, sequence motifs with highlighted labels. The combination of topology information of pockets and functional motifs would guide RNA-related drug screening and docking. In the Visualization module, users can upload and investigate the pocket structure. In the Download module, users can download the information of pockets in xlsx format and the structure of pockets in MRC format. The Links module provides the RNA pocket shape classification scripts and other useful links to help RNA-related drug development and vaccine design. The Tutorial module offers the introduction to use the RPocket and the abbreviation for the RPocket database. Some results of data analysis are shown in the Statistics module. The Contacts module provides emails for users to comment or ask questions. More detail about RPocket database utility is described in Additional files (Additional file 1: Section User interface and utility and Figs. 8–11).

Fig. 2
figure2

The search module of the RPocket server. The user interface displays the RNA cluster, RNA-ligand interaction, pocket topology, and sequence motif characteristic patterns

Implementation

Pocket identification and topology calculation

All the pockets were identified using the rolling probe method by the 3 V program [30, 43,44,45]. The coordinates of the molecule are superimposed on the cubic grids. The pocket is detected by calculating the translational degrees of freedom of the probe ball. The center of the probe is recorded if the probe contacts with more than two atoms on the molecule [43]. These discrete positions form the rolling boundary of the pocket [44]. The volume and surface area values were calculated by using the discrete volume method. Here we used the tested parameters for RNA pocket detection, which are 10 Å for shell probe radius and 3 Å for solvent probe radius [30]. The effective radius was calculated using the following formula

$${\text{r}}_{{{\text{eff}}}} = \frac{{3V_{p} }}{{A_{p} }}$$
(1)

where \(V_{p}\) and \(A_{p}\) represent the volume and surface area. The sphericity (Ψ) was used to measure the similarity between the pocket and sphere using the following formula

$$\Psi = \frac{{A_{s} }}{{A_{p} }} = \frac{{\left( {36\pi V_{p}^{2} } \right)^{1/3} }}{{A_{p} }} = \frac{{\pi^{1/3} \left( {6V_{p} } \right)^{2/3} }}{{A_{p} }}$$
(2)

\(A_{s}\) represents the surface area of a sphere whose volume is the same as the pocket volume,\(V_{p}\). The \(r_{c}\) is the center of mass to pinpoint the location of the pocket [31, 48].

Pocket geometric characteristics analysis and classification

The geometric characteristics of the RNA pockets were identified by Normalized Principal Moments of Inertia Ratios (NPRs). NPRs display a three-dimensional molecule's shape by projecting two descriptors calculated using the principal moment of inertia (PMI) onto a two-dimensional plane [47]. Previous studies have developed some methods to calculate the PMI for proteins [49]. However, these methods cannot be directly applied for RNA pocket calculation. Thus, we developed RPDescriptor (RNA Pocket Descriptor) to calculate the pocket geometric characteristics for RNA molecules. Figure 3 is the workflow of RPDescriptor taking a particular pocket (1EVV_1) as an example.

Fig. 3
figure3

The workflow of the RPDescriptor. The process of the RPDescriptor contains five steps: A create the pocket coordinate file from the NetCDF format; B principal moment of inertia calculation; C generate pocket NPR space graph by projecting the two descriptors onto the two-dimensional plane and classify pocket by connecting the three vertexes of the triangle and the geometric center O; D calculate the shape similarity

The first step is to generate the RNA pocket's coordinate file for NPR analysis (Fig. 3A). The 240 pocket files in MRC format were converted to Network Common Data Format (NetCDF) by Chimera. In NetCDF, a box with length a Å, width b Å, and height c Å is divided into n (n = a*b*c) small grids with a size of 1 Å. A three-dimensional coordinate encodes each grid's position in the box (i, j, k). The values of i, j, k are integers from 0 to a-1, b-1, and c-1, respectively. The value of each grid F(i, j, k) is either 1 or 0. The pocket structure is composed of grids with a value of 1. Since the pocket density map is uniform, we abstract each grid with a value of 1 at the center of the grid with coordinate (i + 0.5, j + 0.5, k + 0.5).

The second step is PMI and RNA pocket topology descriptors calculation (Fig. 3B). The center of mass can be calculated using RPDescriptor. The moment of inertia tensors around the center of mass was further calculated. The PMI (\(I_{11}\), \(I_{22}\), \(I_{33}\)) values were obtained in ascending order. Finally, the RNA pocket topology descriptors, \(rpd_{1}\) and \(rpd_{2}\), are generated using formula (3).

$$rpd_{1} = \frac{{I_{11} }}{{I_{33} }},\;rpd_{2} = \frac{{I_{22} }}{{I_{33} }}$$
(3)

The third step is to calculate the pocket shape space quantitatively and classify the shape of the pocket (Fig. 3C). The shape can be visually displayed on an isosceles triangle by projecting the two descriptors (\(rpd_{1}\) and \(rpd_{2}\)) onto the two-dimensional plane. The upper left, upper right, lower-middle diagonal points correspond to a standard rod, sphere, or disk shape, respectively. According to calculation, the isosceles triangle's geometric center is O \(\left( {\frac{1}{2}, \frac{5}{6}} \right)\). Then, the O point and the three vertices of the triangle are connected. The shape space can be divided into three categories: sphere-, disc-, and rod-like pockets qualitatively.

The fourth step is to calculate the shape similarity score (Fig. 3D). The \(s_{1}\) = \(rpd_{1}\) + \(rpd_{2}\) − 1, \(s_{2}\) = 2—2*\(rpd_{2}\), and \(s_{3}\) = \(rpd_{2}\)\(rpd_{1}\) represent the sphere-like, disc-like, rod-like degree of the pocket, respectively [50]. Here, we defined a shape similarity score \(s_{i}\) that allows pockets to be classified quantitatively using formula (4). The value of \(s_{i}\) is from \(\frac{1}{3}\) to 1. For O point, \(s_{i}\) = max \(\left( {s_{1} = \frac{1}{3},\;s_{2} = \frac{1}{3},\;s_{3} = \frac{1}{3}} \right)\) = \(\frac{1}{3}\). For the three vertices, \(s_{i}\) = 1. If \(s_{i}\) = \(s_{1}\), the pocket is divided into the sphere-like type and \(s_{i}\) denotes sphericity. If \(s_{i}\) = \(s_{2}\) or \(s_{3}\), the pocket is divided into a disc-like type or rod-like type, and \(s_{i}\) denotes disc-like degree or rod-like degree. We observed that the two shape classification methods (qualitative and quantitative) are equivalent.

$$s_{1} + s_{2} + s_{3} = 1,\;\;s_{i} = {\text{max(s}}_{{1}} {,}\;{\text{s}}_{{2}} {,}\;{\text{s}}_{{3}} {)}$$
(4)

Results

Overview of the RNA pockets

We performed a systematic analysis of the 240 RNA pockets extracted from 94 non-redundant RNA-ligand complex structures (Additional file 5: Folder S1). RNAs can fold into various conformations and affect different functions. The representative RNAs include forty-four riboswitches, fifteen aptamer RNAs, seven ribozymes, five tRNAs, four rRNAs, three small RNAs, two xrRNAs, one mRNA, one telomeric RNA, and thirteen other RNAs [51] (Additional file 1: Fig. 2). For example, the RPocket dataset contains 44 riboswitches and 147 riboswitch pockets. The riboswitch RNA can bind small molecules to regulate gene expression through conformational changes. Understanding the riboswitch pocket provides a potential mechanism for the functional changes and solution for antibiotic drug design. To reflect the difference of characteristic analysis on the geometrical shape of pockets, we analyzed all the pockets topology features using NPRs. RNA pockets can be divided into three categories: sphere-like (50), disc-like (39), rod-like (151) pockets (Additional file 1: Fig. 3).

Topology characteristic of RNA pockets

The pocket topology characteristic is helpful to identify the small molecules for target-specific binding. We analyzed the topology properties (volume, surface area, and effective radius properties) using a rolling probe method by 3 V program [30]. The mean volume (m) and standard deviation (σ) of all the pockets are 1440.9 ± 2329.4Å3. Three large pockets were removed due to their volumes are larger than m + 3σ (Additional file 1: Fig. 4). Then, we calculated the shape similarity scores (\(s_{i}\)) (Additional file 1: Table S2). Figure 4A–C shows that the rod-like pocket (volume of 985Å3, the surface area of 676Å2, and effective radius of 4.60 Å) is more extensive than sphere-like (volume of 536Å3, the surface area of 380Å2, and effective radius of 4.21 Å) and disc-like (volume of 802Å3, the surface area of 508 Å2, and effective radius of 4.37 Å) pockets. We further analyzed the shape similarity scores to reflect pocket shape quantitatively. The continuous similarity scores are from \(\frac{1}{3}\) to 1. Grade 1 indicates a standard shape which is a sphere or disc or rod. Grade \(\frac{1}{3}\) suggests a very irregular shape. The shape similarity scores of sphere-like, disc-like, and rod-like pockets are 0.47, 0.49, and 0.61, respectively (Fig. 4D). The results suggest that the RNA pockets with rod-like shapes are typically highly rod-shaped, while the sphere- and disc-like class face the absence of highly spherical and discoid shapes, respectively.

Fig. 4
figure4

The geometric information distribution of surface area (A), volume (B), effective radius (C), and shape similarity scores (D) for each pocket category, respectively. The median values are colored green

We performed the comparative analysis of the 50 ligand-binding and 190 non-ligand-binding pockets to obtain the topological principle for ligand binding. We classified the RNA pockets based on their geometric shapes using RPDescriptor. There are 9 sphere-like, 8 disc-like, and 33 rod-like pockets in 50 ligand-binding pockets. The geometric shape distribution of 190 non-ligand-binding pockets is similar, which are 41 sphere-like, 32 disc-like, 117 rod-like pockets. To further reflect the geometrical characteristic on shape distributions of ligand-binding and non-ligand-binding pockets, an NPR space distribution graph with pocket-size information was generated. Figure 5 shows that the shape distributions of ligand-binding and non-ligand-binding pockets are similar. We also observed that the location of pockets in RNA are identical. These results emphasize the potential of the non-ligand-binding pocket as a small molecule target. Besides, the loss of globularity with increasing pocket volume both for ligand-binding and non-ligand-binding RNA pockets is consistent with protein pockets, suggesting that RNA can be considered as drug targets like proteins [50]. We further compared the volume and surface area between the ligand-binding and the non-binding pockets (Additional file 1: Fig. 5). It shows most ligand-binding pockets (~ 75%) with a volume between 200 and 2000 Å3. The volume and surface area of the ligand-binding pockets (982 Å3 median volume and 622 Å3 median area) are bigger than non-ligand-binding pockets (803 Å3 median volume and 543 Å3 median area). The ligand-binding may affect the pocket breathing motions.

Fig. 5
figure5

The NPR distribution of ligand-binding and non-ligand-binding pockets. Color code shows the volume size for each pocket

Secondary structure pattern of RNA pockets

The ligand-binding sites usually locate in a specific RNA secondary structure. Binding to the wrong secondary structure may destroy the interactions and the structural stability [52]. Thus, we analyzed the secondary structure distributions for all the RNA pocket binding sites (Additional file 3: Table S2). Here, we focused on the unpaired loop units. There are 10, 11, 15 secondary patterns in the sphere-, disc-, and rod-like pockets. Figure 6 shows that the sphere-like pockets are located in the hairpin loop (22%), internal-hairpin loop (17.1%), internal loop (14.6%), multibranched-internal loop (12.2%), multi-branched loop (9.8%), multibranched-hairpin loop (9.8%), and others (14.5%). The disc-like pockets are observed in the internal loop (15.8%), followed by the multi-branched loop (15.8%), internal-hairpin loop (15.8%), hairpin loop (13.2), internal-multibranched-hairpin-bulge loop (10.5%), multibranched-hairpin loop (7.9%), and others (21%). The rod-like pockets are located in the internal loop (19.4%), hairpin loop (17.2%), internal-hairpin loop (13.4), internal-multibranched loop (12.7%), multibranched-hairpin loop (8.2%), multi-branched loop (8.2%), and others (20.9%). Sphere-like pockets are typically smaller in size than the other two types. This kind of pocket often locates in the hairpin loop with four to five nucleotides [53]. We further counted the numbers of base pairs between the adjacent loops. The results show that the distance of the most adjacent loops are less than six base pairs (86.5%) (Additional file 1: Fig. 6). It is noted that 92.6% of these tandem loops are typically in the same shape pockets.

Fig. 6
figure6

The secondary structure patterns in the RNA pockets. The gray circle represents a spherical-like pocket. The red pie represents a disc-like pocket. The blue rectangle represents a rod-like pocket

We analyzed the distributions of the nucleotides extracted from the RNA-ligand binding sites (Additional file 4: Table S3). It is noted that the average distribution of G nucleotides (35.6%) is significantly higher than A (22.1%), C (20.6%), and U (21.7%) (Fig. 7B). The nucleotide G is easier to form the hydrogen bond with small molecules. Identifying RNA sequence motif can help us understand the RNA-ligand interactions and function [54]. Thus, we further performed a sequence pattern analysis of the RNA-ligand interaction nucleotides. For example, we consider the continuous symmetric sequence, ‘GU’ and ‘UG’, as the same motif. There are 39 sequence motifs involved in RNA-ligand interactions (Fig. 7A). The sequence motif of ‘GU’ (11.7%), ‘GG’ (8.8%), ‘GA’ (8.8%), ‘GC’ (8.1%), ‘AU’ (5.7%), ‘CC’ (5.3%), ‘AC’ (4.9%), ‘UGG’ (4.9%), ‘CU’ (3.5%), ‘AA’ (3.2%), ‘UGC’ (2.8%), ‘AUC’ (2.8%), ‘AAC’ (2.5%), ‘ACU’ (1.8%), and ‘GUC’ (1.8%) are observed more than five times in all the RNA-ligand interactions. Previous studies have indicated that the motifs ‘GU’, ‘GG’, ‘GA’, ‘GC’ can modulate metal-binding specifically [55]. Some of the sequence patterns have been identified as important motifs for RNA complex formation. For example, the previous study showed that some proteins specifically bind to AR (androgen receptor) mRNA rich in the UC region and play a role in post-transcriptional regulation of AR expression in prostate cancer cells [56]. Besides, the most repeated trinucleotide UGG (14 out of 283) is specifically recognized by Nitrosomonas MazF (a sequence-specific toxin endoribonuclease) and promotes RNA degradation selectively [57].

Fig. 7
figure7

The motif (A) and binding site (B) distribution of the RNA-ligand structure. The statistical motif analysis is consistent with the confirmed functional sequences. The average distribution of G nucleotides (35.6%) is significantly higher than A (22.1%), C (20.6%), and U (21.7%)

Contribution of the short- and long-range interactions

We identified the RNA-ligand interactions and analyzed the interaction patterns using Ligplot + (Additional file 4: Table S3). Figure 8A shows 16S RNA binding with Gentamicin C1a (GE), one of the aminoglycoside antibiotics in a rod-like pocket (volume of 979 Å3). There are two hydrogen bonds and eight non-bond interactions involved in the RNA-ligand interactions. It is noted that the hydrogen bonds located the adjacent nucleotides (A21, G22) and eight non-bond interactions dispersed in other parts of the RNA pocket. The other two examples show similar characteristics. The short- and long-range interactions are distributed in different parts of small molecules and stabilize the interaction between RNA and small molecules (Fig. 8B, C). We also analyzed all the ligand functional groups of the 94 representative RNAs involved in hydrogen bond and non-bond interactions (Additional file 1: Fig. 7 and Additional file 6: Folder S2). The results indicate that long-range (polar or electrostatic) interactions bring the ligand and RNA together. Then, the short non-bond interactions optimize the RNA-ligand binding. Besides, we analyzed the size of the pocket and ligand. SAM's volume in space is the smallest, followed by GE, G4P has the biggest size, which is consistent with pocket size. Together, the results suggest two steps for drug screening. First, the size and shape between the RNA pocket and small molecule should be roughly the same. Second, the typically short- and long-interactions should be considered to optimize the RNA-ligand binding.

Fig. 8
figure8

Examples of ligand-binding pockets. A The 16S RNA bound to gentamicin C1a (PDB: 1BYJ), B the ppGpp riboswitch bound to guanosine tetraphosphate (G4P) (PDB: 6DME), C the SAM-IV riboswitch bound to S-Adenosylmethionine (SAM) (PDB: 6UET). The ligands and pockets are colored in cyan and pink. The van der Waals and stacking interactions are emphasized with red arcs. The hydrogen bonds are shown with green dashed lines. The carbon, nitrogen, and oxygen atoms are shown as black, cream, and red spheres

Topology pattern improves tertiary structure prediction

At present, the structural base pairing and loop elements have been successfully applied to RNA tertiary structure prediction. However, the understanding of the higher-level structural element combinations is still limited. Our results show that 92.6% of the tandem loops (distance less than six base pairs) are typically in the same shape pockets. To test if the higher-level scale of structural elements can identify native-like RNA structures, we ran four popular RNA tertiary structure prediction programs (3dRNA, RNAcomposer, simRNA, and Vfold3D) on the given testing set to build several tertiary structures and evaluated the prediction accuracy (Additional file 1: Fig. 6, Additional file 7: Folder S3). All the tests can be downloaded from our website. We divided the prediction structures into Tandem loops with the Same pocket topology (TS) and Tandem loops with Different pocket topologies (TD). Figure 9 shows the all-atom root-mean-square deviation (RMSD) measured against the native structure. The predicted structure with the TS characteristic shows lower RMSDs (1.71 ± 1.66 Å) while the predicted structure with TD characteristic presents much larger RMSDs (7.23 ± 4.43 Å). The results suggest that the different scales of higher-level topology patterns may improve the RNA tertiary structure prediction.

Fig. 9
figure9

The RMSD values for predicted 3D structures for the tested RNAs. The 3D structures are generated by 3dRNA, RNAcomposer, simRNA, and Vfold3D. We divided the prediction structures into Tandem loops with the Same pocket topology (TS) and Tandem loops with Different pocket topologies (TD). The red dots indicate structures with TS characteristics. These structures (red dots) generally achieve lower RMSD values than the predicted structures with TD characteristics (blue dots)

Conclusions

In this work, we proposed RPDescritor to calculate the topological properties for RNA pockets quantitatively. The topological information was then subject to RNA-ligand binding analysis by incorporating the sequence and secondary structure information. This new approach takes advantage of both the atom-level precision of the structure and the residue-level tertiary interactions. Together, the results indicate that long-range interactions bring the ligand and RNA together. Then, the short non-bond interactions optimize and stabilize the RNA-ligand binding. We also developed one user-friendly bioinformatics tool, RPocket, to facilitate RNA-ligand engineering to regulate the complex formation for biological or medical applications.

Availability of data and materials

All the supplementary data and materials can be downloaded from the homepage of the RPocket at http://zhaoserver.com.cn/RPocket/RPocket.html.

References

  1. 1.

    Deigan WK, Hajdin CE, Weeks KM. Principles for targeting RNA with drug-like small molecules. Nat Rev Drug Discov. 2018;17:547–58.

    Article  CAS  Google Scholar 

  2. 2.

    Du Z, Lind KE, James TL. Structure of TAR RNA complexed with a Tat-TAR interaction nanomolar inhibitor that was identified by computational screening. Chem Biol. 2002;9(6):707–12.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  3. 3.

    Serganov A, Huang L, Patel DJ. Coenzyme recognition and gene regulation by a flavin mononucleotide riboswitch. Nature. 2009;458:233–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Wang Q, Wu J, Wang H, Gao Y, Rao Z. Structural basis for RNA replication by the SARS-CoV-2 polymerase. Cell. 2020;182:417–28.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Do CB, Woods DA. Batzoglou aS: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006;22(14):e90.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  6. 6.

    Zhao Y, Wang J, Zeng C, Xiao Y. Evaluation of RNA secondary structure prediction for both base-pairing and topology. Biophys Rep. 2018;4(3):123–32.

    CAS  Article  Google Scholar 

  7. 7.

    Jian Y, Wang X, Qiu J, Wang H, Zeng C. DIRECT: RNA contact predictions by integrating structural patterns. BMC Bioinform. 2019;20:1–12.

    CAS  Article  Google Scholar 

  8. 8.

    Wang J, Zhao Y, Zhu C, Xiao Y. 3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures. Nucleic Acids Res. 2015;43(10):63.

    Article  CAS  Google Scholar 

  9. 9.

    Zhao Y, Huang Y, Gong Z, Wang Y, Man J, Xiao Y. Automated and fast building of three-dimensional RNA structures. Sci Rep. 2012;2(1):734.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  10. 10.

    Leontis N, Westhof E. RNA 3D structure analysis and prediction. New York: Springer; 2011.

    Google Scholar 

  11. 11.

    Pllmann H. Application of Cryo-SEM microscopy and in-situ X-ray diffraction for the investigation of building material hydration. J Wuhan Univ Technol. 2011;33(1):1–10.

    Google Scholar 

  12. 12.

    Zhang Y, Wang J, Xiao Y. 3dRNA: building RNA 3D structure with improved template library. Comput Struct Biotechnol J. 2020;18:2416–23.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Zhao Y, Jian Y, Liu Z, Liu H, Liu Q, Chen C, Li Z, Wang L, Huang HH, Zeng C. Network analysis reveals the recognition mechanism for dimer formation of bulb-type lectins. Report. 2017;7(1):2876.

    Google Scholar 

  14. 14.

    Lei B, Jun WX. Molecular dynamics simulation of the binding process of ligands to the add adenine riboswitch aptamer. Phys Rev E. 2019;100(2–1):22412–22412.

    Google Scholar 

  15. 15.

    Wang Y, Liu T, Yu T, Tan ZJ, Zhang W. Salt effect on thermodynamics and kinetics of single RNA base pair. RNA. 2020;26(4):470–80.

    PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Bao L, Wang J, Xiao Y. Dynamics of metal ions around an RNA molecule. Phys Rev E. 2019;99(1):012420.

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Yan Y, Sheng-You H. RRDB: a comprehensive and nonredundant benchmark for RNA-RNA docking and scoring. Bioinformatics. 2017;34(3):453–8.

    Article  CAS  Google Scholar 

  18. 18.

    Daldrop P, Reyes FE, Robinson DA, Hammond CM, Lilley DM, Batey RT, Brenk R. Novel ligands for a purine riboswitch discovered by RNA-ligand docking. Chem Biol. 2011;18(3):324–35.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Bujnicki JM, Irina T. DARS-RNP and QUASI-RNP: new statistical potentials for protein-RNA docking. BMC Bioinform. 2011;12(1):348.

    Article  CAS  Google Scholar 

  20. 20.

    Burley SK, Charmi B, Bi C, Sebastian B, Chen L, Crichlow GV, Christie CH, Kenneth D, Di CL, Duarte JM. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2020;49(D1):D437–51.

    PubMed Central  Article  CAS  Google Scholar 

  21. 21.

    Berman HM, Olson WK, Beveridge DL, Westbrook J, Schneider B. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys J. 1992;63(3):751–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Jae-Min S, Doo-Ho C. PDB-ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures. Nucleic Acids Res. 2005;33:D238–41.

    Google Scholar 

  23. 23.

    Morgan BS, Sanaba BG, Donlic A, Karloff DB, Hargrove AE. R-BIND: an interactive database for exploring and developing RNA-targeted chemical probes. ACS Chem Biol. 2019;14(12):2691–700.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Subodh K, Mishra AK. NALDB: nucleic acid ligand database for small molecules targeting nucleic acid. Database. 2016;2016:1–11.

    Google Scholar 

  25. 25.

    Ankita M, Surabhi S, Isha G, Saurabh L, Sharma DK, Raman P. SMMRNA: a database of small molecule modulators of RNA. Nucleic Acids Res. 2014;42(D1):132–41.

    Article  CAS  Google Scholar 

  26. 26.

    Ji ZL, Chen X, Zhen CJ, Yao LX, Han LY, Yeo WK, Chung PC, Puy HS, Tay YT, Muhammad A. KDBI: kinetic data of bio-molecular interactions database. Nucleic Acids Res. 2003;31(1):255–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    He J, Wang J, Tao H, Xiao Y, Huang SY. HNADOCK: a nucleic acid docking server for modeling RNA/DNA-RNA/DNA 3D complex structures. Nucleic Acids Res. 2019;47:W35–42.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Pfeffer P, Gohlke H. Drug score RNA knowledge-based scoring function to predict RNA? Ligand interactions. J Chem Inf Model. 2007;47(5):1868–76.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  29. 29.

    Philips A, Milanowska K, Lach G, Bujnicki JM. LigandRNA: computational predictor of RNA-ligand interactions. RNA. 2013;19(12):1605–16.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Voss NR, Gerstein M. 3V: cavity, channel and cleft volume calculator and extractor. Nucleic Acids Res. 2010;38:W555–62.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Petřek M, Otyepka M, Banáš P, Košinová P, Koča J, Damborský J. CAVER: a new tool to explore routes from protein clefts, pockets and cavities. BMC Bioinform. 2006;7(1):316.

    Article  CAS  Google Scholar 

  32. 32.

    An J. Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol Cell Proteom. 2005;4(6):752–61.

    CAS  Article  Google Scholar 

  33. 33.

    Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. 36.

    Delano WL. The PyMOL molecular graphics system. Proteins Struct Funct Bioinf. 2014;30:442–54.

    Google Scholar 

  37. 37.

    Laskowski RA, Swindells MB. LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J Chem Inf Model. 2011;51(10):2778–86.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  38. 38.

    Mcdonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol. 1994;238(5):777–93.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  39. 39.

    Huang HY, Chia-Hung C, Kuan-Hua J, Huang HD. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements. Nucleic Acids Res. 2006;34:W429–34.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Hofacker IL, Schuster P, Stadler PF. Combinatorics of RNA secondary structures. Discrete Appl Math. 1998;88(1–3):207–37.

    Article  Google Scholar 

  41. 41.

    Jin EY, Qin J, Reidys CM. Combinatorics of RNA structures with pseudoknots. Bull Math Biol. 2008;70(1):45–67.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  42. 42.

    Popenda M, Szachniuk M, Blazewicz M, Wasik S, Burke EK, Blazewicz J, Adamiak RW. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures. BMC Bioinform. 2010;11.

  43. 43.

    Lee BK, Richards FM. interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971;55(3):379–400.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Richards MF. Areas, volumes, packing and protein structure. Annu Rev Biophys Bioeng. 1977;6(1):151–76.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  45. 45.

    Connolly ML. Analytical molecular surface calculation. J Appl Crystallogr. 1983;16(5):548–58.

    CAS  Article  Google Scholar 

  46. 46.

    Sanner MF, Olson AJ, Spehner JL. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers. 1996;38(3):305–20.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  47. 47.

    Sauer WHB, Schwarz MK. Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity. J Chem Inf Comput Sci. 2003;43(3):987–1003.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  48. 48.

    Marialuisa PC, Tim M, Janet MT. PoreWalker: a novel tool for the identification and characterization of channels in transmembrane proteins from their three-dimensional structure. Plos Comput Biol. 2009;5(7):e1000440.

    Article  CAS  Google Scholar 

  49. 49.

    Thompson LA, Ellman JA. Synthesis and applications of small molecule libraries. Chem Rev. 1996;96(1):555.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  50. 50.

    Wirth M, Volkamer A, Zoete V, Rippmann F, Michielin O, Rarey M, Sauer WHB. Protein pocket and ligand shape comparison and its application in virtual screening. J Comput Aided Mol Des. 2013;27(6):511–24.

    CAS  PubMed  Article  Google Scholar 

  51. 51.

    Rizvi NF, Smith GF. RNA as a small molecule druggable target. Bioorg Med Chem Lett. 2017;2017(27):5083–8.

    Article  CAS  Google Scholar 

  52. 52.

    Thomas JR, Hergenrother PJ. Targeting RNA with small molecules. Chem Rev. 2008;108(4):1171–224.

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    Groebe DR, Uhlenbeck OC. Characterization of RNA hairpin loop stability. Nucleic Acids Res. 1988;16(24):11725.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. 54.

    Macke TJ, Ecker DJ, Gutell RR, Gautheret D, Case DA, Sampath R. RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. 2001;29(22):4724–35.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 55.

    Wang W, Zhao J, Han Q, Wang G, Yang G, Shallop AJ, Liu J, Gaffney BL, Jones RA. Modulation of RNA metal binding by flanking bases: 15N NMR evaluation of GC, Tandem GU, and Tandem GA sites. Nucleosides Nucleotides Nuclc Acids. 2009;28(5–7):424–34.

    CAS  Article  Google Scholar 

  56. 56.

    Bu B, Yeap DC, Voon JP, Vivian R. Novel binding of HuR and poly(C)-binding protein to a conserved UC-rich Motif within the 3′-untranslated region of the androgen receptor messenger RNA. J Biol Chem. 2002;277(30):27183–92.

    Article  CAS  Google Scholar 

  57. 57.

    Miyamoto T, Yokota A, Ota Y, Tsuruga M, Aoi R, Tsuneda S, Noda N. Nitrosomonas europaea MazF specifically recognises the UGG Motif and promotes selective RNA degradation. Front Microbiol. 2018;9:2386.

    PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This work is supported by the National Natural Science Foundation of China 11704140 (YZ), 12175081(YZ), and self-determined research funds of CCNU from the colleges’ basic research and operation of MOE CCNU20TS004 (YZ). The funders had no role in the study's design and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

T.Z. built the server and performed most computational analysis. H.W. and C.Z. helped to build the server. Y.Z. supervised the overall study, analyzed the data, and wrote the paper. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Yunjie Zhao.

Ethics declarations

Declarations

The author declare that they have provided the data and code public accessible.

Ethics approval and consent to participate

Not applicable.

Consent to publish

Not applicable.

Competing interests

All authors declare no conflicts of interest in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Supplementary material. This file includes the introduction of RPocket, Supplementary material Figure 1–11 and Supplementary material Table 1.

Additional file 2. Table S1

: RNA-ligand complexes involved in this study.

Additional file 3. Table S2

: Geometrical information of RNA pockets and secondary structural elements which pocket located.

Additional file 4. Table S3

: Binding sites of RNA-ligand complexes and functional groups of ligands involved in interaction with RNA.

Additional file 5. Folder S1

: The structure of all RNA pockets.

Additional file 6. Folder S2

: Interaction info of RNA-ligand complexes.

Additional file 7. Folder S3

: Nine other experimental structures and their modeling structures and pockets.

Additional file 8. Folder S4

: RPDescriptor program code for shape classification of RNA pockets.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, T., Wang, H., Zeng, C. et al. RPocket: an intuitive database of RNA pocket topology information with RNA-ligand data resources. BMC Bioinformatics 22, 428 (2021). https://doi.org/10.1186/s12859-021-04349-4

Download citation

Keywords

  • Pocket database
  • RNA-ligand interaction
  • Structure prediction
  • Drug discovery