Identification of RNA silencing components in soybean and sorghum
BMC Bioinformatics volume 15, Article number: 4 (2014)
RNA silencing is a process triggered by 21–24 small RNAs to repress gene expression. Many organisms including plants use RNA silencing to regulate development and physiology, and to maintain genome stability. Plants possess two classes of small RNAs: microRNAs (miRNAs) and small interfering RNAs (siRNAs). The frameworks of miRNA and siRNA pathways have been established in the model plant, Arabidopsis thaliana (Arabidopsis).
Here we report the identification of putative genes that are required for the generation and function of miRNAs and siRNAs in soybean and sorghum, based on knowledge obtained from Arabidopsis. The gene families, including DCL, HEN1, SE, HYL1, HST, RDR, NRPD1, NRPD2/NRPE2, NRPE1, and AGO, were analyzed for gene structures, phylogenetic relationships, and protein motifs. The gene expression was validated using RNA-seq, expressed sequence tags (EST), and reverse transcription PCR (RT-PCR).
The identification of these components could provide not only insight into RNA silencing mechanism in soybean and sorghum but also basis for further investigation. All data are available at http://sysbio.unl.edu/.
Small RNAs, in particular, 20- to 24-nucleotide (nt) in length, belong to two classes: microRNAs (miRNAs) and short interfering RNAs (siRNAs). MiRNAs are regulators of gene expression and affect many biological processes, such as development and physiology in plants and animals [1–3]. Their dysregulation often causes developmental defects and diseases of plants and animals. MiRNAs are released as a duplex from an imperfect step-loop, which resides in the miRNA primary transcripts (pri-miRNA) [1–3]. SiRNAs are chemically indistinguishable with miRNAs but they originate from long perfect double-stranded RNAs (dsRNAs) [2, 3]. Plants encode several classes of siRNAs including siRNAs derived from repetitive DNAs (ra-siRNAs) and transacting siRNAs (ta-siRNAs) [2, 3]. Ra-siRNAs regulate gene expression at transcriptional levels by directing DNA methylation at homologs loci through a process named RNA-directed DNA methylation (RdDM) [2, 3]. In contrast, ta-siRNAs act like miRNAs to regulate gene expression at post-transcriptional levels . The framework of plant miRNA/siRNA biogenesis and function has been established in Arabidopsis thaliana (Arabidopsis); several different categories of genes are involved in the pathways for their generations and loading.
In Arabidopsis, the generation of miRNAs and siRNAs requires the DICER-LIKE proteins (DCL) . DCLs are the RNAase III enzymes that cut the dsRNAs to release ~ 22 nt RNA duplexes, which have 2 nt 3′ overhangs at each end . Arabidopsis encodes four DCLs: DCL1, DCL2, DCL3, and DCL4. DCL1, which associates with HYL1, a dsRNA binding protein, and SERRATE (SE), a zinc protein, cuts pri-miRNAs two times to release 21 nt miRNA duplex in nucleus [4–6]. DCL2 is responsible for 22 nt viral-derived siRNAs when plants are infected . DCL3 generates 24 nt ra-siRNAs and DCL4 produces 21 nt ta-siRNAs and some miRNAs [8–10]. The generation of both miRNAs and siRNAs also requires the single-stranded RNA (ssRNA)-binding proteins DAWDLE and TOUGH [11, 12]. After generation, miRNA and siRNA duplexes are 2′–O-methylated at 3′-terminal nucleotide by a dsRNA methylase HEN1 . The methylation protects miRNAs from degradation and 3′ untemplated uridine addition . The Arabidopsis HASTY (HST) gene is an ortholog of the human exportin 5 gene. After generation, miRNAs are exported to cytoplasm by HST-dependent or independent pathways , where they function. Interestingly, some components of miRNA biogenesis pathway are also targets of miRNAs. For example, in soybean, miR1515 can target DCL2 and leads to hypernodulation [16, 17].
RNA-dependent RNA polymerase (RDR) is another essential player for siRNA production. Among six RDRs in Arabidopsis, RDR2 converts ssRNAs generated from repetitive DNAs to precursor dsRNAs of ra-siRNAs , while RDR6 produces the ta-siRNA precursors . The generation of ra-siRNA also requires a plant specific DNA-dependent RNA polymerase IV (Pol IV) [19–21]. Pol IV is a Pol II-derived plant specific polymerase. It contains many identical subunits of Pol II , but the largest subunit NRPD1 and the second largest subunit NRPD2/NRPE2 of pol IV are paralogous of their counterparts in Pol II . Over 90% siRNAs require Pol IV for their production . Pol IV is thought to transcribe ssRNAs that serve as templates of RDR2 from RdDM target loci [19–21]. Another plant specific DNA dependent RNA polymerase V (Pol V) also plays crucial roles in the RdDM pathway [21, 24]. Pol V shares eight subunits with Pol IV including NRPD2/NRPE2 [22, 25], while NRPE1 (the largest subunit) and other three subunits are distinct from their counterparts in Pol IV [22, 25]. Pol V associates with RdDM target loci and produces ~200 nt non-coding transcripts from surrounding regions of some RdDM loci.
MiRNAs and siRNAs are loaded onto the ARGONAUTE (AGO) proteins, which performs target mRNA cleavage and/or translational inhibition, or directs chromatin modification such as DNA methylation . By recognizing the complementary sequences in the targets, miRNA,s and siRNAs guide AGO to silence specific genes . In general, there are multiple AGO genes in a plant species. Arabidopsis possesses 10 AGOs , based on sequence similarities, which are grouped into three clades: AGO1, AGO5 and AGO10 belong to the first clades; AGO2, AGO3 and AGO7 compose the second clades; and AGO4, AGO6, AGO8 and AGO9 are within the third clades . AGO1 associates with miRNAs and some siRNAs such as ta-siRNAs to cleave target mRNA and/or inhibit translation . AGO10 specifically sequesters miR166/165 from AGO1, which is essential for shoot apical meristem development [28, 29]. AGO7 binds miR390 to cleave the precursor RNA of ta-siRNAs . AGO4, AGO6, and AGO9 majorly bind 24 nt ra-siRNAs to direct DNA methylation , but seem to have different target preference [31–33]. It has been proposed that Pol V may recruit AGO4-siRNA complex to RdDM targets though its physical interaction with AGO4 and/or the interaction between its nascent transcripts and AGO4/6 associated siRNAs [34–36]. Recently, 19 and 18 AGOs were identified in rice and maize, respectively [37, 38].
In soybean and sorghum, our knowledge on RNA silencing mechanism is still poor. Taking advantage of available genome information and the conservation of RNA silencing components in different plant species, in this study, putative RNA silencing components, including DCL, HEN1, SE, HYL1, HST, RDR, NRPD1, NRPD2/NRPE2, NRPE1, and AGO, are identified in soybean and sorghum. The identification of these components could provide insight into RNA silencing mechanism in soybean and sorghum as well as basis for further investigation.
The domains of DExD-helicase, helicase-C, Duf283, PAZ, RNase III, and double-stranded RNA-binding (dsRB) are conserved in plant and animal DCLs . Therefore, DCL genes in soybean and sorghum can be identified by searching genes whose proteins have these domains combined with a structure like DCLs in Arabidopsis. The protein domain identification was accomplished with Hidden Markov Models (HMMs). Using HMM analysis to the whole genomes combined sequence similarity search with TBLASTN, 7 and 3 DCLs were identified in soybean and sorghum, respectively (Table 1). Phylogenetic analysis assigns two DCL1, two DCL2, one DCL3, two DCL4 in soybean and one DCL2, two DCL3 in sorghum (Figure 1). These genes are named by using prefix Gm (Glycine max) or Sb (Sorghum Bicolor) to reflect the species in which they are present and the numbers of their Arabidopsis orthologs, for example GmDCL1 for soybean DCL1 gene. In this manuscript, prefix At (Arabidopsis thaliana) and Os (Oryza sativa) are used for Arabidopsis and rice, respectively. If there are more than one orthologs, a letter is attached according to the sequence similarity. For instance, the one having the highest similarity with their Arabidopsis ortholog is designed as “a”. If two proteins are identical, they are named as “a” or “b” based on the order of chromosome location numbers: the gene is “a” if it is on the chromosome with a smaller number and the other is “b”. The same nomenclature is used for other RNA silencing components described in the following sections. The HMM search failed to find DCL1 and DCL4 in sorghum annotated genes, and hence, TBLASTN was performed to search AtDCL1 and AtDCL4 protein sequences against the sorghum genome sequence. This approach identified the SbDCL1 from an unannotated region in the chromosome one. This locus is named as Sb01g049105 because it is located between loci Sb01g049100 and Sb01g049110. The expression of SbDCL1 was confirmed by RT-PCR. No DCL4 homolog was identified by TBLASTN, but three predicted proteins, Sb06g022180, Sb06g022190, and Sb06g022200, show similarities to different portions of AtDCL4 from C-terminus to N-terminus. This indicates that these three predicted proteins might belong to one transcription unit. In fact, annotation to the region of these three genes predicts one transcript that encodes a 1630 amino acid (AA) long protein. This new predicted protein, named as SbDCL4, has 63% and 83% similarities to AtDCL4 and OsDCL4, respectively. The soybean DCL1, DCL2, and DCL4 have been duplicated once in the genome and there are high similarities between duplicates. In contrast, only DCL3 has duplicates in the sorghum genome. The GmDCLs are more similar to AtDCLs than SbDCLs do (Figure 1), presumably due to that both soybean and Arabidopsis are dicot while sorghum is monocot. In fact, SbDCLs are more similar to OsDCLs than to AtDCLs (Figure 1).
A Dicer gene and its homologs always contain two RNaseIII domains, termed “a” and “b”, each of which cleaves one strand of a dsRNA. The PAZ domain binds to the 3′-end of a dsRNA . The distance between the PAZ domain and the cleavage site of RNase III domain determines the length of small RNAs . The domain Duf283 is now known to be a dsRNA-binding domain . The protein domains present in soybean and sorghum DCLs are similar to their counterparts in Arabidopsis except some differences in DCL1 and DCL3. The RNase IIIa domain of SbDCL1 is divided into two segments, whereas it is present as an undivided domain in AtDCL1. Duf283 exists in both soybean and sorghum DCL3 while it is absent in AtDCL3. Figure 2 shows the combinations of DCL domains in soybean, sorghum, and Arabidopsis.
HEN1, SE, HYL1, and HST
By searching AtHEN1 protein sequence against soybean and sorghum genomes with TBLASTN, two soybean and one sorghum HEN1 homologs were identified. Soybean HEN1 from chromosome 5 and 8 are named as GmHEN1a and GmHEN1b, respectively (Table 2). The protein sequences of GmHEN1a/b and SbHEN1 have only 40-50% similarity with AtHEN1. The identical protein sequences of GmHEN1a and 1b suggest a recent duplication event. Like AtHEN1, GmHEN1a/b, and SbHEN1 contained two dsRNA binding domains, a La-motif-containing domain (LCD), a PPIase-like domain (PLD), and a highly conserved methyltransferase (MTase) , which indicates that HEN1s act on the miRNA or siRNA duplexes.
Soybean and sorghum genomes each encode three Arabidopsis SE homologs (Table 2). AtSEs have around 75% and 50-67% sequence similarities to GmSEs and SbSEs, respectively. Same as AtSEs, soybean and sorghum SEs possess an N-terminal unstructured region followed by an N-terminal domain containing several nuclear localization signals, a middle-domain, a core Zinc-finger domain, and a C-terminal unstructured region . Although similarities among GmSEs and among SbSEs are around 90%, their N-terminal unstructured regions (1–92 AA) are not conserved, which is consistent with the fact that the N-terminal unstructured region of a SE is not essential for its function in miRNA metabolism .
Both soybean and sorghum genomes encode two Arabidopsis HST homologs (Table 2). Although GmHSTa and b proteins are 79% similar to AtHST, they are 96% similar to each other. SbHSTa protein is 73% similar to AtHST, but SbHSTb shows only 59% similar to SbHSTa and 49% to AtHST. The low similarities of SbHSTb with SbHSTa and AtHST indicate that SbHSTb might be evolved into novel functions besides exporting miRNAs. Further research is deserved to conduct to test this hypothesis.
The dsRNA binding protein HYL1, which contains two dsRNA-binding domains at its N-terminus, is another essential component of miRNA biogenesis . Soybean genome encodes two HYL1 homologs that are 96% similar to each other, whereas sorghum encodes one HYL1 homolog (Table 2). GmHYL1a/b and SbHYL1 have more than 70% sequence identity with AtHYL1 at their N-terminal regions (~220 AA), which contains two dsRNA-binding domains. However, their C-terminal regions have no or little homology to that of AtHYL1. This is consistent with the fact that two dsRNA domains of HYL1 are essential and sufficient for its activity in miRNA biogenesis .
RDR is another important component of gene silencing, and it has a conserved RDRP domain. Six RDRs in Arabidopsis can be divided into four families: RDR1, RDR2, RDR3, and RDR6 [45, 46]. RDR3 family contains three members (RDR3a-c; also known as RDR3, 4 and 5), which share more than 80% similarities to each other . All proteins in soybean and sorghum were scanned for the RDRP domain with HMM, and the candidates were compared the results from searching all Arabidopsis RDR protein sequences against soybean and sorghum genomes with TBLASTN. Soybean and sorghum each encodes seven RDRs, which can be grouped into four families as Arabidopsis (Table 3). In soybean, RDR1, 2, and 6 families each contains two members and RDR3 family has a single gene, which is more similar to RDR3b in Arabidopsis. In sorghum, RDR1, 2 and 3 families each contains one member and the RDR6 family possesses four. The Phylogenetic tree of these RDR genes is shown in Figure 3.
Like other RDRs, these sorghum and soybean RDRs contain a common sequence motif corresponding to the catalytic β’ subunit of DNA-dependent RNA polymerases . The putative catalytic domains of soybean and sorghum RDR1, 2, and 6 proteins all contain the DLDGD motif, which is highly conserved in other identified RDRs . Like RDRs in other plants , RDR1, 2, and 6, proteins in soybean and sorghum also have the conserved subsequences, CSGS, GSGG, and ASGS, before the DLDGD motif. Protein sequence analysis shows that the second position on the DLDGD motif has some variations. Like AtRDR3, the motif sequences in soybean and sorghum RDR3 proteins are DFDGD . There are two more conserved motifs in all RDR proteins. All RDR1, 2, and 6 sequences including soybean and sorghum, carry a PCLH(P/S)GD(V/I)R motif while RDR3 has PGLH(F/P)GDIH . The second motif is A(V/L/I)DxPKxG; proteins for RDR1, 2 and 6 genes specifically have AVD(F/S)(P/A)KTG motif and RDR3 proteins have A(L/I)DAPKxG . Like other plants, RDR1, 2, and 6 proteins in soybean and sorghum also have two additional conserved motifs: (A/T)(F/Y)QIRY and ASAWY . Figure 4 shows the combination of domains in RDRs in soybean, sorghum, and Arabidopsis.
Soybean and sorghum Pol IV and Pol V
In order to gain insight into the Pol IV and Pol V complex, the largest and the second largest subunits of Pol IV are searched by searching AtNRPD1, AtNRPE1, and AtNRPD2/NRPE2 against soybean and sorghum genomes with TBLASTN. Soybean encodes two NRPD1, two NRPD2/NRPE2, and two NRPE1, and hence they are named as GmNRPD1a/b, GmNRPD2a/NRPE2a, GmNRPD2b/NRPE2b, and GmNRPE1a/b. Sorghum encodes one NRPD1, one NRPD2/NRPE2, and one NRPE1 for SbNRPD1, SbNRPD2/NRPE2, and SbNRPE1. All genes are listed in Table 4. GmNRPD1a is 97% similar to GmNRPD1b and both are around 67% similar to AtNRPD1, whereas GmNRPD2a/NRPE2a and GmNRPD2b/NRPE2b are 98% similar to each other and are 80% similar to AtNRPD2/NRPE2. GmNRPE1a and GmNRPE1b share 79% similarity and are 61% similar to AtNRPE1. SbNRPD1, SbNRPD2/NRPE2, and SbNRPE1 show only 51% homolog with their Arabidopsis counterparts. Phylogenetic analysis shows the close evolutionary relationships of NRPD1/NRPE1 to RPB1 and NRPD2/NRPE2 to RPB2 in both soybean and sorghum, which agrees with the proposed mode that NRPD1/NRPE1 and NRPD2/NRPE2 are alleles to RPB1 and RPB2, respectively . Protein sequence alignment also revealed the presence of conserved catalytic center residues within NRPD1s and NRPE1s in soybean and sorghum.
AGO proteins often contain four domains: N-terminal function-unknown domain (pfam DUF1785), PAZ, MID, and C-terminal PIWI domains. Proteins in soybean and sorghum with these four domains are identified by HMM analysis, and TBLASTN was performed to align Arabidopsis AGO proteins against sorghum and soybean genomes for comparison. Forteen AGOs in sorghum and 21 AGOs in soybean were identified, respectively (Table 5). Based on phylogenetic analysis, all AGO proteins can be grouped into three families: AGO1, AGO2, and AGO4. For sorghum, the AGO1 family consists of 10 members, who are four AGO1s, four AGO5s, one AGO10s and one AGO18, which is named with OsAGO18 because of their high similarity . The AGO2 family has two proteins, AGO2 and AGO7, and the AGO4 family contains two AGO4 proteins. In soybean, 11 soybean AGOs are grouped as AGO1 family: two clustered to form the AGO1 subfamily, two for the AGO5 subfamily, and seven for the AGO10 subfamily. Among four soybean AGO proteins in the AGO2 family, two are clustered with AGO2/3 and the others are more closely related to AGO7. Two genes in AGO2/3 subfamily are named as GmAGO3 because they are more similar to AtAGO3 than AtAGO2. The soybean AGO4 family has six members: three AGO4s, two AGO6s, and one AGO9. Like DCLs, these AGO proteins are named based on their similarities with their Arabidopsis counterparts (Figure 5). In the current genome annotation, GmAGO10g and GmAGO10e were predicted to encode 671 and 729 AA-long proteins, respectively, which miss C-terminal portions of PIWI domains. Additional gene annotation procedure was conducted and finds that AGO10g and AGO10e may encode two longer proteins with 909 AA and 908 AA, respectively.
The domain combinations of these AGO proteins in Arabidopsis, soybean, and sorghum are shown in Figure 6. The PAZ and MID domains bind the 3′-end and 5′-phosphate of RNAs, respectively [47, 48]. The PIWI domain has a similar structure as RNaseH and is responsible for the target mRNA cleavage. All the soybean and sorghum AGO proteins contains these four domains except for SbAGO6b, which does not have the N-terminal DUF1785 domain but possesses two tandem PIWI domains. The active site of one PIWI domain responsible for RNA cleavage often carries a conserved metal-chelating Asp–Asp–His (DDH) motif, which are correspond to D760, D845, and H986 of AtAGO1 . Furthermore, a conserved histidine at position 798 of AGO1 in Arabidopsis has been shown to be essential for AGO cleavage activity . Protein sequence alignment of all new discovered AGOs reveals that 10 soybean AGOs and 11 sorghum AGOs have the conserved DDH/H798 motifs (Table 6). In five soybean AGOs (GmAGO4a, GmAGO4b, GmAGO4c, GmAGO6a, and GmAGO9) and two sorghum AGOs (SbAGO4a and SbAGO4b), only the H798 is replaced by alanine, proline, or serine in the motif (Table 6). The histidine residue in the DDH motif is missed in GmAGO5b, GmAGO10g, and GmAGO10e, and replaced by aspartic acid in GmAGO3a, GmAGO3b and SbAGO2 (Table 6). GmAGO6b and AGO10g replace the second aspartic acid with alanine or lysine and AGO10g misses the third histidine in DDH motif (Table 6). Alterations in the catalytic motif in these AGOs indicate that they may not cleave targets. It has been shown that some of AGOs with the DDH motif do not have cleavage activity . Thus, it needs to be verified whether AGOs with DDH motifs in sorghum and soybean have the cleavage activity.
To confirm the expression of these RNA silencing components, we collected RNA-seq data from Sequence Read Archive (SRA), and analyzed these RNA-seq data to get the gene expression profiles for these new identified genes. According the numbers of mapped short reads, most identified genes have many mapped reads in different tissues and some of them even have very large numbers of mapped RNA-seq reads. Figure 7 shows the RNA-seq signals for some discovered genes, and detailed results of RNA-seq data analysis for all genes are shown in Additional file 1: Table S1. To further ascertain the RT-results, we searched those discovered RNA silencing components against the dbEST database  and PlantGDB  for expressed sequence tags (ESTs). We found the presence of ESTs of these genes in different tissues of soybean and sorghum. (Please see the Additional file 2: Table S2.) To further confirm these RNA silencing components in sorghum and soybean are indeed expressed, reverse transcription PCR (RT-PCR) was conducted. RT-PCR was performed on RNAs from inflorescence as templates using oligo dT primers. The resulting cDNA then was subjected to PCR using gene specific primers. RT-PCR identified the transcripts of these predicted RNA silencing components. Please see the Additional file 3: Figure S1 for RT-PCR results for those genes. The results of RT-PCR agree with RNA-seq data analysis results.
DCL is the essential component for miRNA and siRNA biogenesis . Although animals encode one DCL for the generation of both miRNAs and siRNAs, plants evolve four DCL groups . These DCLs have overlapping and diversified functions in miRNA and siRNA biogenesis . Both sorghum and soybean possess four DCL families, which further supports the notion that expansion of DCL family members in monocots and dicots happens after divergence between animal and plants . Sorghum has two DCL3 paralogs, DCL3a and DCL3b, which have low similarity to each other, whereas soybean encodes one DCL3. This result is consistent with the hypothesis that the DCL3 paralog in monocots was generated after divergence between monocots and dicots . OsDCL3a acts in non-canonical long miRNA biogenesis and 24 ra-siRNA biogenesis, whereas OsDCL3b functions in phased 24-nt siRNA biogenesis, indicating that the function of DCL3 paralogs is diversified . Because of the high similarities of SbDCL3a to OsDCL3a and SbDCL3b to OsDCL3b, SbDCL3a/b most likely have different functions in the small RNA pathway.
In Arabidopsis, DCL1, SE, TOUGH and HYL1 form a complex to process pri-miRNA in nucleus to generate miRNA duplex that are methylated by HEN1 and exported into cytoplasm by HST [4–6, 12, 13, 15]. The identification of DCL1, HYL1, SE, HEN1, and HST homologs in sorghum and soybean suggests that the biogenesis processes of miRNAs in them are similar to that of Arabidopsis. It is noted that in sorghum, the paralogs of HYL1, SE, HEN1, and HST are less similar to each other, but each has a closely related homolog in rice. This indicates that the duplication may occur before divergence between rice and sorghum about 50–70 million years ago . However, one can note that SEs in both soybean and sorghum have three paralogs each, which is more than other components in soybean/sorghum and SE in Arabidopsis do. This indicates the selective duplication for SEs in soybean and sorghum, besides whole genome duplication.
RDR is essential for siRNA biogenesis as well . Studies from Arabidopsis, rice, and maize have shown that plants possesses four groups of RDRs: RDR1, RDR2, RDR3 and RDR6. RDR2 from Arabidopsis and maize (MOP2), RDR6 from Arabidopsis and rice are required for ra-siRNA and ta-siRNA biogenesis, respectively [45, 46]. Recently, it was shown that RDR6 acts redundantly with RDR1 in viral-derived siRNA biogenesis . The function of RDR3 family is currently unknown yet. Corresponding RDR1, RDR2, RDR3, and RDR6 homologs for both soybean and sorghum are identified, which further supports the notion that the RDR gene family in plants is derived from a common ancestor.
The putative largest subunit and the second largest subunit of Pol IV and PolV, which are required for ra-siRNA-mediated DNA methylation, are discovered from soybean and sorghum. This agrees with the notion that Pol V and Pol IV are plant specific polymerases. In maize, lack of Pol IV and Pol V causes development defects , whereas in Arabidopsis, the nrpd and nrpe mutants appear to grow normally. It is interesting to further test whether Pol IV and Pol V are necessary for the development of soybean and sorghum.
AGO is the effector protein for small RNA-mediated silencing . It is proposed that both plants and animals encode multiple AGOs to meet the diversified functions of small RNA silencing . Like rice, maize, and Arabidopsis, both soybean and sorghum encode three subfamilies of AGO proteins, indicating that small RNA functions are conserved in higher plants. Soybean encodes seven AGO10 paralogs. Among of them, GmAGO10a/b/c share high similarity to each other, while GmAGO10d/e/f/g are clustered. The similarity of these two groups of GmAGO10 is relatively low, which indicates that their functions might be different. They might regulate the functions of different miRNAs. In Arabidopsis, AGO10 has been shown to regulate the function of miR166/165 [28, 29].
The identification of these putative RNA silencing components would give insight on small RNA pathways in soybean and sorghum. However, the exact function and contribution of individual component of RNA silencing machinery needs to be further examined because their functions may be diverse among different plant species.
Small RNA-mediated gene silencing is an important mechanism to regulate gene expression and genome stability in plants. The available sorghum and soybean genome information enable the identification of components that may involve in small-RNA mediated gene silencing in soybean and sorghum [59, 60]. The gene families, including DCL, HEN1, SE, HYL1, HST, RDR, NRPD1, NRPD2/NRPE2, NRPE1, and AGO, in soybean and sorghum were identified. RNA-seq, EST and RT-PCR analysis confirmed the expression of these candidate genes. In soybean, the similarities among paralogs are very high, which is consistent with the hypothesis that there have been 1–2 rounds of genome duplication in soybean since the separation of homolog sequences between soybean and Arabidopsis approximately 90 million years ago . Based on the knowledge of their counterparts in Arabidopsis, putative functions to these genes are annotated.
Genome sequence data
We collected soybean (Gmax 189) and sorghum (v1.4) genome sequences from Phytozome (v9.0) (http://www.phytozome.net/), and Arabidopsis sequences from TAIR (10) (http://www.arabidopsis.org/). The total numbers of genes are 55787, 35386, and 29448 for soybeans, sorghum, and Arabidopsis, respectively.
Identification of miRNA components
HMM analysis was used to search for DCL, AGO, and RDR genes encoded in the soybean and sorghum genomes, besides searching homolog in Arabidopsis with TBLASTN. DCL proteins have domains of DExD-helicase, helicase-C, Duf283, PAZ, RNase III, and double-stranded RNA-binding (dsRB). AGOs have PAZ, MID, and PIWI domains. RDRs have a conserved RDRP domain. The HMM profiles of domains in DCL, AGO and RDR families are obtained from the Pfam database. With the HMM profiles, the corresponding conserved sequences of DCL, AGO, and RDR proteins are extracted by HMMER . These conserved sequences are adapted to search for all predicted DCL, AGO and RDR genes. Protein sequences of all candidate genes were also aligned against Arabidopsis genome with BLASTP program (cutoff E-value = 0.001). The other genes, HEN1, SE, HYL1, and HST, which have only one gene in Arabidopsis, were screened against soybean and sorghum genomes with TBLASTN program (cutoff E-value = 0.001) to find the candidate genes.
Clustal-W was used for multiple sequence alignments. Phylogenetic analysis was performed with the PhyML and MEGA v5.0 programs by the maximum-likelihood method with 500 bootstrap replicates.
RNA-seq data analysis
RNA-seq data for soybean and sorghum were obtained from SRA (http://www.ncbi.nlm.nih.gov/Traces/sra/), and the accession numbers of these RNA-seq data are SRX062333 (floral bud), SRX113962 (cotyledons), and SRX265552 (seeds) for soybean and SRX080311 (root), SRX080321 (shoot), SRX080322 (shoot), SRX080323 (shoot), SRX099022 (early inflorescence), and SRX099184 (embryo) for sorghum. After preprocessing the RNA-seq data, the short reads were mapped against the G. max 189 genome and S. bicolor v1.4 genome sequences using Tophat (v1.3.2) , allowing up to two mismatches. The numbers of reads in genes were counted by HTSeq-count tool (Anders, 2010)  with the “union” resolution mode, and they are normalized with scaling the total count of mapped reads to 10 million reads. For each gene, the numbers of mapped reads per kilobase of exon per million mapped reads (RPKM) is shown as well.
EST expression analysis
To estimate the expression profiles, all miRNA components are searched against the dbEST database  (http://www.ncbi.nlm.nih.gov/dbEST) and PlantGDB  (http://www.plantgdb.org) with MEGABLAST (cutoff E-value = 10-10).
Total RNAs from inflorescences of soybean or sorghum was extracted as described in the work of Yu et al. . After treatment with DNase I, 5 μg RNA was reverse transcribed (RT) by the Superscript III reverse transcriptase (Invitrogen) using an oligo-T18 primer to generate cDNAs at 50°C for 1 hour. The resulting cDNAs was used as templates to perform PCR amplification with primers listed in Additional file 4: Table S3. PCR was performed for 32 cycles (94°C for 30 seconds, 55°C for 30 seconds, and 72°C for 60 seconds). Total RNAs were extracted from inflorescences of soybean or sorghum. Reverse transcription was performed using an Oligo-T primer. The amplification of UBIQUITIN 5 (UBQ5) was used as a loading control.
Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116 (2): 281-297. 10.1016/S0092-8674(04)00045-5.
Baulcombe D: RNA silencing in plants. Nature. 2004, 431 (7006): 356-363. 10.1038/nature02874.
Chen X: Small RNAs and their roles in plant development. Annu Rev Cell Dev Biol. 2009, 25: 21-44. 10.1146/annurev.cellbio.042308.113417.
Song L, Han MH, Lesicka J, Fedoroff N: Arabidopsis primary microRNA processing proteins HYL1 and DCL1 define a nuclear body distinct from the Cajal body. Proc Natl Acad Sci U S A. 2007, 104 (13): 5437-5442. 10.1073/pnas.0701061104.
Fang Y, Spector DL: Identification of nuclear dicing bodies containing proteins for microRNA biogenesis in living Arabidopsis plants. Curr Biol. 2007, 17 (9): 818-823. 10.1016/j.cub.2007.04.005.
Fujioka Y, Utsumi M, Ohba Y, Watanabe Y: Location of a possible miRNA processing site in SmD3/SmB nuclear bodies in Arabidopsis. Plant Cell Physiol. 2007, 48 (9): 1243-1253. 10.1093/pcp/pcm099.
Bouche N, Lauressergues D, Gasciolli V, Vaucheret H: An antagonistic function for Arabidopsis DCL2 in development and a new function for DCL4 in generating viral siRNAs. EMBO J. 2006, 25 (14): 3347-3356. 10.1038/sj.emboj.7601217.
Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, Jacobsen SE, Carrington JC: Genetic and functional diversification of small RNA pathways in plants. PLoS Biol. 2004, 2 (5): E104-10.1371/journal.pbio.0020104.
Xie Z, Allen E, Wilken A, Carrington JC: DICER-LIKE 4 functions in trans-acting small interfering RNA biogenesis and vegetative phase change in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2005, 102 (36): 12984-12989. 10.1073/pnas.0506426102.
Henderson IR, Zhang X, Lu C, Johnson L, Meyers BC, Green PJ, Jacobsen SE: Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nat Genet. 2006, 38 (6): 721-725. 10.1038/ng1804.
Yu B, Bi L, Zheng B, Ji L, Chevalier D, Agarwal M, Ramachandran V, Li W, Lagrange T, Walker JC, et al: The FHA domain proteins DAWDLE in Arabidopsis and SNIP1 in humans act in small RNA biogenesis. Proc Natl Acad Sci U S A. 2008, 105 (29): 10073-10078. 10.1073/pnas.0804218105.
Ren G, Xie M, Dou Y, Zhang S, Zhang C, Yu B: Regulation of miRNA abundance by RNA binding protein TOUGH in Arabidopsis. Proc Natl Acad Sci U S A. 2012, 109 (31): 12817-12821. 10.1073/pnas.1204915109.
Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X: Methylation as a crucial step in plant microRNA biogenesis. Science. 2005, 307 (5711): 932-935. 10.1126/science.1107130.
Li JJ, Yang ZY, Yu B, Liu J, Chen XM: Methylation protects miRNAs and siRNAs from a 3 ′-end uridylation activity in Arabildopsis. Curr Biol. 2005, 15 (16): 1501-1507. 10.1016/j.cub.2005.07.029.
Park MY, Wu G, Gonzalez-Sulser A, Vaucheret H, Poethig RS: Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A. 2005, 102 (10): 3691-3696. 10.1073/pnas.0405570102.
Li H, Deng Y, Wu T, Subramanian S, Yu O: Misexpression of miR482, miR1512, and miR1515 increases soybean nodulation. Plant Physiol. 2010, 153 (4): 1759-1770. 10.1104/pp.110.156950.
Song QX, Liu YF, Hu XY, Zhang WK, Ma B, Chen SY, Zhang JS: Identification of miRNAs and their target genes in developing soybean seeds by deep sequencing. BMC Plant Biol. 2011, 11: 5-10.1186/1471-2229-11-5.
Yoshikawa M, Peragine A, Park MY, Poethig RS: A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev. 2005, 19 (18): 2164-2175. 10.1101/gad.1352605.
Herr AJ, Jensen MB, Dalmay T, Baulcombe DC: RNA polymerase IV directs silencing of endogenous DNA. Science. 2005, 308 (5718): 118-120. 10.1126/science.1106910.
Kanno T, Huettel B, Mette MF, Aufsatz W, Jaligot E, Daxinger L, Kreil DP, Matzke M, Matzke AJ: Atypical RNA polymerase subunits required for RNA-directed DNA methylation. Nat Genet. 2005, 37 (7): 761-765. 10.1038/ng1580.
Onodera Y, Haag JR, Ream T, Nunes PC, Pontes O, Pikaard CS: Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell. 2005, 120 (5): 613-622. 10.1016/j.cell.2005.02.007.
Ream TS, Haag JR, Wierzbicki AT, Nicora CD, Norbeck AD, Zhu JK, Hagen G, Guilfoyle TJ, Pasa-Tolic L, Pikaard CS: Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II. Mol Cell. 2009, 33 (2): 192-203. 10.1016/j.molcel.2008.12.015.
Mosher RA, Schwach F, Studholme D, Baulcombe DC: PolIVb influences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proc Natl Acad Sci U S A. 2008, 105 (8): 3145-3150. 10.1073/pnas.0709632105.
Pontier D, Yahubyan G, Vega D, Bulski A, Saez-Vasquez J, Hakimi MA, Lerbs-Mache S, Colot V, Lagrange T: Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA polymerases IV in Arabidopsis. Genes Dev. 2005, 19 (17): 2030-2040. 10.1101/gad.348405.
Huang L, Jones AM, Searle I, Patel K, Vogler H, Hubner NC, Baulcombe DC: An atypical RNA polymerase involved in RNA silencing shares small subunits with RNA polymerase II. Nat Struct Mol Biol. 2009, 16 (1): 91-93. 10.1038/nsmb.1539.
Vaucheret H: Plant ARGONAUTES. Trends Plant Sci. 2008, 13 (7): 350-358. 10.1016/j.tplants.2008.04.007.
Yu B, Wang H: Translational inhibition by microRNAs in plants. Prog Mol Subcell Biol. 2010, 50: 41-57. 10.1007/978-3-642-03103-8_3.
Zhu H, Hu F, Wang R, Zhou X, Sze SH, Liou LW, Barefoot A, Dickman M, Zhang X: Arabidopsis Argonaute10 specifically sequesters miR166/165 to regulate shoot apical meristem development. Cell. 2011, 145 (2): 242-256. 10.1016/j.cell.2011.03.024.
Ji L, Liu X, Yan J, Wang W, Yumul RE, Kim YJ, Dinh TT, Liu J, Cui X, Zheng B, et al: ARGONAUTE10 and ARGONAUTE1 regulate the termination of floral stem cells through two microRNAs in Arabidopsis. PLoS Genet. 2011, 7 (3): e1001358-10.1371/journal.pgen.1001358.
Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE, Alexander AL, Chapman EJ, Fahlgren N, Allen E, Carrington JC: Specificity of ARGONAUTE7-miR390 interaction and dual functionality in TAS3 trans-acting siRNA formation. Cell. 2008, 133 (1): 128-141. 10.1016/j.cell.2008.02.033.
Havecker ER, Wallbridge LM, Hardcastle TJ, Bush MS, Kelly KA, Dunn RM, Schwach F, Doonan JH, Baulcombe DC: The Arabidopsis RNA-directed DNA methylation argonautes functionally diverge based on their expression and interaction with target loci. Plant Cell. 2010, 22 (2): 321-334. 10.1105/tpc.109.072199.
Zheng X, Zhu J, Kapoor A, Zhu JK: Role of Arabidopsis AGO6 in siRNA accumulation, DNA methylation and transcriptional gene silencing. EMBO J. 2007, 26 (6): 1691-1701. 10.1038/sj.emboj.7601603.
Zilberman D, Cao X, Jacobsen SE: ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science. 2003, 299 (5607): 716-719. 10.1126/science.1079695.
Wierzbicki AT, Haag JR, Pikaard CS: Noncoding transcription by RNA polymerase Pol IVb/Pol V mediates transcriptional silencing of overlapping and adjacent genes. Cell. 2008, 135 (4): 635-648. 10.1016/j.cell.2008.09.035.
Wierzbicki AT, Ream TS, Haag JR, Pikaard CS: RNA polymerase V transcription guides ARGONAUTE4 to chromatin. Nat Genet. 2009, 41 (5): 630-634. 10.1038/ng.365.
El-Shami M, Pontier D, Lahmy S, Braun L, Picart C, Vega D, Hakimi MA, Jacobsen SE, Cooke R, Lagrange T: Reiterated WG/GW motifs form functionally and evolutionarily conserved ARGONAUTE-binding platforms in RNAi-related components. Genes Dev. 2007, 21 (20): 2539-2544. 10.1101/gad.451207.
Qian Y, Cheng Y, Cheng X, Jiang H, Zhu S, Cheng B: Identification and characterization of Dicer-like, Argonaute and RNA-dependent RNA polymerase gene families in maize. Plant Cell Rep. 2011, 30 (7): 1347-1363. 10.1007/s00299-011-1046-6.
Kapoor M, Arora R, Lama T, Nijhawan A, Khurana JP, Tyagi AK, Kapoor S: Genome-wide identification, organization and phylogenetic analysis of Dicer-like, Argonaute and RNA-dependent RNA Polymerase gene families and their expression analysis during reproductive development and stress in rice. BMC Genomics. 2008, 9: 451-10.1186/1471-2164-9-451.
Hammond SM: Dicing and slicing: the core machinery of the RNA interference pathway. FEBS Lett. 2005, 579 (26): 5822-5829. 10.1016/j.febslet.2005.08.079.
Qin H, Chen F, Huan X, Machida S, Song J, Yuan YA: Structure of the Arabidopsis thaliana DCL4 DUF283 domain reveals a noncanonical double-stranded RNA-binding fold for protein-protein interaction. RNA. 2010, 16 (3): 474-481. 10.1261/rna.1965310.
Huang Y, Ji L, Huang Q, Vassylyev DG, Chen X, Ma JB: Structural insights into mechanisms of the small RNA methyltransferase HEN1. Nature. 2009, 461 (7265): 823-827. 10.1038/nature08433.
Machida S, Chen HY, Adam Yuan Y: Molecular insights into miRNA processing by Arabidopsis thaliana SERRATE. Nucleic Acids Res. 2011, 39 (17): 7828-7836. 10.1093/nar/gkr428.
Vazquez F, Gasciolli V, Crete P, Vaucheret H: The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol. 2004, 14 (4): 346-351.
Yang SW, Chen HY, Yang J, Machida S, Chua NH, Yuan YA: Structure of Arabidopsis HYPONASTIC LEAVES1 and its molecular implications for miRNA processing. Structure. 2010, 18 (5): 594-605. 10.1016/j.str.2010.02.006.
Wassenegger M, Krczal G: Nomenclature and functions of RNA-directed RNA polymerases. Trends Plant Sci. 2006, 11 (3): 142-151. 10.1016/j.tplants.2006.01.003.
Zong J, Yao X, Yin J, Zhang D, Ma H: Evolution of the RNA-dependent RNA polymerase (RdRP) genes: duplications and possible losses before and after the divergence of major eukaryotic groups. Gene. 2009, 447 (1): 29-39. 10.1016/j.gene.2009.07.004.
Parker JS, Roe SM, Barford D: Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature. 2005, 434 (7033): 663-666. 10.1038/nature03462.
Ma JB, Yuan YR, Meister G, Pei Y, Tuschl T, Patel DJ: Structural basis for 5′-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature. 2005, 434 ((7033): 666-670.
Liu J, Carmell MA, Rivas FV, Marsden CG, Thomson JM, Song JJ, Hammond SM, Joshua-Tor L, Hannon GJ: Argonaute2 is the catalytic engine of mammalian RNAi. Science. 2004, 305 (5689): 1437-1441. 10.1126/science.1102513.
Baumberger N, Baulcombe DC: Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc Natl Acad Sci U S A. 2005, 102 (33): 11928-11933. 10.1073/pnas.0505461102.
Tolia NH, Joshua-Tor L: Slicer and the argonautes. Nat Chem Biol. 2007, 3 (1): 36-43. 10.1038/nchembio848.
Boguski MS, Lowe TM, Tolstoshev CM: dbEST–database for “expressed sequence tags”. Nat Genet. 1993, 4 (4): 332-333. 10.1038/ng0893-332.
Schlueter SD, Dong Q, Brendel V: GeneSeqer@PlantGDB: Gene structure prediction in plant genomes. Nucleic Acids Res. 2003, 31 (13): 3597-3600. 10.1093/nar/gkg533.
Margis R, Fusaro AF, Smith NA, Curtin SJ, Watson JM, Finnegan EJ, Waterhouse PM: The evolution and diversification of Dicers in plants. FEBS Lett. 2006, 580 (10): 2442-2450. 10.1016/j.febslet.2006.03.072.
Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, Olson T, Young N, Concibido V, Wilcox J, Tamulonis JP, et al: Genome duplication in soybean (Glycine subgenus soja). Genetics. 1996, 144 (1): 329-338.
Song X, Li P, Zhai J, Zhou M, Ma L, Liu B, Jeong DH, Nakano M, Cao S, Liu C, et al: Roles of DCL4 and DCL3b in rice phased small RNA biogenesis. Plant J. 2012, 69 (3): 462-474. 10.1111/j.1365-313X.2011.04805.x.
Wang XB, Wu Q, Ito T, Cillo F, Li WX, Chen X, Yu JL, Ding SW: RNAi-mediated viral immunity requires amplification of virus-derived siRNAs in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2010, 107 (1): 484-489. 10.1073/pnas.0904086107.
Erhard KF, Stonaker JL, Parkinson SE, Lim JP, Hale CJ, Hollick JB: RNA polymerase IV functions in paramutation in Zea mays. Science. 2009, 323 (5918): 1201-1205. 10.1126/science.1164508.
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, et al: The B73 maize genome: complexity, diversity, and dynamics. Science. 2009, 326 (5956): 1112-1115. 10.1126/science.1178534.
Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457 (7229): 551-556. 10.1038/nature07723.
Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.
Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.
HTSeq: analysing high-throughput sequencing data with python. [http://www-huber.embl.de/users/anders/HTSeq/]
Yu B, Bi L, Zhai J, Agarwal M, Li S, Wu Q, Ding SW, Meyers BC, Vaucheret H, Chen X: siRNAs compete with miRNAs for methylation by HEN1 in Arabidopsis. Nucleic Acids Res. 2010, 38 (17): 5844-5850. 10.1093/nar/gkq348.
The work is supported by funding under CZ’s startup funds from University of Nebraska, Lincoln, NE, USA and BY’s grant from Nebraska Soybean Board, NE, USA.
The authors declare that they have no competing interests.
XL, BY, and CZ desigend the experiment. XL conducted the experiment. TL, YD, and CZ conducted the bioinformtic study and data analysis. YD conducted the RNA-seq data analtysis. CZ and BY supervised the whole project and drafted the manuscript. All authors read and approved the final manuscript.
Xiang Liu, Tao Lu contributed equally to this work.
Electronic supplementary material
Additional file 1: Table S1: Raw and normalized numbers of mapped RNA-seq reads for all discovered genes in different tissues of soybean and sorghum. RNA-seq data were obtained from SRA and TopHat was used to map short reads to soybean and sorghum genome sequences. The number of reads aligned in one gene reflects the gene expression level in the given tissue. (XLS 52 KB)
Additional file 2: Table S2: Numbers of ESTs for all discovered genes in soybean and sorghum. ESTs from dbEST and PlantGDB, including PlantGDB-assembled unique transcripts (PUTs), were mapped to all identified genes with MEGABLAST. The numbers of aligned ESTs are related to the gene expression, and Symbols of “X” for different tissues mean that some ESTs are identified in corresponding tissues. (XLS 68 KB)
Additional file 3: Figure S1: Detection of predicted genes involved in the RNA silencing pathway in soybean (A) and sorghum (B). The transcripts are detected by RT-PCR. Amplification of UBIQUITIN5 (UBQ5) with or without RT (-RT) is shown as a control. (TIFF 1010 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Liu, X., Lu, T., Dou, Y. et al. Identification of RNA silencing components in soybean and sorghum. BMC Bioinformatics 15, 4 (2014). https://doi.org/10.1186/1471-2105-15-4
- miRNA Biogenesis
- Sorghum Genome
- PIWI Domain
- dsRNA Binding Protein
- siRNA Biogenesis