In-silico and in-vivo analyses of EST databases unveil conserved miRNAs from Carthamus tinctorius and Cynara cardunculus
© Catalano et al.; licensee BioMed Central Ltd. 2012
Published: 28 March 2012
Skip to main content
© Catalano et al.; licensee BioMed Central Ltd. 2012
Published: 28 March 2012
MicroRNAs (miRNAs) are small RNAs (21-24 bp) providing an RNA-based system of gene regulation highly conserved in plants and animals. In plants, miRNAs control mRNA degradation or restrain translation, affecting development and responses to stresses. Plant miRNAs show imperfect but extensive complementarity to mRNA targets, making their computational prediction possible, useful when data mining is applied on different species. In this study we used a comparative approach to identify both miRNAs and their targets, in artichoke and safflower.
Two complete expressed sequence tags (ESTs) datasets from artichoke (3.6·104 entries) and safflower (4.2·104), were analysed with a bioinformatic pipeline and in vitro experiments, identifying 17 potential miRNAs. For each EST, using RNAhybrid program and 953 non redundant miRNA mature sequences, available in mirBase as reference, we searched matching putative targets. 8730 out of 42011 ESTs from safflower and 7145 of 36323 ESTs from artichoke showed at least one predicted miRNA target. BLAST analysis showed that 75% of all ESTs shared at least a common homologous region (E-value < 10-4) and about 50% of these displayed 400 bp or longer aligned sequences as conserved homologous/orthologous (COS) regions. 960 and 890 ESTs of safflower and artichoke organized in COS shared 79 different miRNA targets, considered functionally conserved, and statistically significant when compared with random sequences (signal to noise ratio > 2 and specificity ≥ 0.85). Four highly significant miRNAs selected from in silico data were experimentally validated in globe artichoke leaves.
Mature miRNAs and targets were predicted within EST sequences of safflower and artichoke. Most of the miRNA targets appeared highly/moderately conserved, highlighting an important and conserved function. In this study we introduce a stringent parameter for the comparative sequence analysis, represented by the identification of the same target in the COS region. After statistical analysis 79 targets, found on the COS regions and belonging to 60 miRNA families, have a signal to noise ratio > 2, with ≥ 0.85 specificity. The putative miRNAs identified belong to 55 dicotyledon plants and to 24 families only in monocotyledon.
The family Asteraceae represents one of the largest evolutive radiations of flowering plants, including more than 1.5·103 genera and 2.3·104 species, comprising economically important as well as ornamental crops [1, 2]. Among members of this dicotyledon family two crop species belonging to the Cardueae tribe, Carthamus tinctorius L. and Cynara cardunculus var. scolymus L., also hold a phyto-pharmaceutical interest. The former, known as safflower, is the only member of this genus widely cultivated for industrial oil, as a livestock feed or for use in traditional medicine . The second species, the globe artichoke, apart from its importance as food, is popular for its dietary and therapeutic potentials, especially for hepato-biliary dysfunctions and digestive complaints [4, 5].
Significant progress has been achieved in recent years on the mechanisms of gene regulation and expression in plant and animals [6, 7] and several studies already elucidated gene regulation in the model organisms Caenorhabditis elegans and Arabidopsis thaliana [8, 9]. Few data are available thus far on gene expression  or regulation based on small regulatory elements, on a wide scale, for C. cardunculus and C. tinctorius.
A class of tiny RNA molecules, lin-4 and let-7, both controlling the timing of juvenile development in C. elegans were the founding members of non-coding RNAs called microRNAs (miRNAs) [11–13]. Plant miRNAs came to light from A. thaliana studies . Their discovery broadened the phylogenetic distribution of miRNAs to plant genomes and highlighted their ancient origin and the important role played . Recently, a significant advancement was achieved through the discovery that small RNA molecules were not only active within the cell, but also as mediators at the cell-to-cell communication level . Plant miRNAs derive from long primary transcripts (pri-miRNA) giving rise to mature RNAs products of 21-24 bp, fundamental in gene regulation . In plants, miRNAs control the degradation of messengers or restrain translation, affecting development and response to biotic and abiotic stresses .
The biogenesis of miRNAs in plant is a multi step process: initially a miRNA gene is transcribed in the nucleus (pri-miRNA) by means of POLII enzyme [16, 18]. Subsequently, the pri-miRNA is cleaved in the miRNA precursor or "pre-miRNA". In this step the precursor is organized in a typical stem loop structure. A Dicer-like enzyme is involved in the plant miRNA maturation, functioning in concert with the dsRNA binding protein (dsRBP) [18, 19]. After this step, the processed miRNA is transported to the cytoplasm by the HASTY protein  to build a complex, termed "RNA-induced silencing complex (RISC)", with ARGONAUTE (AGO) proteins. The RISC complex guide has a functional activity of cleavage or translational repression of its cognate mRNA target by base-paring. In plants, miRNAs have several roles involved in most biologic processes  as well as in diverse signalling networks, like leaf [22, 23] and flower development [24, 25].
BLASTx analysis showed at least three miRNAs within coding regions, (E-values range: 1e-26 to 8e-127), corresponding to miR168a and miR834 in C. tinctorius, and to miR390 and miR834 in Cynara. The latter was detected within EMBL:GE610451, that is homologous to the BURP domain-containing proteins (E-value: 6e-24), identified in many plants, putatively involved in a variety of vegetative and reproductive developmental functions, as well as in responses to environmental stresses . MiR390, identified in EMBL:GE603045, showed a high similarity level with chaperonin family Tcp-1/cpn60 , (E-value: 8e-127). All sequences of precursors of the above mentioned pre-miRNAs, fold into a perfect hairpin structure with low free energy levels (additional file 1).
List of primer sequences for RT-PCR analysis for cloning and sequencing microRNAs from Cynara cardunculus leaves
Poly(T) adapter reverse*
Identification of homologous regions between C. cardunculus and C. tinctorius, and the conservation of identical putative targets in both analysed sequences, provided a guide for specific primer design to further assess C. cardunculus miRNAs in vivo. This approach allowed the experimental validation of 4 out of 10 selected miRNAs, using as template RNAs from young C. cardunculus leaves. However, to be able to capture all expressed miRNAs and considering that only young leaves were assayed, it appeared necessary to test various phenological stages or plant developmental phases, in order to get a more complete scenario of the miRNA expression patterns.
However, this study revealed that some sequences considered specific only for Arabidopsis are also present in Carduoideae (i.e. miR833-5p). All analyses carried out in this study highlighted and confirmed the broad conservation and homology of plant mature miRNAs. Results herein shown strengthen the hypothesis that miRNAs play a fundamental role at the cellular level, and that they are nearly always conserved in related organisms . In particular, their conservation suggests a feedback mechanism of genome stabilization and protection, through selective pressures active at the gene expression level. Mutations in miRNA sequences have in fact a very low probability of being mirrored by complementary changes occurring at their target loci or vice versa, thus setting in place a simple but efficient mechanism of control . However, as for orphan genes, during evolution some structures persist even if their function was lost. Since A. thaliana and Carduoideae do not share the same evolutionary line, caution is needed when inferring the functionality of the miRNA-target links observed.
RNAhybrid results showed that 7145 and 8730 ESTs in C. tinctorius and C. cardunculus, respectively, have at least one predicted miRNA target (additional files 2, 3). The selection parameters used in the analysis (see methods) were: maximum one loop, one mismatch, and overhangs not longer than 2 nucleotides and mfe cut off value > 70% of the perfect match obtained by RNAHybrid, and calculated for each miRNA mature sequence  (additional file 4). Data also showed a pool of 1443 ESTs, 990 for C. scolymus and 453 for C. tinctorius, in which repeated (identical or different) targets occur in the same EST. The prediction of multiple targets within the same EST in several positions, suggests the possibility of evolutive mechanisms related to co-regulation or amplification. Data indicate that, in Asteraceae, several types of miRNAs could be involved in this mechanism, in a way similar to the multiple and different targets on the same mRNAs proposed in Arabidopsis, i.e. miR400 and miR161 that target Tair:At1g06580, Tair:At1g62720 and Tair:At1g62670 mRNAs .
The comparative approach herein followed allowed the identification of conserved homologous/orthologous sequences (COS regions) in EST datasets of C. tinctorius and C. cardunculus, pointing out the unique features (shared sequences), present in this phylogenetic group. A systematic search for aligned regions between C. tinctorius and C. cardunculus ESTs, through BLAST analyses, resulted in the identification of 233891 homologous sequences. We inferred that about 75% of C. tinctorius and C. cardunculus ESTs shared at least one homologous region (E-value < 10-4). The aligned regions longer than 400 bp (E-value < 10-10 and identity > 75%), were considered as significant (additional file 5). They covered about 50% of the complete EST dataset (16921 sequences for C. cardunculus and 19582 for C. tinctorius) and were considered to represent sequences that code for proteins with similar functions in these two species.
About 8000 C. cardunculus and 9900 C. tinctorius EST sequences did not show a common region, probably due to incomplete transcriptome sequencing data.
Analyses carried out in COS regions resulted in the identification of miRNA targets, conserved in C. cardunculus and C. tinctorius, obtained after aligning of 960 and 890 ESTs of Cynara and Carthamus respectively. Following a statistical evaluation of the predicted targets, a pool of 79 (additional file 6) different mature miRNA targets with a signal to noise ratio > 2 and specificity > 0.85 was identified.
The complete EST sequences analyses, including target and relative functions, are reported in additional file 7. ESTs EMBL:GE592822 and EMBL:EL403073 (C. scolymus and C. tinctorius, respectively) showed a 600 bp homologous region in which a miR157a/b/c target was present. Members of this family are ubiquitous within plants, as they were also found in Fabaceae, Vitaceae, Solanaceae and now in Cardueae (this study). BLASTx analysis showed a significant similarity of these accessions to Arabidopsis cyclophillin [Refseq:NP_200679], an important protein involved in variety of cellular processes in plants, including protein trafficking and maturation, receptor signalling and complex stabilization [40, 41]. A target for miR397a was predicted on a COS region [EMBL:GE583552 and EMBL:EL386866 for C. cardunculus and C. tinctorius, respectively], similar to laccase [Refseq:NP_187533], a multi-copper containing glycoprotein, member of a multigene family present in plants as well as other organisms . Furthermore, miR398 was predicted in COS region with specificity > 0.85; comparison analysis of these ESTs [EMBL:GE585699, EMBL:GE592094, EMBL:GE589103, EMBL:GE602849] against the Arabidopsis proteome showed significant similarity to copper/zinc superoxide dismutase (Cu,Zn-SOD) [Refseq:NP_56566, Refseq:NP_001077494]. In Arabidopsis, down-regulation of Cu,Zn-SOD expression involves miR398, that directs the degradation of its mRNA at Cu++ low concentrations . Three additional miRNA families, miR397 (found in this study), miR408 and miR857 were predicted in Arabidopsis and other plants to target transcripts for the Cu++ protein plastocyanin and members of the laccase family, as shown by their accumulation in response to low Cu++ concentrations, demonstrating a microRNA-mediated down regulation . Furthermore, the four miRNAs above mentioned are known to perform important functions in Cu++ homeostasis within the plant cells . Moreover, other BLASTx analyses against the Arabidopsis proteome showed similarity of C. cardunculus with a nucleotide binding protein and to another protein with unknown function (additional file 7).
In this study, potential targets for miR390 were found within 12 ESTs of C. cardunculus, whereas only 6 were conserved in C. tinctorius COS regions. In Arabidopsis, mature ath-miR390a is obtained by two precursors [miRBase:MI0001000 and miRBase:MI0001001]. The mature sequence of ath-miR390a is ubiquitous in plants, since it is shared with other 24 species of Eudicotyledons, Monocotyledons as well as Physcomitrella patens (Embryophyta). A homologous sequence is also known in Coniferophyta (Pinus taeda) with a G→U substitution in position 11 . Ath-miR390 drives the cleavage of the non-protein-coding primary transcripts of trans-acting (ta) genes directing the formation of small-interfering ta-siRNAs. These miRNAs are involved in the cleavage of AGO proteins, with RNaseH-like activity cleaving ta-siRNAs single-stranded RNA transcripts in the region complementary to small RNAs . Two miR390 target sites (5' and 3' ta-siRNA sites) were shown to be necessary in Arabidopsis for ta-siRNAs 3 precursor RNA cleavage, dependent on a specific interaction between AGO7 and miR390, suggesting a similar putative role of miR390 in the Cardueae (additional file 7).
Within the C. cardunculus transcriptome, this pool of miRNAs conserved in COS regions was considered trustable for subsequent in vivo validation, since their identification was experimentally more probable than the vertical prevision carried out by RNAhybrid in each of the two datasets, separately.
Eight C. cardunculus EST sequences (EMBL:GE577936, EMBL:GE578910, EMBL:GE582576, EMBL:GE586807, EMBL:GE606316, EMBL:GE607689, EMBL:GE608348, EMBL:GE610548) were found as possible targets of ath-miR400 (signal to noise ratio: 26, specificity: 0.96). So far, ath-miR400 was considered specific in Arabidopsis [miRBase:MIMAT0001001 and TAIR:AT1G32582]. BLASTx analysis of these ESTs showed a significant similarity with an Arabidopsis 26 proteasome subunit-like protein [Uniprot:Q8RWF0]. In Arabidopsis, miR400 is predicted to target more than 10 genes of the pentatricopeptide repeat (PPR) protein family, characterized by tandem repeats of a degenerate 35 amino acid motif. Some proteins play a role in post-transcriptional processes within organelles, current evidence suggesting that PPR proteins bind RNA as well as other proteins [38, 48, 49].
The confirmed role played by miRNAs in both physiological and pathological processes make them an interesting object of study . Herein we followed a comparative transcriptome in silico approach that let us mining for putative miRNAs and their possible targets in globe artichoke and safflower. Using this procedure we identified 17 miRNAs in both species and after statistical analysis 79 conserved targets, found on the COS regions and belonging to 60 miRNA families, with a signal to noise ratio > 2, and ≥ 0.85 specificity, allowing for their in vivo functional analyses.
The identification of 17 ESTs codifying for miRNA precursors in Asteraceae family, showed that the rate of precursors retrieve is estimated at 0.021%, alike to the real rate found in Arabidopsis (see material and methods), highlighting the validity of the developed method; moreover catm0011, a novel miRNA predicted from sufflower , let us to retrieve a EST that codified the entire precursor (additional file 1). The complementarity between miRNAs, targets and sequence conservation in Cynara and Carthamus helped us predict regulated genes, as well. The bioinformatic approach identified ESTs whose sequences were maintained in both species as most probable targets. The occurrence and mining of COS regions from both species validated the analysed targets, suggesting that many true positives were also possibly missed. In fact, any real target identified in one species was ignored if the counterpart sequence was not found in the other species, a problem occurring when the ESTs in public databases do not represent the whole transcriptome. The presence of multiple targets within a gene agrees with the concept of cooperatively acting miRNAs. Finally, a possible outcome of our approach would be the development of bioinformatic tools detecting multi-species COS regions using publicly available ESTs, altogether with an evaluation of their level of conservation. In fact, recent data concerning miRNAs identified in safflower  confirmed the validity of the in-silico approach herein applied, since 52% of the miRNA groups identified in this study were also present in the deep sequencing data produced from different organs of this plant.
A computational approach to identify potential miRNA precursors was carried out in the EST db of C. cardunculus. Einverted program of the EMBOSS package  was employed to search in the Cynara dataset the inverted repeats longer than 30 bp. The results acquired were included in the database and used in the next step by RNAHybrid analysis, searching the putative mature miRNAs and miRNA* strand. The ESTs derived after the analysis were stored in a dedicated board in the database and used for the extraction of 200 bp region around the putative miRNA targets present in the inverted repeat. The 200 bp regions further used in the RNAfold analyses and the mfe-estimate and hairpin structure prediction have been compared with orthologous precursors from the following species Arabidopsis, Arachis hypogaea, Solanum lycopersicum, Medicago truncatula, Populus trichocarpa, Gossypium hirsutum, Zea mays, Oryza sativa, Vitis vinifera and Citrus. Furthermore we tested method accuracy on Arabidopsis EST dataset, by a BLAST analysis, using as targets 1.065.779 ESTs, and as query the Arabidopsis precursors. The analyses were conducted under stringent conditions, that consider a 90% identity, and pre-micro sequence alignment coverage ≥ 70%, in order to limit the false positive arose from the possible targets. The analysis identified 221 EST sequences, which represent a rate of 0.020%. On the same Arabidopsis EST dataset, our bioinformatic approach recognised 196 out of 221, corresponding to 0.018% rate of EST sequences present. (See also additional files 8 and 9).
The BLAST program was used to identify homologous regions within the ESTs of the species under study. In the BLASTn (e-value e10-4) analysis we considered C. cardunculus ESTs as query and C. tinctorius ESTs as target. The results obtained were stored in the relational database and used by MySQL for the extraction of COS regions longer than 400 bp (we considered 400 pb a significant value of an aligned region comparable to mean EST length in these two datasets), e-value < 10-10 and identity higher than 75%.
The RNAhybrid program was used to identify possible binding sites present in the ESTs, following specific base pairing rules. RNAhybrid finds the mfe hybridisation of two RNAs fragments with different lengths, i.e. long (ESTs) and short (mature miRNA sequences), respectively [27, 28].
The Arabidopsis mature miRNA sequences were extracted from miRNA.dat file available at miRbase release 17, 2011  including all published miRNA data in EMBL format, by means of an in house developed Perl and MySQL script. The miRNA.dat file contains 1581 miRNA hairpins precursor for the species analysed, from which we obtained 1803 mature miRNA sequences (in some cases the precursor give rises to more than one mature sequences). After comparative analysis of the mature miRNA, considering a perfect match, 953 non redundant mature miRNAs were identified. The two EST datasets were used as target sequences in forward and reverse mode, considering Arabidopsis mature miRNA sequences as query. The parameters used in the analysis were: maximum loop size: 1 nucleotide; maximum mismatch size: 1 nucleotide, overhangs: 2 nucleotides, the mfe considered for each target was higher than 70% of the minimum value assessable on perfect match between the mature miRNA and its target.
Where (TP + FP) is the number of all predicted miRNA:target relationships on authentic miRNAs versus authentic mRNAs, and FP is obtained from the randomized data.
Seedlings of globe artichoke (C. cardunculus var. scolymus L.) F1 hybrid (Harmony green variety, Nunhems, NL), were grown in a glasshouse at 22 ± 2°C, with a photoperiod of 16/8 h (light/dark). All plants were subjected to the same general care, including fertilization and watering regimes.
Total RNA was extracted from leaf (cotyledon and true leaves) of 3-weeks old globe artichoke plants, using Trizol reagent (Invitrogen, UK) followed by RNase-free DNase digestion (RNase-free DNase set and RNeasy Mini Kit, Qiagen GmbH, Germany). Samples concentration was determined by a spectrophotometer (NanoDrop©, Thermo Scientific, USA). Polyadenylation was performed as follows: 2 μg of total RNA were added to 25 μl of a reaction mixture in presence of 2.5 mM MnCl2, 80 μM ATP, 1× reaction buffer and 1 U of Escherichia coli Poly(A) Polymerase from Ambion's Poly(A) Tailing kit (Applied Biosystems, US). Subsequently, the reaction mixture was incubated for 15 min at 37°C to add a short adenine tail to the non-poly(A) low molecular weight RNA molecules . An aliquot of 4 μl (320 ng) of the polyadenylated RNA was reverse transcribed with a Poly(T) adapter in presence of High Capacity Reverse Transcription Reagents (Applied Biosystems, USA), performed at 25°C for 10 min followed by 50 min at 42°C, and finally 1 min at 85°C.
Amplification of miRNAs was carried out from 1 μl of a 1:10 dilution of the reverse transcribed reaction with a common reverse primer, Poly(T) adapter reverse, homologous to 23 nt in the Poly(T) adapter sequence, while the forward primers were chosen according to homologous sequences from miRNAs selected through the in silico analysis. Ten miRNAs were selected representing an assortment of Arabidopsis miRNA families that showed conservation in globe artichoke and safflower. EST databases with a specificity value higher than 65%. The primers design considered introduction of degenerated nucleotides to match multiple nucleotides present in the same positions in the three sequences considered. A 6-nt sequence was also included at the 5'- terminus to stabilize the amplification reaction . The reaction was carried out in a volume of 25 μl containing 200 nM primers (see table 1) dNTP, buffer. Cycling profile were 95°C for 2 min, followed by 40 cycles of 10 s at 95°C, 20 s at 55°C and 20 s at 72°C. Amplified products were cloned into pGEM-T vector and used to transform competent Escherichia coli DH5α cells. Plasmid containing inserts were used for DNA automated sequencing services (Macrogen, South Korea).
RNA-induced silencing complex
dsRNA binding protein
Expressed sequenced tags
Conserved homologous/orthologous sequences
minimum free energy
next generation sequencing
Zn-SOD: copper/zinc superoxide dismutase.
This work was partially supported by a dedicated grant from the Italian Ministry of Economy and Finance to the National Research Council. Project: "Innovazione e Sviluppo del Mezzogiorno - Conoscenze Integrate per Sostenibilità ed Innovazione del Made in Italy Agroalimentare - Legge n. 191/2009. The authors thank F. Cillo for kindly commenting the manuscript and the anonymous referees for their thorough review and highly valuable comments and suggestions, which significantly contributed to improving the quality of the paper.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 4, 2012: Italian Society of Bioinformatics (BITS): Annual Meeting 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S4.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.