VIRGO: visualization of A-to-I RNA editing sites in genomic sequences
© Distefano et al.; licensee BioMed Central Ltd. 2013
Published: 22 April 2013
Skip to main content
© Distefano et al.; licensee BioMed Central Ltd. 2013
Published: 22 April 2013
RNA Editing is a type of post-transcriptional modification that takes place in the eukaryotes. It alters the sequence of primary RNA transcripts by deleting, inserting or modifying residues. Several forms of RNA editing have been discovered including A-to-I, C-to-U, U-to-C and G-to-A. In recent years, the application of global approaches to the study of A-to-I editing, including high throughput sequencing, has led to important advances. However, in spite of enormous efforts, the real biological mechanism underlying this phenomenon remains unknown.
In this work, we present VIRGO (http://atlas.dmi.unict.it/virgo/), a web-based tool that maps Ato-G mismatches between genomic and EST sequences as candidate A-to-I editing sites. VIRGO is built on top of a knowledge-base integrating information of genes from UCSC, EST of NCBI, SNPs, DARNED, and Next Generations Sequencing data. The tool is equipped with a user-friendly interface allowing users to analyze genomic sequences in order to identify candidate A-to-I editing sites.
VIRGO is a powerful tool allowing a systematic identification of putative A-to-I editing sites in genomic sequences. The integration of NGS data allows the computation of p-values and adjusted p-values to measure the mapped editing sites confidence. The whole knowledge base is available for download and will be continuously updated as new NGS data becomes available.
RNA Editing is a type of post-transcriptional modification that takes place in eukaryotes. It alters the sequence of primary RNA transcripts by deleting, inserting or modifying residues. Several forms of RNA editing have been discovered including A-to-I, C-to-U, U-to-C and G-to-A. Here we focus on A-to-I editing (Adenosine-to-Inosine), the most frequent and common one . Adenosine (A) deamination produces its conversion into inosine (I), which, in turn, is interpreted by both the translation machinery and the splicing machinery  as guanosine (G). Since inosine binds cytosine (C), the A-U base pairs in the secondary structure are changed into I:U mismatches . This biological phenomenon is catalyzed by enzymes members of the Adenosine Deaminase Acting on RNA (ADAR) family and occurs only on dsRNA structures [1, 4, 5].
The A-to-I RNA editing may be either promiscuous or specific. The promiscuous RNA editing occurs within long duplexes [6, 7], while specific RNA editing A-to-I occurs within shorter duplex regions, often formed by an exon and an intron sequence . Moreover, it has been reported that A-to-I RNA editing can target both exonic and intronic regions as well as 5' and 3'-UTRs regions. This can have different consequences in the biogenesis of mRNA [2, 9], the translation , the mRNA export from the nucleus to the cytoplasm [10, 11], and the degradation of I-containing mRNA molecules . In the last few years, it has been reported that RNA editing may occur in small noncoding RNA molecules in particular within precursor-tRNA  and pri-miRNAs [14, 15]. It has been estimated that ~ 16% of these sequences undergo A-to-I editing , influencing the pri-miRNA's maturation process  and, consequently, the recognition of binding sites on target mRNAs [17–19].
It is well known that the activity of RNA editing is higher in mRNAs of mammalian brain than other tissues  and this leads to the assumption that editing plays a crucial role in the central nervous system . Therefore, malfunctions of ADARs could lead to serious consequences, in particular it has been observed that an imbalance of ADAR expression/activity induces a variety of human diseases .
A common approach to identify putative A-to-I editing sites relies on the alignment of the cloned cDNA gene sequence to its genomic sequence highlighting A-to-G mismatches. Recent literature reports different screenings designed to detect A-to-I RNA editing sites in human, especially in ALU-type repetitive elements located also in UTRs regions [6, 7, 22–25]. Li et al.  presented an unbiased assay to select more than 36, 000 computationally predicted non-repetitive A-to-I sites. The sites were detected using amplified and sequenced padlock probes. The authors used cDNA and gDNA from several tissues and derived from a single individual. These methods led to the discovery of thousands of ADAR substrates which may help clarify the function of A-to-I RNA editing on the regulation of gene expression and quantify the impact of A-to-I editing on transcriptome and proteome diversity. Eggington et al.  provide a web-based application which predicts editing sites in dsRNA of any sequence using Sanger sequencing protocols to perform a more accurate quantitative analysis. More recently, in contrast to the previous approaches, new methods, based on Next Generation Sequencing (NGS) data, have been developed to identify A-to-I editing sites [28–32]. These new approaches have allowed the detection of novel editing sites within coding and non-coding genes . On the other hand they produced a high number of false editing sites, since the NGS technology is prone to error .
Few systems are available on the web. dbRES  was the first web-oriented database for annotated RNA editing sites, but the last update goes back to 2007 and contains only a few dozen of human editing sites. More recently, Kiran and Baranov created DARNED , the largest database of human RNA editing sites providing a centralized access to published data. RNA editing locations are mapped on the reference human genome. DARNED is periodically updated and contains more than 300,000 editing sites, but no statistical significance is provided. In 2011, Picardi et al. presented Expedit . It is a web application that maps data and, given individual sequence reads as input, executes a comparative analysis against DARNED editing sites. No statistical significance of results is given.
In this work, we present VIRGO (Visualization of A-to-I RNA editing sites into GenOmic sequences, http://atlas.dmi.unict.it/virgo/), a knowledge-base equipped with a web-interface allowing users to map putative and known A-to-I editing sites into gene regions (including coding sequences, introns, and UTRs). We consider as putative editing sites A-to-G mismatches between genomic and EST sequences, while known A-to-I editing sites are obtained from DARNED.
In particular, the VIRGO knowledge-base has been created by matching all the human genes regions obtained from UCSC (hg19) to the EST database using filters and NGS data. The filters allow the selection of candidate editing events in clusters , lying in repeated and double strand regions and not classified as SNPs. Moreover, VIRGO locally maps all the editing events stored in DARNED. This feature allows the visualization of all DARNED editing sites through the VIRGO web interface. Finally, VIRGO uses the DARNED editing sites for which NGS information is available to compute the expected frequencies of A to G substitution that can happen in a mismatch aligned column. This knowledge is then used to compute p-values for all VIRGO editing events for which NGS information is available.
The VIRGO web interface allows annotation of genomic sequences, provided by users, known editing sites and those sites passing the filters described above.
VIRGO is a knowledge base that integrates information retrieved from specialized biological databases. The core of the system has been developed in C++, while the front-end consists of a web interface developed in PHP.
The data integration process implemented in VIRGO consists of a sequence of steps carried out to identify putative A-to-I editing sites (see Figure 1). The database construction, which has been done offline, includes six steps. All filters are mandatory, therefore, a site that does not pass one of such steps is discarded. The last step is applied only when mismatches align with the NGS reads.
The steps are described below.
Step 1. We downloaded the whole set of human genes from UCSC (http://genome.ucsc.edu/buildGRCCh37/hg19). Then, using BLASTN, we aligned all the genes with NCBI EST database. Although this step is very time consuming, it allows us to identify all the potential A-to-I editing sites. VIRGO creates an initial database by selecting the A-G mismatches between the genes and the EST sequences.
Step 3. VIRGO partitions the selected mismatches in three categories. To achieve that, we label the genes as falling in ALU regions (T0), in repeat regions (T1), and in non repeat region (T2).
Step 4. VIRGO verifies whether mismatches (from all the classes created above) occur into double-stranded regions. For this purpose we applied a technique already used in [6, 36] for the prediction of the double strand portion of a RNA secondary structure. It creates a short reverse complementary sequence centered on each mismatch by retrieving upstream and downstream flanking nucleotides. Then it searches for the constructed reverse complementary sequence into the gene where the mismatch has been found. In particular, when a mismatch occurs into an ALU repetitive region the length of the short complementary sequence is equal to the length of the ALU region. Otherwise, the length of the short sequence is equal to 251 nucleotides including the mismatch.
Step 5. VIRGO, uses the database All SNPs(135) contained in UCSC, to filter the mismatches that are already classified as SNPs.
Step 6. VIRGO performs an alignment of the genes with a subset of NGS data taken from the following experiments: SRP002274 - GSE19166 (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP002274) and SRP007465 (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP007465).
The subset of short reads is constructed as follows. Alignment of human genome with short reads is performed by BOWTIE . In order to reduce noise, only the best alignments with at most two mismatches by using -a and -v parameters are accepted. By specifying -a, VIRGO instructs BOWTIE to report all valid alignments, subjected to the alignment policy -v 2 (at most two mismatches are allowed).
The selected short reads are mapped on each VIRGO mismatch, selecting those mismatches occurring into at least five short reads. This alignment allows to compute, for some of the editing events, p-value and adjusted p-value yielding the confidence that the candidate mismatch is not a false positive.
The significance of those mismatches for which it was not possible to compute the p-values was annotated as unknown.
Finally, p-values have been adjusted applying FDR correction for testing multiple hypotheses, with α = 0.01. Each p-value is periodically updated by using new NGS experiments.
VIRGO aims to be an efficient and user-friendly system, providing an interface by which users can analyze and visualize their data, and export results into xml and txt files.
The central purpose of VIRGO is to provide users with a periodically updated system which stores high-quality candidate editing sites. This will allow users to quickly and easily identify whether their genomic sequences are subject to A-to-I RNA Editing.
The sites identified by VIRGO are marked with different colors (yellow, orange, red, purple) according to the Number of Aligned ESTs (NAEs. The colors with respect to the NAEs are: (yellow)1 ≤ NAE ≤ 5, (orange)5 <NAE ≤ 10, (red) 10 <NAE ≤ 20, (fuxia) NAE ≥ 20). They are placed at the bottom of sequences (see number 2 in Figure 5). By clicking on a blue marker, VIRGO shows the following information: chromosome, genomic position, strand, p-value, tissue/organ (if known), if it is a SNP and the PUBMED resources.
Markers relative to newly predicted sites will give information on chromosome, genomic position, strand, and p-value. When a mismatch occurs inside a repeat region, its start/end genomic position, strand, chromosome, name, class and family will be given. The list of EST sequences in which the mismatch occurs is given. For each EST sequence, VIRGO shows the EST name, tissue and organ (if known), the alignment between the input gene and EST sequence, and the NCBI information. The list of isoforms where the mismatch occurs is also provided. For each isoform, information such as the refSeq ID, chromosome, strand, starting and ending genomic position, among others, are provided (see number 3, 4 and 5 in Figure 5). Finally, the results of the analysis will be stored into the server for 5 days and then removed.
RNA Editing is an important post-transcriptional mechanism which contributes to the diversity of transcriptome. It alters the sequence of primary RNA transcripts by deleting, inserting or modifying residues. Here we focus on A-to-I editing (Adenosine-to-Inosine), the most frequent and common one. The main goal of VIRGO is to provide a simple system aiming to identify known and putative A-to-I RNA editing sites into user provided genomic sequences. By exploiting NGS data, VIRGO is able to compute, for each predicted editing site, a p-value to measure the confidence of the prediction. Predictions can be downloaded in xml and txt format. Finally, the whole VIRGO database can be downloaded and used in third party applications.
Number of Aligned EST.
We would like to thank the anonymous reviewers for their valuable comments. Funding: Valentina Macca has been supported by a fellowship sponsored by "Associazione Sclerosi Tuberosa", A.S.T.
The publication costs for this article were funded by PO - FESR 2007-2013 grant - CUP: G23F11000840004 -Title: "BIOWINE".
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 7, 2013: Italian Society of Bioinformatics (BITS): Annual Meeting 2012. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S7
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.