A new advance in alternative splicing databases: from catalogue to detailed analysis of regulation of expression and function of human alternative splicing variants
© de la Grange et al; licensee BioMed Central Ltd. 2007
Received: 11 December 2006
Accepted: 04 June 2007
Published: 04 June 2007
Most human genes produce several transcripts with different exon contents by using alternative promoters, alternative polyadenylation sites and alternative splice sites. Much effort has been devoted to describing known gene transcripts through the development of numerous databases. Nevertheless, owing to the diversity of the transcriptome, there is a need for interactive databases that provide information about the potential function of each splicing variant, as well as its expression pattern.
After setting up a database in which human and mouse splicing variants were compiled, we developed tools (1) to predict the production of protein isoforms from these transcripts, taking account of the presence of open reading frames and mechanisms that could potentially eliminate transcripts and/or inhibit their translation, i.e. nonsense-mediated mRNA decay and microRNAs; (2) to support studies of the regulation of transcript expression at multiple levels, including transcription and splicing, particularly in terms of tissue specificity; and (3) to assist in experimental analysis of the expression of splicing variants. Importantly, analyses of all features from transcript metabolism to functional protein domains were integrated in a highly interactive, user-friendly web interface that allows the functional and regulatory features of gene transcripts to be assessed rapidly and accurately.
In addition to identifying the transcripts produced by human and mouse genes, fast DB http://www.fast-db.com provides tools for analyzing the putative functions of these transcripts and the regulation of their expression. Therefore, fast DB has achieved an advance in alternative splicing databases by providing resources for the functional interpretation of splicing variants for the human and mouse genomes. Because gene expression studies are increasingly employed in clinical analyses, our web interface has been designed to be as user-friendly as possible and to be readily searchable and intelligible at a glance by the whole biomedical community.
Human genes are transcribed as messenger RNA precursor molecules (pre-mRNAs), which are composed of short exons separated by much longer introns. The introns are removed during the splicing process, which gives rise to mature mRNAs containing only exons. Genome-wide analyses indicates that most (up to 70%) human genes generate different transcripts with different exon contents by using alternative promoters, alternative polyadenylation sites and alternative splice sites [1–3]. About 40% of human genes produce at least five different splicing variants (SVs) and up to 10% of them produce more than 10 alternate transcripts each [4, 5]. With such diversity and complexity, it becomes difficult to apprehend the role and impact of the transcriptome in terms of cellular function. A first step was attained by several large-scale studies showing that 75% of alternative splicing (AS) events occur in translated regions of mRNAs and have consequences at the protein level. The resulting changes in amino acid sequence can alter the binding properties of proteins, influence their intracellular location, and modify their enzymatic activity and/or stability . This leads to a gain, an alteration, or even a complete loss of function. Importantly, although a given transcript can be predicted to produce a particular protein isoform, its translation can be inhibited by several mechanisms or it can be degraded. These mechanisms include nonsense-mediated mRNA decay (NMD) [6–8] and targeting of microRNAs (miRNA) [9–11]. The NMD pathway leads to the degradation of mRNAs that contain a premature stop codon, i.e. a stop codon more than 50 nucleotides upstream of any exon/exon junction [6, 7]. MiRNAs are 21- to 25-nucleotide-long RNAs that can either induce the degradation or suppress the translation of mRNAs, depending on whether they match their targets perfectly or approximately . Several databases based on different algorithms able to predict miRNA targets have been constructed [12–14]. In addition to considering the production of protein isoforms from SVs, biologists are interested in their expression patterns. Global gene expression, promoter use and SV expression can be specific to a tissue or a group of tissues [15, 16]. Therefore, to apprehend transcriptome diversity and complexity, it is now necessary to develop interactive databases that provide tools for predicting the functions of SVs as well as their tissue expression patterns.
Having compiled human and mouse splicing variants in a database named fast DB , we now provide resources for functional interpretation of SVs. Indeed, this new release of fast DB provides a predicted open reading frame (ORF) for each known transcript, and helps predict the functional consequences of AS events by an analysis of protein domains encoded by alternative exons. Moreover, fast DB now predicts whether transcripts are potential targets for an NMD pathway and/or miRNA. Finally, this new version of fast DB provides tools for studying the expression and regulation of SVs. In particular, fast DB now provides tissue distribution charts of each gene and AS event to help study the tissue specificity of AS events.
Construction and content
Definition of exons and AS events
Genomic sequence and orthologous relationships
We recovered human genomic sequences of the 22,218 "protein coding" genes from the homo_sapiens_core_31_35d EnsEMBL database . Mouse genomic sequences came from the mus_musculus_core_35_34d EnsEMBL database. We used the ensemble_compara_36 database to associate each human gene with its orthologous mouse gene.
To define exon/intron structures and alternative splicing events, we aligned transcript sequences against genomic sequence using sim4 . "Full length" mRNAs and expressed sequence tags (ESTs) came from the UCSC website as available in January 2006  and "partial" mRNAs were downloaded from the NCBI website . Transcripts were selected using very stringent criteria (see the fast DB documentation).
Definition of "genomic exons" and AS events
We defined a "genomic exon" as the most frequently-occurring transcript exon at a given genomic position. Alternative events were defined by comparing transcript exons with the corresponding genomic exons (see the fast DB documentation). Statistics of AS events in fast DB are available on the fast DB website.
Known coding sequences
To display information about known translation product(s), fast DB shows the known coding sequence(s) (CDS) for most human genes. We downloaded data from the Consensus CDS database (CCDS) as available on March 2, 2005 . Using EnsEMBL transcript accession (e.g. ENST00000343008), we associated 12,201 fast DB human genes with at least one CDS from the CCDS database, which contains CDS from 13,142 human genes. On average, each fast DB human gene was associated with 1.1 CDS from the CCDS database.
Prediction of open reading frames
To help predict the impact of AS on protein function, fast DB presents interactive graphical representations of ORFs for most human and mouse cDNAs. In some cases, two ORFs are predicted for the same SV (Figure 2, items 5 of [GenBank:BC063849]). We used the "getorf" program from the EMBOSS package  to find all ORFs of at least 120 nucleotides ("getorf -minsize 120 -reverse No file_in file_out"). We then selected the two longest ORFs. The first of these (blue ORFs in Figure 2, items 5) corresponds to the ORF covering the greatest number of transcript exons. If there are AS events before or within the first exon covered by the first selected ORF, fast DB displays the second (red ORF of the [GenBank:BC063849] transcript in Figure 2, item 5). Using this algorithm, fast DB provides ORF predictions for 111,893 human cDNAs (99.4% of the 112,609 human cDNAs) and 76,391 mouse cDNAs (97.7% of the 78,170 mouse cDNAs). Two predicted ORFs are displayed for 8,450 (7.5%) human cDNAs and for 4,780 (6.1%) mouse cDNAs.
Prediction of nonsense-mediated mRNA decay
To predict whether a given transcript is a target for the NMD pathway, we calculated nucleotide length between the stop position (on the transcript sequence) and the position of the last exon-exon junction. If this length is greater than 50, the corresponding SV is predicted to be targeted by NMD. Fast DB labels each predicted ORF with an "ORF" flag, or with an "NMD" flag if NMD targeting is predicted. Among all human cDNAs, 13% (14,667 of 112,609) have at least one transcript labeled "NMD" [14,667 (31%) of 47,923 of alternate human cDNAs].
Prediction of microRNA/transcript interaction sites
To predict whether a given exon is a potential target for a miRNA, we downloaded a file with all predicted miRNA/transcript interaction sites from the miRBase Targets database version 2.0 . This file contains miRNA names, chromosome names and chromosomal positions. From these positions, we realigned the miRNA sequences with corresponding genomic sequences using the miRanda program . Interaction sites with a maximum energy of -19 were stored in the fast DB MySQL database . Each transcript sequence within the alignment region was aligned with the genomic sequence using Clustalw , and differences between genomic and transcript sequences were highlighted in red in order to predict whether they potentially affect miRNA/transcript interaction. A total of 78,903 miRNA/transcript interaction sites were predicted between 11,301 human genes (63% of the 18,008 human genes) and 413 microRNAs.
Association of transcript with tissue
To provide a tissue-specificity analysis of each AS event, we associated each human transcript (cDNAs and ESTs) with the tissue from which it had been cloned. Where information was available, we recovered the name of the transcript library from CGAP , and where the library was associated with a tissue, we associated tissue with transcript using a keyword search system written in Perl  among a collection of 36 tissues and groups of tissues (see the fast DB documentation). If a transcript was not associated with a library, or if its library was not associated with a tissue, we used our keyword search system on the "tissue_type" field of the transcript GenBank file (where this field was available). Using this algorithm, 87% of fast DB transcripts were associated with one of the 36 tissues. Among the 1,154,554 transcripts stored in the fast DB database, 875,479 (76%) were associated with a tissue using the library information and 132,740 (11%) were associated with a tissue using their "tissue_type" information. A histogram of tissue distribution of all the fast DB cDNAs and ESTs is available in the fast DB documentation and on the fast DB website (Figure 5A, item 1).
Tissue distribution histogram of all gene transcripts
To enable the expression pattern of a given gene to be visualized clearly, fast DB displays a histogram of the tissue distribution of all its transcripts. Once the transcripts had been associated with the tissue from which they were cloned, we used the Perl module GD::Graph to draw this histogram. Transcripts not associated with a tissue were placed in the "n/a" group. Only tissues in which transcripts are expressed were represented on the chart.
Tissue distribution histogram of gene transcripts for a specific event
To go further, fast DB provides a histogram of the tissue distribution of transcripts for each transcriptional and splicing event.
Analysis of alternative first exons
All the different alternative first exons were represented on the same chart. For each alternative first exon, we identified the transcripts for which this event was defined. All 5'-partial transcripts were excluded from this study, i.e. transcripts for which no alternative first exon was defined (see the fast DB documentation).
Analysis of alternative terminal exons
As for the alternative first exons, all the different alternative terminal exons were represented on the same chart. For each alternative terminal exon, we identified the transcripts for which this event was defined. All 3'-partial transcripts were excluded from this study, i.e. transcripts for which no alternative terminal exon was defined (see the fast DB documentation).
Several groups of transcripts were represented on the chart. The first group comprises transcripts for which exon skipping is defined (see the fast DB documentation). The next group comprises transcripts that include the corresponding exon(s). The last group can be divided into subgroups according to splicing events defined as adjacent to the event studied. For this purpose, we defined a pair of values corresponding to the positions of splice sites flanking the skipped exon(s) (Figure 5B, items 2). For each pair of values, a group was represented on the chart, in addition to the group of transcripts for which exon skipping was defined.
As for exon skipping, several groups of transcripts were represented on the chart. The first group consists of transcripts for which intron retention is defined (see the fast DB documentation). The other consists of transcripts that splice introns (genomic or nongenomic introns), which are included in transcripts defining the alternative event. For a single chosen intron, four values were defined (corresponding to the splice sites of the exons flanking the chosen intron). For each group of values, a transcript group was represented on the chart, in addition to the group of transcripts for which intron retention was defined.
Alternative 3'and 5' splice sites
Several distinct groups of transcripts were represented on the chart. Each group corresponds to a different pair of acceptor/donor splice sites (see the fast DB documentation).
Of the 18,008 human genes present in our database, 11,071 (61%) were estimated to undergo AS. This result is consistent with other studies, which estimate that 40–75% of human genes are alternatively spliced [1–3, 28, 29]. Using 112,609 cDNAs mapped to the human genome, we defined 5,663 cassette exons, 3,940 intron retentions, 6,483 alternative acceptor sites and 5,551 alternative donor sites. Finally, 26% and 22% of the fast DB genes present alternative first and alternative terminal exons, respectively. These results are highly consistent with those of other databases [5, 30–35]. More detailed statistics can be found on the fast DB website.
Prediction of protein isoform production from gene transcripts in fast DB
Prediction of alternative coding sequence functions
Prediction of no protein production: NMD pathway and miRNA targeting
It is now recognized that many transcripts are not translated, or are poorly translated, owing to rapid degradation and/or translational repression through the NMD pathway and/or miRNA targeting. It is therefore important to predict whether transcripts are likely to be subject to such regulations. In addition, to diversify the proteome by changing the protein domain composition, AS may modulate gene expression by generating NMD-targeted variants. Therefore, in fast DB, transcripts that are predicted to be targeted by NMD (the stop codon is located more than 50 nucleotides upstream of the most downstream exon/exon junction) are indicated and labeled by an "NMD" close to their stop codon (Figure 2, items 8).
By analyzing 18,008 genes in fast DB, we found that 14,667 (31%) of 47,923 SVs were potential targets for the NMD pathway, which is consistent with a previous report . However, this number is an underestimate because mRNAs targeted by the NMD pathway are certainly underrepresented in cDNA databanks owing to their low level of expression and to the high proportion of 3'-partial transcripts potentially targeted by the NMD pathway.
Advanced analysis of the regulation of transcript expression in fast DB
Prediction of tissue distribution and regulatory features
The expression of SVs may depend on cell type, and some SVs are only present in a specific tissue or group of tissues . For each human gene, fast DB provides the tissue distribution of its transcripts (cDNAs and ESTs) (Figure 5A, item 3). Furthermore, for each alternative promoter, alternative polyadenylation site and AS event, fast DB provides the tissue distribution of the SVs defining the event (Figure 5B). For example, two distinct exon skipping events were defined for the human RANBP9 gene: skipping of exon 5 and skipping of exons 6 to 8. All AS events are listed on the right side of the "tissue specificity" page (Figure 5A, item 3). By clicking on a given event (e.g. skipping of exon 5), fast DB displays the tissue distribution histogram for the SVs that contain or do not contain exon 5 (in blue and red, respectively). The group of SVs containing exon 5 is divided into two subgroups according to splicing events defined adjacently to the studied event; the first subgroup corresponds to SVs that include exon 5 without defining other events (light blue), and the second corresponds to SVs that include exon 5 but do not include exons 6 to 8 (dark blue). These different subgroups of SVs are schematically represented on the left side of the page (Figure 5B, item 1). The tissue distribution chart shows that skipping of RANBP9 exon 5 occurs specifically in muscular (and possibly nervous) tissue(s) (Figure 5B, item 3). A table with the number of transcripts defining the events in the different tissues is presented (Figure 5B, item 4).
Tools to assist in experimental analysis of SV expression
Finally, to assist the interpretation of microarray results, the "probe alignment" tool graphically provides the location of any sequence within the gene exon/intron structure, such as probes used in microarrays. Input sequences can be located within an exon, an intron, an exon/intron or an intron/exon junction, or an exon/exon junction. In all cases, the input sequences must be at least 20 nucleotides long, but in the case of an exon/exon junction they must cover at least 16 nucleotides of each exon. The example displayed in Figure 7B corresponds to the alignment of all Affymetrix exon-array core-level probes for the human RANBP9 gene (69 sequences, each 25 nucleotides long). Each of these probes is represented on the chart (Figure 7B, item 1), and their genomic positions, percentage identities with the genomic sequence and alignment lengths are displayed in a table under the diagram. After clicking on a given probe on the scheme, corresponding data in the table are highlighted in red (Figure 7B, item 2), and the corresponding probe alignment with the genomic sequence is provided at the bottom of the page (Figure 7B, item 3).
Forum and documentation
To provide interactive help in using fast DB, we set up a forum in the "Forum/Documentation" section where users are invited to post their comments. Furthermore, we developed a quick help section by assembling short explanations and legends of fast DB charts. A complete fast DB documentation is available in PDF and HTML formats in the "Forum/Documentation" section.
In addition to describing the transcripts produced by human and mouse genes, as already reported , the new release of fast DB now provides tools that analyze the putative function of these transcripts and the regulation of their expression and therefore achieves an advance in AS databases by allowing SVs for human and mouse genomes to be interpreted functionally. To do that, fast DB integrates information and tools for predicting the functional consequences of AS. Several other databases have assembled protein information or information regarding AS. To try to correlate AS events with their functional consequences, some AS databases integrate protein sequences to annotate splicing events [47, 48], while others indicate translational product start positions, end positions and amino acid sequences corresponding to some SVs [5, 31, 49–51]. The SpliceNest database even shows all 6-frame predicted ORFs . However, none of these databases integrates tools for predicting the functional consequences of a given AS event. To the best of our knowledge, fast DB is the first freely-available system to offer direct links for making interactive predictions of functional protein domains from alternative exons. The next objective in the development of fast DB will be to integrate structural protein domain analysis, as this kind of domain has been shown to be altered by AS [2, 53].
Furthermore, the translation of a given transcript can be inhibited or degraded through the NMD pathway and/or miRNA targeting. Therefore, transcripts that are predicted to be targeted by the NMD pathway and/or miRNA are indicated in fast DB. Only two databases containing NMD data have previously been available. The first, called "NMD database", comprises yeast data based on Affymetrix chip analyses with several UPF gene deletion conditions. The second is the ASTRA database , which indicates transcripts potentially targeted by NMD. However, ASTRA only contains 5,751 human genes and NMD prediction is constructed only from full-length cDNA sequences from UniGene. Several other databases contain information on miRNA targets, but fast DB is the only AS database that integrates the miRNA target feature in its transcript catalogue.
To help predict the tissue specificity of AS, fast DB provides a tissue distribution chart for each splicing event. Several studies have identified tissue-specific AS events [15, 54], and some AS databases present EST tissue-expression histograms [49, 50]. ASAP even provides lists of tissue-specific AS events by comparing the number of ESTs in different tissues . However, fast DB is the only tool that provides a tissue distribution histogram for each splicing and transcriptional event. A limitation of this bioinformatic approach based on EST is that the number of ESTs varies considerably among tissues.
In addition to providing tools for predicting the translation product, function and expression pattern of each SV, fast DB offers tools to assist in their experimental analysis. The "in silico PCR" option enables the user to design primers for PCR amplification easily. Furthermore, the "probe alignment" tool provides a clear visualization of the genomic alignment of any input sequences, such as probes used in microarrays. In particular, the fast DB probe alignment tool allows each Affymetrix "probe selecting region" to be graphically associated with a given region (i.e. either constitutive exon or alternative region).
Much effort has been devoted to setting up the most complete transcript catalogue to date. This work is an ongoing project and researchers must sustain their effort by providing new SVs to fill public databanks. In order to apprehend the role and impact of this transcriptome diversity in gene function, biologists need tools that provide information about the potential function of each SV, as well as its expression pattern. In addition, the scientific community using genomic/transcriptomic databases is increasing in size and diversity, not least because splicing deregulation is involved in many diseases , and the use of gene expression studies is continually growing in clinical analyses. Therefore, bioinformatic web interfaces have to be as user-friendly as possible so that they are readily searchable and intelligible at a glance by the whole biomedical community. We think that fast DB has reached the goal of providing a large number of bioinformatic tools that facilitate the study of the regulation of human gene product expression and of integrating these tools in a user-friendly, attractive and interactive web interface.
Availability and requirements
We thank Dr. Sebastien Jauliac and the members of his team for their comments and suggestions. We thank the fast DB users for their support and comments. This work was financially supported by INSERM and the "Association Française contre les Myopathies".
- Brett D, Pospisil H, Valcarcel J, Reich J, Bork P: Alternative splicing and genome complexity. Nat Genet 2002, 30(1):29–30. 10.1038/ng803View ArticlePubMedGoogle Scholar
- Stamm S, Ben-Ari S, Rafalska I, Tang Y, Zhang Z, Toiber D, Thanaraj TA, Soreq H: Function of alternative splicing. Gene 2005, 344: 1–20. 10.1016/j.gene.2004.10.022View ArticlePubMedGoogle Scholar
- Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S: Increase of functional diversity by alternative splicing. Trends Genet 2003, 19(3):124–128. 10.1016/S0168-9525(03)00023-4View ArticlePubMedGoogle Scholar
- de la Grange P, Dutertre M, Martin N, Auboeuf D: FAST DB: a website resource for the study of the expression regulation of human gene products. Nucleic Acids Res 2005, 33(13):4276–4284. 10.1093/nar/gki738PubMed CentralView ArticlePubMedGoogle Scholar
- Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA: ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 2006, 34(Database issue):D46–55. 10.1093/nar/gkj031PubMed CentralView ArticlePubMedGoogle Scholar
- Hillman RT, Green RE, Brenner SE: An unappreciated role for RNA surveillance. Genome Biol 2004, 5(2):R8. 10.1186/gb-2004-5-2-r8PubMed CentralView ArticlePubMedGoogle Scholar
- Lejeune F, Maquat LE: Mechanistic links between nonsense-mediated mRNA decay and pre-mRNA splicing in mammalian cells. Curr Opin Cell Biol 2005, 17(3):309–315. 10.1016/j.ceb.2005.03.002View ArticlePubMedGoogle Scholar
- Lewis BP, Green RE, Brenner SE: Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc Natl Acad Sci U S A 2003, 100(1):189–192. 10.1073/pnas.0136770100PubMed CentralView ArticlePubMedGoogle Scholar
- Bentwich I: Prediction and validation of microRNAs and their targets. FEBS Lett 2005, 579(26):5904–5910. 10.1016/j.febslet.2005.09.040View ArticlePubMedGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol 2005, 3(3):e85. 10.1371/journal.pbio.0030085PubMed CentralView ArticlePubMedGoogle Scholar
- Pillai RS: MicroRNA function: multiple mechanisms for a tiny RNA? Rna 2005, 11(12):1753–1761. 10.1261/rna.2248605PubMed CentralView ArticlePubMedGoogle Scholar
- Sethupathy P, Corda B, Hatzigeorgiou AG: TarBase: A comprehensive database of experimentally supported animal microRNA targets. Rna 2006, 12(2):192–197. 10.1261/rna.2239606PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 2006, 34(Database issue):D140–4. 10.1093/nar/gkj112PubMed CentralView ArticlePubMedGoogle Scholar
- Hsu PW, Huang HD, Hsu SD, Lin LZ, Tsou AP, Tseng CP, Stadler PF, Washietl S, Hofacker IL: miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes. Nucleic Acids Res 2006, 34(Database issue):D135–9. 10.1093/nar/gkj135PubMed CentralView ArticlePubMedGoogle Scholar
- Xu Q, Modrek B, Lee C: Genome-wide detection of tissue-specific alternative splicing in the human transcriptome. Nucleic Acids Res 2002, 30(17):3754–3766. 10.1093/nar/gkf492PubMed CentralView ArticlePubMedGoogle Scholar
- Yeo G, Holste D, Kreiman G, Burge CB: Variation in alternative splicing across human tissues. Genome Biol 2004, 5(10):R74. 10.1186/gb-2004-5-10-r74PubMed CentralView ArticlePubMedGoogle Scholar
- Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Graf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, Parker A, Proctor G, Prlic A, Rae M, Rios D, Redmond S, Schuster M, Sealy I, Searle S, Severin J, Slater G, Smedley D, Smith J, Stabenau A, Stalker J, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Hubbard TJ: Ensembl 2006. Nucleic Acids Res 2006, 34(Database issue):D556–61. 10.1093/nar/gkj133PubMed CentralView ArticlePubMedGoogle Scholar
- Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W: A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res 1998, 8(9):967–974.PubMed CentralPubMedGoogle Scholar
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res 2006, 34(Database issue):D590–8. 10.1093/nar/gkj144PubMed CentralView ArticlePubMedGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005, 33(Database issue):D54–8. 10.1093/nar/gki031PubMed CentralView ArticlePubMedGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 2000, 16(6):276–277. 10.1016/S0168-9525(00)02024-2View ArticlePubMedGoogle Scholar
- microRNA targets[http://www.microrna.org]
- Aiyar A: The use of CLUSTAL W and CLUSTAL X for multiple sequence alignment. Methods Mol Biol 2000, 132: 221–241.PubMedGoogle Scholar
- Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 2003, 302(5653):2141–2144. 10.1126/science.1090100View ArticlePubMedGoogle Scholar
- Modrek B, Resch A, Grasso C, Lee C: Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res 2001, 29(13):2850–2859. 10.1093/nar/29.13.2850PubMed CentralView ArticlePubMedGoogle Scholar
- Kim N, Alekseyenko AV, Roy M, Lee C: The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Res 2007, 35(Database issue):D93–8. 10.1093/nar/gkl884PubMed CentralView ArticlePubMedGoogle Scholar
- Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O: Automated classification of alternative splicing and transcriptional initiation and construction of visual database of classified patterns. Bioinformatics 2006, 22(10):1211–1216. 10.1093/bioinformatics/btl067View ArticlePubMedGoogle Scholar
- Holste D, Huo G, Tung V, Burge CB: HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res 2006, 34(Database issue):D56–62. 10.1093/nar/gkj048PubMed CentralView ArticlePubMedGoogle Scholar
- Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda J, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo Mde F, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, Gopinath GR, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, Otsuki T, Piatier-Tonneau D, Poustka A, Ren SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Schupp I, Servant F, Sherry S, Shiba R, Shimizu N, Shimoyama M, Simpson AJ, Soares B, Steward C, Suwa M, Suzuki M, Takahashi A, Tamiya G, Tanaka H, Taylor T, Terwilliger JD, Unneberg P, Veeramachaneni V, Watanabe S, Wilming L, Yasuda N, Yoo HS, Stodolsky M, Makalowski W, Go M, Nakai K, Takagi T, Kanehisa M, Sakaki Y, Quackenbush J, Okazaki Y, Hayashizaki Y, Hide W, Chakraborty R, Nishikawa K, Sugawara H, Tateno Y, Chen Z, Oishi M, Tonellato P, Apweiler R, Okubo K, Wagner L, Wiemann S, Strausberg RL, Isogai T, Auffray C, Nomura N, Gojobori T, Sugano S: Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2004, 2(6):e162. 10.1371/journal.pbio.0020162PubMed CentralView ArticlePubMedGoogle Scholar
- Le Texier V, Riethoven JJ, Kumanduri V, Gopalakrishnan C, Lopez F, Gautheret D, Thanaraj TA: AltTrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics 2006, 7: 169. 10.1186/1471-2105-7-169PubMed CentralView ArticlePubMedGoogle Scholar
- Gustincich S, Sandelin A, Plessy C, Katayama S, Simone R, Lazarevic D, Hayashizaki Y, Carninci P: The complexity of the mammalian transcriptome. J Physiol 2006, 575(Pt 2):321–332. 10.1113/jphysiol.2006.115568PubMed CentralView ArticlePubMedGoogle Scholar
- Touriol C, Bornes S, Bonnal S, Audigier S, Prats H, Prats AC, Vagner S: Generation of protein isoform diversity by alternative initiation of translation at non-AUG codons. Biol Cell 2003, 95(3–4):169–178. 10.1016/S0248-4900(03)00033-9View ArticlePubMedGoogle Scholar
- Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P: SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 2006, 34(Database issue):D257–60. 10.1093/nar/gkj079PubMed CentralView ArticlePubMedGoogle Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH: InterPro, progress and status in 2005. Nucleic Acids Res 2005, 33(Database issue):D201–5. 10.1093/nar/gki106PubMed CentralView ArticlePubMedGoogle Scholar
- Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 2003, 31(13):3635–3641. 10.1093/nar/gkg584PubMed CentralView ArticlePubMedGoogle Scholar
- Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A: Pfam: clans, web tools and services. Nucleic Acids Res 2006, 34(Database issue):D247–51. 10.1093/nar/gkj149PubMed CentralView ArticlePubMedGoogle Scholar
- Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S, Kahn D: The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 2005, 33(Database issue):D212–5. 10.1093/nar/gki034PubMed CentralView ArticlePubMedGoogle Scholar
- Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ: The PROSITE database. Nucleic Acids Res 2006, 34(Database issue):D227–30. 10.1093/nar/gkj063PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta S, Zink D, Korn B, Vingron M, Haas SA: Strengths and weaknesses of EST-based prediction of tissue-specific alternative splicing. BMC Genomics 2004, 5(1):72. 10.1186/1471-2164-5-72PubMed CentralView ArticlePubMedGoogle Scholar
- Wilkie GS, Dickson KS, Gray NK: Regulation of mRNA translation by 5'- and 3'-UTR-binding factors. Trends Biochem Sci 2003, 28(4):182–188. 10.1016/S0968-0004(03)00051-3View ArticlePubMedGoogle Scholar
- Kim N, Lim D, Lee S, Kim H: ASePCR: alternative splicing electronic RT-PCR in multiple tissues and organs. Nucleic Acids Res 2005, 33(Web Server issue):W681–5. 10.1093/nar/gki407PubMed CentralView ArticlePubMedGoogle Scholar
- Nurtdinov RN, Neverov AD, Mal'ko DB, Kosmodem'ianskii IA, Ermakova EO, Ramenskii VE, Mironov AA, Gel'fand MS: [EDAS, databases of alternatively spliced human genes]. Biofizika 2006, 51(4):589–592.PubMedGoogle Scholar
- Huang HD, Horng JT, Lee CC, Liu BJ: ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data. Genome Biol 2003, 4(4):R29. 10.1186/gb-2003-4-4-r29PubMed CentralView ArticlePubMedGoogle Scholar
- Lee Y, Lee Y, Kim B, Shin Y, Nam S, Kim P, Kim N, Chung WH, Kim J, Lee S: ECgene: an alternative splicing database update. Nucleic Acids Res 2007, 35(Database issue):D99–103. 10.1093/nar/gkl992PubMed CentralView ArticlePubMedGoogle Scholar
- Kim P, Kim N, Lee Y, Kim B, Shin Y, Lee S: ECgene: genome annotation for alternative splicing. Nucleic Acids Res 2005, 33(Database issue):D75–9. 10.1093/nar/gki118PubMed CentralView ArticlePubMedGoogle Scholar
- Lee C, Atanelov L, Modrek B, Xing Y: ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res 2003, 31(1):101–105. 10.1093/nar/gkg029PubMed CentralView ArticlePubMedGoogle Scholar
- Krause A, Haas SA, Coward E, Vingron M: SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein. Nucleic Acids Res 2002, 30(1):299–300. 10.1093/nar/30.1.299PubMed CentralView ArticlePubMedGoogle Scholar
- Yura K, Shionyu M, Hagino K, Hijikata A, Hirashima Y, Nakahara T, Eguchi T, Shinoda K, Yamaguchi A, Takahashi K, Itoh T, Imanishi T, Gojobori T, Go M: Alternative splicing in human transcriptome: Functional and structural influence on proteins. Gene 2006, 380(2):63–71. 10.1016/j.gene.2006.05.015View ArticlePubMedGoogle Scholar
- Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG: Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Res 2001, 29(11):2338–2348. 10.1093/nar/29.11.2338PubMed CentralView ArticlePubMedGoogle Scholar
- Faustino NA, Cooper TA: Pre-mRNA splicing and human disease. Genes Dev 2003, 17(4):419–437. 10.1101/gad.1048803View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.