SpliceCenter: A suite of web-based bioinformatic applications for evaluating the impact of alternative splicing on RT-PCR, RNAi, microarray, and peptide-based studies
© Ryan et al; licensee BioMed Central Ltd. 2008
Received: 25 March 2008
Accepted: 18 July 2008
Published: 18 July 2008
Over 60% of protein-coding genes in vertebrates express mRNAs that undergo alternative splicing. The resulting collection of transcript isoforms poses significant challenges for contemporary biological assays. For example, RT-PCR validation of gene expression microarray results may be unsuccessful if the two technologies target different splice variants. Effective use of sequence-based technologies requires knowledge of the specific splice variant(s) that are targeted. In addition, the critical roles of alternative splice forms in biological function and in disease suggest that assay results may be more informative if analyzed in the context of the targeted splice variant.
A number of contemporary technologies are used for analyzing transcripts or proteins. To enable investigation of the impact of splice variation on the interpretation of data derived from those technologies, we have developed SpliceCenter. SpliceCenter is a suite of user-friendly, web-based applications that includes programs for analysis of RT-PCR primer/probe sets, effectors of RNAi, microarrays, and protein-targeting technologies. Both interactive and high-throughput implementations of the tools are provided. The interactive versions of SpliceCenter tools provide visualizations of a gene's alternative transcripts and probe target positions, enabling the user to identify which splice variants are or are not targeted. The high-throughput batch versions accept user query files and provide results in tabular form. When, for example, we used SpliceCenter's batch siRNA-Check to process the Cancer Genome Anatomy Project's large-scale shRNA library, we found that only 59% of the 50,766 shRNAs in the library target all known splice variants of the target gene, 32% target some but not all, and 9% do not target any currently annotated transcript.
SpliceCenter http://discover.nci.nih.gov/splicecenter provides unique, user-friendly applications for assessing the impact of transcript variation on the design and interpretation of RT-PCR, RNAi, gene expression microarrays, antibody-based detection, and mass spectrometry proteomics. The tools are intended for use by bench biologists as well as bioinformaticists.
Technologies commonly used by biologists to investigate gene function include quantitative RT-PCR (qRT-PCR) assays, RNA interference (RNAi) mediated by small interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), gene expression microarrays, and antibody-based protein assays. Each of those technologies targets a small nucleic or amino acid sequence that, preferably, is unique to a specific gene.
More than 60% of protein-coding genes in vertebrates exhibit splice variation [1, 2]. Alternative splicing complicates the selection of target sequence and interpretation of resulting data. In many cases, the targeted sequence might not be present in all of a gene's transcript forms. The prevalence of alternative spliceforms suggests many questions that routinely confront biologists who use oligonucleotide- or peptide-based assays. For example:
Which specific splice variants are targeted by my assay? What other splice variants exist?
Do the RT-PCR primers/probes that I plan to use to validate microarray expression results target the same splice variants as were targeted by the microarray platform?
Did an siRNA fail to mediate RNAi silencing of a gene because it did not target the dominant splice isoform?
Is there known splice variation in my gene of interest that affects the protein coding portion of the transcript? Does the antibody that I plan to use target all potential protein products?
Where could I place RT PCR primers to target all splice variants? Where could I place RT PCR primers to amplify one specific splice variant to the exclusion of others?
Do expression values from one microarray fail to correlate with values from another microarray because the probesets target different splice variants?
Questions such as those are not motivated by a particular research focus on alternative splicing, but rather by the need to account for the impact of splice variation in almost every high-throughput biological study. In our laboratories, for example, we have experienced several such issues, including an siRNA that failed to target the dominant transcript in a particular cell line and RT PCR results that failed to correlate with microarray expression data because different splice forms were targeted .
In addition to the pragmatic motivations for evaluating the splice variants targeted by a given assay, there may also be scientific benefit. Alternate splice forms have been associated with tissue-specific gene functions, developmental processes, and disease states (notably cancer) [4, 5]. Genes with splice variation in the coding region produce different proteins with potentially dramatically different functions. For example, the Epidermal Growth Factor Receptor (EGFr), a major target for cancer therapy, can be expressed in the transmembrane receptor form or as a soluble isoform that competes with the receptor for binding of ligand. Transcripts with variation in the untranslated regions (UTRs) may be differentially regulated and therefore exhibit differences in spatial or temporal expression patterns.
Comparison of SpliceCenter features with those of other web-based splicing resources
Simple, Intuitive Interface
Help and Sample Queries
Graphical display of gene's splice variants
Identifies coding regions
Identifies NMD targets
Human, Mouse, and Rat
Primer sequence query showing position in variants
Batch high-throughput primer query
siRNA sequence query showing position in variants
Batch high-throughput hit/miss report for siRNAs
Display pre-computed target positions of Affymetrix, Agilent, Illumina, and ExonHit probesets in gene's variants
Display integrated graphic of microarray probe targets and primer target positions.
Batch high-throughput query of pre-computed probe/probeset target positions
Peptide sequence query displaying source coding region in splice variants
Batch high-throughput peptide query
To meet the needs of biologists as well as bioinformaticists, we have developed SpliceCenter, a comprehensive, rapid, user-friendly suite of web-based tools for identifying the alternative transcripts targeted by contemporary technologies. The SpliceCenter tools include: "Primer-Check" for evaluation of qRT-PCR primers/probes, siRNA-Check" for analysis of RNAi effectors, "Array-Check" for analysis of mRNA expression microarray probes, and "Peptide-Check" for analysis of peptides (e.g., in antibody-based binding assays and mass spectrometry). SpliceCenter applications provide the novel ability to cross-compare the technologies (e.g., with a single graphical visualization that indicates both the splice forms of a gene that are targeted by RT-PCR primers and those that are targeted by a microarray probe set).
SpliceCenter was developed using infrastructure from our previously described SpliceMiner application  including the splice variant database structure and microarray probe assignment features. SpliceCenter represents a substantial advance over our previous work in terms of the database content, website utilities, and breadth of potential users. The database was rebuilt and extended to include the latest genomic and transcript data, mouse and rat genomes, position of coding regions, identification of NMD targets, protein sequences, and pre-computed microarray probe targets. The new website utilities provide custom, user-friendly applications tailored to the needs of bench biologists who are applying such technologies as RT-PCR, RNAi, microarrays, or peptide-based assays. The focused nature of SpliceCenter utilities provide a significant time savings compared to our previous application [see Additional file 1]. Additional query facilities were also implemented to process protein sequences and very short oligos (e.g. siRNA and PCR primer sequences).
Construction and Content
We needed to develop new sequence-alignment components for PCR-Check and siRNA-Check because the relatively short sequence lengths (e.g., 21 nucleotide sequences of siRNAs) were problematic for our previously developed sequence alignment functions. The new sequence search component makes use of the open source PrimerMatch  application to align user-provided PCR primer or siRNA sequences with transcript sequences. We selected PrimerMatch because, unlike BLAT and BLAST, it can rapidly and accurately align very short query sequences.
For Peptide-Check, a peptide-to-protein alignment is performed with the translated protein sequences from the RefSeq and GenBank records in our splice variant database. Alignments are converted into chromosomal coordinates, and thus into the transcript position, via EVDB mappings.
Utility and Discussion
The SpliceCenter web-based application suite is designed to assist biologists and bioinformaticists in analyzing the impact of alternative splicing on studies of transcripts and proteins. Each application identifies the target locations of oligonucleotides or peptides within the unique splice variants of the targeted gene or genes. Interactive investigation is supported through web-based applications that return graphical results with clickable hyperlinks. High-throughput applications for processing large query files are implemented as batch applications that return text-based result files. Currently support species are human, mouse, and rat.
Primer design and selection: Whether designing custom primers or selecting commercial ones, it is important to identify the splice variants that will be targeted. Primer-Check can be used to ensure that selected primer pairs hybridize to all variants (or specifically targeted variants) and to screen for possible cross-hybridizations.
Investigation of anomalous results: One potential reason for failure of RT-PCR primers is that they are not targeting the splice variant(s) present in the particular sample being analyzed. Primer-Check is useful for trouble-shooting RT-PCR primers that fail to provide the expected amplification product.
Validation of microarray data: As already noted, qRT-PCR is considered to be the gold standard for validation of microarray results. Primer-Check can display the target locations of PCR primers and probes and the target locations of microarray probes in a single graphical display that shows directly whether PCR primers and microarray probes do, in fact, target the same variants.
The effect of splice variation on validation of microarray expression data by qRT-PCR data is by no means hypothetical. A study by Dallas and colleagues  found that such correlations were negatively impacted by splice variation if PCR primers and microarray probes targeted differently spliced transcripts. Figure 3 shows Primer-Check results for SEC23IP (P125), a gene that showed discordance between microarray and qRT-PCR results in the Dallas study . Primer-Check identifies four known splice forms of the gene. Two Affymetrix probe sets on the U133A GeneChip target the RefSeq-annotated transcript (NM_007190) that corresponds to SEC23IP. However, one of those probes sets, 216392_s_at, also targets two additional transcript variants (BC063800 and AB019435), missing the fourth. The second probe set, 209175_at, targets only one of those additional variants (BC063800), missing the other two. In contrast, the PCR primers and probes from Applied Biosystems target all four reported variants corresponding to SEC23IP, leading to discordance between the qRT-PCR and microarray results. Primer-Check can help diagnose or even avoid such problems.
RNAi technologies based on exogenously administered siRNAs or shRNAs are used extensively to investigate gene function. For compactness in the following descriptions, we will often use the term "siRNA" to include all of the standard RNAi effector molecules. siRNAs mediate sequence-specific gene silencing through targeted cleavage of a transcript via the RNA interference pathway. Selecting an siRNA sequence that effectively targets a gene is a complex task that requires in silico prediction of the ability of the siRNA to mediate cleavage of the targeted transcript(s) while avoiding partially homologous sequences of other genes. Databases of experimentally-validated siRNAs and several tools to aid in design are available [18, 19]. To achieve the goal of maximally silencing protein expression, it is safest to ensure that all protein-encoding transcript variants are targeted by the siRNA. Hence, in most cases siRNAs have been designed to target all splice variants of a gene that are found in the RefSeq database. But because RefSeq was not designed or intended to include all known transcripts, non-RefSeq splice variants may not be targeted. If, for example, an siRNA has been successful in one cell type and then fails to silence expression in another, the two cell types may be expressing different splice variants. siRNA-Check can be used to confirm targeting of all known variants or selective targeting of a particular variant (and, therefore, silencing of a particular protein isoform). The following are typical uses of siRNA-Check:
Selection or design of siRNA (or shRNA) sequences: Whether designing custom siRNAs or selecting commercial ones, it is important to understand which variants will be targeted. The siRNA-Check application can be used to confirm targeting of all variants or selective targeting of a particular variant (and, therefore, silencing of a particular protein isoform). In interactive mode, the application identifies siRNA target sequences within a gene via an intuitive graphical display. If an siRNA targets a sequence that occurs in more than one gene, multiple graphics panels, one for each gene, are displayed.
Clarification of anomalous results: siRNA-Check provides a quick, easy way to investigate the possibility that failure to silence a gene is due to splice variation. To cite one example from our own work, when we were trying to knock down expression of two apoptosis-associated genes, BAD and YWHAZ, we observed differential expression of some of the untargeted transcript variants . For example, as shown in Additional Figure 1a [see Additional file 2], two siRNAs that target BAD (siBAD.1 and siBAD.3) mediated a significant decrease in mRNA levels when all variants of the gene were assayed (using the Branched DNA-RNA Quantigene assay, Panomics, Fremont, CA). But siBAD.2 produced no knockdown. Transcript-specific qRT-PCR showed that NM_004322, the transcript variant targeted by siBAD.2, represents only 1% of BAD mRNA levels in the cell line studied. We saw analogous results for the gene YWHAZ (Additional Figure 1b) [see Additional file 2].
Microarray platform evaluation: Figure 4 shows an Array-Check comparison of Affymetrix 95A, U133A, and U133 Plus 2 in their coverage of the ACP1 gene. Array-Check thus provides a quick means of performing a side-by-side comparison of the coverage of splice variants by microarray platforms. It can be used in the mining of historical microarray datasets to ensure that an older platform provided good coverage of all variants of the gene of interest. It may also be useful in selecting a platform for a new study. As shown in the figure, the U133A and U133 Plus 2 Affymetrix arrays provide better coverage of the ACP1 variants than does the U95A array.
Microarray platform correlation: Comparison of expression values from different microarray platforms is prone to misinterpretation if the platforms target different splice variants. Array-Check provides a mechanism for comparison of probe target locations to identify potential splicing-related differences. For example, Figure 4 shows that correlation between probe set 36611_at on the U95A array and probe set 1554808_at on the U133 Plus 2 array is unlikely because those probe sets measure non-overlapping subsets of the splice variants of ACP1.
Trouble-shooting anomalous results: Alternative splicing is a potential source of inconsistent expression measurements among the probes in a nominal probe set. Array-Check provides a rapid means for ascertaining the known variants that are targeted or missed by probes on a given microarray platform. Older microarrays were designed before the availability of detailed annotation of many of the recently-identified transcript variants. Array-Check indicates splice variant coverage in the context of up-to-date information on transcript variation.
Alternative splicing plays a critical role in higher organisms by increasing the functional diversity of proteins. Isoforms that differ minimally in structure may perform very different functions or may perform the same function in different cell types or at different stages of development. Failure to take splice variation into account can lead to inaccurate or incorrect interpretation of experimental results. Mass spectrometry and antibody-binding assays, the most common technologies in proteomic research, are susceptible to such problems. For those technologies, Peptide-Check provides a simple interface that accepts one or more short peptide sequences, generates a visualization of the known splice variants of the source gene, and shows the location, within the mRNA transcript, of the nucleotide sequence that codes for the peptide. Common use-cases include the following:
Design and analysis of peptide immunogens and antigens: Peptide-Check has many applications in the context of technologies that use antibodies or other ligands that target peptide sequences. To cite one common example, animals are often immunized with a peptide to generate antibodies against a specific protein. Peptide-Check can assist in selecting an immunizing peptide that occurs in all splice forms of the protein or, conversely, in only one particular form. The latter type of specificity may be particularly useful for identification of the biological or pathological roles of individual protein isoforms. For example, antibodies raised against peptides that represent unique splice variants of p53 have helped to elucidate details of the molecule's tumor suppressor function . Peptide-Check provides a rapid method for identifying the target variants of those p53 antibodies [see Additional file 3]. It should be noted that Peptide-Check is capable of processing only sequential peptide epitopes; it cannot help with conformational epitopes that are composed of multiple sequences within a protein.
Analysis of Mass spectrometry results: Mass spectrometry is increasingly being used to identify and/or quantify proteins in a biological sample after peptidolysis. The first step is to identify peptides on the basis of mass/charge ratios, partial sequences, and/or chromatographic elution times. The identity of the original protein is then inferred from the peptides by any of a number of available software packages (reviewed in ). Peptide-Check can then be queried to explain the presence or absence of peptides that correspond to a given protein isoform and perhaps to give information on which isoforms are expressed in the sample. In principle, knowledge of splice variation could also be included in calculation of the protein identification probabilities provided by peptide fingerprinting programs.
Although alternative splicing is a ubiquitous and functionally critical phenomenon in eukaryotic gene expression, fluent software tools have not been available to assist researchers, particularly bench biologists, in determining which splice variants are targeted by particular qRT-PCR primer sets, RNAi effectors, microarray platforms or peptide-targeting reagents. Increasingly, all of those methods are being used, separately or in combination, to analyze gene function. SpliceCenter's integrated suite of applications for correlating experimental findings with the transcriptional structure of a gene should significantly aid in elucidating the roles played by splice variation in a wide range of biological processes and diseases. The SpliceCenter applications, currently including Primer-Check, Array-Check, siRNA-Check, and Peptide-Check, provide user-friendly, web-based tools for the biologist and bioinformaticist.
Availability and requirements
The SpliceCenter website is available for use by academic, government, or commercial users without restriction or charge. The address of the site is: http://discover.nci.nih.gov/splicecenter. We recommend using Internet Explorer (version 6.0 or better) or FireFox (version 2.0 or better) but are not aware of compatibility issues with other browsers.
This research was supported in part by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research and in part by Tiger Team Consulting. We thank Peter Dallas of the University of Western Australia for providing background and data from his study of the correlation between microarray and RT-PCR expression values. We also thank Dr. Nathan Edwards of the University of Maryland for developing the open source PrimerMatch application and for assisting us with its use. We thank Tamara Jones, Gene Silencing Section, GB, CCR, NCI for assistance in assessing the siRNA-Check application, David Kane of SRA International and the Genomics & Bioinformatics Group, LMP, CCR, NCI for assistance in deploying the applications on the web, Dr. Philip Lorenzi, LMP, CCR, NCI and Dr. Jennifer Weller, BRC, University of North Carolina at Charlotte, for useful discussion.
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, al : Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860–921. 10.1038/35057062View ArticlePubMedGoogle Scholar
- Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 2003, 302(5653):2141–2144. 10.1126/science.1090100View ArticlePubMedGoogle Scholar
- Martin SE, Jones TL, Thomas CL, Lorenzi PL, Nguyen DA, Runfola T, Gunsior M, Weinstein JN, Goldsmith PK, Lader E, Huppi K, Caplen NJ: Multiplexing siRNAs to compress RNAi-based screen size in human cells. Nucleic Acids Res 2007, 35(8):e57. 10.1093/nar/gkm141PubMed CentralView ArticlePubMedGoogle Scholar
- Hu GK, Madore SJ, Moldover B, Jatkoe T, Balaban D, Thomas J, Wang Y: Predicting splice variant from DNA chip expression data. Genome Res 2001, 11(7):1237–1245. 10.1101/gr.165501PubMed CentralView ArticlePubMedGoogle Scholar
- Garcia-Blanco MA, Baraniak AP, L LE: Alternative splicing in disease and therapy. Nat Biotechnol 2004, 22(5):535–546. 10.1038/nbt964View ArticlePubMedGoogle Scholar
- Thierry-Mieg D, Thierry-Mieg J: AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol 2006, 7 Suppl 1: S12.1–14.Google Scholar
- Kim N, Alekseyenko AV, Roy M, Lee C: The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Res 2007, 35(Database issue):D93–8. 10.1093/nar/gkl884PubMed CentralView ArticlePubMedGoogle Scholar
- Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA: ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 2006, 34(Database issue):D46–55. 10.1093/nar/gkj031PubMed CentralView ArticlePubMedGoogle Scholar
- de la Grange P, Dutertre M, Correa M, Auboeuf D: A new advance in alternative splicing databases: from catalogue to detailed analysis of regulation of expression and function of human alternative splicing variants. BMC Bioinformatics 2007, 8: 180. 10.1186/1471-2105-8-180PubMed CentralView ArticlePubMedGoogle Scholar
- Holste D, Huo G, Tung V, Burge CB: HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res 2006, 34(Database issue):D56–62. 10.1093/nar/gkj048PubMed CentralView ArticlePubMedGoogle Scholar
- Bhasi A, Pandey RV, Utharasamy SP, Senapathy P: EuSplice: a unified resource for the analysis of splice signals and alternative splicing in eukaryotic genes. Bioinformatics 2007, 23(14):1815–1823. 10.1093/bioinformatics/btm084View ArticlePubMedGoogle Scholar
- Kim P, Kim N, Lee Y, Kim B, Shin Y, Lee S: ECgene: genome annotation for alternative splicing. Nucleic Acids Res 2005, 33(Database issue):D75–9. 10.1093/nar/gki118PubMed CentralView ArticlePubMedGoogle Scholar
- Rambaldi D, Felice B, Praz V, Bucher P, Cittaro D, Guffanti A: Splicy: a web-based tool for the prediction of possible alternative splicing events from Affymetrix probeset data. BMC Bioinformatics 2007, 8 Suppl 1: S17. 10.1186/1471-2105-8-S1-S17View ArticlePubMedGoogle Scholar
- Kahn AB, Ryan MC, Liu H, Zeeberg BR, Jamison DC, Weinstein JN: SpliceMiner: a high-throughput database implementation of the NCBI Evidence Viewer for microarray splice variant analysis. BMC Bioinformatics 2007, 8: 75. 10.1186/1471-2105-8-75PubMed CentralView ArticlePubMedGoogle Scholar
- NCBI Evidence Viewer[http://www.ncbi.nlm.nih.gov/sutils/static/evvdoc.html]
- Primer Match[http://www.umiacs.umd.edu/~nedwards/research/primer_match.html]
- Dallas PB, Gottardo NG, Firth MJ, Beesley AH, Hoffmann K, Terry PA, Freitas JR, Boag JM, Cummings AJ, Kees UR: Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR -- how well do they correlate? BMC Genomics 2005, 6(1):59. 10.1186/1471-2164-6-59PubMed CentralView ArticlePubMedGoogle Scholar
- Pei Y, Tuschl T: On the art of identifying effective and specific siRNAs. Nat Methods 2006, 3(9):670–676. 10.1038/nmeth911View ArticlePubMedGoogle Scholar
- NCBI RNAi[http://www.ncbi.nlm.nih.gov/projects/genome/rnai/]
- Lee JC, tiles D, Lu J, Cam MC: A detailed transcript-level probe annotation reveals alternative splicing based microarray platform differences. BMC Genomics 2007., 8(284):
- Bourdon JC: p53 and its isoforms in cancer. Br J Cancer 2007, 97(3):277–282. 10.1038/sj.bjc.6603886PubMed CentralView ArticlePubMedGoogle Scholar
- Xu C, Ma B: Software for computational peptide identification from MS-MS data. Drug Discov Today 2006, 11(13–14):595–600. 10.1016/j.drudis.2006.05.011View ArticlePubMedGoogle Scholar
- The Cancer Genome Anatomy Project, RNAi[http://cgap.nci.nih.gov/RNAi]
- Applied Biosystems[http://appliedbiosystems.com]
- Bourdon JC, Fernandes K, Murray-Zmijewski F, Liu G, Diot A, Xirodimas DP, Saville MK, Lane DP: p53 isoforms can regulate p53 transcriptional activity. Genes Dev 2005, 19(18):2122–2137. 10.1101/gad.1339905PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.