Different effects of the probe summarization algorithms PLIER and RMA on high-level analysis of Affymetrix exon arrays
© Qu et al. 2010
Received: 14 October 2009
Accepted: 28 April 2010
Published: 28 April 2010
Skip to main content
© Qu et al. 2010
Received: 14 October 2009
Accepted: 28 April 2010
Published: 28 April 2010
Alternative splicing is an important mechanism that increases protein diversity and functionality in higher eukaryotes. Affymetrix exon arrays are a commercialized platform used to detect alternative splicing on a genome-wide scale. Two probe summarization algorithms, PLIER (Probe Logarithmic Intensity Error) and RMA (Robust Multichip Average), are commonly used to compute gene-level and exon-level expression values. However, a systematic comparison of these two algorithms on their effects on high-level analysis of the arrays has not yet been reported.
In this study, we showed that PLIER summarization led to over-estimation of gene-level expression changes, relative to exon-level expression changes, in two-group comparisons. Consequently, it led to detection of substantially more skipped exons on up-regulated genes, as well as substantially more included (i.e., non-skipped) exons on down-regulated genes. In contrast, this bias was not observed for RMA-summarized data. By using a published human tissue dataset, we compared the tissue-specific expression and splicing detected by Affymetrix exon arrays with those detected based on expressed sequence databases. We found the tendency of PLIER was not supported by the expressed sequence data.
We showed that the tendency of PLIER in detection of alternative splicing is likely caused by a technical bias in the approach, rather than a biological bias. Moreover, we observed abnormal summarization results when using the PLIER algorithm, indicating that mathematical problems, such as numerical instability, may affect PLIER performance.
Alternative splicing (AS) contributes greatly to protein diversity throughout the evolution of complex organisms. According to Johnson et al. , about 70% of human multi-exon genes are predicted to have more than one isoform. Changes in the relative expression levels of the various isoforms may have significant biological implications (for example ). Genome-wide surveys of AS events have only become practical in recent years, largely due to the development of microarray technology. The Affymetrix exon array is one of the microarray platforms available for this purpose. It has been applied in a number of research areas [3–5], especially in cancer studies [6–8]. While traditional gene expression arrays target each transcript near the 3' end, Affymetrix Exon arrays target individual exons in the gene, thus enabling both gene-level expression analysis and detection of AS.
The Affymetrix GeneChip Human Exon 1.0 ST array has an extremely high probe density. The platform contains over 1.4 million probesets, each of which contains four perfect match (PM) probes that cover over 1 million exons. While this design has added new capacity to the microarray platform, it also poses new challenges in data analysis. First of all, gene-level and exon-level expression values need to be estimated using probe signals. This process is called summarization and several algorithms have been proposed for this purpose (for example ). The most commonly used summarization algorithms are RMA and PLIER, both of which are implemented in the Expression Console software provided by Affymetrix.
where BKG ij is the background value specific for array i and probe j.
Different approaches are taken to fit the models. RMA log-transforms the PM intensities, and uses median polish to obtain the robust estimates of log(t i ) and log(f j ). In contrast, PLIER algorithm works on the PM intensities directly without log-transformation. It defines r ij = log(e ij ). In order to down-weigh outliner probes with large absolute values of r ij , the loss function is specified as , where z is a tuning constant for robustness. Newton's method is applied to find the values of f and t that minimize the loss function. The IterPLIER method, which is an extension of PLIER algorithm, generates gene-level signals based on consecutive exons [10, 11]. A systematic comparison of PLIER and RMA summarization has not been reported. In this study, by using two public datasets, we found that IterPLIER and RMA derived different gene-level estimates from the same probe signals. Highly expressed probesets made more contribution to the gene-level signal in IterPLIER, compare to RMA.
Identification of differential splicing is also a challenge in microarray analysis. Changes in exon-level expression can be caused by two factors: differential splicing and differential gene expression. To detect differential splicing, the effect of differential gene expression must be removed. A commonly used strategy to achieve this is to calculate a NI (normalized index) for each exon, which is denoted as the ratio of the exon-level signal to the gene-level signal . NI represents the exon inclusion rate and can be used in statistical testing to detect differential splicing between sample groups. This strategy eliminates gene-level expression in a simple manner. However, one possible disadvantage of this method is that it relies heavily on correct estimation of gene-level expression. When comparing PLIER and RMA, we discovered that the two methods behaved differently in detection of alternatively spliced exons and we found that a major cause for this phenomenon was the difference in estimation of gene expression between the two methods.
To assess the ability of PLIER and RMA to detect AS, a relatively large number of validated AS events are required. RT-PCR is often performed to verify microarray results, but large-scale validation of exon array results with RT-PCR can be very laborious and impractical. In this study, we proposed a different strategy. A large amount of data on expressed sequences have been collected for various human tissues and are available in public sequence databases (Refseq, dbEST, etc.). Noh et al.  previously used these data to identify tissue-specific expression and splicing. Their results are summarized in the TISA database. Since human tissues are expected to have good homogeneity, we compared a published human tissue panel dataset of Affymetrix exon arrays with these sequence analysis results. We measured the level of consistency between the two platforms and tested whether the tendency of PLIER or RMA to detect AS was supported by the sequence data.
The colon cancer dataset is available from the Affymetrix website . Briefly, 10 matched pairs of human colon primary tumor and adjacent normal tissues were assayed on Affymetrix Human Exon 1.0 ST arrays. Only genes and exons with core annotation were considered in this study (the terms "exon" and "probeset" and the terms "gene" and "transcript" are used interchangeably hereafter). Patient 3 was removed since he was identified as an outlier by PCA analysis .
The human tissue panel dataset is also available from the Affymetrix website . This dataset contains 11 tissues: breast, cerebellum, heart, kidney, liver, muscle, spleen, pancreas, prostate, testis and thyroid. Each tissue was assayed on Affymetrix Human Exon 1.0 ST arrays in three biological replicates. Only genes and exons with core annotation were used in this study.
Gene-level and exon-level signals were generated from CEL files with Expression Console v1.1 using the PLIER and RMA algorithms. In this paper, "PLIER summarization" refers to gene-level and exon-level signals derived with the iterPLIER and PLIER algorithms, respectively; while "RMA summarization" refers to both gene-level and exon-level signals derived with the RMA algorithm. RMA-generated signals were reported on a log2 scale, while PLIER-generated signals were reported on a linear scale. As recommended in , a value of 16 was added to the gene- and exon-level PLIER signals prior to log transformation. Unless otherwise specified, gene- and exon- level signals mentioned hereafter are assumed to be presented on a log2 scale.
To reduce the false positive rate when comparing cancer vs. normal samples in the colon cancer dataset and when identifying tissue specificity in the human tissue dataset, we filtered probesets according to the suggestions in  and . For the colon cancer dataset, we defined a probeset as present in a group if its DABG (detection above background) p-value < 0.05 in at least 50% of the samples in the group. We defined a transcript as present in a group if at least 50% of the core probesets of the transcript were present in the group. We retained probesets present in either of the two groups and transcripts present in both groups. Probesets with cross-hybridization type other than 1 (i.e., not all probes uniquely match the targeted exon) were also removed. Moreover, we kept only the genes with IterPLIER signal > 70 in order to increase the true positive rate (according to ).
For the human tissue dataset, similar filtering steps were performed. First, we defined a probeset as present in a tissue if its DABG p < 0.05 in at least 2 samples (out of 3) of that tissue. Since there are a total of 11 tissues in the dataset, we filtered for: (1) probesets present in either the test tissue or five of the other tissues; (2) genes present both in the test tissue and in five of the other tissues; (3) probesets with type 1 cross-hybridization. In addition, we retained only genes for which the mean expression ranked in the top 50% for both IterPLIER and RMA, since genes with low expression levels may associate with higher false positive rates in the detection of AS .
where E i and G j are the expression values of exon i and gene j, respectively.
For the colon cancer dataset, paired t-tests on gene-level signals were used to detect differential expression between normal and cancer samples. Paired t-tests on NI were used to detect differential splicing between the two groups. For the human tissue dataset, two sample t-tests were used on gene-level signals to detect tissue-specific expression (one tissue vs. all the others). Two sample t-tests were used on NI to detect tissue-specific splicing (one tissue vs. all the others).
TISA data was obtained from the website http://tisa.kribb.re.kr/AGC. Genes and exons in the database were mapped to transcripts and probesets on the exon array based on physical position. For each tissue in the TISA database, we counted the number of exons located on tissue-specific genes and with reported tissue-specific splicing. Then we conducted the chi-square test to determine whether the ratio of relatively skipped exons to relatively included exons was different from 1:1 to see if there is an enrichment of skipped exons on the tissue-specific genes. Tissue-specific expression and splicing events detected using the human tissue dataset were compared to the TISA database (see Additional file 1 for details).
In microarray analysis, usually the fold change of gene expression between arrays, rather than the gene expression value itself, is of interest. In perfect case where the expression levels of exons change exactly proportionally across arrays, the "weighing scheme" may not affect the estimation of gene-level expression changes, however, as we will shown in the next section, in real situation, the two methods did result in different estimation of the fold change of gene expression, which lead to detection of different AS events with NI-based methods.
The two datasets were used to compare exons identified as alternatively spliced with either PLIER or RMA. For the colon cancer dataset, we compared cancer vs. normal samples and for the human tissue dataset, we compared cerebellum vs. non-cerebellum tissues, due to the high number of reported tissue specific splicing events in the brain.
After filtering, a total of 54,908 core probesets located in 3,552 core transcripts were retained in the colon cancer dataset. In the human tissue dataset, for the comparison of cerebellum vs. non-cerebellum tissues, 111,703 core probesets located in 7,686 core transcripts were kept for further analysis.
where SI represents the difference in NI for exon i in gene j in the kth sample pair, and ΔG represents the difference in expression levels of gene j for the kth sample pair.
where SI ij and ΔG j represent the mean difference in inclusion rates of exon i and in expression levels of gene j between the cerebellum vs. non-cerebellum tissues, respectively.
In both datasets, we found that SI was strongly negatively correlated with ΔG for PLIER-summarized data, while SI and ΔG were correlated to a much lesser extent (though the correlation was also significant, p < 0.01) for RMA-summarized data. Person and Spearmen correlations between SI i, j, kand ΔG j, kin the colon cancer dataset were as large as -0.492 and -0.456, respectively, for the PLIER-summarized data. In contrast, the correlations were 0.048 and 0.045 for the RMA-summarized data. Similarly, in the human tissue dataset, the Person and Spearmen correlations between SI i, jand ΔG j were -0.58 and -0.60 when using PLIER, and -0.021 and 0.055 when using RMA, respectively.
Detection of alternatively spliced exons on differentially expressed genes: (A) Human tissue dataset, (B) Colon cancer dataset
Significance level of AS
# of significant probesets
# of significant probesets located on differentially expressed genesa(1)
# of probesets in (1) and with mean gene expression difference (2)
# of probesets in (2) and with mean NI difference
>0 559 (8.6%)
<0 5954 (91.4%)
>0 4086 (96.7%)
<0 138 (3.3%)
>0 1736 (10.5%)
<0 14729 (89.5%)
>0 9641 (94.6%)
<0 573 (5.4%)
>0 2485 (56.7%)
<0 1896 (43.3%)
>0 1067 (64.6%)
<0 585 (35.4%)
>0 7295 (59.2%)
<0 5023 (40.8%)
>0 3575 (56.7%)
<0 2735 (43.3%)
Significance level of AS
# of significant probesets
# of significant probesets located on differentially expressed genes b (1)
# of probesets in (1) and with mean gene expression difference (2)
# of probesets in (2) and with and mean NI difference
>0 3 (1.5%)
<0 197 (98.5%)
>0 20 (100%)
<0 0 (0%)
>0 134 (4.7%)
<0 2696 (95.3%)
>0 436 (96.2%)
<0 17 (3.8%)
>0 35 (33.3%)
<0 70 (66.7%)
<0 2 (100%)
>0 1372 (55.3%)
<0 1110 (44.7%)
>0 60 (50%)
<0 60 (50%)
Table 1 shows the number of probesets that were identified as having significant AS and were located on up- or down-regulated genes. Two significance levels of AS were considered, p < 0.001 and p < 0.05. For PLIER-summarized data, at least 89.5% of the significant probesets detected on up-regulated genes were relatively skipped (with SI < 0) and at least 94.6% of the significant probesets detected on down-regulated genes were relatively included (with SI > 0). In contrast, for RMA summarized data, substantially more balanced numbers of skipped and included probesets were detected on both up- and down-regulated genes (the ratio of numbers of these two kinds of probesets ranged between 3:7 and 7:3 in all but one case, where the total number of probesets under consideration was only 2). (Table 1
One possible reason for the observed difference between PLIER and RMA summarization is that, the multiplicative error model may not be completely hold for probes with low intensities. As it is generally thought that low intensity features are likely to be associated with larger coefficients of variation, lowly expressed probesets are typically filtered out in microarray analysis. So to minimize the loss function , PLIER probably down-weighs features with low intensities. The iterPLIER algorithm may further strengthen the tendency of "selecting against" low intensity features, since it first computes the gene-level expression using all the probes with PLIER, and then iteratively selects a subset of probes whose signal varies in a similar pattern as the predicted gene-level expression to redo the calculation. Another point to be considered is that, as suggested by the moderately positive correlation between the average exon-level expression and expression difference between sample groups, the assumption that f j is independent of signal intensity is probably not completely hold either, which may also have an influence on gene-level estimation.
Enrichment of skipped exons on tissue-specific genes in the human tissue dataset
Pearson correlation between SI and ΔG
Spearman correlation between SI and ΔG
enrichment of skipped probesets (p < 0.001) on tissue-specific genes
enrichment of skipped probesets (p < 0.05) on tissue-specific genes
The TISA database contains 46 tissues. A total of 4,527 exons were found to be involved in the 3,695 tissue-specific splicing events (with p < 0.05), while 1,753 of these exons were located on genes that were specifically expressed in the same tissue (an exon may be counted multiple times if it is involved in more than one tissue-specific splicing event). Nine exons with inconsistent tissue specificity (i.e., presents in both isoform A, which was reported to be more enriched in a given tissue, and isoform B, which was reported to be less enriched in the same tissue) were removed, and the remaining 1,744 exons were kept for further analysis.
Numbers of exons located on tissue-specific genes and with tissue-specific splicing, according to TISA database
# exons with tissue-specific splicing and located on genes with tissue-specific expression (1)
# exons in (1) and were relatively included in that tissue
# exons in (1) and were relatively skipped in that tissue
3 splicing forms only
10 tissues only
3 splicing forms+10 tissues only
all exons mapped to core probesets
3 splicing forms+ mapped to core probesets only
10 tissues +mapped to core probesets only
3 splicing forms+10 tissues+ mapped to core probesets only
This study compared gene- and exon-level tissue specificity identified using the exon arrays with the TISA database to assess reliability of the exon array platform and the performance of PLIER and RMA. The mapping between the two platforms showed reasonable agreement in gene- and exon-sequence clustering. At the gene level, we observed significant consistency between the two platforms in detection of tissue-specific expression for both PLIER- and RMA-summarized data. RMA performing slightly better than PLIER in distinguishing "true positives" from "true negatives", where the tissue-specific expression reported in TISA were assumed to be true events. However, at the exon-level, the consistency between the two platforms was not significant, regardless of the summarization method. In addition, the difference between the two methods was not significant. Due to the lack of significant agreement between the TISA database and the tissue dataset, we did not reach a conclusion as to which method was better (see Additional file 1 for details).
Since the IterPLIER algorithm is based on PLIER, and PLIER relies on Newton's method to find the best solution for parameters, the observed phenomena may be due to numerical instability, which can cause the algorithm to be trapped in a local maximum, resulting in retrieval of an abnormal solution for the parameters. By using the APT software (Affymetrix Power Tools) and choosing different parameters for controlling the PLIER algorithm, similar problems can be avoided in some cases, but not in all (data not shown).
In this study, we found that the two commonly used summarization algorithms, PLIER and RMA, behaved differently in detection of AS. Due to different gene-level estimation, PLIER showed a strong tendency to detect relatively skipped exons on up-regulated genes and relatively included exons on down-regulated genes, while this tendency was not observed when using RMA. To determine whether this tendency of PLIER represents a real biological situation, we used tissue-specific expression and splicing events that have been identified with sequence data and summarized in the TISA database as references. The TISA data did not show significant enrichment of skipped exons on genes with tissue-specific expression, a finding that did not support the tendency of PLIER. So we concluded that the observed tendency of PLIER is due to technical bias. We also compared the performance of RMA and PLIER in detection of AS by using tissue-specific splicing events in the TISA database as true positives. The consistency between the exon array data and the TISA database was low for both summarization methods, and the difference between the two methods was not significant. Given the observed bias of PLIER, this result may suggest that the efficacy of the RMA algorithm can be further improved as well. More sophisticated methods that incorporate sequence information or other characteristics of the probes may help to achieve more accurate estimation of gene- and exon-level expression .
Probe Logarithmic Intensity Error
Robust Multichip Average
Detection above Background
Tissue-specific Alternative splicing
This work was supported by Shanghai Natural Science Foundation (07ZR14084).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.