It would be very useful to be able to predict which samples will go on to give interpretable results after hybridisation, based on their pre-chip quality control variables. In some cases, for instance when using post-mortem human tissue, over which the researcher often has little control before preparing the RNA, these considerations become paramount. We examined the results we had obtained using 117 brain RNA samples hybridised to U133A GeneChips to see if we could establish any relationships between pre- and post-chip variables. As it might also be useful for the interpretation of data from the publicly available resources we also examined whether we could predict the RIN from the post-chip measures.
The two measures of RNA integrity we used were highly correlated; these in turn correlated with cRNA yield indicating that more intact total RNA will lead to better yields at the end of the sample preparation process. Indeed, the few samples that did not reach the Affymetrix recommended 15μg cRNA yield were largely generated from poor quality RNA. As we did not hybridise those samples we judged to have RNA that was too degraded (by subjective assessment) or that did not generate sufficient cRNA we cannot judge their impact on our post-chip measures. There are no reports assessing the performance of the Agilent-generated RIN on GeneChip data quality to date. Using the post-chip quality control measures in our experiment, we found that samples with a RIN > 5.5 produced expression data of sufficient quality to be included in analyses. We found that longer post-mortem intervals were associated with poorer quality RNA (lower RIN or SUBQUAL) as might be expected, although the level of correlation is low, around -0.2. Tomita et al.  found that RNA integrity measures correlated well with their measure of post-chip performance but that PMI was not correlated with RNA integrity measures. In common with previous reports, they did find that agonal factors (e.g. terminal coma, pyrexia and coma) correlated better with RNA integrity than with PMI [4, 18]. Of the three samples in our study derived from cases with prolonged agonal state, none were clear outliers with respect to RNA quality or cRNA yield compared with the rest of the RNA samples in this experiment. They are likely to be too few and the quality too variable in our experiment to separate them out at this point but all three were clearly outliers on post-chip measures. RNA integrity is also related to tissue pH [4, 18–20] and pH has been used as a surrogate measure of RNA integrity as a result of pre- and post-agonal events. We did not measure this in our samples, but given the relative ease of doing this test, it would be a worthwhile measure in the absence of clinical information. The variation in RNA quality observed in our samples may in part be due to PMI and agonal state, but these factors seem to play only a small part in the variation of quality observed. Other factors such as technical problems (e.g. freezing and storage) and unknown physiological processes likely had a greater impact.
The post-chip measures can be generated from data available in the public databases, but the pre-chip quality control measures are usually not provided. Therefore it might prove useful to predict the RIN, as an objective measure of RNA integrity, retrospectively. It is not clear how generalisable the models generated from our data will be although they indicate that there may be relationships that can provide an estimate of this information. Only examination of a large number of varied data sets will give a true indication of their general validity. However, one of these predictors might be used to obtain a 'quick and dirty' estimate of the quality of RNA from which the data is derived. More importantly, it highlights the most appropriate post-chip measures that predict RNA quality (B_ACTIN35 and GAPDH35 and to a lesser extent SF) and gives an estimate of their relationship which can be used to select chips to exclude from analysis based on their outlier status in these variables which would improve data quality, particularly in small datasets or datasets combined from several experiments.
Yield of cRNA is significantly correlated with measures of RNA integrity. It is thus difficult to know if yield per se is related to PMI or whether this is simply a result of its relationship with RNA integrity. Although yield clearly reflects RNA integrity it also indexes the quality of the reactions from total RNA to cRNA applied to the chips. However, it is clear that yield and RNA integrity have different relationships with the post-chip QC factors. In our experiment where all reactions were carried out by the same person in large batches it is unlikely that there were large variations in yield due to technical factors. Our study is limited by having no systematic technical replicates, for as in most such studies this would have been too expensive. The only RNA sample that was re-hybridised to an A chip, having been re-generated from RNA, was rated as poor and failed on both occasions.
It is useful to distinguish between the various facets of the catch-all term 'quality'. In chronological order: there is first the condition of the starting RNA; next is the calibre of the experimental process and resulting hybridisation; finally comes the acceptability of the resulting expression measures, including identification of outliers.
We can think of the first four principal components as providing a grouping of post-chip measures, with each component representing a different aspect of quality (see Table 5): the first component reflects variables gauging array outliers, the second comprises variables assessing array adjustment, the third contains variables measuring hybridisation noise, and the fourth consists of the set of variables related to RNA integrity. Interestingly, these components correspond roughly to the three aspects mentioned above, but in the reverse order. The first and second components together give insight into the outlier status of a chip when it is considered as part of a set of chips. The component explaining most variability contains variables providing numerical assessments for outlier identification. A related but somewhat distinct aspect is given by an assessment of how far off the chip is from the others, or how the signal would need to be adjusted to make it more like the rest of the chips in the set; this is provided by variables strongly represented in the second component. The third and fourth components, respectively, reflect directly the second (hybridisation) and first (RNA integrity) areas of quality.
It is notable that none of the quality assessment procedures have measures in all four categories covered by the principal components. The different algorithms appear to mostly view quality from different, though overlapping, perspectives. The measures provided by MAS 5.0 are most prominent in the noise and integrity aspects, but also touch on array adjustment. The RMA-QC measures dominate in outlier identification, but also include array adjustment; integrity is tangentially included in the affy package  through the slope and corresponding p-value. DChip measures also focus on outlier identification and array adjustment, but include a noise variable as well .
These different softwares are often used in conjunction with one another, as indeed we did in our original analysis. Presumably it is hoped that this increases the chance of picking out all of the important variation, although it may also increase the chances of excluding from the analysis chips with data of sufficient quality.
The relationships revealed by the canonical correlations confirm that RNA perceived by subjective assessment or RIN to be of high quality correlates strongly with post-chip measures of 3'/5' integrity in the first canonical correlation. This is reflected in the ability of B_ACTIN35 and GAPDH35 to predict RIN retrospectively. RMA and dChip measures do not explicitly index the 3'/5' ratios, seen in the relationship between RIN and the affy package RNADEG_SL, that the canonical correlations reveal to be highly correlated: we did not use the RNADEG_SL in our original decisions about which chips to exclude from analysis. In retrospect it only identified 1 out of the 6 samples that we subsequently decided to drop from the analysis in our original method of exclusion and in fact identified two other chips that we decided on balance to include.
However, the scaling factor and median log ratios of RMA expression are also strongly related to RNA integrity variables, indicating that scaling factors or bias should be as consistent as possible across all chips in an experiment. The relationship of SF to RIN is evidenced by its ability to predict RIN without invoking the measures of 3'/5' ratios. This is entirely logical as higher RNA integrity is related to higher yield of cRNA and to a signal better able to be distinguished from the noise or background. Thus scaling factors will be smaller and percentage present calls higher if the RNA is of good quality, as shown by the first canonical correlation. The less adjustment that needs to be applied between different chips, the more likely that a clearly interpretable signal will be obtained from an experiment. That is, there will be increased sensitivity to detect changes. Including data from a poor quality chip affects the numbers of probesets considered to be differentially expressed very markedly; this effect is amplified with smaller numbers of biological replicates. An FDR adjustment  can control the false discovery levels in the presence of poor quality chip data, but chips of poor quality appear to substantially increase the number of genes called as false negatives, as does reducing the numbers of biological replicates . This is a result of increased variance reducing the power to detect differential expression [24, 25]. Archer et al.  suggest that one way of rescuing some data from these poor quality chips is to consider only those probesets for which all probes fall within a defined distance of the 3' end of the gene, although this fails to take into account the complexity of RNA degradation, which is influenced by mRNA higher order structure as well as length [27, 28].
All the other variables in the first canonical correlation have correlations of < 0.4 but users should note that lower RNA integrity and yield predict lower %P calls, lower median intensity and higher measures of signal variation. The second canonical correlation shows much weaker relationships and a big difference between RIN and SUBQUAL relationships to the post-chip variables. Good yields combined with poor subjective quality appear to predict higher background measures as though the entire signal, including the noise, has been amplified.