From: A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database

Correlations between Classic and refRMA data trained with GEO samples. Randomly selected samples from normal GEO samples of mutiple organs were used to train refRMA models at each of the sample sizes (I) indicated and were applied to 50 exclusive test set samples. Correlation for each test sample summarized by refRMA was calculated relative to the same test sample summarized by Classic RMA (i.e., trained on the 50 test set samples) using all probe sets. The mean correlation across the 50 test samples was then calculated. This process was repeated 100 times using random selection of both training and test sets to yield the correlation distributions shown as box plots. The entry shown as "DB" is the Full refRMA model trained on 1,614 samples from Gene Logic's reference database. Note that the GEO-based models, where test set experiments are not completely exclusive of training set, show slightly higher correlations than the Full refRMA model, which represents complete exclusivity of training and test sets.

