Novel and simple transformation algorithm for combining microarray data sets
 KiYeol Kim^{1},
 Dong Hyuk Ki^{2, 3, 4},
 Ha Jin Jeong^{2, 3, 4},
 HeiCheul Jeung^{2, 5},
 Hyun Cheol Chung^{2, 3, 4, 5, 6} and
 Sun Young Rha^{2, 3, 4, 5, 6}Email author
DOI: 10.1186/147121058218
© Kim et al; licensee BioMed Central Ltd. 2007
Received: 20 December 2006
Accepted: 25 June 2007
Published: 25 June 2007
Abstract
Background
With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis.
Results
Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set.
Conclusion
Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.
Background
DNA microarrays are a useful tool for the study of complex systems and have applications in a wide variety of biological sciences. Despite their usefulness, however, systematic biases caused by different handling procedures present a challenge. Microarray experiments are often performed over many months, and samples are often collected and processed at different institutions. Further, the samples may be assayed using different microarray print batches or platforms, or using different array hybridization protocols. When two microarray data sets are directly compared, systematic biases arising from variability in experimental conditions can be erroneously detected as differences in gene expression patterns. Such systemic biases present a substantial obstacle in the analysis of microarray data. However, due to the limited numbers of available microarray experiments, the motivation to use an entire data set regardless of platforms or experimental procedure is increasing. Therefore, it is necessary to investigate new methods that can effectively combine microarray data sets which were derived from different experimental environments, while simultaneously minimizing systematic bias.
A commonly utilized method to integrate microarray data sets is to focus on the differential expression, i.e. comparing significantly expressed genes selected separately from each data set [1–7]. Another type of comparison examines the variability in gene expressions between human and mouse data sets combining the different microarray platforms [4]. These studies exploit multiple data sets, rather than a single data set, in order to obtain more robust result. Some studies overcome the limitations of a single microarray data set using integration technique, since integration of separate data sets has the similar effect as increasing sample size [8]. However, a suitable integration method has not yet been established. Indeed, some studies suggest that microarray data sets derived from different experimental processes cannot be combined directly, as they are poorly correlated with each other [9].
Recently, the practice of integrating data sets prior to selecting significant genes was introduced and standardization has been used for this as the simplest method [10]. Singular Value Decomposition (SVD) corrects systematic bias of data sets and has been used in yeast cell cycle experiments [11] and in data sets containing samples from many soft tissue tumors [12]. Although SVD is a useful method for determining the direction of large variations so that systematic effects can be removed, it has been suggested that SVD is inappropriate for cases where the magnitude of the systematic variation is similar to the components of other variations [13]. Alternatively, Distance Weighted Discrimination (DWD), which is a modified form of SVM that adjusts for systematic effects, performed well and could eliminate source effects [13]. However, DWD could not regulate the dispersion of different data sets.
A method that transforms the distributions of gene expressions of two data sets similarly was proposed [14]. However, this method did not consider biological differences between the two different experimental groups, such as normal and tumor, because they used the average expression value of these two groups to define a reference sample. A recent study introduced an ANOVA, Analysis of Variance, model to select discriminative genes from several datasets derived from different experimental environments [15]. This method can be flexible to consider any clinical variables as well as genetic information including several effect factors, which represent experimental conditions. But, with this method, we can not evaluate how well datasets are intermixed, and explore expression patterns of any interesting genes in combined data set. Therefore, we suggested a method to effectively integrate different experimental environments and evaluated its efficiency using mixture score.
Results
Seventyeight experiments (43 tumor and 35 normal) from data set A and 76 experiments (39 tumor and 37 normal) from data set B were used in this study. The whole data set included missing entries in the range of 448 to 1298 genes for each experiment. A total of 12293 genes without missing entries were used for further analysis.
Exploration of expression patterns of two data sets before transformation
Exploration of expression patterns of combined data sets after transformation
Data set A had higher withingroup correlation than data set B prior to integration, indicating that experiments in data set A had more homogeneous expression pattern than data set B (Figure 4). The tumor groups had lower correlations than the normal groups in both data sets, indicating that there were larger variations within the tumor groups than the normal groups. Withingroup correlations of the combined data sets after transformation were increased comparing to the result shown in Figure 4 (Figure 7). The homogeneity of the combined data set A'B was similar to that of data set B, because data set A'B was resulted from the transformation of data set A into the expression pattern of data set B with low homogeneity within group. Similarly, data set AB' preserved relatively strong homogeneity, by transforming into expression patterns of data set A with higher correlation than data set B. Data set A'B', which was transformed by the weighted average of the dispersions of the two data sets, had an average degree of homogeneity of A'B and AB', but still had a low correlation in the tumor group. The data sets integrated by the proposed transformation methods had higher correlations within groups than the integrated data set without transformation, indicating that the proposed transformation methods effectively preserved homogeneity within groups of separate data sets.
Comparison of significant gene sets selected by transformation methods
Table 1 shows the number of differently expressed genes using ttest for each transformation method. Adjusted pvalues of 0.05 and 0.01 by Bonferroni correction were used as the significant levels.
Comparison of the numbers of significant genes.
Data set (method)/α  0.05  0.01 

A  2325  2005 
B  1654  1367 
AB  2848  2488 
A'B  3429  3088 
AB'  4257  3868 
A'B'  3453  3102 
ANOVA  3337 (3302)  3004 (2961) 
We compared the degree of concurrency in top 500 significant genes which are selected from combined data sets. Two hundred fifty genes out of top 500 genes selected from data set A were consistent with gene set selected from data set B. Among 750 genes, which is union gene set of top 500 genes of data set A and data set B, 496, 492, and 488 genes were consistent with top 500 genes of A'B, AB' and A'B', respectively. And 457 genes were consistent with top 500 genes of AB, indicating that our transformation methods preserved the original biology of the data sets.
Descriptions of 8 genes which were selected from AB' but A or B.
Gene ID  UniGene ID  Symbol  Gene name  Chromosomal Location 

AI972269  Hs.556600  MYLK  Myosin, light chain kinase  3q21 
AA447632  Hs.75819  GPM6A  Glycoprotein M6A  4q34 
AA485871  Hs.286226  MYO1C  Myosin IC  17p13 
AI266457  Hs.527860  Transcribed locus  12  
AI383497  Hs.189409  FNBP1  Formin binding protein 1  9q34 
AA463926  Hs.444403  PPP1R12B  Protein phosphatase 1, regulatory (inhibitor) subunit 12B 1q32.1  1q32.1 
AI524093  Hs.460109  MYH11  Myosin, heavy chain 11, smooth muscle  16p13.11 
AA213816  Hs.369574  CDC42EP3  CDC42 effector protein (Rho GTPase binding) 3  2p21 
 (1)
Take random subsamples from data set A and data set B (such as 5 tumor and 5 normal tissues from each set) without replacement. This process was repeated 10 times with same sample size for reducing sampling bias.
 (2)
Find the 500 significant genes list from the subsample of A.
 (3)
Find the 500 significant genes list from (subsample A)(subsample B) ', which is a combined data set with subsample of A and transformed data set of subsample of B. (4) Compare (2) and (3) with 500 significant genes list selected from data set A.
The gene sets with a similar OOB (outofbag) error rate were compared (information of each gene set not shown), and none of the gene sets had 0% OOB error regardless of the transformation methods. Only two significant genes of data set AB' had 0.65% OOB error. Even if the number of significant genes was increased to 500, the OOB error rate did not decrease. The prediction accuracies of top 500 significant genes were compared, and the prediction accuracies were 100% for all of transformation methods (data not shown). In this case, integrated data sets by each transformation method were used as training data sets to create classifiers, and separated data sets were used for testing these classifiers.
Discussion
An inescapable problem with combining several microarray data sets is the variation of expressions between data sets. In cases where the microarray analyses are from different experimental conditions, integration without transformation may skew the expression ratios of the same genes from different data sets. When the experimental bias exceeds biological variation, the use of microarray data sets without adjustments for this bias may make biological variation unidentifiable, meaning that reliable results cannot be obtained. In addition, due to the limited numbers of available microarray experiments, the motivation to use the whole data set, regardless of platforms or experimental procedure, is increasing.
We attempted to minimize experimental bias by transforming the expression ratios of the data sets such that they have similar expression patterns in the corresponding experimental groups of different data sets. Compared with previous studies, the proposed transformation method is a relatively simple algorithm [11–13], furthermore that showed good performance in various evaluation methods. While a previous study used a reference sample with average expression of whole experiments including normal and tumor groups [14], the proposed method considers biological differences that can be existed between different experimental groups by transforming expression ratios for each experimental group separately.
Our method transforms expression ratios by three approaches. In A'B, data set B was used as a reference data set, data set A in AB'. A'B' is a combined data set after transforming both of two data sets using pooled standard deviation. In selecting the reference data set, we did not consider biological meaning and we rather compared the effects of transformation with diverse references. When a method used a data set with strong homogeneity as a reference data set, its performance was slightly better than other transformation methods as shown in Figure 9, but there were no significant differences in efficiencies among them, thus allowing a biological evaluation of the significant genes of data sets integrated by each transformation method.
Using a data set with homogeneity as a reference data set, we observed that such characteristics are preserved in the combined data set more strongly. Further, two separated data sets can be well intermixed in a combined data set because there are more chances that k NNs of a experiment in one data set includes experiments of the other data set (AB' in Figure 9). Also, the mean difference of expressions between two experimental groups can be larger as the homogeneity increases within the group. Therefore, larger number of significant genes can be selected from AB' as shown in Table 1.
RNA source effect in gene expression ratios was detected in more than 5000 genes before transformation and such effect was adjusted by the proposed methods.
Integration of datasets increases the sample size and improves the analytical accuracy and statistical power of the test. When focusing on significant gene selection, ANOVA can be a flexible model to consider any clinical information as well as genetic information with several effect factors representing experimental conditions [15]. However, this method is applied to each gene and does not create a combined data set for applying various statistical methods to be able to identify additionally useful biological information, such as gene expression patterns through whole gene set. Therefore, for a given experimental question, i.e. complex genetic information including expression patterns, it is useful to integrate data sets by transformation prior to specific analysis.
The proposed integration method preserves the expression patterns of two data sets similar in corresponding experimental groups, transforming the location and the scale of the expression ratios and this method is available to any data set with more than two groups. We confirmed that the transformed data sets obtained from different experimental environments were well intermixed, meaning that the experimental bias was reduced. And most genes among top 500 genes, which were selected from combined data sets after transformation, were consistent with top 500 genes selected from two original data sets. This means that our method preserves original biology of two data sets. In addition, we detected colorectal cancer related genes which might be dropped in separated data sets by using a combined data set. By simulation study, we confirmed that the proposed method can detect more reliable information from a combined data and it is more effective in small data sets derived from different experimental conditions.
Conclusion
This method may not be appropriate when the different experimental features in data sets include biological variations (for example, early disease stages of I and II in data set A and advanced disease stages of III and IV in data set B) because the expression values of a specific experimental group are transformed into the form of the corresponding experimental group of a reference data set. Thus, we suggest that the proposed integration method is useful when each data set includes phenotypically or biologically homogenous experimental groups.
In conclusion, our method is simple and useful to combine several datasets experimented under different experimental conditions and available to any data set including more than two groups. With this method, biologically useful information can be detectable by applying various analytic methods to combined data set with increased sample size.
Methods
Tissue sample preparation
A total of 154 colorectal tissue samples (82 tumor and 72 normal) were obtained from colorectal cancer patients who had undergone surgery at the Severance Hospital, Yonsei University College of Medicine, Seoul, Korea. Informed consent was obtained from patients prior to using their surgical specimens and clinicopathologic data for research purposes. Fresh tissues obtained from patients were snapfrozen and stored at 80°C.
Microarrays
Total RNA was extracted from the tissues using Trizol (Invitrogen, USA) and then purified using an RNeasy kit (Qiagen, Germany). The purified RNA samples were divided into two groups for gene expression profiling using total RNA and amplified mRNA. Gene expression profiling using total RNA samples consisted of 20 paired normal and tumor colon tissue samples, 23 tumor samples, and 15 normal colon tissues. This data set is used by data set A in this study. Of the remaining 34 paired samples, 5 tumor and 3 normal colon tissues were used for gene expression profiling with amplified mRNA, which was obtained using the linear T7 mRNA amplification method with the Megascript T7 kit (Ambion, USA). This data set is used by data set B in this study. Each sample of total RNA (50 ug) and amplified mRNA (2 ug) was directly labeled with Cy5dUTP and transcribed to cDNA. The microarray experiment was performed using a reference design with the Cy3 dUTP labeled Yonsei reference RNA [19]. We used the 17K human cDNA microarray (GenomicTree Co., Daejon, Korea) for probe hybridization based on the Yonsei Cancer Metastasis Research Center (CMRC, Yonsei University, Korea) protocol [19]. Following hybridization, microarrays were scanned using a GenePix 4000B (Axon Ins., USA) and images were analyzed using GenePix Pro 4.0 (Axon Ins., USA).
These two microarray data sets have only difference on RNA source. Previous studies have concluded that it is vital to use equally treated samples for any particular study, and all other samples should be amplified when one sample requires amplification. In addition, the sensitivity to detect differential gene expression from microarray data set using amplified RNA was also different compared to using total RNA [20, 21]. Therefore, we used these two data sets for evaluating our method.
Data normalization
Expression intensities were normalized such that they would have similar distributions across a series of arrays. In this study, the MAD (medianabsolutedeviation) scale estimator was used as a robust estimate of scale, and both Avalues, as well as the Mvalues, were normalized. Withinslide and between slide normalization were used to transform expression values to make intensities consistent within each array and transform expression values to achieve consistency between arrays, respectively. It was necessary to apply betweenslide normalization to the expression data because there were different dispersions between arrays after withinslide normalization. The normalization process was executed using the 'limma' library of the R package [22].
Data transformation for combining data sets
The gene expression intensities of each data set were transformed based on the reference data set by the following three different methods to have similar expression patterns in corresponding experimental group.
where AN', AT': transformed expression ratios of normal and tumor groups in data set A.
AN, AT: normal and tumor groups in data set A.
$\overline{BN}$, $\overline{BT}$: mean expression ratios of tumor and normal groups in data set B.
sd(AN), sd(AT), sd(BN), sd(BT): standard deviation of expression ratios of tumor and normal groups in data set A and B.
where BN', BT': transformed expression ratios of normal and tumor groups in data set B.
BN, BT: normal and tumor groups in data set B.
$\overline{AN}$, $\overline{AT}$: mean expression ratios of tumor and normal groups in data set A.
where $\overline{N}$, $\overline{T}$: mean expression ratios of normal and tumor groups in data set A and data set B.
$sd(N)=\sqrt{\frac{({n}_{AN}1)sd{(AN)}^{2}+({n}_{BN}1)sd{(BN)}^{2}}{{n}_{AN}+\phantom{\rule{0.5em}{0ex}}{n}_{BN}2}}$: pooled standard deviation of the normal group
$sd(T)=\sqrt{\frac{({n}_{AT}1)sd{(AT)}^{2}+({n}_{BT}1)sd{(BT)}^{2}}{{n}_{AT}+\phantom{\rule{0.5em}{0ex}}{n}_{BT}2}}$: pooled standard deviation of the tumor group
n_{ AN }, n_{ BN }, n_{ AT }, n_{ BT }: number of experiments of AN, BN, AT and BT.
Evaluation of transformation method
We evaluated our proposed integration method by several plots and mixture score, defined to evaluate the efficiency of the integration method proposed in this study. The principle of this metric is to measure how many knearest neighbors (kNNs) of data set B in combined data set belong to data set A. The metric was calculated as follows, where k is the number of nearest neighbors (NNs).
Mixture score= #{x/x ∈ k NNs(data set B) n (data set A)}/k
where x is any experiment belonging to kNNs(data set B) and data set A.
The mixture score ranges from 0 to 1. A value close to 0.5 is indicative of two different data sets that are perfectly intermixed. Conversely, values close to either 0 or 1 indicate a poor level of intermixing between the two different data sets.
 (1)
Generate n datasets of bootstrap samples {B_{1}, B_{2},..., B_{ n }} by allowing repetition of the same sample.
 (2)
Use each sample B_{ k }to construct a Tree classifier T_{ k }to predict those samples that are not in B_{ k }, called outofbag (OOB) samples. These predictions are called outofbag estimators.
 (3)
Final prediction is the average of outofbag estimators over all bootstrap samples and we get average of them which is overall classification error (OOB error).
ANOVA (Analysis of Variance) model was used to evaluate the RNA source effect of data sets derived from different experimental conditions. ANOVA model used in this work is as following.
g_{ ijk }= μ + T_{ i }+ R_{ j }+ (TR)_{ ij }+ ε_{ ijk }, ε_{ ijk }~ N(0, σ^{2}), i = 1, 2. j = 1, 2. k = 1, 2,...,154.
where g_{ ijk }is k^{ th }expression ratio of a gene in i^{ th }treatment and j^{ th }RNA source. T_{ i }, R_{ j }and (TR)_{ ij }represent treatment effect, RNA source effect and interaction effect, respectively.
Abbreviations
 SVD:

Singular Value Decompositin
 SVM:

Support Vector Machine
 KNN:

K Nearest Neighbors
 NNs:

Nearest Neighbors
 ANOVA:

Analysis Of Variance
 OOB error:

Out Of Bag error
 HCA:

Hierarchical Cluster Analysis
Declarations
Acknowledgements
This study was supported by a grant of the Korea Health 21 R&D Project, Ministry of Health & Welfare (0405BC0106040002), and Korea Research Foundation Grant funded by Korean Government (KRF2005005J05904). We thank the members of National Biochip Research Center, Yonsei University, and the Genomic Tree Incorporation, Korea for the current project.
Authors’ Affiliations
References
 Breitling R, Sharif O, Hartman ML, Krisans SK: Loss of compartmentalization causes misregulation of lysine biosynthesis in peroxisomedeficient yeast cells. Eukaryot Cell. 2002, 1: 978986. 10.1128/EC.1.6.978986.2002.PubMed CentralView ArticlePubMedGoogle Scholar
 Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics(Suppl). 2003, 19: I84I90. 10.1093/bioinformatics/btg1010.View ArticleGoogle Scholar
 Detours V, Dumont JE, Bersini H, Maenhaut C: Integration and crossvalidation of highthroughput gene expression data: Comparing heterogeneous data sets. FEBS Letters. 2003, 546: 98102. 10.1016/S00145793(03)005222.View ArticlePubMedGoogle Scholar
 Lee PD, Sladek R, Greenwood CM, Hudson TJ: Control genes and variability: Absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Research. 2002, 12: 292297. 10.1101/gr.217802.PubMed CentralView ArticlePubMedGoogle Scholar
 Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis in primary solid tumors. Nat Genet. 2003, 33: 4954. 10.1038/ng1060.View ArticlePubMedGoogle Scholar
 Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Metaanalysis of microarrays: Interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002, 62: 44274433.PubMedGoogle Scholar
 Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA. 2004, 100: 84188423. 10.1073/pnas.0932692100.View ArticleGoogle Scholar
 Choi JK, Choi JY, Kim DG, Choi DW, Kim BY, Lee KH, Yeom YI, Yoo HS, Yoo OJ, Kim SS: Integrative analysis of multiple gene expression profiles applied to liver cancer study. FEBS Letters. 2004, 565: 93100. 10.1016/j.febslet.2004.05.087.View ArticlePubMedGoogle Scholar
 Kuo WP, Jenssen TK, Butte AJ, Lucila OM, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics. 2002, 18 (3): 405412. 10.1093/bioinformatics/18.3.405.View ArticlePubMedGoogle Scholar
 Kim KY, Chung HC, Jeung HC, Shin JH, Kim TS, Rha SY: Significant gene selection using integrated microarray data set with batch effect. Genomics & Informatics. 2006, 4 (3): 110117.Google Scholar
 Alter O, Patrick OB, David B: Singular value decomposition for genomewide expression data processing and modelling. Proc Natl Acad Sci USA. 2000, 97: 1010110106. 10.1073/pnas.97.18.10101.PubMed CentralView ArticlePubMedGoogle Scholar
 Nielsen TO, West RB, Linn SC, Alter O, Knowling MA, O'Connell J, Zhu S, Fero M, Sherlock G, Pollack JR, Patrick OB, Botstein D, Rijn M: Molecular characterisation of soft tissue tumours: a gene expression study. Lancet. 2002, 359: 13011307. 10.1016/S01406736(02)082703.View ArticlePubMedGoogle Scholar
 Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, Marron JS: Adjustment of systematic microarray data biases. Bioinformatics. 2004, 20: 105114. 10.1093/bioinformatics/btg385.View ArticlePubMedGoogle Scholar
 Jiang H, Deng Y, Chen HS, Tao L, Sha Q, Chen J, Tsai CJ, Zhang S: Joint analysis of two microarray geneexpression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 2004, 5: 8110.1186/14712105581.PubMed CentralView ArticlePubMedGoogle Scholar
 Park TS, Yi SG, Shin YK, Lee SY: Combining multiple microarrays in the presence of controlling variables. Bioinformatics. 2006, 2 (14): 16821689. 10.1093/bioinformatics/btl183.View ArticleGoogle Scholar
 Kemp Z, CarvajalCarmona L, Spain S, Barclay E, Gorman M, Martin L, Jaeger E, Brooks N, Bishop DT, Thomas H, Tomlinson I, Papaemmanuil E, Webb E, Sellick GS, Wood W, Evans G, Lucassen A, Maher ER, Houlston RS: Evidence for a colorectal cancer usceptibility locus on chromosome 3q21q24 from a highdensity SNP genomewide linkage scan. Human Molecular Genetics. 2006, 15 (9): 29032910. 10.1093/hmg/ddl231.View ArticlePubMedGoogle Scholar
 Andersen LC, Wiuf C, Kruhøffer M, Korsgaard M, Laurberg S, Ørntoft TF: Frequent occurrence of uniparental disomy in colorectal cancer. Carcinogenesis. 2007, 28 (1): 3848. 10.1093/carcin/bgl086.View ArticlePubMedGoogle Scholar
 Colebatch A, Hitchins M, Williams M, Meagher A, Hawkins NJ, Ward RL: The role of MYH and microsatellite instability in the development of sporadic colorectal cancer. British Journal of Cancer. 2006, 95: 12391243. 10.1038/sj.bjc.6603421.PubMed CentralView ArticlePubMedGoogle Scholar
 Kim TM, Jeong HJ, Seo MY, Kim SC, Cho G, Park CH, Kim TS, Park KH, Chung HC, Rha SY: Determination of Genes Related to Gastrointestinal Tract Origin Cancer Cells Using a cDNA Microarray. Clinical Cancer Research. 2005, 11: 7986.PubMedGoogle Scholar
 Feldman AJ, Costouros NG, Wang E, Qian M, Marincola FM, Alexander HR, Libutti SK: Advantages of mRNA amplification for microarray analysis. Biotechniques. 2002, 33 (4): 906914.PubMedGoogle Scholar
 Schneider J, Buneß A, Huber A, Volz J, Kioschis P, Hafner M, Poustka A, Sültmann H: Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments. BMC Genomics. 2004, 5: 2910.1186/14712164529.PubMed CentralView ArticlePubMedGoogle Scholar
 R: A language and environment for statistical computing. [http://www.Rproject.org]
 Breiman L: Random Forests. 2001, Statistics Department, University of California, Berkeley, 133.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.