Validation and characterization of DNA microarray gene expression data distribution and associated moments

BMC Bioinformatics

Table 2 Fraction of null hypotheses rejected by the Anderson-Darling tests for best fit to 7 distribution functions.

*Dataset*	Probe set no.	Normal	Weibull	Extreme Value	Logistic	Lognormal	Log-logistic	At least one of the distributions not rejected
Craniofacial	6215	0.72	0.79	0.82	0.69	0.75	0.71	0.46
Liver	6228	0.82	0.93	0.95	0.79	0.83	0.8	0.25
Brain	25146	0.77	0.92	0.93	0.69	0.77	0.68	0.35
Housekeeping	23	0.7	1	1	1	0.48	1	0.52
Male	19532	0.46	NA	0.96	0.24	NA	NA	0.82
Female	18915	0.43	NA	0.96	0.21	NA	NA	0.85

The fraction is the number of probe sets that reject a given hypothesis out of the number of the probe sets (that is given in the second column). The number of probe sets in the second column were the ones that were assumed to be unaffected by the conditions involved in the generation of all samples in each of the six data sets - "Craniofacial", "Liver", "Brain", "Housekeeping", "Male" and "Female". Probe sets were deemed to be unaffected for the first three data sets using the Kruskal Wallis test as described in the Methods section. The "Housekeeping" data set had 6219 samples and the 23 probe sets analyzed corresponding to the so-called housekeeping genes [47] that are supposed to be essential for cell-survival under most conditions. The probesets for the "Male" and "Female" data sets were identified using the procedure detailed in the Methods section. Note some of the log ratio data for the "Male" and "Female" data sets are negative and so cannot be tested for goodness-of-fit to some of the distributions. The results for these distributions are listed as "NA".

ISSN: 1471-2105