The aim of our multi-platform integration method is to select a set of significant biomarkers that are involved in a biological process and thus behave differently in the treatment group and the control group. In order to combine statistical evidence across different platforms, our method requires that analogous hypotheses based on the features being measured are formulated for each platform. Each null analogous hypothesis specifies the unrelatedness of the biomarker in that particular experimental setting, but all of them infer the unrelatedness of the biomarker to the biological process being investigated. Based on the set of Q analogous hypotheses for Q data sources, we construct a set of Q corresponding test statistics for each type of data. The test statistics can be different and tailored to the specific experimental settings. For example, if the microarray experiment has a multifactorial design, the appropriate test statistic can be an F statistic based on an ANOVA test. If the proteomics experiment generates counting data for diseased versus normal groups, the appropriate test statistic can be a nonparametric Wilcoxon rank sum test. A vector of observed statistics across multi-platforms is obtained. We then randomly permute data across diseased and control groups. All measurements from different platforms are permuted. In this way, we obtain an empirical null distribution of the vector of test statistics. In order to pool the randomized values of the statistics across the biomarkers to form the empirical null distribution, we assume data from different biomarkers are independent or have an exchangeable correlation structure. For the validity of the randomization procedure, we assume an exchangeable covariance structure for the measurements within each platform. Finally, we construct a weighted sum of the test statistics across different platforms with the weights being the inverse of the empirical standard deviation of each statistic. We determine a set of significant biomarkers based on the aggregated test statistic.

In the following, we demonstrate our method by integrating microarray expression data and proteomic data as an example. We consider two experiments, the first having microarray expression data measured on *l*
_{1} diseased samples and *l*
_{2} control samples and the second having proteomic data measured on *m*
_{1} diseases samples and *m*
_{2} control samples. The objective is to find biomarkers significantly involved in disease development.

Step 1): Define two analogous null hypotheses. For microarray data, the null hypothesis would be *H*
_{01}: the gene’s mRNA level is the same in diseased and normal populations; for proteomic data, the null hypothesis would be *H*
_{02}: the protein level is the same in diseased and normal populations.

Step 2): Based on the hypotheses, construct two test statistics,

*t*
_{
m
} and

*t*
_{
p
}, tailored to each type of data. Consequently, we obtain a vector of two observed statistics (

*t*
_{
m
},

*t*
_{
p
})

^{
′
} across two data platforms. The test statistics can be of any type as long as they summarize information from the data and can be used to assess the statistical significance of the data toward the hypotheses. Let

denote the

*l*
_{1} gene expression measurements in the disease group,

denote the

*l*
_{2} gene expression measurements in the control group,

, and

. Similarly,

denotes the

*m*
_{1} protein measurements in the disease group and

denotes the

*m*
_{2} protein measurements in the control group,

, and

. For illustration purpose, we adopt Student’s t-statistic for each of the data:

where *s*
^{2} denotes the sample variance. The test statistics should be formulated so that a larger test statistic in the positive direction indicates more evidence towards the alternative hypotheses. For example, if Student’s t-statistic is used, then a one-sided alternative hypothesis corresponds to a one-sided t-statistic, whereas the two-sided alternative leads to the absolute value of the t-statistic. Consider *n* genes being measured in the experiments and we obtain *n* vectors of test statistics (*t*
_{
mi
},*t*
_{
pi
})^{
′
}, *i* = 1,…,*n*, from the data sets.

Step 3): The samples are randomly permuted across diseased and control groups. If the same sample is being measured across different platforms, all the measurements from the different platform are permuted simultaneously. The simultaneous permutation preserves the dependency relationship among the measurements from different platforms. Based on random permutation, we obtain an empirical null distribution of the vector (*t*
_{
m
},*t*
_{
p
})^{
′
}.

Step 4): The aggregated test statistic will be:

where

and

are the estimated standard deviations of

*t*
_{
m
} and

*t*
_{
p
} based on the empirical null distribution, and

*t*
_{
m
} and

*t*
_{
p
} are the observed t-statistics or the absolute values of the t-statistics based on the direction of the alternative hypotheses. At significance level

*α*, we choose a threshold

*C*
_{
α
}, such that

. Specifically,

*C*
_{
α
} is the 100(1−

*α*)

*%* percentile of

*t*
_{
A
}, which can be obtained from the empirical null distribution. Construct a decision line that separates selected significant biomarkers and nonsignificant biomarkers. The resulting separation line is:

All the biomarkers with (*t*
_{
m
},*t*
_{
p
}) above the separation line will be declared as significantly involved in the disease development.

In the more general case, suppose we have Q data platforms with the observed test statistics (

*t*
_{1},…,

*t*
_{
Q
})

^{
′
}. From random permutation, we obtain the joint empirical distribution of this vector of test statistics under the global null hypothesis. Let

denote the estimated variance of the individual test statistics.The aggregated test statistic takes the form:

The resulting critical region will take the form:

where *C*
_{
α
} is the 100(1−*α*)*%* percentile of *t*
_{
A
}. Any biomarker with *t*
_{
A
} > *C*
_{
α
} will be selected as behaving significantly differently between the diseased group and control group.

Our method aggregates actual values of the test statistics across different data platforms, which preserves more information compared to the rank aggregation method. Moreover, our method assigns different weights to each data set according to the variability of the test statistics: larger the variation in the test statistic, the smaller the weight assigned to it, and vice versa. The threshold *C*
_{
α
} is determined based on the empirical null distribution of the aggregated test statistics, which implicitly takes into account the dependency relationships among the test statistics. Furthermore, our method can deal with different data types and formats generated by various experimental settings.

There are two major ways to perform the multiplicity adjustment. The first is the Bonferroni correction. If we wish to control the familywise type I error rate at *α*
^{∗}, then the individual level *α* = *α*
^{∗}/*n*, where *n* is the total number of biomarkers. When *n* is large, the Bonferroni correction leads to very stringent tests with *α* being very small. Alternatively, we can control the number of false discoveries. To set the number of false discoveries to be equal to or less than *f* , then
, where
is the estimated proportion of non-differentially expressed biomarkers. If there is no
available, we use
and that gives a conservative value for *α*.

Different platforms can be used to test different sub-hypothesis. All of these sub-hypotheses should be concordant in supporting the overall biological hypothesis. For example, the involvement of a gene in disease development can be supported by both mRNA expression level changes and proteomic level changes. In most cases, changes in measurements from different platforms are expected to occur in the same direction. However, our method is also applicable even if the changes are in different directions, as long as the statistical evidence from both sources can be combined. For example, consider *H*
_{10}: mRNA is increasing in normal group; *H*
_{20}: antibody count is decreasing in normal group. Even though the actual measurements from two platforms are negatively correlated, we can construct the test statistics *t*
_{1} and *t*
_{2} so that the positive value of the statistics supports the alternative hypotheses and the weighted average can be used as combined evidence of the involvement of the biomarker in the process.