Two-part permutation tests for DNA methylation and microarray data

Background One important application of microarray experiments is to identify differentially expressed genes. Often, small and negative expression levels were clipped-off to be equal to an arbitrarily chosen cutoff value before a statistical test is carried out. Then, there are two types of data: truncated values and original observations. The truncated values are not just another point on the continuum of possible values and, therefore, it is appropriate to combine two statistical tests in a two-part model rather than using standard statistical methods. A similar situation occurs when DNA methylation data are investigated. In that case, there are null values (undetectable methylation) and observed positive values. For these data, we propose a two-part permutation test. Results The proposed permutation test leads to smaller p-values in comparison to the original two-part test. We found this for both DNA methylation data and microarray data. With a simulation study we confirmed this result and could show that the two-part permutation test is, on average, more powerful. The new test also reduces, without any loss of power, to a standard test when there are no null or truncated values. Conclusion The two-part permutation test can be used in routine analyses since it reduces to a standard test when there are positive values only. Further advantages of the new test are that it opens the possibility to use other test statistics to construct the two-part test and that it avoids the use of any asymptotic distribution. The latter advantage is particularly important for the analysis of microarrays since sample sizes are usually small.


Background
The addition of a methyl group at the carbon-5 position of cytosine is a modification of DNA called DNA methylation. In mammalian cells, DNA methylation is essential for proper development [1]. The methylation patterns of tumor cells are altered compared to those of normal cells, moreover, there are also differences between different types of cancer as shown for subtypes of leukemia [2] and lung cancer [3]. Thus, DNA methylation analysis promises to become a powerful tool in cancer diagnosis [4].
DNA methylation data can be obtained using the Methy-Light technology [5]. When the tested region is not or only partially methylated the result is negative (undetectable methylation, null values). In contrast, samples that show methylation will have a value greater than 0 [4]. Thus, DNA methylation data obtained with MethyLight have a clump of zero observations and a continuous nonzero part. For such a data structure, two-part models as proposed by Lachenbruch [6][7][8] are applicable. In that approach, the test statistic is the sum of two squared statistics, one comparing the proportions of zeros and one comparing the positive values. For example, one can use the binomial test and the Wilcoxon rank sum test. The asymptotic null distribution of the sum of the squares of the two test statistics is χ 2 with two degrees of freedom (df = 2).
In microarray data it is relatively common that small and negative expression levels were clipped-off to be equal to an arbitrarily chosen cutoff value. Recent examples used different cutoff values: 1, 20, and 50, respectively [9][10][11]. Aside from the fact that negative values make no biological sense, there are, with regard to oligonucleotide arrays from Affymetrix, two primary reasons for truncating the values [12]. First, spots at the low intensity range are generally more vulnerable to noise, thus, it is thought that the technology produces a poor discrimination at low levels of expression [13]. Second, the focus is on expression of identified genes and expressed sequence tags. Differences at negative or low values may result from differences in binding to the mismatch probes. Since it is generally not known what binds to the mismatch probes, the differences at negative or low values cannot be attributed to target genes.
Due to the truncation there are two different types of data: truncated values and original observations. Since the truncated values are not just another point on the continuum of possible values, it would be inappropriate to use a standard statistical method that would treat all values equally [14]. The two different types of data should be analyzed separately. Therefore, the two-part model, comparing the proportions of truncated values and the distribution of positive values, is applicable. Note that negative expression levels are not possible when the Affymetrix Microarray Suite (MAS) 5.0 software is used. However, small values are possible and may be truncated.
Since, in microarray experiments, the sample sizes, i.e. the numbers of replications, are usually very small [15,16], the use of the asymptotic distribution of Lachenbruch's two-part test statistic may be questionable. Thus, we carried out permutation tests with the two-part statistic. For a permutation test all possible permutations under the null hypothesis are generated. In our situation we permute the group labels for the whole sample, i.e. for truncated values (or the null values in case of methylation data) and original observations. Then, the test statistic is calculated for each permutation. The null hypothesis can then be accepted or rejected using the permutation distri-bution of the test statistic, the p-value being the probability of the permutations giving a value of the test statistic as supportive or more supportive of the alternative than the observed value [17,18]. Thus, inference is based upon how extreme the observed test statistic is relative to other values that could have been obtained under the null hypothesis.
We found that, in the case of a two-part model, the permutation test is not only a way to avoid the use of an asymptotic distribution, but also is a more powerful test, i.e. a test that produces, on average, smaller p-values. In addition, the permutation test reduces, without any loss of power, to a single test if no truncated (or null) values were present. Thus, the proposed test is applicable in routine use whether or not truncated (or null) values occur. After the definition of the tests in the following section, we present our findings for DNA methylation data and microarray data. We then confirm the results using simulations.

Two-part tests
As briefly mentioned above a two-part test statistic is the sum of two squared statistics, one comparing the proportions of truncated values and one comparing the positive values. Let n 1 and n 2 be the numbers of independent observations regarding one gene (or one region in case of methylation data, respectively), for two groups to be com- For the second part in Lachenbruch's two-part model one can use different tests, Lachenbruch [6][7][8] considered the Wilcoxon rank sum test, Student's t test, and the Kolmogorov-Smirnov test. The use of the latter test in a twopart model, however, was too liberal, the type I error rate was close to 0.065 (for a significance level of α = 0.05 [7,8]  original analyses of the data we use below [3,11]. Nonparametric tests based on ranks are more appropriate for nonnormally distributed data such as microarray data [19]. The test statistic for the two-part test is Under the null hypothesis of no difference between the groups, X 2 is asymptotically χ 2 -distributed with df = 2 [6,7]. Alternatively, a permutation test can be performed with the statistic X 2 . This test, called two-part permutation test here, is a permutation test based on the sum statistic X 2 . It is carried out by permuting the group labels for the whole sample. Thus, all observations, truncated and non-truncated values (or null and positive values in case of methylation data) are reallocated to the groups. When performing this two-part permutation test the exact permutation distribution of X 2 is determined. This distribution, computed by generating all possible permutations or, for the p-values given below in Table 1, using a simple random sample of 20,000 permutations, is used to compute the p-value.
Since it is a permutation test based on the sum X 2 , it is neither necessary to determine the permutation distributions of the summands B 2 and W 2 nor to calculate the p-values of the univariate tests related to B 2 and W 2 .

Application to actual methylation data
We use DNA methylation data from 7 regions and 87 lung cancer cell lines, 41 lines are from small cell lung cancer and 46 lines from non-small cell lung cancer [3,4]. The proportion of positive values for the different regions ranges from 39 to 100% for the small cell lung cancer and from 65 to 98% for the non-small cell lung cancer. The data are available at http://www-rcf.usc.edu/~kims/Sup plementaryInfo.html. Siegmund et al. [4] transformed the data by standardizing the positive values on the naturallog scale. However, for the tests applied here, this transformation has no influence.

Application to actual microarray data
Tschentscher et al. [11] performed an experiment with HG-U95Av2 oligonucleotide arrays in order to compare patients with uveal melanomas with and without monosomy 3. Expression values were calculated by use of the MAS 4.0 software. The data are available at http:// www.uni-essen.de/humangenetik/download. The sample size in this microarray experiment is 10 per group. As mentioned above, expression levels below 50 were set to 50. This data truncation occurred in 2,215 (28%) out of 7,902 genes. First, we consider these 2,215 genes. Figure 1 displays the number of genes for the different number of truncated values per gene.  Table 3, we consider the genes for which the p-value of the original twopart test is ≤ 0.1. Out of the 2,215 genes 514 remain. As shown in Table 3 the p-values of the two-part permutation test are, on average, distinctly smaller than those of the original two-part test.
For a large proportion of genes (72% in this data set) there are no truncated values. In that case, the two-part statistic reduces to the sum 0 plus the squared standardized rank sum W 2 . Of course, one has to define a priori whether the two-part test or the Wilcoxon test will be used to analyze  Number of genes with the given number of truncations per gene (data from the microarray experiment of Tschentscher et al. [11], only genes with at least one truncation)

Simulation study
The two different tests, the original two-part test and the two-part permutation test, were compared in a Monte Carlo simulation study performed using SAS version 8.2, 5,000 simulation runs were generated for each configuration. The sample size of 10 per group was chosen as in the microarray experiment presented above. In some configurations some randomly chosen values were set to 0 according to binomial distributions with the probabilities p 1 and p 2 , and the remaining observations were generated according to a lognormal distribution (with median 1 and σ = 1). Then, the values of one group were shifted if applicable. In some other configurations all observations were generated according to the lognormal distribution and, in one group, shifted. Then, values smaller than a cutoff value were truncated.
The type I error rates of the two tests are very similar. With e.g. p 1 = p 2 = 0.3 and a significance level of α = 0.05 the simulated type I error rates were 0.049 for both the original two-part test and the two-part permutation test. Table 4 displays results for situations with a difference between the two groups. As above, only those comparisons were regarded for which there is some indication of a difference, i.e. a p-value ≤ 0.1 of the original two-part test. In all considered configurations the median of the difference between the p-values of the original two-part test and the two-part permutation test is positive. The finding that the p-values of the permutation test are smaller corresponds to a higher power of this test. The power is given in Table 5, as shown the power of the twopart permutation test is at least as high as that of the original two-part test. There is only one exception, the latter test is slightly more powerful in one situation, i.e. when the proportion of zeros is higher in group 1 and the positive values are larger in group 1.

Discussion
Previous research demonstrated that a two-part test is appropriate and powerful in the presence of a clump of zero observations (i.e. truncated or null values). In this paper we propose a permutation test for such situations with two types of data. Usually, nonparametric tests can be performed based on an asymptotic distribution or based on a permutation null distribution. The two approaches often give similar results, especially when the sample sizes are large. However, in the case of a two-part test one cannot simply replace the asymptotic distributions of B 2 and W 2 by the exact permutation distributions. If so, one would compute two exact p-values although the aim of a two-part test is to receive one p-value that com- bines information from both parts. Therefore, the twopart permutation test uses the exact permutation distribution of the sum statistic X 2 . That this permutation distribution of the sum is generated rather than to simply replace the asymptotic distributions of the summands by their exact permutation distributions may be the reason why the permutation test is more powerful.
A disadvantage of a permutation test is that it can be computer-intensive. However, this issue is less relevant now due to faster algorithms [ [18], chap. 13] and the advent of high-speed PCs. Furthermore, one can carry out a permutation test based on a random sample out of the possible permutations, as we did for the DNA methylation data (see Table 1).
In microarray experiments it is common to investigate thousands of genes simultaneously. The approach presented here for the identification of differentially expressed genes is to consider a univariate testing problem for each gene. A correction for the multiplicity of genes is a subsequent step, that is, like the previous step of normalizing the data, outside the scope of this paper. A common approach to the multiplicity problem is to consider a procedure for testing the genes simultaneously for differential expression with the test on an individual gene being implied in the simultaneous test. For such a procedure different proposals have been made recently. For instance, there are methods based on the p-values of the tests from individual genes [21][22][23]. In a similar manner, the multiplicity of regions can be managed in DNA methylation data.

Conclusion
Aside from the shown improvement in power, the proposed two-part permutation test has three important advantages. First, it avoids the use of any asymptotic dis-tribution and, therefore, can safely be applied in case of small sample sizes that are common in microarray experiments. Second, it reduces without any loss of power to the exact Wilcoxon test if there were no truncated (or zero) values. Thus, it can be used in routine analyses. Third, the permutation test opens the possibility to use other tests to construct the two-part test. Thus, tests with unknown or non-standard null distributions can be used. For instance, one could replace the Wilcoxon test by the Baumgartner-Weiß-Schindler test [24] that was recently recommended for the analysis of gene expression data [19]. Table 5: Results of the simulation study: Powers of the original two-part test and the two-part permutation test; 5,000 data sets with n 1 = n 2 = 10 were generated for each configuration (notation as in Table 4, significance level α = 0.05)

Configuration
Original two-part test Two-part permutation test