Comments on: fold change rank ordering statistics: a new method for detecting differentially expressed genes

We published a new method (BMC Bioinformatics 2014, 15:14) for searching for differentially expressed genes from two biological conditions datasets. The presentation of theorem 1 in this paper was incomplete. We received an anonymous comment about our publication that motivates the present work. Here, we present a complementary result which is necessary from the theoretical point of view to demonstrate our theorem. We also show that this result has no negative impact on our conclusions obtained with synthetic and experimental microarrays datasets.


Background
To search for differentially expressed (DE) genes in profiling studies, we presented a new method based on fold change rank ordering statistics (FCROS). For the derivation of this method, we considered microarrays data from two biological conditions where n probes (genes) were used with m 1 control and m 2 test samples. We performed k pairwise comparisons (k = m 1 m 2 ) of the data samples and computed fold changes (FC) for each gene. The FCs obtained for each comparison were sorted in increasing order and their corresponding ranks were associated with genes. Hence, we can form a matrix of rank values R with components r ij (i = 1, 2, . . . , n, j = 1, 2, . . . , k). We noted r i =[ r i1 r i2 . . . r ik ] T the vector of rank values associated with gene i. We notedr i , the average of ranks (a.o.r) value for gene i. The value forr i varies between a = min i {r i } and b = max i {r i }. That allows to associate an unique vector of a.o.r values with the n genes:r =[ a, (a + δ 1 ), (a + δ 1 + δ 2 ), . . . , (a + δ 1 + . . . + δ n−2 ), b] T where the scalars δ i are the differences between consecutive ordered a.o.r. Without loss of generality, we assumed that the differences δ i have the same value which is approximated by their mean: δ = b−a n−1 . Using these notations, we derived  [1]. The content of this theorem was incomplete as shown in the following lemma we received from an anonymous reader.
. . n, has a mean n+1 2 1 n and degenerate variance-covariance matrix (n, n), det = 0: Proof Note that for k → ∞, the appearance of all elements of the set {1, . . . , n} in each row of R under the assumed sampling model are equally likely, hence by the weak law of large numbers ( [2], page 235) the asymptotic mean is the constant vector 1 n n i=1 i 1 n = n+1 2 1 n . Under the same observation, the asymptotic variance, ∀ ∈ {1, . . . , n}, is equal to: The asymptotic covariance is computed as a two-index summation over the set {1, . . . , n} with the restriction that no two indices can be the same since the columns are permutations by construction, hence ∀ = m ∈ {1, . . . , n}: Thus, since 1 n = 0, it follows that det = 0.
This lemma shows that the covariance term was missed in our theorem. In the next section, we present a complete version of our theorem using the notations we adopted in [1]. Proof From the following definitions:

Results
and using δ = b−a n−1 , a component of the mean of the normal distribution is: A component of the variance (diagonal element) of the normal distribution matrix is: A component of the covariance (off-diagonal element) of the normal distribution matrix is: (8) Table 1 Values of the mean, the variance and the covariance components when n increases n 10 100 1,000 10,000 For the FCROS algorithm, we used the standardized rank value, i.e., each observed rank value is divided by n. The mean and variance-covariance components should be divided by n and n 2 respectively. This leads to a mean component r = 1 2 + 1 2n , and a variance-covariance matrix with a diagonal component β =  Table 1 shows the values for r , β and α when n increases. For a large value for n, the off-diagonal components of the variance-covariance matrix vanish. Hence, when n is large, a good approximation for the mean and the variance components are 1 2 and 1 12 , respectively.

Discussion and conclusions
As shown, the theorem we previously presented was incomplete since the covariance term was missed. The present complementary result is necessary from the theoretical point of view, and we are grateful to the anonymous reader for pointing this out. This result will be useful for small values of n. However, for high throughput biological datasets, n is large, often greater than 10,000 ([1], page 2). For such values of n, the rank deficient variance-covariance matrix of the normal distribution associated with the a.o.r values is near a diagonal matrix. Hence, it is as if the a.o.r values of each gene follow a normal distribution with parameters 1 2 and 1 12 .