We use an empirical Bayes approach to infer covariances and by simple extension, correlations. The theoretical basis of this, using conjugate priors, is derived in [3]. In that paper, Champion et al. use an independence prior, and a flat prior with constant covariance. We introduce a new prior that is a mixture of these two extremes, the block diagonal prior.
Bayes theorem relates the posterior distribution of the parameters given the data p(Θ|X) to the likelihood of the data p(X|Θ) and the prior distribution of the parameters p(Θ) via
$$ p(\Theta|X) \propto p(X|\Theta)p(\Theta) $$
We assume the data X are multivariate normal with sample size n, covariance matrix Σ, mean μ and number of variables (genes) p, and thus the likelihood is proportional to
$$p(X|\Theta) \propto |\Sigma|^{-\frac{1}{2}}{\exp}\left\{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)\right\} $$
We wish to obtain a Bayesian estimate for the covariance matrix Σ, which we denote by η, and we assume that μ is known. The conjugate prior for estimating a covariance matrix (η) is an inverse Wishart distribution. We adopt the same parameterisation as [3]: η∼W
−1(ψ,λ) where ψ is the mean of the Inverse Wishart distribution (\(\frac {\lambda z}{\lambda }\)) and λ+p+1 are the degrees of freedom. The probability density function (pdf) for the inverse Wishart is proportional to
$$ p(\Theta) \propto |\eta|^{-(\lambda +2p+2)/2)}{\exp}\left\{-\frac{1}{2}\text{tr}(\lambda z \eta^{-1})\right\} $$
To obtain the joint posterior distribution we multiply the likelihood for sample data (which is multivariate normal) with the prior. Let \(S=\sum {(x'-\mu ')(x'-\mu ')^{T}}\), where x
′ is the sample data and μ
′ the sample mean. Then the joint posterior distribution p(Θ|X) is proportional to
$$|\eta|^{-(n+\lambda+2p+2)/2}\exp\left\{-\frac{1}{2}\text{tr}(\eta^{-1}(\lambda z + S))\right\} $$
which is an inverse Wishart distribution with parameters \(\left (\frac {\lambda z+S} {\lambda +n},\lambda +n\right)\). We then estimate η by η
0, the expected value of the distribution:
$$\begin{array}{*{20}l} \hspace{70pt} \eta^{0} = \frac{\lambda z + S}{\lambda +n } \end{array} $$
From this we can see that the inverse Wishart requires two hyperparameters, the matrix z and scalar λ.
Calculating hyperparameters
In empirical Bayes methods both hyperparameters λ and z are estimated from the data, rather than using a hierarchical model and assigning a prior distribution to each of the parameters, or by having to choose the parameter values where little prior knowledge is available. In choosing the matrix z we are looking for an appropriate prior matrix for the covariance matrix η. The Inverse Wishart distribution requires that the matrix z is positive definite, block diagonal matrices are positive definite if and only if each of the blocks are positive definite. Therefore we construct a block diagonal prior matrix where each of the blocks have a constant (non perfect) correlation. The estimated correlation matrix from this method η
0 is then a mixture of the sample correlation matrix S and the block diagonal prior matrix z.
As we are using a block diagonal matrix for z we have added an extra level of estimation to the model over other methods that fix the structure of the prior matrix z a priori as, for example, the independence or flat correlation matrix. By using the block diagonal matrix the different groupings or blocks need to be determined for each data set separately. There are three methods for generating the blocks for the matrix z available in the package covEB. The first is to provide a list of the block assignments for each variable in the covariance matrix. Each entry in the list should comprise a set of variables that together will form one block. The algorithm then uses the average correlation within these blocks to calculate a constant value for this block in the matrix z, with all elements outside the block set to zero.
The second method is to provide a threshold parameter (or correlation level). This could be based on prior knowledge or calculated using an existing method for thresholding covariance matrices such as those described in Bickel et al. [4]. Given this correlation level (used for simplicity to allow the user to set thresholds in [-1,1]), we set all elements of the sample correlation matrix below this threshold to zero and identify the block diagonal structure. This is done by treating the correlation matrix as a weighted graph and using the cluster function in the R package igraph [5] to separate the graph into disjoint subnetworks such that each element in one subnetwork is completely separate from all elements in the other subnetworks, that is the correlation between them is zero. Each subnetwork then represents one block in the block diagonal matrix.
The third method requires no prior information from the user, instead the Akaike’s Information Criteria (AIC) metric is used to select the threshold level that determines the block diagonal structure. The AIC is defined as AIC=2p−2 ln(L), where p are the number of parameters estimated in the model and ln(L) is the log-likelihood of the data given the model. In our case, we calculate the likelihood of observing the given covariance matrix for different block diagonal matrices (models). We assume the data is from a multivariate normal distribution with covariance matrix Ω. Ω is the diagonal matrix calculated (as outlined above) for different threshold values, this threshold value is chosen to minimise the AIC statistic.
Once the groupings are known, it remains to find the correlation level within each block that together with the groupings will define the prior matrix z.This is done by averaging the correlations within each block to give a constant correlation value for that block in the prior matrix z. For example, in the block diagonal matrix z below we have 5 blocks, membership of variables in these 5 blocks having being determined by one of the three methods outlined above.
$$\begin{array}{*{20}l} \hspace{60pt} \left(\begin{array}{ccccc} \delta_{1} & 0 & 0 & 0 & 0 \\ 0 & \delta_{2} & 0 & 0 & 0 \\ 0 & 0 & \delta_{3} & 0 & 0 \\ 0 & 0 & 0 & \delta_{4} & 0 \\ 0 & 0 & 0 & 0 & \delta_{5} \end{array} \right) \end{array} $$
If, for example block 1 contains 4 variables (or genes) then, δ
1 (the first block in the matrix z) takes the form:
$$\hspace{70pt} \left(\begin{array}{cccc} 1 & \gamma & \gamma & \gamma \\ \gamma &1 & \gamma & \gamma \\ \gamma & \gamma & 1 & \gamma \\ \gamma & \gamma & \gamma & 1 \end{array} \right) $$
and γ is estimated from the data as the average of the sample correlations between the four variables.
Given the current estimate of z we then calculate λ using the following approximation suggested by [3]:
$$E((\rho_{ij}[\!\eta]-\rho_{ij}[\!z])^{2}) \simeq \frac{(1-\rho_{ij}[\!z]^{2})^{2}}{\lambda+3} \text{for} i \neq j, $$
where ρ
ij
[ η] is the correlation based on η. We approximate ρ
ij
[ η] by the sample correlations, and ρ
ij
[ z] are the correlations based on z for the selected value of γ. See [3] for full details of this derivation. However, briefly, this result follows from the distribution of the sample correlations ρ
ij
[ η] that is a two parameter distribution whose mean is ρ
ij
[ z]. Given the distribution of the ρ
ij
[ η], the variance var(ρ
ij
[ η])=E((ρ
ij
[η]−ρ
ij
[ z])2) is approximated using the moments of this distribution.
Simulated data
For the simulated data we use the method of Hardin to generate block diagonal matrices [6]. We generated 100 sample covariance matrices for 50 genes, each covariance matrix had a block diagonal structure with 5 blocks of different sizes. Correlations within blocks were set to 0.7, random samples were generated for each of these correlation matrices for three different sample sizes of 10, 15 and 20. We include a simulated dataset under these parameters in the R package covEB. We denote the empirical Bayes method using a block diagonal prior as EB, for comparison two other methods are used to calculate the correlation matrix, the Pearson correlation matrix, and the Corpcor method of Schäfer et al. are used to estimate elements within blocks and all the elements outside these blocks are set to zero. We calculated the Frobenius norm between each of the estimation methods and the true block diagonal matrix. We estimated the correlation matrices and calculated the average false and true positive rates and their standard deviations over the 100 simulations, for each of the methods mentioned above.
Biological data
The Bacillus subtilis data were taken from the paper [7]. A subset of 19 samples that are clustered according to the affinity propagation clustering with Euclidean distance was used. The genes were filtered leaving those with variance above the median. For each method we used known transcriptional unit information from BSubCyc (http://www.bsubcyc.org) to generate the prior groupings, selecting 56 known transcriptional units each with at least 5 genes in them.