In this section, we review Gaussian graphical models (GGMs) and the Ledoit-Wolf (LW) shrinkage [14, 16]. We illustrate how the shrinkage modifies the network structure (i.e. the magnitudes and orders of the partial correlations) in a non-linear way. To overcome this pitfall, we propose the ‘un-shrunk’ partial correlation and discuss its properties. Throughout the text, uppercase bold letters are used for matrices (e.g. Ρ is the matrix of partial correlations) and the hat symbol denotes the statistical estimators (e.g. \(\widehat{\mathbf{P}}\) is an estimator of \(\mathbf{P}\)).
The ‘shrunk’ partial correlation
The partial correlation is a measure of (linear) association between Gaussian variables, where confounding effects coming from the other variables are removed (i.e. a full-conditional correlation). A GGM is represented by a matrix \(\mathbf{P}\) of partial correlations, where the element \({\mathbf{P}}_{\mathrm{ij}}\) is the partial correlation between the i-th and j-th variable [29]. Partial correlations can be computed via the relationship
$${\mathbf{P}}_{\mathrm{ij}}=\frac{-{{\varvec{\Omega}}}_{\mathrm{ij}}}{\sqrt{{{\varvec{\Omega}}}_{\mathrm{ii}}{{\varvec{\Omega}}}_{\mathrm{jj}}}}$$
(1)
where \({\varvec{\Omega}}\) is the inverse of the covariance matrix \(\mathbf{C}\) (or equivalently, the inverse of the correlation matrix \(\mathbf{R}\), see Additional file 1: S1). For a dataset \(\mathbf{D}\) that consists of p variables and n samples, \(\mathbf{C}\) is a p × p matrix that can be estimated from data, e.g. by the sample covariance matrix \({\widehat{\mathbf{C}}}^{\mathrm{SM}}\). However, estimating \(\mathbf{C}\) is challenging when n is comparable to, or smaller than, p as the estimator then becomes ill-conditioned (numerically unstable) or non-invertible.
The LW-shrinkage estimator \({\widehat{\mathbf{C}}}^{[\uplambda ]}\) consists of a convex linear combination of \({\widehat{\mathbf{C}}}^{\mathrm{SM}}\) and a target matrix \(\mathbf{T}\) (e.g. a diagonal matrix of variances), and it is defined as
$${\widehat{\mathbf{C}}}^{[\uplambda ]}:=\left(1-\uplambda \right){\widehat{\mathbf{C}}}^{\mathrm{SM}}+\uplambda \mathbf{T}$$
(2)
where \({\uplambda}\:{\epsilon }\:(0, 1)\), also called the shrinkage value, represents the weight allocated to \(\mathbf{T}\). The inverse of \({\widehat{\mathbf{C}}}^{[\uplambda ]}\), denoted by \({\widehat{{\varvec{\Omega}}}}^{[\uplambda ]}\), can then be plugged into Eq. (1), yielding
$${{\widehat{\mathbf{{\boldsymbol{\rm P}}}}}^{[\uplambda ]}}_{\mathrm{ij}}=\frac{-{{\widehat{{\varvec{\Omega}}}}^{[\uplambda ]}}_{\mathrm{ij}}}{\sqrt{{{\widehat{{\varvec{\Omega}}}}^{[\uplambda ]}}_{\mathrm{ii}}{{\widehat{{\varvec{\Omega}}}}^{[\uplambda ]}}_{\mathrm{jj}}}}$$
(3)
This is the ‘shrunk’ partial correlation [17], and an edge in the GGM is selected according to its magnitude and/or statistical significance [18]. The operations involved in Eq. (3) (i.e. matrix inversion, square roots, and standardization) suggest that \({{\widehat{\mathbf{P}}}_{\mathrm{ij}}}^{[\uplambda ]}\) is a non-linear function of λ. In addition, the value of λ is usually optimized by minimizing the mean square error between \({\widehat{\mathbf{C}}}^{[\uplambda ]}\) and \(\mathbf{C}\), but this λ does not necessarily minimize the MSE between \({\widehat{{\varvec{\Omega}}}}^{[\uplambda ]}\) and \({\varvec{\Omega}}\).
Pitfalls of the ‘shrunk’ partial correlation
Let us consider the following covariance matrix \(\mathbf{C}\) and its inverse \({\varvec{\Omega}}\),
$$\begin{array}{cc}\mathbf{C}=\left(\begin{array}{cccc}1& \frac{1}{2}& \frac{-1}{4}& \frac{-1}{8}\\ \frac{1}{2}& 1& \frac{-3}{4}& \frac{-3}{4}\\ \frac{-1}{4}& \frac{-3}{4}& 1& \frac{3}{4}\\ \frac{-1}{8}& \frac{-3}{4}& \frac{3}{4}& 1\end{array}\right),&{\varvec{\Omega}}=\left(\begin{array}{cccc}\frac{160}{97}& \frac{-152}{97}& \frac{-8}{97}& \frac{-88}{97}\\ \frac{-152}{97}& \frac{416}{97}& \frac{124}{97}& \frac{200}{97}\\ \frac{-8}{97}& \frac{124}{4}& \frac{272}{97}& \frac{-112}{97}\\ \frac{-88}{97}& \frac{200}{97}& \frac{-112}{97}& \frac{320}{97}\end{array}\right)\end{array}$$
(4)
The matrix \(\mathbf{C}\) is invertible, its eigenvalues are 2.66, 0.94, 0.25, and 0.15, its determinant is 0.09, and its condition number 16.89. On the one hand, from Eq. (1) we find the partial correlations \({\mathbf{P}}_{12}= 152/\sqrt{160}\sqrt{416}=0.63\) and \({\mathbf{P}}_{34}=112/\sqrt{272}\sqrt{320}=0.37\), concluding that \({\mathbf{P}}_{12}\) is stronger than \({\mathbf{P}}_{34}\). On the other hand, computing \({\widehat{\mathbf{C}}}^{[\uplambda ]}\) with Eq. (2) and substituting it into Eq. (3) gives the ‘shrunk’ partial correlations \({{\widehat{\mathbf{P}}}^{[\uplambda ]}}_{12}\) and \({{\widehat{\mathbf{P}}}^{[\uplambda ]}}_{34}\). As λ increases, the value of the two ‘shrunk’ partial correlations change. Figure 5a shows that for λ greater than 0.3, \({{\widehat{\mathbf{P}}}^{[\uplambda ]}}_{12}\) gets weaker than \({{\widehat{\mathbf{P}}}^{[\uplambda ]}}_{34}\) and their relative order reverses. Although Eq. (2) is a linear shrinkage on \({\widehat{\mathbf{C}}}^{[\uplambda ]}\), operations like matrix inversion and standardization in Eq. (3) propagate the effect of λ to \({\mathbf{P}}^{[\uplambda ]}\) in a non-linear way. An equivalent plot for the estimator gLASSO [15, 19] is presented in Additional file 1: Figure S1.
Additional properties
Partial correlations can be found from the covariance matrix \(\mathbf{C}\), or equivalently, the inverse of the correlation matrix \(\mathbf{R}\), see Additional file 1: S3.1. Without loss of generality, we now switch to the correlation matrix \(\mathbf{R}\) (the standardized \(\mathbf{C}\)). Using Eq. (2), \(\mathbf{R}\) (instead of \(\mathbf{C}\)) can be ‘shrunk’ towards its diagonal (\(\mathbf{T}\) is the identity matrix). For small samples sizes, the sample correlation \({\widehat{\mathbf{R}}}^{\mathrm{SM}}\) (the standardized \({\widehat{\mathbf{C}}}^{\mathrm{SM}}\)) is not positive definite. Some eigenvalues of \({\widehat{\mathbf{R}}}^{\mathrm{SM}}\) can be (i) near to zero, (ii) equal to zero, or (iii) slightly negative. This translates into \({\widehat{\mathbf{R}}}^{\mathrm{SM}}\) being (i) ill-conditioned, (ii) singular, or (iii) indefinite, respectively. The case of an indefinite matrix arises from a numerical inaccuracy that can make zero eigenvalues slightly negative.
Let \({\mathrm{\alpha }}_{\mathrm{k}}\) \((\mathrm{k}=1, 2, \dots ,\mathrm{p})\) denote the eigenvalues of \({\widehat{\mathbf{R}}}^{\mathrm{SM}}\). Equation (2) yields \(\widehat{{\mathbf{R}}}^{{[\uplambda ]}}\), which has eigenvalues:
$${{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}=\left(1-\uplambda \right){\mathrm{\alpha }}_{\mathrm{k}}+\uplambda (\mathrm{k}=1, 2, \dots ,\mathrm{p})$$
(5)
and the shrinkage \({\uplambda}\:{\epsilon }\:(0, 1)\) transforms each \({{\alpha }}_{\mathrm{k}}\) into a positive \({{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}\), so that \({\widehat{\mathbf{R}}}^{[\uplambda ]}\) becomes positive definite (see Additional file 1: S2). Accordingly, the eigenvalues of \({\mathbf{{\widehat{R}}}}^{[\uplambda ]}\) that are obtained by inversion:
$$\frac{1}{{{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}}=\frac{1}{\left(1-\uplambda \right){\mathrm{\alpha }}_{\mathrm{k}}+\uplambda } (\mathrm{k}=1, 2, \dots ,\mathrm{p})$$
(6)
are positive as well. This ensures that \(\widehat{\mathbf{R}}^{{[\uplambda]}}\) is a positive definite matrix.
Shrinking the data
Traditionally, the shrinkage is interpreted as a modification of the statistical model (a ‘shrunk’ covariance/correlation/partial correlation), where the data remains unchanged. However, most research questions need to be interpreted in terms of the dataset. We therefore propose to discuss the shrinkage from a different perspective, namely from the data level. To this end, we use the Singular Value Decomposition (SVD) of the data matrix \(\mathbf{D}\), and that the shrinkage only modifies the eigenvalues of \({\widehat{\mathbf{C}}}^{\mathrm{SM}}=\frac{1}{\mathrm{n}-1}{\mathbf{D}}^{\mathrm{t}}\mathbf{D}\), while the eigenvectors stay identical, see Additional file 1: S2. As singular values are the positive square roots of the eigenvalues \({\mathrm{\alpha }}^{[\uplambda ]}\) given in Eq. (5), we can derive the SVD of the ‘shrunk’ data matrix \({\mathbf{D}}^{[\uplambda ]}\) as
$${\mathbf{D}}^{[\uplambda ]}=\mathbf{U}\mathrm{diag}\left(\sqrt{(\mathrm{n}-1){\mathrm{\alpha }}^{[\uplambda ]}}\right){\mathbf{V}}^{\mathrm{t}}$$
(7)
Here \(\mathbf{U}\) and \(\mathbf{V}\) are the matrices of (left and right) singular vectors of \(\mathbf{D}\), and the singular values are replaced by their ‘shrunk’ counterparts. This relationship allows us to study the shrinkage effect at the data level. That is, analyzing the original dataset \(\mathbf{D}\) with a ‘shrunk’ model \({\widehat{\mathbf{C}}}^{[\uplambda ]}\) is equivalent to analyzing \({\mathbf{D}}^{[\uplambda ]}\) with the classical model \({\widehat{\mathbf{C}}}^{\mathrm{SM}}\). To illustrate this, we generate data from a network (p = 8, n = 10) and investigate what happens to the original data points as the shrinkage increases; see Fig. 5b and Additional file 1: S3 for more details.
The ‘un-shrunk’ partial correlations
Here we propose the concept of the ‘un-shrunk’ partial correlation, which we define as the limit of \({{\widehat{\mathbf{P}}}^{[\uplambda ]}}_{\mathrm{ij}}\) as λ approaches zero,
$${{\mathbf{P}}^{[0]}}_{\mathrm{ij}}:=\underset{\uplambda \to 0}{\mathrm{lim}}{{\mathbf{P}}^{[\uplambda ]}}_{\mathrm{ij}}$$
(8)
A proof that \({{\mathbf{P}}^{[\uplambda ]}}_{\mathrm{ij}}\) is a continuous and bounded function of \(\uplambda\), as well as a general proof of the existence of this limit, can be found in Additional file 1: S4 and S5. The key idea is that there is no divergence in Eq. (8), and to illustrate this we start with the eigen-decomposition,
$$\widehat{{\mathbf{R}}}^{{[\uplambda ]}} = {\mathbf{V}}{\text{diag}}\left( {\frac{1}{{\alpha ^{{[\uplambda ]}} }}} \right){\mathbf{V}}^{{\text{t}}}$$
(9)
where \(\mathbf{V}\) is a matrix whose columns are the eigenvectors of \({{\mathbf{\widehat{R}}}}^{[\uplambda ]}\), and \(\mathrm{diag}\left(1/{\mathrm{\alpha }}^{[\uplambda ]}\right)\) is the diagonal matrix of eigenvalues 1/\({{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}\) \((\mathrm{k}=1, 2, \dots ,\mathrm{p})\).
Let us assume that \({\widehat{\mathbf{R}}}^{\mathrm{SM}}\) is singular and recall two facts from the previous subsection. First, \({\widehat{\mathbf{R}}}^{[\uplambda ]}\) has a zero eigenvalue \({\mathrm{\alpha }}_{\mathrm{k}}=0\) which is transformed into \({{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}=\uplambda\) in Eq. (5), and that the corresponding eigenvalue of \(\widehat{{\mathbf{R}}}^{{[\lambda ]}}\) is 1/\({{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}=1/\uplambda\) by Eq. (6). Second, \({\widehat{\mathbf{R}}}^{\mathrm{SM}}\) and \({\widehat{\mathbf{R}}}^{[\uplambda ]}\) have the same eigenvectors, because the shrinkage only changes the eigenvalues (see Additional file 1: S2). Substituting Eq. (9) in Eq. (3), and factorizing out the term \(1/\uplambda\) gives
$${{\mathbf{P}}^{[\uplambda ]}}_{\mathrm{ij}}=\frac{-{\frac{1}{\uplambda }\left[\mathbf{V}\mathrm{diag}\left(\frac{\uplambda }{{\mathrm{\alpha }}^{[\uplambda ]}}\right){\mathbf{V}}^{\mathrm{t}}\right]}_{\mathrm{ij}}}{\frac{1}{\uplambda }\sqrt{{\left[\mathbf{V}\mathrm{diag}\left(\frac{\uplambda }{{\mathrm{\alpha }}^{[\uplambda ]}}\right){\mathbf{V}}^{\mathrm{t}}\right]}_{\mathrm{ii}}{\left[\mathbf{V}\mathrm{diag}\left(\frac{\uplambda }{{\mathrm{\alpha }}^{[\uplambda ]}}\right){\mathbf{V}}^{\mathrm{t}}\right]}_{\mathrm{jj}}}}$$
(10)
Any singularity disappears by cancelling the term 1/\(\uplambda\). As \({{\mathrm{\alpha }}^{[\uplambda ]}}_{\mathrm{k}}>\uplambda\), the diagonal elements \(\uplambda /{\mathrm{\alpha }}^{[\uplambda ]}\) in Eq. (10) have limits equal to (i) zero for \({\mathrm{\alpha }}_{\mathrm{k}}\ne 0\), or (ii) one for \({\mathrm{\alpha }}_{\mathrm{k}}=0\), see Equations (S43, S48, S52, S55, S58). In this sense, the ratio that defines the ‘shrunk’ partial correlation in Eq. (10) does not diverge when removing the shrinkage. For a general proof of the existence of this limit, see Additional file 1: S5. We propose this limit as a generalization of the classical partial correlation.
The idea resembles the classical example from Calculus, where the limits to zero of \({\mathrm{x}}^{-1}\) and \({\mathrm{x}}^{-2}\) are both infinite; while the limit of their ratio \({\mathrm{x}}^{-1}/{\mathrm{x}}^{-2}\) is finite (i.e. zero). For illustration, let us consider a 3 × 3 correlation matrix of ones (all variables are maximally correlated). The matrix is singular as all \({\mathrm{\alpha }}_{\mathrm{k}}=0\), and Eq. (2) gives
$${\mathbf{R}}^{[\uplambda ]}=\left(\begin{array}{ccc}1& \left(1-\uplambda \right)& \left(1-\uplambda \right)\\ \left(1-\uplambda \right)& 1& \left(1-\uplambda \right)\\ \left(1-\uplambda \right)& \left(1-\uplambda \right)& 1\end{array}\right)$$
(11)
using Eq. (3) we obtain
$${\mathbf{P}}^{[\uplambda ]}=\frac{\frac{\left(1-\uplambda \right)}{\uplambda }}{\frac{1}{\uplambda }}\left(\begin{array}{ccc}1& {\left(2-\uplambda \right)}^{-1}& {\left(2-\uplambda \right)}^{-1}\\ {\left(2-\uplambda \right)}^{-1}& 1& {\left(2-\uplambda \right)}^{-1}\\ {\left(2-\uplambda \right)}^{-1}& {\left(2-\uplambda \right)}^{-1}& 1\end{array}\right)$$
(12)
and we see that \({{\mathbf{P}}^{[0]}}_{\mathrm{ij}}=\underset{\uplambda \to 0}{\mathrm{lim}}{{\mathbf{P}}^{[\uplambda ]}}_{\mathrm{ij}}=1/2\). Here it is worth noting that a simple linear re-scaling by \(\left(1-\uplambda \right)\) is not sufficient to remove the shrinkage effect (see Additional file 1: Figure S1). In this example, \({{\mathbf{P}}^{[\uplambda ]}}_{\mathrm{ij}}\) would become \(1/\left(2-\uplambda \right)\) which for \(\uplambda =1/3\) gives \({{\mathbf{P}}^{[1/3]}}_{\mathrm{ij}}=3/5=0.6\), and for \(\uplambda = 2/3\) is \({{\mathbf{P}}^{[2/3]}}_{\mathrm{ij}}=3/4=0.75\). More toy examples can be found in Additional file 1: S6.
Practical implementation
From a mathematical perspective, Eq. (8) can be computed by means of the analytical results in Equations (S43, S48, S52, S55, S58). However, these results are un-practical as numerical inaccuracies make the elements of \(\mathbf{V}\) unreliable, and often render zero eigenvalues (slightly) positive/negative. To circumvent numerical issues, we apply a simple approximation. We compute the ‘shrunk’ partial correlation \({{\mathbf{P}}^{[\uplambda ]}}_{\mathrm{ij}}\) for different \({\uplambda}\:{\epsilon }\:(0, 1)\) values, and apply Fisher’s transformation to ensure they are normally distributed. Finally, we fit a weighted smoothing splines through these points. Considering that small shrinkage values cause uncertain estimates, we use weights according to the reciprocal condition number of the correlation matrix. By extrapolating the fitted function to \(\uplambda =0\), the limit in Eq. (8) is approximated. For more details, we refer the reader to the Additional file 1: S7.
P-values are computed from the probability density of the ‘shrunk’ partial correlation \({\uprho }^{[\uplambda ]}\) (under the null-hypothesis)
$${{\mathrm{f}}_{0}}^{[\uplambda ]}\left({\uprho }^{[\uplambda ]}\right)=\frac{{\left({\left(1-\uplambda \right) }^{2}-{\left({\uprho }^{[\uplambda ]}\right)}^{2}\right)}^{(\mathrm{df}-3)/2}}{\mathrm{Beta}\left(\frac{1}{2},\frac{\mathrm{df}-1}{2}\right){\left(1-\uplambda \right) }^{(\mathrm{df}-2)}}$$
(13)
where \(\mathrm{Beta}\) denotes the beta function, and \(\mathrm{df}\) are the degrees of freedom, which can be found via Maximum Likelihood Estimation. Further details can be found in our previous work [18]. The parameter \(\uplambda\) is the optimal shrinkage value for the ‘shrunk’ partial correlation, and zero for the ‘un-shrunk’ partial correlation. All computations are performed with the R package GeneNet version 1.2.13.