Skip to main content

Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models



Linear mixed-effects models (LMM) are a leading method in conducting genome-wide association studies (GWAS) but require residual maximum likelihood (REML) estimation of variance components, which is computationally demanding. Previous work has reduced the computational burden of variance component estimation by replacing direct matrix operations with iterative and stochastic methods and by employing loose tolerances to limit the number of iterations in the REML optimization procedure. Here, we introduce two novel algorithms, stochastic Lanczos derivative-free REML (SLDF_REML) and Lanczos first-order Monte Carlo REML (L_FOMC_REML), that exploit problem structure via the principle of Krylov subspace shift-invariance to speed computation beyond existing methods. Both novel algorithms only require a single round of computation involving iterative matrix operations, after which their respective objectives can be repeatedly evaluated using vector operations. Further, in contrast to existing stochastic methods, SLDF_REML can exploit precomputed genomic relatedness matrices (GRMs), when available, to further speed computation.


Results of numerical experiments are congruent with theory and demonstrate that interpreted-language implementations of both algorithms match or exceed existing compiled-language software packages in speed, accuracy, and flexibility.


Both the SLDF_REML and L_FOMC_REML algorithms outperform existing methods for REML estimation of variance components for LMM and are suitable for incorporation into existing GWAS LMM software implementations.


Linear mixed-effects modeling (LMM) is a leading methodology employed in genome-wide association studies (GWAS) of complex traits in humans, offering the dual benefits of controlling for population stratification while permitting the inclusion of data from related individuals [1]. However, the implementation of LMM comes at the cost of increased computational burden relative to ordinary least-squares regression, particularly in performing residual maximum likelihood (REML) estimation of genomic variance components. Conventional REML algorithms require multiple \(\mathcal {O}\left (n^{3}\right)\) or \(\mathcal {O}\left (mn^{2}\right)\) matrix operations, where m and n are the numbers of markers and individuals, respectively, rendering them infeasible for large biobank scale data sets. Further, common numerical methods for REML estimation rely on sparse matrix methods suitable for traditional LMM applications (e.g., pedigree data or experiments with repeated measures [2]) that are inapplicable to genomics variance components models since these models involve dense relatedness matrices. As a result, the problem of increasing the computational efficiency of REML estimation of genomic variance components has generated considerable research activity [38].

In the case of the standard two variance component model (1), the estimation of which is the focus of the current research, previous efforts toward increasing computational efficiency fit into two primary categories: 1., reducing the number of cubic time complexity matrix operations needed to achieve convergence; and 2., substituting stochastic and iterative matrix operations for deterministic, direct methods to obtain procedures with quadratic time complexity. The first approach is embodied by the methods implemented in the FaST-LMM and GEMMA packages [3, 5, 6], which take advantage of the fact that the genetic relatedness matrix (GRM) and identity matrix comprising the covariance structure are simultaneously diagonalizable. As a result, after performing a single spectral decomposition of the GRM and a small number of matrix-vector multiplications, the REML criterion (3) and its gradient and Hessian can be repeatedly evaluated using only vector operations. The second approach is exemplified by the popular BOLT-LMM software [7, 8], which avoids all cubic operations by solving linear systems via the method of conjugate gradients (CG) and employing stochastic trace estimators in place of deterministic computations.

In the current research, we propose two algorithms, stochastic Lanczos derivative-free residual maximum likelihood (SLDF_REML; Algorithm 3) and Lanczos first-order Monte Carlo residual maximum likelihood (L_FOMC_REML; Algorithm 4), that combine features of both approaches (Fig. 1). Here, we translate the simultaneous diagonalizability of the heritable and non-heritable components of the covariance structure to stochastic and iterative methods via the principle of Krylov subspace shift-invariance. As a result, we only need to compute the costliest portions of the objective function once (via stochastic/iterative methods), computing all subsequent iterations of the REML optimization problem only using vector operations. We develop the theory underlying these methods and demonstrate their performance relative to previous methods via numerical experiment.

Fig. 1
figure 1

Time complexity analogies with respect to existing and proposed methods. Heuristically, the novel algorithms (bottom right) are to the stochastic, iterative algorithm implemented in the BOLT-LMM software [7, 8] (bottom left) as the direct methods exploiting the shifted structure of the two component genomic variance component model (1) (e.g., FaST-LMM and GEMMA [3, 5]; top right) are to standard direct methods (top left). For simplicity, we assume here that the number of markers is equal to the number of observations and omit low-order terms related to the spectral conditioning of the covariance structure and the number of random vectors generated by the stochastic methods; further details are provided in Table 1. neval denotes the number of objective function evaluations needed to achieve convergence

Table 1 Time complexity of stochastic algorithms


Across 20 replications per condition for random subsamples of n=16,000 to 256,000 unrelated European-ancestry individuals, both SLDF_REML and L_FOMC_REML produced heritability estimates for height consistent with those generated by the GCTA software package (Figs. 4 and 5). For large samples, the novel algorithms achieved greater accuracy than either version of BOLT-LMM (e.g., for n=250,000, mean-squared error was 1.74 ×10−6 for BOLT-LMM v2.3.2 versus 1.24 ×10−7 for L_FOMC_REML). Particularly, the time required per additional iteration after initial overhead computations was low for the novel algorithms (e.g., \(\overline {t}\)=20.07 min for BOLT-LMM v2.3.2 versus 2.06 min for L_FOMC_REML; Table 2), enabling increased precision at minor cost. With respect to total timings, SLDF_REML dramatically outperformed all other methods when the precomputed GRM was available (Table 2 and Fig. 3), which we expect whenever the number of markers exceeds the sample size. Examining methods taking genotype matrices as inputs, SLDF_REML and L_FOMC_REML performed similarly, whereas BOLT-LMM v2.3.2 converged more quickly than either in smaller samples (Fig. 3), though the differences for n =256,000 were relatively minor (e.g., \(\overline {t}=\)91.09 min for BOLT-LMM v2.3.2 versus 102.21 min for L_FOMC_REML; Table 2). The older version of BOLT-LMM, v2.1, performed significantly more slowly than any of the other implementations examined (e.g., average wall clock time was 177.95 min at n =256,000), demonstrating the importance of implementation optimization.

Fig. 2
figure 2

Overhead versus iterative optimization procedure timing results. Trimmed mean wall clock time for overhead computations and iterative REML optimization procedures across twenty replications per condition on the log10 scale (a) and natural scale (b). Error bars reflect per condition standard errors and lines connect per condition means

Fig. 3
figure 3

Timing results. Trimmed mean wall clock time across twenty replications for per condition on the log10 scale (a) and natural scale (b). Error bars reflect per condition standard errors and lines connect per condition trimmed means

Fig. 4
figure 4

Accuracy results. Comparison of heritability estimates for height generated by BOLT-LMM versions 2.1 and 2.3.2, SLDF_REML, and L_FOMC_REML versus those generated by the deterministic algorithm implemented in the GCTA software package [4], for varying sub-samples of 16,000 to 256,000 unrelated European-ancestry UK Biobank participants. Data are comprised of twenty independent replications per condition. Red dashed lines indicate standard errors of GCTA estimate. Points represent individual observations whereas boxes indicate the 95% confidence intervals for the trimmed mean estimate after a Bonferroni correction for 25 comparisons. The bias evidenced by the BOLT-LMM estimators is likely due to the combination of performing a small number of secant iterations with fixed start values and loose tolerances for determining convergence. For n=256,000, memory requirements prohibited the use of GCTA, so we instead averaged ten estimates generated by the high-accuracy stochastic estimator implemented in BOLT-REML [31] (standard errors were 6.32e-5 and 2.45e-7 for the mean REML heritability estimate and its standard error, respectively)

Fig. 5
figure 5

Numerical experiments: accuracy versus time. Average absolute error on the log10 scale with respect to the GCTA estimate versus trimmed mean wall clock time across twenty replications per condition. Error bars reflect per condition standard errors and lines connect per condition trimmed means. For n=256,000, memory requirements prohibited the use of GCTA, so we instead averaged ten estimates generated by the high-accuracy stochastic estimator implemented in BOLT-REML v2.3.2 [31] (standard errors were 6.32e-5 and 2.45e-7 for the mean heritability and its standard error, respectively)

Table 2 Overhead and per objective function evaluation timings of stochastic algorithms for n=256,000

As the computations needed to compute the Lanczos decompositions in L_FOMC_REML and BOLT-LMM v2.3.2 are equivalent in time and memory complexity, we expect that an optimized compiled-language implementation of L_FOMC_REML would reduce the overhead computation time by a significant linear factor (≈3 for n =256,000, comparing the sum of the overhead time and single objective function evaluation time for BOLT-LMM v2.3.2 to its total running time; Table 2). Consistent with theory, the wall clock times per objective function evaluation for the novel algorithms were trivial given the Lanczos decompositions (e.g., for n =256,000, \(\overline {t}\) = 2.06 versus 20.07 min for L_FOMC_REML and BOLT-LMM v2.3.2, respectively; Table 2 and Fig. 2).


We have proposed stochastic algorithms for estimating the two variance component model (1), both of which theoretically offer substantial time savings relative to existing methods. Our methods capitalize on the principle of Krylov subspace shift invariance to reduce the number of steps involving \(\mathcal {O} \left (n^{2}\right)\) or \(\mathcal {O} (mn)\) computations to one, whereas existing methods perform equivalent computations at each iteration of the REML optimization procedure. For large samples, when taking genotype matrices as inputs, our interpreted-language implementations of L_FOMC_REML and SLDF_REML [9] produced more accurate variance component estimates than the highly-optimized, compiled BOLT-LMM implementations, while taking similar amounts of time. Thus, we expect comparably-optimized implementations of the novel algorithms to compute high accuracy REML estimates in close to the time required by BOLT-LMM v2.3.2 for a single objective function evaluation. Further, in contrast to the BOLT_LMM algorithm, which requires the genotype matrix, SLDF_REML can exploit precomputed GRMs to reduce operation count by an \(\mathcal {O}(2m/n)\) factor (Table 1), which yields dramatic time savings when the number of markers greatly exceeds the number of individuals (Fig. 3). While GRM precomputation is itself \(\mathcal {O}\left (mn^{2}\right)\), it can be effectively and asynchronously parallelized across multiple compute nodes, substantially mitigating computational burden (though we note that serial input/output constraints can interfere with efficient parallelization). However, as the L_FOMC_REML algorithm involves the computation of BLUPs of SNP effects, L_FOMC_REML is preferable to SLDF_REML when BLUP estimates are desired for prediction (as in [10]).

There are several limitations to the proposed approaches. First, SLDF_REML, which benefits from the ability to take GRMs as input, depends linearly on the number of included covariates, which might grow prohibitive in samples spanning numerous genotyping batches and ascertainment locations. However, as in BOLT_LMM, L_FOMC_REML requires \(\mathcal {O}(mn)\) matrix multiplications for BLUP computation at each step of the REML optimization procedure, whereas SLDF_REML requires only vector operations. Thus, though the options provided by the two novel algorithms increase researchers’ flexibility overall, the choice of whether to employ SLDF_REML versus L_FOMC_REML is problem-specific and necessitates greater researcher attention to resource allocation. For example, even when a precomputed GRM is available, it might be preferable to use L_FOMC_REML if BLUPs of latent SNP effects are desired. On the other hand, if a researcher intends to sequentially analyze a large number of phenotypes in a relatively small sample of individuals, it might prove most efficient to compute a GRM, despite the involved computational burden, in order to speed subsequent computations by supplying the GRM to the SLDF_REML algorithm. Second, neither algorithm mitigates the substantial \(\mathcal {O}(mn)\) or \(\mathcal {O}\left (n^{2}\right)\) memory complexity common to all algorithms for REML estimation of genomic variance components, requiring that researchers have access to high-memory compute nodes to work with large samples (though we note that neither of the novel algorithms substantial increases this burden either). Finally, for the same reasons that the spectral decomposition-based direct methods implemented in the FaST-LMM and GEMMA packages [3, 5, 6] are restricted to the simple two component model (1) (i.e., whereas the GRM and identity matrix are simultaneously diagonalizable, the same doesn’t hold for arbitrary collections of three or more symmetric positive semidefinite matrices), the shift-invariance property exploited by the proposed methods does not extend to multiple genomic variance components. Given that the two component model is insufficient for precise heritability estimation for many complex traits [11], our novel algorithms apply to the particular, though common, tasks of variance component and BLUP estimation for LMM in association studies.

Despite these limitations, the proposed algorithms have clear advantages over existing methods in terms of flexibility, accuracy, and speed of computation. We provide both pseudocode and heavily annotated Python 3 implementations [9] to facilitate their incorporation into existing software packages. Further, though our algorithms are restricted to the two variance component model, they can be used to generate the inputs necessary for estimation of more complex models, such as the mixture model estimated via variational approximation implemented in [7], and thus have applications to non-infinitesimal models. Finally, we suggest that the methods presented in our theoretical development, in particular stochastic trace estimation and stochastic Lanczos quadrature, are likely to find uses in REML estimation of other models of interest to researchers in genomics. In particular, we suggest the development of models that exploit Krylov subspace shift-invariance to speed up variance/covariance component estimation for the case of multivariate phenotypes as a target for future research. Such models necessarily involve the computation or approximation of Hessian matrices, thereby introducing additional complexity in comparison to the univariate case considered above. However, the extension of fast cubic complexity methods based on the spectral decomposition of the covariance matrix [3, 5] to the multivariate case [6] suggests the potential for multivariate analogues of the algorithms presented here.


The proposed algorithms, SLDF_REML and L_FOMC_REML, unify previous approaches to estimating the two variance component model (1) by exploiting the simultaneous diagonalizability of the covariance structure components while avoiding matrix operations with cubic time complexity. As a result, the most expensive operations only need to be performed once, as with the spectral decomposition performed in the FaST-LMM and GEMMA software packages [3, 5, 6], but these operations consist only of matrix-vector products, as in the BOLT-LMM software package [7, 8]. All but one iteration of the REML optimization procedure requires only vector operations, yielding increased speed and numerical precision relative to existing methods. Furthermore, the unique strengths of the two methods lead to a flexible approach depending on researcher goals: SLDF_REML is capable of operating on precomputed GRMs when available, whereas L_FOMC_REML can generate BLUPs of latent SNP effects without added computational burden. We recommend these algorithms for incorporation into GWAS LMM implementations.


We consider the two component genomic variance components model commonly employed in LMM association studies [1], which is of the form

$$\begin{array}{*{20}l} y&=X\beta+\frac{1}{\sqrt{m}}Zu+e,\\ u&\stackrel{{i.i.d.}}{\sim}\mathcal{N}\left(0,\sigma_{g}^{2}\right),\quad e\stackrel{{i.i.d.}}{\sim}\mathcal{N}\left(0,\sigma_{e}^{2}\right), \end{array} $$

where y is a measured phenotype, the cn columns of \(X\in \mathbb {R}^{n\times c}\) are covariates (including an intercept term) with corresponding fixed effects β, and \(Z\in \mathbb {R}^{n\times m}\) is a matrix of n individuals’ standardized genotypes at m loci. Without loss of generality, we assume that X has full column rank; in the case of numerical rank deficiency we can simply replace X by the optimal full rank approximation generated by its economy singular value decomposition or rank revealing QR decomposition. The latent genetic effects \(u\in \mathbb {R}^{m}\) and residuals \(e\in \mathbb {R}^{n}\) are random variables with distributions parametrized by the heritable and non-heritable variance components, \(\sigma _{g}^{2}\) and \(\sigma _{e}^{2}\), respectively. The REML criterion corresponds to the marginal likelihood of \(\sigma _{g}^{2},\sigma _{e}^{2}\vert K^{T}y\), where KT projects to an (nc)-dimensional subspace orthogonal to the covariate vectors such that the null space of KT is exactly the column space of X [12]. In other words \(K^{T}:\mathbb {R}^{n}\rightarrow \mathcal {S}\subset \mathbb {R}^{n-c}\) such that \(\mathbb {R}^{n}=\mathcal {S}\oplus \text {col}\ X\). The transformed random variable KTy has the marginal distribution \(K^{T}y\sim \mathcal {MVN}\left (0,\sigma _{g}^{2}\frac {1}{m}K^{T}ZZ^{T}K+\sigma _{e}^{2}KK^{T}\right)\), which we reparametrize as \(K^{T}y\sim \mathcal {MVN}\left (0,\sigma _{g}^{2}K^{T}H_{\tau }K\right)\), where

$$ H_{\tau}=\frac{1}{m}ZZ^{T}+\tau I_{n},\quad \tau=\sigma_{e}^{2}/\sigma_{g}^{2}. $$

Here, \(\frac {1}{m}ZZ^{T}\), which indicates the average covariance between individuals’ standardized genotypes, is often referred to as the genomic relatedness matrix (GRM). The REML criterion, or marginal log likelihood, can be expressed as a function of τ:

$$\begin{array}{*{20}l} \ell\left(\tau\vert K^{T}y\right)\propto & -(n-c)\ln\left(\hat{\sigma}^{2}_{g}(\tau)\right)-\hat{\sigma}^{2}_{e}(\tau)^{-1}y^{T}P_{\tau} y \\\ &-\ln\left(\det\left(K^{T}H_{\tau} K\right)\right), \end{array} $$

where Pτ=K(KTHτK)−1KT, and, as implied by the REML first-order (stationarity) conditions, \(\hat {\sigma }^{2}_{e}(\tau)\) is the expected residual variance component given τ and \(\hat {\sigma }^{2}_{g}(\tau)=\hat {\sigma }^{2}_{e}(\tau)/\tau \) [12, 13]. In practice, K is never explicitly formed.

Naïve procedures for maximizing the REML criterion require evaluating (3) or its derivatives at each iteration of the optimization procedure. Previous methods either reduce the number of necessary cubic time complexity operations to one by exploiting problem structure, or substitute quadratic time complexity iterative and stochastic matrix operations for direct computations (Fig. 1). Here, we unify these approaches via the principle of Krylov subspace shift invariance to achieve methods that only require a single iteration of quadratic time complexity operations.

In what follows, we first present a brief survey of the Lanczos process, its applications to families of shifted linear systems, and its use in constructing Gaussian quadratures for spectral matrix functions. We assume familiarity with the method of conjugate gradients, an iterative procedure for approximating solutions to symmetric positive definite linear systems, and Gaussian quadrature, a method for approximating the integral of a given function by a well chosen weighted sum of its values; if not, see [14] and [15], respectively. We present these methods toward the goal of efficiently evaluating the quadratic form and log-determinant terms appearing in the REML criterion (3). We then present the details of the SLDF_REML and L_FOMC_REML algorithms, both of which exploit problem structure via Lanczos process-based methods in order to speed computation. Finally, we derive expressions for the computational complexity of the present algorithms, which we confirm via numerical experiment.


The notation in this section is self-contained. Our presentation borrows from the literature extensively; further details on the (block) Lanczos procedure [14, 16], conjugate gradients for shifted linear systems [17, 18], stochastic trace estimation [19, 20], and stochastic Lanczos quadrature [2123] are suggested in the bibliography.

Krylov subspaces

Consider a symmetric positive-definite matrix A and nonzero vector b. Define the mth Krylov subspace by the span of the first m−1 monomials in A applied to b; that is, \(\mathcal {K}_{m}(A,b)=\text {span}\, \left \{A^{k}b:k=0,\ldots,m-1\right \}\). Krylov subspaces are shift invariant—i.e., for real numbers σ, we have \(\mathcal {K}_{m}(A,b)=\mathcal {K}_{m}(A+\sigma I,b)\).

The Lanczos procedure

The Lanczos procedure generates the decomposition AUm=UmTm, where the columns u1,…,um of Um form an orthonormal basis for \(\mathcal {K}_{m}(A,b)\) and the Jacobi matrices\(T_{m}\in \mathbb {R}^{m\times m}\) are symmetric tridiagonal. Choosing u1=b/b, successive columns are uniquely determined by the sequence of Lanczos polynomials \(\{p_{k}\}_{k=1}^{m-1}\) such that each uk=pk−1(A)u1 and each pk is the characteristic polynomial of Jacobi matrix Tk consisting of the first k rows and columns of Tm. The Lanczos procedure is equivalent to the well-known method of conjugate gradients (CG) for solving the linear system Ax=b in that the mth step CG approximate solution x(m) is obtained from the above decomposition using only vector operations (see Algorithm 1). The number of steps m prior to termination corresponds to the number of CG iterations need to bound the norm of the residual below a specified tolerance: Ax(m)b<ε. The rate of convergence depends on the spectral properties of A and can be controlled in terms of the spectral condition number κ(A). In the present application, the fact that all complex traits of interest generally have a non-trivial non-heritable component results in well-conditioned systems [7, 9].

Solving families of shifted linear systems

Having applied the Lanczos process to the seed system Ax=b, shift-invariance can be exploited to obtain the mth step CG approximate solution \(x_{\sigma }^{(m)}\) to the shifted linear system Aσxσ=(A+σI)xσ=b, only using vector operations [17]. It can be shown that any positive shift by σ≥0 improves the rate of convergence such that \(\left \Vert A_{\sigma }x_{\sigma }^{(m)}-b\left \Vert = \frac {\delta _{m}}{\delta _{m}+\sigma }\right \Vert Ax^{(m)}-b\right \Vert \), where δm>0 is the mth diagonal element of the Lanczos Jacobi matrix corresponding to \(\mathcal {K}_{m}(A,b)\).

Lanczos polynomials and Gaussian quadrature

Additionally, the Lanczos polynomials comprise a sequence of orthogonal polynomials with respect to the spectral measure

$$\mu_{A,v}(t)=\sum_{j=1}^{\ell:\lambda_{\ell}\leq t} \left(Q^{T}v\right)_{j}^{2}, $$

where A=QΛQT is the spectral decomposition [21, 22]. Quadratic forms vTf(A)v involving spectral functions f(A)=Qf(Λ)QT, e.g., for the matrix logarithm, \(v^{T}(\log A)v = \sum _{i=1}^{n} \left [\ln (\lambda _{i})\left (Q^{T}v\right)_{i}^{2}\right ]\), can be written as Riemann–Stieltjes integrals of the form

$$ v^{T}Qf(\Lambda)Q^{T}v=\int_{\lambda_{1}}^{\lambda_{n}}f(t)d\mu_{A,v}(t). $$

The Lanczos decomposition AUm=UmTm generates the weights and nodes for an m-point Gaussian quadrature approximating the above integral. Denoting the spectral decomposition of the jth Jacobi matrix \(T_{j}=W_{j}D_{j}W_{j}^{T}\) for j=1,…,m, we approximate (4) as

$$\int_{\lambda_{1}}^{\lambda_{n}}f(t)d\mu_{A,v}(t)\approx\sum_{\ell=1}^{m}\omega_{j,\ell}f(\theta_{j,\ell}), $$

where θj,={Dj}, and \(\omega _{j,\ell }=\left \{e_{1}^{T}W_{j}\right \}_{\ell }\). As m here corresponds to the number of CG iterations needed to ensure that Ax(m)v is smaller than a specified tolerance, the tridiagonal Jacobi matrices are small and calculating their spectral decompositions is computationally trivial.

Stochastic Lanczos quadrature

Stochastic Lanczos quadrature (SLQ) combines the above quadrature formulation with Hutchinson-type stochastic trace estimators [21]. Such estimators approximate the trace of a matrix \(H\in \mathbb {R}^{n\times n}\) by a weighted sum of quadratic forms \(\text {tr}(H)\approx \frac {n}{n_{\text {rand}}}\sum _{k=1}^{n_{\text {rand}}}v_{k}^{T}{Hv}_{k}\) for normalized, suitably distributed i.i.d. random probing vectors\(\{v_{j}\}_{j=1}^{n_{\text {rand}}}\) [19]. The SLQ approximate trace of a spectral function of a matrix, tr(f(A)), is then

$$\begin{array}{*{20}l} \text{tr}(f(A)) & \approx \frac{n}{n_{\text{rand}}} \sum_{k=1}^{n_{\text{rand}}}v_{k}^{T}Qf(A)Q^{T}v_{k} \\ &=\frac{n}{n_{\text{rand}}}\sum_{k=1}^{n_{\text{rand}}}\int_{\lambda_{1}}^{\lambda_{n}}f(t)d\mu_{A,v_{k}}(t)\\ &\approx\frac{n}{n_{\text{rand}}}\sum_{k=1}^{n_{\text{rand}}}\sum_{\ell=1}^{m_{\kappa}}\omega_{k,\ell}f(\theta_{k,\ell}). \end{array} $$

Whereas the number of probing vectors nrand is chosen a priori, the number quadrature nodes mκ corresponds to the number of conjugate gradient iterations needed to ensure \(\left \Vert A_{\sigma }x_{j \sigma }^{(m_{\kappa })}-v_{j}\right \Vert \) is less than a specified tolerance for each j=1,…,nrand.

SLQ and shift invariance

For a fixed probing vector vi, we can exploit the shift invariance of \(\mathcal {K}_{m}(A,v_{i})\) to efficiently update Gaussian quadrature generated by the corresponding Lanczos decomposition AUm=UmTm. Again denoting the spectral decomposition of the Jacobi matrix \(T_{i}=W_{i}D_{i}W_{i}^{T}\), the Lanczos decomposition of the shifted system is simply \(A_{\sigma }U_{m}=U_{m}W_{m}(D_{m}+\sigma I_{m})W_{m}^{T}\). Thus, given the approximation (5) for tr(f(A)), we can efficiently compute an approximation of tr(f(Aσ)) for any σ>0. In Algorithm 2 we implement a method for estimating tr(log(Aσ)) in \(\mathcal {O}(n_{\text {rand}})\) operations given the spectral decompositions of the Jacobi matrices corresponding to \(\mathcal {K}_{m}(A,v_{j})\) for probing vectors \(\{v_{j}\}_{j=1}^{n_{\text {rand}}}\).

Block methods

For multiple right hand sides B=[b1||bc], the Lanczos procedure can be generalized to the block Krylov subspace\(\mathcal {K}_{m}(A,B)=\bigotimes _{j=1}^{c}\mathcal {K}_{m}(A,b_{j})\), resulting in a collection of Lanczos decompositions AUj=UjTj such that {Uj}1=bj/bj for j=1,…,c. This process is equivalent to block CG methods in that the Jacobi matrices can again be used to generate an approximate solution X(m) to the matrix equation AX(m)=B. We provide an implementation of the block Lanczos procedure in L_Seed [9], employing a conservative convergence criterion defined in terms of the magnitude of the (1,2) operator norm of the residual \(\left \Vert AB- X^{(m)}\right \Vert _{1\rightarrow 2}=\max _{j}\left \Vert {Ab}_{j}- x^{(m)}_{j}\right \Vert _{2}\). Compared to performing c separate Lanczos procedures with respect to \(\{\mathcal {K}_{m}(A,b_{j})\}_{j=1}^{c}\), block Lanczos with respect to \(\mathcal {K}_{m}(A,b)\), with B=[b1||bc], produces the same result (for a fixed number of steps). However, block Lanczos employs BLAS-3 operations and is thus more performant, especially when implemented on top of parallelized linear algebra subroutines.

A derivative-free REML algorithm

We propose the stochastic Lanczos derivative-free residual maximum likelihood algorithm (SLDF_REML; Algorithm 3), a method for efficiently and repeatedly evaluating the REML criterion, which is then subject to a zeroth-order optimization scheme. To achieve this goal, we first identify the parameter space of interest with a family of shifted linear systems. We then develop a scheme for evaluating the quadratic form yTPτy and log determinant ln(det(KTHτK)) terms in the REML criterion (3) that use the previously discussed Lanczos methods to exploit this shifted structure. Specifically, after obtaining a collection of Lanczos decompositions, we can repeatedly solve the linear systems involved in the quadratic form term via Lanczos conjugate gradients and approximate the log determinant term via stochastic Lanczos quadrature.

The parameter space as shifted linear systems

Given a range of possible values of the standardized genetic variance component, or heritability,

$$ h^{2}=\sigma_{g}^{2}/\left(\sigma_{g}^{2}+\sigma_{e}^{2}\right),\quad h^{2}\in\left[h^{2}_{\min},h^{2}_{\max}\right], $$

we set \(\phantom {\dot {i}\!}\tau _{0}=(1-h^{2}_{\max })/h^{2}_{\max }\) and define \(H_{0}=H_{\tau _{0}}\phantom {\dot {i}\!}\), noting that for all \(\tau \in \Theta =\left \{\left (1-h^{2}\right)/h^{2}: h^{2}\in \left [h^{2}_{\min },h^{2}_{\max }\right ]\right \}\phantom {\dot {i}\!}\), the spectral condition number of Hτ will be less than that of H0 as the identity component of Hτ will only increase. Further, we have now identified elements of our parameter space τΘ with the family of shifted linear systems

$$\mathcal{H}_{\tau_{0}} = \{H_{\sigma} = H_{\tau}=H_{0}+\sigma I_{n}:\sigma=\tau-\tau_{0}\}. $$

For any vector v for which we have computed the Lanczos decomposition H0U=UT with the first column of U equal to v/v, we can use Algorithm 1 to obtain the CG approximate solution \(\phantom {\dot {i}\!}x_{\sigma }\approx H_{\sigma }^{-1}v\) for all σ≥0 in O(n) operations.

The quadratic form

Directly evaluating the quadratic form

$$ y^{T}P_{\tau} y = y^{T}K\left(K^{T}H_{\tau} K\right)^{-1}K^{T}y $$

is computationally demanding and is typically avoided in direct estimation methods [12, 13]. Writing the complete QR decomposition of the covariate matrix \(\phantom {\dot {i}\!}X=[Q_{X}\vert Q_{X^{\perp }}]R\) allows us to define \(\phantom {\dot {i}\!}K^{T}=Q_{X^{\perp }}^{T}\), noting that substituting \(\phantom {\dot {i}\!}Q_{X^{\perp }}Q_{X^{\perp }}^{T}\) for KT preserves the value of (7). \(Q_{X^{\perp }}Q_{X^{\perp }}^{T}\) is equivalent to the orthogonal projection operator \(\phantom {\dot {i}\!}S:v\mapsto v - Q_{X}Q_{X}^{T}v\), which admits an efficient implicit construction and is computed in O(nc2) operations via the economy QR decomposition X=QXRX. Then, reexpressing (7) as yTS(SHτS)Sy, we can use the Lanczos process to construct an orthonormal basis and corresponding Jacobi matrix for the Krylov subspace \(\mathcal {K}({SH}_{0}S,Sy)\). We can then obtain the CG approximation of yTS(SHσS)−1Sy using vector operations as, for any shift σ, we have yTS(SHσS)Sy=yTS(SH0S+σIn)−1Sy (see Lemma 1 in Additional file 1 for proof).

The log determinant

We use an equivalent formulation [12, 24] of the term ln(det(KTHτK)), rewriting it as

$$\ln(\det(H_{\tau}))+\ln\left(\det\left(X^{T}H^{-1}_{\tau} X\right)\right)- \ln\left(\det\left(X^{T}X\right)\right). $$

The det(XTX) term is constant with respect to τ and can be disregarded. For cn, \(\det \left (X^{T}H^{-1}_{\tau } X\right)\) is computationally trivial via direct methods given \(H^{-1}_{\tau } X\), which we can compute for all parameter values of interest in O(n) operations having first applied the block Lanczos process with respect to \(\mathcal {K}(H_{0},X)\). Computing the block Lanczos decomposition corresponding to \(\mathcal {K}(H_{0},X)\), which is only performed once, unfortunately scales with the number of covariates c, a disadvantage not shared by our second algorithm (Algorithm 4). The remaining term, ln(det(Hτ)), is approximated by applying SLQ (Algorithm 2) to a special case of (5): We rewrite the log determinant as the trace of the matrix logarithm

$$\begin{array}{*{20}l} \ln(\det(H_{\tau}))&=\text{tr\,}(\log(H_{\tau}))\\ &=\text{tr\,}Q[\ln(\lambda_{1}+\sigma)\vert\cdots\vert\ln(\lambda_{n} + \sigma)]Q^{T}, \end{array} $$

where we have spectrally decomposed H0=QΛQT for some τ0τ with σ=ττ0. We draw nrand i.i.d. normalized Rademacher random vectors \(v_{1},\ldots,v_{n_{\text {rand}}}\phantom {\dot {i}\!}\), where each element of each vector vi takes values of either 1/vi or −1/vi with equal probability. The SLQ approximate of the log determinant for the seed system is

$$\ln(\det(H_{\sigma}))\approx\frac{n}{n_{\text{rand}}}\sum_{i=1}^{n_{\text{rand}}}\sum_{\ell=1}^{m_{i}}\omega_{i,\ell}\ln(\theta_{i,\ell}+\sigma), $$

where the weights wi, and nodes θi, are respectively derived by using the Lanczos process to construct orthonormal bases for \(\mathcal {K}(H_{0},v_{i})\) (in practice, we apply block Lanczos to \(\mathcal {K}(H_{0},(v_{1},\ldots,v_{n_{\text {rand}}}))\)) [21, 22].

The SLDF_REML algorithm

Stochastic Lanczos derivative-free residual maximum likelihood (SLDF_REML; Algorithm 3), conceptually similar to the derivative-free algorithm of Graser and colleagues [13], applies the previously introduced Lanczos methods to approximate the above reparametrization of the REML criterion. Shift-invariance is then exploited such that, with the exception of the initial Lanczos decompositions, the REML log likelihood can be repeatedly evaluated using only vector operations. SLDF_REML takes a phenotype vector \(y\in \mathbb {R}^{n}\), a covariate matrix \(X\in \mathbb {R}^{n\times c}\), either the genetic relatedness matrix \(ZZ^{T}\in \mathbb {R}^{n\times n}\) or the standardized genotype matrix \(Z\in \mathbb {R}^{n\times m}\) (in which case the action of the GRM as a linear operator is coded implicitly as vZ(ZTv)), and a range of possible standardized genomic variance component values \(\Theta = \left [h^{2}_{\text {min}},h^{2}_{\max }\right ]\) as arguments and generates a function REML_criterion\(:\Theta \rightarrow \mathbb {R}\) that efficiently computes the log-likelihood of τ|KTy. This function is then subject to scalar optimization via Brent’s method, which is feasible given the low cost of evaluation and low dimension of Θ. Hyperparameters include the number of probing vectors to be used for the SLQ approximation of the log determinant nrand, as well as tolerances corresponding to the REML criterion, parameter estimates, and the Lanczos residual norms. Convergence to a given tolerance on a sensible scale is ensured by optimizing with respect to the heritability h2Θ [ 0,1] and evaluating the REML criterion at τ=(1−h2)/h2. The REML criterion can be repeatedly evaluated in \(\mathcal {O}(n)\) operations, making high accuracy computationally feasible.

A first-order Monte Carlo REML algorithm

We additionally propose the Lanczos first-order Monte Carlo residual maximum likelihood algorithm (L_FOMC_REML; Algorithm 4), which also takes advantage of the shifted structure of the standard genomic variance components model to speed computation. We first present the related first-order algorithm implemented in the efficient and widely-used BOLT-LMM software [7, 8], which we refer to as BOLT_LMM and of which the proposed L_FOMC_REML algorithm is a straightforward extension.

BOLT_LMM (First-order Monte Carlo REML)

The BOLT_LMM algorithm is based on the observation that at stationary points of the REML criterion (3), the first-order REML conditions (i.e., =0) imply that

$$ \mathbb{E}\left[\tilde{u}^{T}\tilde{u}\vert y\right] =\tilde{u}^{T}\tilde{u},\quad \mathbb{E}\left[\tilde{e}^{T}\tilde{e}\vert y\right] =\tilde{e}^{T}\tilde{e}, $$

where \(\tilde {u}\) and \(\tilde {e}\) are the best linear unbiased predictions (BLUPs) of the latent genetic effects and residuals, respectively [25]. The BLUPs are functions of τ given by

$$\begin{array}{*{20}l} \tilde{u}(\tau)&=m^{-1/2}Z^{T}S\acute{H}_{\tau}^{-1}Sy, \\ \tilde{e}(\tau)&=\tau\acute{H}_{\tau}^{-1}Sy, \end{array} $$

where we have defined \(\acute {H}_{\tau }= \frac {1}{m}SZZ^{T}S+\tau I_{n}\). The expectations (8) are approximated via the following stochastic procedure: Monte Carlo samples of the latent variables, \(\check {u}_{k}\stackrel {i.i.d}{\sim }\mathcal {MVN}(0,I_{m})\), \(\check {e}_{k}\stackrel {i.i.d}{\sim }\mathcal {MVN}(0,S)\) are used to generate samples of the projected phenotype vector

$$\check{y}_{k} = m^{-1/2}SZ\check{u}_{k} + \check{e}_{k},\quad k=1,\ldots n_{\text{rand}}. $$

BLUPs are then computed as in (9), yielding the approximations

$$\begin{array}{*{20}l} \mathop{\mathbb{E}}_{\text{MC}}\left[\tilde{u}^{T}\tilde{u}\vert y\right] &= \frac{n_{\text{rand}}^{-1}}{\sqrt{m}} \sum_{k=1}^{n_{\text{rand}}} \left\Vert Z^{T}S\acute{H}_{\tau}^{-1}S\check{y}_{k} \right\Vert^{2}, \\ \mathop{\mathbb{E}}_{\text{MC}}\left[\tilde{e}^{T}\tilde{e}\vert y\right] &= n_{\text{rand}}^{-1} \sum_{k=1}^{n_{\text{rand}}} \left\Vert \tau\acute{H}_{\tau}^{-1}S\check{y}_{k} \right\Vert^{2}. \end{array} $$

Using the above expressions, Loh et al. [7, 8] apply a zeroth-order root-finding algorithm to the quantity

$$f_{r}(\tau)= \ln \left[ \frac{\tilde{u}^{T}\tilde{u}}{\tilde{e}^{T}\tilde{e}} \right] - \ln \left[ \frac{\mathop{\mathbb{E}}_{\text{MC}}\left[\tilde{u}^{T}\tilde{u}\vert y\right]}{\mathop{\mathbb{E}}_{\text{MC}}\left[\tilde{e}^{T}\tilde{e}\vert y\right]} \right], $$

noting that fr=0 is a necessary condition (and, in practice, a sufficient condition) for (8). Using CG to approximate solutions to the linear systems involved in BLUP computations results in an efficient REML estimation procedure involving O(n·m·nrand) operations for well-conditioned covariance structures (i.e., for nontrivial non-heritable variance component values). As noted in [8], implicit preconditioning of H0 can be achieved by including the first few right singular vectors of the genotype matrix (or eigenvectors of the GRM) as columns of the covariate matrix X.

The L_FOMC_REML algorithm

The BOLT_LMM algorithm described above involves solving nrand+1 linear systems

$$\acute{H}_{\tau_{\ell}}^{-1}S\check{y},\ \acute{H}_{\tau_{\ell}}^{-1}S\check{y}_{1},\ \ldots,\ \acute{H}_{\tau_{\ell}}^{-1}S\check{y}_{n_{\text{rand}}}, $$

at each iteration of the optimization scheme in order to compute BLUPs of the latent variables for the observed phenotype vector and each of the Monte Carlo samples. However, each iteration involves spectral shifts of the left hand side of the form

$$\acute{H}_{\tau_{\ell+1}}^{-1}=\left(\acute{H}_{\tau_{\ell}}+\sigma I_{n}\right)^{-1},\quad \sigma=(\tau_{\ell+1}-\tau_{\ell}). $$

As in the SLDF_REML algorithm, the underlying block Krylov subspace is invariant to these shifts (i.e., \(\mathcal {K}_{m}\left (\acute {H}_{\tau },Y\right)=\mathcal {K}_{m}\left (\acute {H}_{\tau }+\sigma I,Y\right)\), where \(Y=\left [y\vert \check {y}_{1}\vert \cdots \vert \check {y}_{n_{\text {rand}}}\right ]\)). Thus, having performed the Lanczos process for an initial parameter value τ0, we can use L_Solve (Algorithm 1) to obtain the block CG approximate solution \(X_{\sigma }^{(m)}\approx \acute {H}_{\tau +\sigma }^{-1}Y\) in O(n·nrand) operations. We are thus able to avoid solving linear systems in all subsequent iterations, though the relatively small number of matrix-vector products involved in computing BLUPs for the latent genetic effects at each step are unavoidable. The requirement of the genotype matrix for computing (9) prevents both L_FOMC_REML and BOLT_LMM from efficiently exploiting precomputed GRMs.

Comparison of methods

We compare theoretical and empirical properties of our proposed algorithms, SLDF_REML and L_FOMC_REML, to those of BOLT_LMM.

Computational complexity

In contrast to BOLT_LMM, the Lanczos-decomposition based algorithms we have proposed only need to perform the computationally demanding operations necessary to evaluate the REML criterion once. As such, we differentiate between overhead computations, which occur once and do not depend on the number of iterations needed to achieve convergence, and per-iteration computations, which are repeated until convergence of the optimization process (Table 1 and Fig. 2).

The overhead computations of SLDF_REML are dominated by the need to construct bases for the nrand+c+1 subspaces \(\mathcal {K}(H_{0},[\check {v}_{1},\ldots,\check {v}_{n_{\text {rand}}},x_{1},\ldots,x_{c},y])\), and are thus \(\mathcal {O}\left (n^{2}(n_{\text {rand}}+c)n_\kappa \right)\) when a precomputed GRM is available and \(\mathcal {O}(2m\cdot n(n_{\text {rand}}+c)n_\kappa)\) otherwise. Here, nκ denotes the number of Lanczos iterations needed to achieve convergence at a pre-specified tolerance and increases with \(h^{2}_{\max }\). Subsequent iterations are dominated by the cost of solving c+1 shifted linear systems via L_Solve and are thus \(\mathcal {O}(n\cdot c\cdot n_\kappa)\). The overhead computations in L_FOMC_REML are dominated by the Lanczos decompositions corresponding to the 2nrand+1 seed systems, where the GRM is implicitly represented in terms of the standardized genotype matrix, and is thus \(\mathcal {O}(4m\cdot n\cdot n_{\text {rand}} \cdot n_\kappa)\). Operations of equivalent complexity are needed at every iteration of BOLT_LMM.

Numerical experiments

We compared wall clock times for genomic variance component estimation for height in nested random subsets of 16,000, 32,000, 64,000, 128,000, and 256,000 unrelated \((\hat {\pi }<.05)\) European ancestry individuals from the widely used UK Biobank data set [26]. All analyses included 24 covariates consisting of age, sex, and testing center and used hard-called genotypes from 330,723 array SNPs remaining after enforcing a 1% minor allele frequency cutoff. We compared SLDF_REML, with and without a precomputed GRM, to L_FOMC_REML which requires the genotype matrix. For the novel algorithms, absolute tolerances for the Lanczos iterations and the REML optimization procedure were set to 5e-5 and 1e-5, respectively. Additionally, we compared our interpreted Python 3.6 code to BOLT-LMM versions 2.1 and 2.3.3 (C++ code compiled against the Intel MKL and Boost libraries) [7, 8, 27, 28]. We ran each algorithm twenty times per condition, trimming away the two most extreme timings in each condition. Mirroring the default settings of the BOLT-LMM software packages, we set nrand=15 across both of our proposed methods.

Novel algorithms were implemented in the Python v3.6.5 computing environment [9], using NumPy v1.14.3 and SciPy v1.1.0 compiled against the Intel Math Kernel Library v2018.0.2 [2830]. Optimization was performed using SciPy’s implementation of Brent’s method, with convergence determined via absolute tolerance of the standardized genomic variance component \(\hat {h}^{2}\). Timing results (Table 2 and Figs. 3 and 5) do not include time required to read genotypes into memory, or, when applicable, to compute GRMs, and reflect total running time on an Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz with 32 physical cores and 1 terabyte of RAM. Timing experiments excluded methods with cubic time complexity, including GCTA, FaST-LMM, and GEMMA. Accuracy was assessed by comparing heritability estimates generated by the stochastic algorithms to those estimated via the direct, deterministic average-information Newton–Raphson algorithm as implemented in the GCTA software package v1.92.0b2 [4] (Figs. 4 and 5).

Availability of data and materials

The UK Biobank data are available to qualified researchers via the UK Biobank Access Management System ( The code used in the numerical experiments is available on Github (



Basic linear algebra subprogram


Basic linear unbiased prediction


Conjugate gradients method


Genome-wide complex trait analysis [4]


Genomic relatedness matrix


Genome-wide association study


Linear mixed-effects model


Residual maximum likelihood


Stochastic Lanczos quadrature


  1. Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and Pitfalls in the Application of Mixed Model Association Methods. Nat Genet. 2014; 46(2):100–6.

    Article  Google Scholar 

  2. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using Lme4. 2014. arXiv preprint arXiv:14065823.

  3. Zhou X, Stephens M. Genome-Wide Efficient Mixed Model Analysis for Association Studies. Nat Genet. 2012; 44(7):821–4.

    Article  CAS  Google Scholar 

  4. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A Tool for Genome-Wide Complex Trait Analysis. Am J Hum Genet. 2011; 88(1):76–82.

    Article  CAS  Google Scholar 

  5. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST Linear Mixed Models for Genome-Wide Association Studies. Nat Methods. 2011; 8(10):833–5.

    Article  CAS  Google Scholar 

  6. Zhou X, Stephens M. Efficient Multivariate Linear Mixed Model Algorithms for Genome-Wide Association Studies. Nat Methods. 2014; 11(4):407.

    Article  CAS  Google Scholar 

  7. Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian Mixed-Model Analysis Increases Association Power in Large Cohorts. Nat Genet. 2015; 47(3):284–90.

    Article  CAS  Google Scholar 

  8. Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-Model Association for Biobank-Scale Datasets. Nat Genet. 2018; 50:906–8.

    Article  CAS  Google Scholar 

  9. Border R. Stochastic Lanczos Likelihood Estimation of Genomic Variance Components. Appl Math Grad Theses Dissertations. 2018;120.

  10. de los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D. Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor. PLoS Genet. 2013; 9(7):e1003608.

    Article  Google Scholar 

  11. Evans LM, Tahmasbi R, Vrieze SI, Abecasis GR, Das S, Gazal S, et al. Comparison of Methods That Use Whole Genome Data to Estimate the Heritability and Genetic Architecture of Complex Traits. Nat Genet. 2018; 50(5):737–45.

    Article  CAS  Google Scholar 

  12. Searle SR, Casella G, McCulloch CE. Variance Components, vol. 391. United States: Wiley; 2009.

    Google Scholar 

  13. Graser HU, Smith SP, Tier B. A Derivative-Free Approach for Estimating Variance Components in Animal Models by Restricted Maximum Likelihood. J Anim Sci. 1987; 64(5):1362–70.

    Article  Google Scholar 

  14. Björck A. Numerical Methods in Matrix Computations, vol. 59. Switzerland: Springer; 2015.

    Book  Google Scholar 

  15. Atkinson KE. An Introduction to Numerical Analysis. United Kingdom: Wiley; 2008.

    Google Scholar 

  16. O’Leary DP. The Block Conjugate Gradient Algorithm and Related Methods. Linear Algebra Appl. 1980; 29:293–322.

    Article  Google Scholar 

  17. Frommer A, Maass P. Fast CG-Based Methods for Tikhonov-Phillips Regularization. SIAM J Sci Comput. 1999; 20(5):1831–50.

    Article  Google Scholar 

  18. Sogabe T. A Fast Numerical Method for Generalized Shifted Linear Systems with Complex Symmetric Matrices. Recent Dev Num Anal Num Comput Algoritm. 2010;:13.

  19. Hutchinson MF. A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines. Commun Stat Simul Comput. 1990; 19(2):433–50.

    Article  Google Scholar 

  20. Avron H, Toledo S. Randomized Algorithms for Estimating the Trace of an Implicit Symmetric Positive Semi-Definite Matrix. J ACM. 2011; 58(2):8:1–8:34.

    Article  Google Scholar 

  21. Golub GH, Matrices MG. Moments and Quadrature with Applications. Princeton: Princeton University Press; 2009.

    Book  Google Scholar 

  22. Ubaru S, Chen J, Saad Y.Fast Estimation of Tr(f(A)) via Stochastic Lanczos Quadrature. SIAM J Matrix Anal Appl. 2017; 38(4):1075–99.

    Article  Google Scholar 

  23. Chen J, Saad Y. A Posteriori Error Estimate for Computing Tr(f(A)) by Using the Lanczos Method. 2018. arXiv:180204928 [math].

  24. Zhu S, Wathen AJ. Essential Formulae for Restricted Maximum Likelihood and Its Derivatives Associated with the Linear Mixed Models. 2018. arXiv:180505188 [stat].

  25. McCulloch C, Searle SR, Neuhaus JM. Generalized, Linear, and Mixed Models. Hoboken: Wiley; 2008.

    Google Scholar 

  26. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015; 12(3):e1001779.

    Article  Google Scholar 

  27. Schling B. The Boost C++ Libraries. USA: XML Press; 2011.

    Google Scholar 

  28. Wang E, Zhang Q, Shen B, Zhang G, Lu X, Wu Q, et al. Intel Math Kernel Library. In: High-Performance Computing on the Intel®Xeon Phi™. New York: 2014. p. 167–88.

  29. Oliphant T. NumPy: A Guide to NumPy. 2006.

  30. Jones E, Oliphant T, Peterson P, et al. SciPy: Open Source Scientific Tools for Python. 2001.

  31. Loh PR, Bhatia G, Gusev A, Finucane HK, Bulik-Sullivan BK, Pollack SJ, et al. Contrasting Genetic Architectures of Schizophrenia and Other Complex Diseases Using Fast Variance Components Analysis. Nat Genet. 2015; 47(12):1385–92.

    Article  CAS  Google Scholar 

Download references


The authors wish to thank UK Biobank participants. Additionally, the authors thank Matthew C. Keller and Luke M. Evans for their thoughtful comments and provision of computational resources. Publication of this article was funded by the University of Colorado Boulder Libraries Open Access Fund.


Richard Border was supported by a training grant from the National Institute of Mental Health (T32 MH016880) and by the Institute for Behavioral Genetics. Stephen Becker acknowledges funding by NSF grant DMS-1819251.

Author information

Authors and Affiliations



RB wrote the manuscript, developed the algorithms, wrote the code used in numerical experiments, and analyzed the data. SB supervised the project and contributed to the development of the algorithms and the writing of the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Richard Border.

Ethics declarations

Ethics approval and consent to participate

UK Biobank data collection procedures were approved by the UK Biobank Research Ethics Committee (reference 11/NW/0382).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1

Proof of result used to efficiently compute the quadratic form (7). (PDF 125 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Border, R., Becker, S. Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models. BMC Bioinformatics 20, 411 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: