 Research article
 Open Access
 Published:
Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis
BMC Bioinformatics volume 19, Article number: 3 (2018)
Abstract
Background
Running multiplechain Markov Chain Monte Carlo (MCMC) provides an efficient parallel computing method for complex Bayesian models, although the efficiency of the approach critically depends on the length of the nonparallelizable burnin period, for which all simulated data are discarded. In practice, this burnin period is set arbitrarily and often leads to the performance of far more iterations than required. In addition, the accuracy of genomic predictions does not improve after the MCMC reaches equilibrium.
Results
Automatic tuning of the burnin length for running multiplechain MCMC was proposed in the context of genomic predictions using BayesA and BayesCπ models. The performance of parallel computing versus sequential computing and tunable burnin MCMC versus fixed burnin MCMC was assessed using simulation data sets as well by applying these methods to genomic predictions of a Chinese Simmental beef cattle population. The results showed that tunable burnin parallel MCMC had greater speedups than fixed burnin parallel MCMC, and both had greater speedups relative to sequential (singlechain) MCMC. Nevertheless, genomic estimated breeding values (GEBVs) and genomic prediction accuracies were highly comparable between the various computing approaches. When applied to the genomic predictions of four quantitative traits in a Chinese Simmental population of 1217 beef cattle genotyped by an Illumina Bovine 770 K SNP BeadChip, tunable burnin multiplechain BayesCπ (TBMBayesCπ) outperformed tunable burnin multiplechain BayesCπ (TBMBayesA) and Genomic Best Linear Unbiased Prediction (GBLUP) in terms of the prediction accuracy, although the differences were not necessarily caused by computational factors and could have been intrinsic to the statistical models per se.
Conclusions
Automatically tunable burnin multiplechain MCMC provides an accurate and costeffective tool for highperformance computing of Bayesian genomic prediction models, and this algorithm is generally applicable to highperformance computing of any complex Bayesian statistical model.
Background
Genomic predictions have been proposed as a method of providing accurate estimates of the genetic merits of breeding animals using genomewide SNP markers [1]. This new technology does not require the actual phenotyping of breeding candidates and therefore offers great promise for traits that are difficult or expensive to measure, such as carcass traits [2]. A noted feature of genomic predictions is that selection can be performed on breeding candidates at birth or young ages, which in turn accelerates genetic improvement progress more rapidly than conventional breeding approaches in farm animals [3,4,5,6].
Many genomic prediction models have been proposed, such as the Genomic Best Linear Unbiased Prediction (GBLUP) [7] and Bayesian alphabets [1, 8]. Bayesian genomic models are widely used [8,9,10,11], although complex calculations are required for Bayesian models implemented via Markov chain Monte Carlo (MCMC) and may take hours, days or even weeks to complete. Hence, parallel computing for genomic predictions is of importance for applying genomic selection in practice [12]. However, parallelization in MCMC is difficult because the procedure is iterative in the sense that simulating the next value of the chain depends on the current value, which violates Bernstein’s condition of independence for parallel computing [13]. This problem increases the difficulty of delivering parallelism for a single Markov chain. Thus, Wu et al. proposed the use of a multiplechain MCMC method to calculate Bayesian genomic prediction models [14].
Running multiplechain MCMC provides a naïve yet efficient form of parallel computing for Bayesian genomic prediction models, although the speedup in computing is limited by the burnin requirement, which is included to give the Markov chain time to reach equilibrium distribution. A burnin period corresponds to the first n samples during the MCMC, and these samples are discarded after the burnin is initiated from a poor starting point to the period before each chain moves into a high probability state. Often, the length of the burnin is assumed to be onetenth or onefifth of the entire length of the MCMC iterations or even half of the total iterations [1, 8,9,10]. This rule of thumb is often used for the sake of convenience but is not necessarily optimal for computing.
In the present paper, a multiplechain MCMC computing strategy utilizing automatic tuning of the burnin period was proposed and demonstrated with two Bayesian genomic prediction models (BayesA and BayesCπ). Using this strategy, a convergence diagnosis based on Gelman and Rubin [15] was conducted periodically in accordance with the multiplechain situation. The burnin period ended as soon as the convergence criteria were met, and posterior samples of unknown model parameters were then collected to perform statistical inferences. This strategy was assessed on a simulation data set and applied to genomic predictions of four quantitative traits in a Chinese Simmental population.
Methods
Simulation data
The simulated data set consisted of 1000 animals in scenario 1 and scenario 2 and 2000 animals in scenario 3, with each presenting a phenotype and genotypes on five chromosomes. The GPOPSIM software package [16] was used to generate the simulation data set, including the markers and QTLs based on a mutationdrift equilibrium model in the three scenarios. In scenario 1, the heritability of the trait was set at 0.1, and each chromosome had 4000 markers. In scenario 2, the heritability was 0.5, and each chromosome had 10,000 markers. In scenario 3, the heritability was 0.3, and each chromosome had 40,000 markers. In each of the three scenarios, 200 QTLs were simulated. The mutation rate of the markers and QTLs was set at 1.25 × 10^{−3} for each generation.
Real phenotype and genotype data
The experimental population consisted of 1302 Simmental cattle born between 2008 and 2013 in Ulgai, Xilingol League, Inner Mongolia, China. After weaning, all cattle were transferred to the Beijing Jinweifuren farm and raised under the same nutritional and management conditions. Each animal was evaluated regularly for growth and development traits until slaughter at between 16 and 18 months of age. At slaughter, the carcass traits and meat quality traits were assessed according to the Institutional Meat Purchase Specifications [17] for Fresh Beef Guidelines. The quantitative traits used in the present study included carcass back fat thickness (CBFT), strip loin weight (SLW), carcass weight (CW), and average daily gain (ADG). Prior to the genomic predictions, the phenotypes were adjusted for systematic environmental factors, which included the farms, seasons and years, and age at slaughter, using a linear regression model. The genetic and residual variances for each of the four traits were estimated by restricted maximum likelihood (REML) based on equivalent animal models.
Each animal was genotyped by an Illumina Bovine 770 K SNP BeadChip. SNP quality control was conducted using PLINK v1.07 software [18], which excluded SNPs under the following categories: 1) SNPs on the X and Y chromosomes, 2) SNPs with minor allele frequencies less than 0.05, 3) SNPs with > 5% missing genotypes, and 4) SNPs that violated HardyWeinberg equilibrium (p < 10^{−6}). After data cleaning, 1217 Simmental cattle remained for subsequent data analyses, and each had genotypes on up to 671,220 SNPs on 29 autosomes.
Statistical model
Adjusted phenotypes were described by the following linear regression model:
where y_{ i } is an adjusted phenotype for individual i, M is the number of SNPs, μ is the overall mean of the traits, a_{ j } is the additive (association) effect of the jth SNP, X_{ ij } is the genotype (0, 1, or 2) of the jth SNP observed on the ith individual, and e_{ i } is the residual term.
BayesA
The BayesA model [1] assumed a priori a normal distribution for SNP effects, with zero mean and SNPspecific variances denoted by \( {\upsigma}_j^2 \), where j = 1, 2, …, M. The variances of SNP effects were independent of one another, and each followed an identical and independently distributed (IID) scaled inverse chisquare prior distribution, \( \mathrm{p}\left({\upsigma}_j^2\right)={\upchi}^{2}\Big({\upsigma}_j^2\mid \nu, {S}^2 \)), where ν is the degree of freedom parameter and S^{2} is the scale parameter, both of which are assumed to be known. Thus, the marginal prior distribution of each marker effect, \( \mathrm{p}\left({\upalpha}_j\nu, {S}^2\right)=\int N\left({\upalpha}_j0,{\upsigma}_j^2\right){\upchi}^{2}\left({\upsigma}_j^2\nu, {S}^2\right)d{\upsigma}_j^2 \), was a tdistribution [19].
BayesCπ
The BayesCπ model [8] assumed a priori that each SNP effect was null with probability π or followed a normal distribution, \( N\left(0,{\sigma}_a^2\right) \), with probability 1π.
In the above, \( {\sigma}_a^2 \) is a variance common to all nonzero SNP effects, and it is assigned a scaled inverse chisquare prior distribution, \( {\chi}^{2}\left({v}_a,{s}_a^2\right) \). The value of π in the model is unknown, and it is inferred based on the prior distribution of π, which is considered uniform between 0 and 1, or π~Uniform(0, 1).
GBLUP
GBLUP [7] can be considered a reparameterization of the Bayesian RKHS (reproducing kernel Hilbert spaces) regression [20]. In RKHS, each SNP effect is assumed to follow a normal distribution with a zero mean and common variance; and in GBLUP, genomic estimated breeding values (GEBVs) are assumed to follow a normal distribution \( \boldsymbol{u}\sim \mathrm{N}\left(0,\mathbf{G}{\upsigma}_u^2\right) \), where G is a n × n genomic (co)variance matrix that is formulated as follows [7]:
where n is the number of SNPs, q_{ i } is the frequency of an allele of SNP i, and X is a centered incidence matrix of SNP effects, which are corrected for allele frequencies. The additive genetic variances and residual variances of the four traits were estimated by REML based on an animal model equivalent to (1).
Tunable versus fixed burnin multiplechain MCMC
Tunable burnin multiplechain MCMC
In multiplechain MCMC simulations, the following processes occur. Assume that we want to estimate some target distribution p(X) but cannot directly draw samples of X from p(X). Instead, a Markov chain X_{0}, X_{1}, … can be generated that converges to p(X) at equilibrium via a transition density u(X_{t + 1} X_{ t }). Now, let there be i = 1, 2, …, K parallel chains, with each initialized and burnedin independently for B_{ i } updating steps before more samples are drawn at intervals. As K → ∞ and all B_{ i } → ∞, the ensemble is ergodic (i.e., tending in the limit) to p(X) [14].
To assess the convergence of multiple parallel chains simulated for each model, both the interchain and withinchain variances were calculated for each selected model parameter, e.g., x. Briefly, the interchain variance I was calculated as follows:
The withinchain variance W was determined as follows:
where \( {s}_i^2=\frac{1}{n1}{\sum}_{j=1}^n{\left({x}_{ij}{\overline{x}}_i\right)}^2 \), \( {\overline{x}}_i=\frac{1}{n}{\sum}_{j=1}^n{x}_{ij} \), and \( \overline{x}=\frac{1}{m}{\sum}_{i=1}^m{\overline{x}}_i \). Then, the marginal posterior variance of x was estimated by a weighted average of W and I as follows:
Under the assumption that the starting distribution of x was appropriately overdispersed, the above quantity tended to overestimate the marginal posterior variance but was unbiased under stationarity (i.e., when the starting distribution equals the target distribution) or within the limit, n → ∞.
Following Gelman et al. [21], we assessed convergence by estimating the factor by which the scale of the current distribution for x might be reduced if the posterior simulation were continued within the limit, n → ∞. This potential scale reduction was estimated by the following shrink factor:
which reduced to 1 as n → ∞. A highscale reduction indicated that proceeding with more simulations could further improve the inference about the target distribution of the model parameter. To run multiplechain MCMC simulations, a collection of shrink factors was obtained, R = (r_{0}, r_{1}, …, r_{N − 1}), where N is the length of a chain, R^{(j)}_{=(}r_{(j − 1) × p}, r_{(j − 1) × p + 1}, …, r_{j × p − 1)} is the jth subsection of R, and p is the length of R^{(j)}. Let T be a threshold that was arbitrarily provided; the mean \( {\overline{\mathrm{r}}}^{(j)} \) and standard deviation S^{(j)} were calculated as follows:
The multiple chains were considered to converge in the jth subsection when \( \left{\overline{r}}^{(j)}1\right<T \) and S^{(j)} < T.
In the simulation study, each of the parallel MCMC chains was initiated independently. Then, the convergence diagnosis during burnin used samples from each parallel chain to determine the convergence state of these chains. The end of burnin iterations occurred when the convergence criteria were met. Then, the simulated posterior samples were collected to calculate the posterior summary statistics of the model parameters of interest, which were subject to the thinning of the MCMC chains.
For the Simmental cattle data set, we evaluated genomic prediction accuracies (GPAs) by running up to 16 parallel chains for both TBMBayesA and TBMBayesCπ, and the results were compared with those obtained from GBLUP.
Speedup ratio
According to Amdahl’s law [22], the speedup ratio of multiplechain MCMC over that of singlechain MCMC was calculated as follows:
where N_{burnin} is the number of burnin iterations that cannot be parallelized, N_{T} is the total number of MCMC iterations, and K is the number of computer cores available for running multiplechain Markov chains in parallel. The parallel computing efficiency was assessed as follows:
Evidently, when E = 1, the parallel computing scales linearly with the number of cores used for computing; thus, S(K) = K. However, because of the nonparallelizable burnins, the parallel computing efficiency is upper bounded by N_{ T }/N_{burn − in} (as K → ∞).
Parameter setting
Fixed burnin multiplechain MCMC jobs were also run on the simulation data, with the length of burnin set at onetenth of the total sequential MCMC iterations in scenario 2 and at onefifth of the total sequential MCMC iterations in scenarios 1 and 3. To assess the effect of the burnin length, we ran 50,000 iterations for each chain of fixed burnin multiplechain BayesA (FBMBayesA), which included burnins of 2000, 4000, 6000, 8000, and 10,000 iterations in scenario 2. Threshold T was set to 0.001 in this study.
Evaluation of genomic prediction accuracy
The GEBVs were calculated as the sum of all SNP effects of each individual (say i) as follows:
where X_{ ij } is a genotype (coded 0, 1, or 2) for SNP j of animal i and g_{ j } is the estimated genetic effect of the jth SNP.
The GPA relative to that of phenotypic selection was calculated as \( r/\sqrt{h^2} \), where r is Pearson’s correlation between GEBVs and true breeding values in the simulation study or Pearson’s correlation between GEBVs and adjusted phenotypes. This criterion of relative genomic prediction accuracy (RGPA) was used so that the GPAs were comparable regardless of their respective heritabilities [23].
A fivefold crossvalidation [24] was used to evaluate the genomic predictions in the Simmental data set. Briefly, the entire data set of 1217 Simmental cattle was randomly divided into five approximately equal subsets. Then, four subsets were used to estimate the SNP effects (i.e., training), and the remaining subset was used for testing the GPA (i.e., validation). The above process was rotated five times until each subset was used for testing once and only once. For each trait, fivefold crossvalidations were randomly duplicated 10 times, and the GPA for each trait was calculated as the average of GPAs across the ten replicates.
Computer system
The calculations were conducted on an HP ProLiant DL585 G7 (708686AA1) server, which was equipped with an AMD Opteron 6344 (2.6G Hz) CPU, 272 G of memory and an L2 cache size of 4 M and an L3 cache size of 16 M. The operating system was Microsoft Windows. A C program with Message Passing Interface (MPI) for parallel computing was developed to implement the aforementioned multiplechain MCMC. MPICH2 is an open source MPI implementation and a standard for message passing in parallel computing, and it is available freely (http://www.mpich.org/downloads). The Integrated Development Environment that we used is DevC++ 5.1, which is available freely at the following link: http://www.bloodshed.net/index.html.
Results
Simulation studies
Speedup ratios
Running multiple chains of genomic prediction models led to substantially reduced computing time compared with running a single chain (Additional file 1: Tables S1–S3 and Figures S9–S11). The speedups increased nonlinearly with the number of parallelized chains or available computer cores (Fig. 1) because of the nonparallel burnins, and perfect speedups were not practically observed when calculating these Bayesian genomic prediction models. In scenario 1, for example, the speedup obtained by TBMBayesA was 1.86 when running two parallel chains and was 13.63 when running 18 parallel chains. However, the speedup obtained by FBMBayesA increased from 1.57 when running two chains to 3.79 when running 18 parallel chains. For the results obtained with 18 parallel chains, the speedups were approximately between 3 and 6 when running fixed burnin MCMC and between 10 and 14 when running tunable burnin MCMC. More precisely, TBMBayesA had considerably greater speedups than those of FBMBayesA, and the speedups by TBMBayesA scaled better than those by FBMBayesA. Similar trends were found in the comparison of computing time between TBMBayesCπ and FBMBayesCπ (fixed burnin, multiplechain BayesCπ). Thus, tunable burnin MCMC had greater parallel computing efficiencies than fixed burnin MCMC because the use of automatic convergence diagnosis and tuning of burnins effectively shortened the computing time by TBMBayesA (or TBMBayesCπ), resulting in increased speedups in computing time with tunable burnins. We also noted that the loss of parallel computing efficiency relative to an assumedly perfect speedup increased with the model dimension, which is proportional to the number of SNPs in the genomic prediction models (Fig. 1).
Theoretically, the speedups achieved by running multiplechain MCMC are limited by the length of nonparallel parts (i.e., burnins). Frequently, the rule of thumb for the length of burnin tends to result in far more burnin iterations than are required. Thus, with automatic tuning of the convergence diagnosis on multiplechain MCMC, the burnin length can be drastically reduced, resulting in greater speedups in computing. With all other factors equal, the speedup obviously increased with a greater number of chains (or CPU cores) running in parallel (Fig. 1).
Estimated model (SNP) effects
Various computing forms of the same genomic prediction models essentially generated highly comparable estimated model effect results. Trace plots of the residual variance obtained by various computational approaches are shown in Fig. 2. Each chain mixed very well, and all were centered near zero.
The estimated SNP effects were also highly comparable among various forms of calculating the same genomic prediction models. In parallel computing, between 2 and 18 chains (with an increment of two chains) were run for each model, and the posterior mean of a SNP effect was calculated as the average of all saved posterior samples from all the chains running for that model. In sequential computing, a SNP effect was calculated as the average of all the saved posterior samples from the single chain. The results showed that the correlations of estimated SNP effects, such as in scenario 2, were greater than 0.80 between parallel MCMC and sequential MCMC and even greater than 0.90 between tunable burnin MCMC and fixed burnin MCMC (Fig. 3). For example, the correlations of estimated SNP effects were from 0.901 to 0.908 between FBMBayesA and TBMBayesA. Similar results were obtained in scenarios 1 and 3 (data not presented). Thus, the observed differences in estimated SNP effects for the same method were attributable to Monte Carlo errors, and they were essentially made trivial as soon as the MCMC chains converged to the expected stationary distributions.
GEBVs and GPAs
The GEBV of each animal was calculated as the sum of all SNP effects for that animal. The results showed that the GEBVs obtained from various forms of calculating the same genomic prediction models were almost identical and presented correlations that were greater than or close to unity (> 0.99). The GEBVs obtained from different models were also highly correlated but with some noticeable differences. These differences did not result from the use of varied computational strategies but reflected the use of different statistical models (and the underlying model assumptions).
Similarly, the GPAs were also analogous between different computational forms of the same model, regardless of the number of MCMC chains and types of burnin mechanisms, although noticeable differences were observed in the GPAs between different statistical models (Tables 1, 2 and 3). In simulation scenario 1, for example, the GPA was approximately 0.523 for the various forms of BayesA with either fixed or automatically tunable burnin periods running between 1 and 18 chains. However, the GPA obtained by various computing forms of BayesCπ varied only slightly between 0.630 and 0.632 (Table 1). Similar trends were observed in simulation scenarios 2 and 3. As a comparison, GPAs were also obtained using GBLUP; however, these values were mostly lower than the GPAs obtained from the various computing forms of BayesA and BayesCπ, although BayesA models in scenario 1 were exceptions (Table 1). Again, the slight differences in GPA among the various computing forms of the same genomic prediction models were caused by Monte Carlo errors in the simulation of posterior samples of SNP effects, whereas the differences in GPA between the various statistical models were not necessarily computational but were attributable to intrinsic differences between the methods per se.
Application in Chinese Simmental beef cattle
Convergence diagnoses
Convergence diagnoses were conducted for residual variances as well as for a randomly selected number of SNP effects. Generally, the MCMC simulations of residual variances converged quickly, which primarily occurred within the first 1000 iterations, and the posterior modes were highly comparable among the various computing forms of the same genomic prediction models. Nevertheless, certain differences were observed in the posterior modes of the residual variances between different models (e.g., between TBMBayesA and TBMBayesCπ). These results were consistent with our observations in the simulation studies, with trivial differences in the estimated SNP effects and GEBVs (and hence GPAs) among various computing forms of the same statistical models caused by Monte Carlo errors and intrinsic differences observed between different statistical models. With the estimated residual variances of the four traits used as examples, the difference was the lowest for SLW, and the posterior mode of residual variance approached 0.16 with TBMBayesA (Figure S1 in Additional file 1) and 0.18 with BayesCπ (Figure S2 in Additional file 1); the difference was largest for CBFT, and the posterior mode of the residual variance approached 0.19 (TBMBayesA; Additional file 1: Figure S3) and 0.30 (TBMBayesCπ; Additional file 1: Figure S4). Trace plots of the MCMC chains of residual variance for the remaining two traits (CW and ADG) are also provided in Additional file 1: Figures S5–S8. Trace plots of MCMC chains of selected SNP effects on SLW obtained by TMBBayesA and TMBBayesCπ are shown in Additional file 1: Figures S12 and S13, respectively. Evidently, these MCMC chains also all converged quickly within the first 1000 iterations.
Estimated heritabilities and GPAs
The estimated heritabilities for the four traits were within the range of previous reports [25, 26]. The differences may have been caused by differences in the genomic architectures of distinct breeds. In this Chinese Simmental beef population, CBFT had a smaller heritability compared with the other three traits. Consequently, the GPAs for CBFT were also lower (0.100 ~ 0.106) than those for the other three traits (0.202 ~ 0.271) (Table 4).
Nevertheless, the RGPAs were comparable among the four traits because this criterion assessed the GPAs relative to the square root of the heritability of each trait, with the latter reflecting the selection accuracy based on phenotypes and pedigree information. The RGPAs were also roughly comparable among the three models but with slight differences: TBMBayesA and TBMBayesCπ had a greater RGPA for CBFT, SLW, and CW but a lower RGPA for ADG; and TBMBayesCπ had the greatest average RGPA for the four traits calculated across the three computationalstatistical models. Again, these differences might not be based on computational differences but could be intrinsic to the differences in the data and statistical models.
Discussion
Parallel computing of Bayesian genomic prediction models: tunable burnin versus fixed burnin
Bayesian regression models are of high value for genomic prediction, although the complexity of computing of these models can be intensive [14], which is increasingly becoming the bottleneck in practical genomic selection programs. The challenges are found primarily in two aspects. First, genotype and phenotype data have been accumulating drastically in the past 10 years, and these “big data” are not managed efficiently because traditional data processing methods and tools are inadequate. Hence, highperformance computing (e.g., via parallel programming and computing strategies) is required to increase the computational efficiency and generate high computational throughputs for genomic selection. Nevertheless, the computing of Bayesian genomic prediction models is not parallelizable by the nature of the iterative algorithms, which poses the second and most likely greater challenge. Although Bayesian genomic prediction models can be calculated in parallel by running multiple MCMC chains of the same model, the speedup of computing heavily depends on the length of the burnin period, which cannot be parallelized. Often, the length of burnins is set arbitrarily and thus can be too short or too long. When too short, the Markov chains are not converged, and the generated samples do not represent those drawn from the targeted posterior distributions. When too long, running a longer burnin period after the convergence of Markov chains does not improve the accuracies of the posterior estimates of the model parameters [27, 28] but does consume more time than necessary. In the present study, we proposed a tunable, multiplechain MCMC algorithm that is capable of automatically tuning an appropriate length of burnins, depending only on the actual status of MCMC convergence of the Bayesian statistical model. Our results showed that this tunable burnin algorithm was effective and able to reduce the computing time remarkably compared with its counterpart with fixed burnins. In the present study, we used Gelman and Rubin’s convergence diagnostic method [15] to monitor the convergence state of the multiplechain MCMC method. The shrink factor was calculated using posterior samples of the residual variance and a selected number of SNP effects from multiple chains of each model. When the multiple chains reached convergence, the burnin period was terminated immediately, and the posterior samples generated afterward were collected and used for statistical inferences of the model parameters of interest, and they were subject to the frequency of thinning.
In the discussion that follows, we explain numerically how tunable burnin MCMC could achieve greater speedups than fixed burnin MCMC. Consider again the formula of the speedup ratio as in (10). Let N_{burnin} = m and N_{T} = 5 m; in FBMBayesA and FBMBayesCπ, the speedup ratios are S(5) = 2.78 and S(20) = 4.16 when 5 and 20 cores are available for computing, respectively. Nevertheless, the maximal speedup ratio is \( \underset{\mathrm{K}\to \infty }{\lim}\mathrm{S}\left(\mathrm{K}\right)=5 \), regardless of how many cores are available for computing. Using our strategy, N_{burnin} was adjusted to reduce unnecessary burnin iterations based on the convergence diagnosis. If N_{burnin} was shortened to m/2 by the convergence diagnosis, then the corresponding speedup ratios were doubled to S(5) = 3.84 and S(20) = 7.14. Furthermore, if N_{burnin} was reduced to m/4, then the speedup ratios were quadrupled to S(5) = 4.76 and S(20) = 11.11. In the simulation studies, our results showed that TBMBayesA (or TBMBayesCπ) tended to a burnin length that was half or even onefourth as long as that of FBMBayesA (or FBMBayesCπ). This result demonstrated that the automatic tunable burnin MCMC method effectively shortened the length of burnins compared with the fixedburnin MCMC; therefore, the tunable burnin MCMC led to a greater speedup and greater parallel computing efficiency.
Our results showed that the use of tunable burnin MCMC did not change the GPA as long as the MCMC chains converged. This conclusion is also supported by previous research [27, 28]. The GEBV is a critical concept in genomic prediction, and it was calculated as the sum of all SNP effects for each animal. Our results showed that the estimated SNP effects were highly analogous between the tunable burnin MCMC and fixed burnin MCMC used to calculate the same statistical model; however, the former had greater speedups than the latter. The differences were essentially caused by Monte Carlo errors. Because of such small differences in the estimated SNP effects among various forms of calculating the same genomic prediction model, the differences in the calculated GEBV for individual animals could mostly be ignored.
Genomic predictions of Chinese Simmental beef cattle: a real application
The tunable burnin MCMC and fixed burnin MCMC methods were applied to genomic predictions of four quantitative traits in a Chinese Simmental beef population, and the GPAs obtained were also compared with those of the GBLUP. In the Chinese Simmental data set, the RGPAs were roughly comparable among the three models GBLUP, TBMBayesA and TBMBayesCπ because the RGPA assesses the GPA relative to the heritability of each trait in the present study. Without adjusting for the heritability differences of these traits, our results indicated that the GPA was higher for a trait with higher heritability than for traits with lower heritability. Nevertheless, both the GPA and RGPA showed noticeable differences among the different genomic prediction models, and these differences were not necessarily based on computational factors but could be intrinsic to the varied assumptions of these models. In reality, GPAs can vary with a number of factors, such as the size of the reference populations, heritability of the trait, density of the SNP panels, level of LD, and the statistical models used for prediction [2].
In this real application, our results supported that this tunable burnin MCMC was effective and outperformed the fixed burnin MCMC regarding speedup and parallel computing efficiency. The GPAs were comparable among the various computing forms of BayesA and BayesCπ, and these two models had greater GPAs for three of the four traits than GBLUP.
Toward greater parallel computing efficiencies: what to consider?
Finally, several strategies deserve mention, such as adaptive MCMC algorithms [29, 30] and tempering [31, 32], which can further improve the convergence of multiplechain MCMC. These strategies were not investigated in the present study but are worthy of investigation in future studies for further increasing parallel computing efficiency for genomic prediction. For example, Metropoliscoupled MCMC is a method that is related to simulated tempering and tempered transitions [31, 32] and simultaneously runs several different Markov chains governed by different (yet related) Markov chain transition probabilities. Occasionally, the algorithm “swaps” values between different chains, with the probability governed by the Metropolis algorithm to preserve the stationarity of the target distribution. Possibly, these swaps can speed up the convergence of the algorithm substantially. Craiu et al. proposed an ensemble of MCMC chains using the covariance of samples across all chains to adopt the proposed covariance for a set of MetropolisHastings chains [33]. Somewhat different from these multiplechain methods, which use a synchronous exchange of samples to expedite convergence, Murray et al. mixed in an additional independent proposal that represents some hitherto best estimate or summary of the posterior and cooperative adapting across chains [34]. Therefore, a globally best estimate of the posterior is generated at any given step, and then this estimate is mixed as a remote component with whatever local proposal that a chain has adopted. This method does not preclude adaptive treatment or tempering of that local proposal but also permits a heterogeneous blend of remote proposals, thus allowing the ensemble of chains to mix well.
Conclusions
An automatically tunable burnin MCMC method for calculating Bayesian genomic prediction models was proposed and manifested using BayesA and BayesCπ models. Our results from the simulation study showed a better speedup in computing with tunable burnin MCMC than with fixed burnin MCMC. However, the estimated SNPs and GPAs were highly comparable regardless of the various forms of parallel computing when the same Bayesian genomic prediction model was used. In a Chinese Simmental beef population, the average GPAs for four quantitative traits obtained by the tunable burnin BayesA and BayesCπ models were better than those obtained by the GBLUP, and although these differences may have been caused by computational factors, they might also have been attributable to intrinsic differences in the statistical model assumptions. The proposed tunable burnin strategy for running parallel (i.e., multiplechain) MCMC can lead to dramatically increased computational efficiency and is applicable to the computing of all complex Bayesian models, either sequentially or in parallel.
Abbreviations
 ADG:

Average daily gain
 CBFT:

Carcass back fat thickness
 CW:

Carcass weight
 FBM:

Fixed burnin, multiplechain
 GBLUP:

Genomic Best Linear Unbiased Prediction
 GEBVs:

Genomic estimated breeding values
 GPA:

Genomic prediction accuracy
 h^{2}:

heritability
 MCMC:

Markov chain Monte Carlo
 MPI:

Message Passing Interface
 QTL:

Quantitative trait locus
 REML:

Restricted maximum likelihood
 RGPA:

Relative genomic prediction accuracy
 SLW:

Strip loin weight
 TBM:

Tunable burnin, multiplechain
References
 1.
Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001;157(4):1819–29.
 2.
Chen L, Vinsky M, Li C. Accuracy of predicting genomic breeding values for carcass merit traits in Angus and Charolais beef cattle. Anim Genet. 2015;46(1):55–9.
 3.
Moser G, Tier B, Crump RE, Khatkar MS, Raadsma HW. A comparison of five methods to predict genomic breeding values of dairy bulls from genomewide SNP markers. Genet Sel Evol. 2009;41:56.
 4.
Neves HH, Carvalheiro R, O’Brien AM, Utsunomiya YT, do Carmo AS, Schenkel FS, Solkner J, McEwan JC, Van Tassell CP, Cole JB, et al. Accuracy of genomic predictions in Bos indicus (Nellore) cattle. Genet Sel Evol. 2014;46:17.
 5.
de Campos CF, Lopes MS, e Silva FF, Veroneze R, Knol EF, Lopes PS, Guimarães SE. Genomic selection for boar taint compounds and carcass traits in a commercial pig population. Livest Sci. 2015;174:10–7.
 6.
Duchemin SI, Colombani C, Legarra A, Baloche G, Larroque H, Astruc JM, Barillet F, RobertGranie C, Manfredi E. Genomic selection in the French Lacaune dairy sheep breed. J Dairy Sci. 2012;95(5):2723–33.
 7.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.
 8.
Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186.
 9.
Brondum RF, Su G, Lund MS, Bowman PJ, Goddard ME, Hayes BJ. Genome position specific priors for genomic prediction. BMC Genomics. 2012;13:543.
 10.
Yang W, Tempelman RJ. A Bayesian antedependence model for whole genome prediction. Genetics. 2012;190(4):1491–501.
 11.
Zhu B, Zhu M, Jiang J, Niu H, Wang Y, Wu Y, Xu L, Chen Y, Zhang L, Gao X, et al. The impact of variable degrees of freedom and scale parameters in Bayesian methods for genomic prediction in Chinese Simmental beef cattle. PLoS One. 2016;11(5):e0154118.
 12.
Wu XL, Beissinger TM, Bauck S, Woodward B, Rosa GJ, Weigel KA, Gatti Nde L, Gianola D. A primer on highthroughput computing for genomic selection. Front Genet. 2011;2:4.
 13.
Bernstein AJ. Analysis of programs for parallel processing. IEEE Trans Electron Comput. 1966;EC15(5):757–63.
 14.
Wu XL, Sun C, Beissinger TM, Rosa GJ, Weigel KA, Gatti Nde L, Gianola D. Parallel Markov chain Monte Carlo  bridging the gap to highperformance Bayesian computation in animal breeding and genetics. Genet Sel Evol. 2012;44:29.
 15.
Gelman A, Rubin D. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457–511.
 16.
Zhang Z, Li X, Ding X, Li J, Zhang Q. GPOPSIM: a simulation tool for wholegenome genetic data. BMC Genet. 2015;16:10.
 17.
Service AM. Institutional meat purchase specifications. In: Fresh beef series 100; 2010. http://www.ams.usda.gov/AMSv1.0/getfile?dDocName=STELDEV3003281.
 18.
Purcell S, Neale B, ToddBrown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. PLINK: a tool set for wholegenome association and populationbased linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
 19.
Gianola D. Priors in wholegenome regression: the bayesian alphabet returns. Genetics. 2013;194(3):573–96.
 20.
de Los Campos G, Gianola D, Rosa GJ. Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J Anim Sci. 2009;87(6):1883–7.
 21.
Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis. New York: Chapman and Hall; 2004.
 22.
Amdahl GM. Validity of the single processor approach to achieving large scale computing capabilities. In: Proceeding AFIPS '67 (Spring) Proceedings of the April 1820, 1967, spring joint computer conference; 1967. p. 483–5.
 23.
Rolf MM, Garrick DJ, Fountain T, Ramey HR, Weaber RL, Decker JE, Pollak EJ, Schnabel RD, Taylor JF. Comparison of Bayesian models to estimate direct genomic values in multibreed commercial beef cattle. Genet Sel Evol. 2015;47:23.
 24.
Saatchi M, McClure MC, McKay SD, Rolf MM, Kim J, Decker JE, Taxis TM, Chapple RH, Ramey HR, Northcutt SL, et al. Accuracies of genomic breeding values in American Angus beef cattle using Kmeans clustering for crossvalidation. Genet Sel Evol. 2011;43:40.
 25.
Mehrban H, Lee DH, Moradi MH, IlCho C, Naserkheil M, IbanezEscriche N. Predictive performance of genomic selection methods for carcass traits in Hanwoo beef cattle: impacts of the genetic architecture. Genet Sel Evol. 2017;49(1):1.
 26.
Fernandes Junior GA, Rosa GJ, Valente BD, Carvalheiro R, Baldi F, Garcia DA, Gordo DG, Espigolan R, Takada L, Tonussi RL, et al. Genomic prediction of breeding values for carcass traits in Nellore cattle. Genet Sel Evol. 2016;48:7.
 27.
Rosenthal J. Parallel computing and Monte Carlo algorithms. Far East J Theor Stat. 2000;4:207–36.
 28.
Guo J, Jain R, Yang P, Fan R, Kwoh CK, Zheng J. Reliable and fast estimation of recombination rates by convergence diagnosis and parallel Markov chain Monte Carlo. IEEE/ACM Trans Comput Biol Bioinform. 2013;11:6372.
 29.
Gilks W, Robertson G, Sahu S. Adaptive Markov chain Monte Carlo through regeneration. J Am Stat Assoc. 1998;93:1045–54.
 30.
Andrieu C, Thoms J. A tutorial on adaptive MCMC. Stat Comput. 2008;18:343–73.
 31.
Marinari E, Parisi G. Simulated tempering: a new Monte Carlo scheme. Europhys Lett. 1992;19:451–8.
 32.
Neal R. Sampling from multimodal distributions using tempered transitions. Stat Comput. 1996;6:353–66.
 33.
Craiu R, Rosenthal J, Yang C. Learn from thy neighbor: parallelchain and regional adaptive MCMC. J Am Stat Assoc. 2009;104:1454–66.
 34.
Murray L. Distributed Markov chain Monte Carlo. In: Proceedings of neural information processing systems workshop on learning on cores, clusters and clouds: 2010. Mt. Currie South. http://lccc.eecs.berkeley.edu/Papers/dmcmc_short.pdf.
Acknowledgements
We are grateful to the editor and the two anonymous reviewers whose comments have greatly helped improve this paper.
Funding
This work was supported by the National Natural Science Foundation of China (31372294, 31201782 and 31672384), National High Technology Research and Development Program of China (863 Program 2013AA1025054), the Agricultural Science and Technology Innovation Program (ASTIPIAS03, ASTIPIASTS16 and ASTIPIASTS9), Cattle Breeding Innovative Research Team of Chinese Academy of Agricultural Sciences (cxgcias03), Beijing Natural Science Foundation (6154032).
Availability of data and materials
We confirm that all data used to generate our findings are publicly available without restriction. Data are available from the Dryad Digital Repository: http://datadryad.org/resource/doi:10.5061/dryad.4qc06.
Author information
Affiliations
Contributions
JYL, HJG, HMN and YG conceived and designed the study. PG and BZ performed statistical analyses and programming in C language. PG, LYX, XLW and EH wrote the paper. PG, ZZW and YHL participated in data analyses. HN and XG carried out quantification of phenotypic data and SNP data preprocessing. YC and LPZ participated in the design of the study and contributed to acquisition of data. All authors read, commented and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval
Animal experiments were approved by the Science Research Department of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS) (Beijing, China). Human participants, data or tissues were not used.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1: Figure S1.
Trace plots of posterior samples of residual variance from TBMBayesA (16 chains) for SW of the first 2000 iterations. Figure S2. Trace plots of posterior samples of residual variance from TBMBayesCπ (16 chains) for SW of the first 2000 iterations. Figure S3. Trace plots of posterior samples of residual variance from TBMBayesA (16 chains) for CBFT of the first 1800 iterations. Figure S4. Trace plots of posterior samples of residual variance from TBMBayesCπ (16 chains) for CBFT of the first 2500 iterations. Figure S5. Trace plots of posterior samples of residual variance from TBMBayesA (16 chains) for CW of the first 2000 iterations. Figure S6. Trace plots of posterior samples of residual variance from TBMBayesCπ (16 chains) for CW of the first 2000 iterations. Figure S7. Trace plots of posterior samples of residual variance from TBMBayesA (16 chains) for ADG of the first 1400 iterations. Figure S8. Trace plots of posterior samples of residual variance from TBMBayesCπ (16 chains) for ADG of the first 1400 iterations. Figure S9. Plot of running time in scenario1.FBMBayesA: Fixed burnin multiple chains parallel BayesA, TBMBayesA: Tunable burnin multiple chains parallel BayesA, FBMBayesCπ: Fixed burnin multiple chains parallel BayesCπ, TBMBayesCπ: Tunable burnin multiple chains parallel BayesCπ. One chain is equivalent to sequential genomic selection. Figure S10. Plot of running time in scenario 2. Figure S11. Plot of running time in scenario 3 Figure S12. Convergence of TBMBayesA for SW. Iteration: 50,000, initial Burnin:10,000, threshold:0.001. Figure S13. Convergence of TBMBayesCπ for SW. Iteration: 50,000, initial Burnin:10,000, threshold:0.001. Table S1. Running time using five genomic prediction approaches in scenario 1. Table S2. Running time using five genomic prediction approaches in scenario 2. Table S3. Running time using five genomic prediction approaches in scenario 3. (ZIP 405 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Guo, P., Zhu, B., Niu, H. et al. Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis. BMC Bioinformatics 19, 3 (2018). https://doi.org/10.1186/s1285901720033
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285901720033
Keywords
 Bayesian models
 Convergence diagnosis
 Genomic prediction
 Highperformance computing
 Tunable burnin