Two Bayesian model averaging methods that address the statistical drawbacks of BayesA and BayesB were developed for genomic prediction. These two models were termed BayesCπ and BayesDπ to emphasize that the prior probability π that a SNP has zero effect was treated as an unknown. The objectives of this study were to evaluate the ability of these methods to make inferences about the number of QTL (NQTL) of a quantitative trait by simulated and real data, and to compare accuracies of GEBVs from these new methods compared to existing methods.
Simulations
In ideal simulations, all loci were in linkage equilibrium and both SNPs and QTL were modeled. BayesCπ was able to distinguish the QTL that had non-zero effects from the SNPs that had zero effects as training data size increased and when QTL effects were normally distributed. In contrast, when QTL effects were gamma distributed many QTL remained undetected. This may have been because the gamma distribution generated fewer large effects and more small effects than the normal (Figure 1). Further, the prior of SNP effects in BayesCπ given the common effects variance was normal and not gamma; a gamma prior may produce better results and should be investigated in a subsequent study. In conclusion, even in this ideal case the estimate of
obtained from BayesCπ is a poor indicator for NQTL, unless the QTL distribution is normal. BayesDπ was insensitive to NQTL and inappropriate to estimate NQTL.
In realistic simulations, SNPs and QTL were in LD and only the SNP genotypes were known. As expected, BayesCπ fitted more SNPs than there were QTL, because every QTL was in LD with several SNPs. However, the number of SNPs fitted per QTL depended on both training data size and effect size of a QTL, which was varied here by the distribution of QTL effects, h2 and NQTL; the size of simulated QTL effects increased with h2 and decreased with NQTL. If QTL effects were generally large and easy to detect (Table 4, h2 = 0.9), NSNP was small with 1,000 training individuals and increased with training data size. In addition, the larger a QTL effect, the more SNPs were fitted per QTL (Table 4, h2 = 0.9, 10 vs. 20 NQTL). The cause for these findings may be that SNPs in low LD with the QTL were more likely to be fitted as either QTL effect size or training data size increased. The increase in NSNP with training data size could also have been the result of detecting QTL with smaller effects. If QTL effects were smaller and less easy to detect (Table 4, h2 = 0.2), NSNP was larger with 1,000 training individuals, which may be explained by false positive SNPs in the model, because the power of detection was likely to be low. In contrast to h2 = 0.9, NSNP decreased substantially with training data size. However, the fact that NSNP increased with training data size for h2 = 0.03, normally distributed QTL effects, and a starting value of π = 0.5 (
small) points to another explanation why many SNPs were fitted with small QTL effect size or small training data size: a higher number of SNPs explains differences between training individuals better than a smaller number, and thus more SNPs were required to explain those differences as training data size increased. In conclusion, BayesCπ overestimates NQTL, the extent depending on the size of QTL effects, which makes inference difficult. However, information about NQTL can be gained by analyzing the trend of NSNP with training data size, and starting with different π values. Furthermore, as SNP density increases in the future, overestimation of NQTL is expected to be smaller, because LD between SNPs and QTL will be higher such that fewer SNPs are modeled per QTL. Sufficiently high SNP density guarantees near perfect LD between at least one SNPs and each QTL in which case the scenario of the ideal simulations will be approached.
Real data analysis
Number of QTL and size of QTL effects
In agreement with the realistic simulations, estimates of π from BayesDπ were insensitive to both trait and training data size (Table 6). BayesCπ, in contrast, showed clear differences for both: NSNP increased with training data size for milk and fat yield, and decreased for protein yield and somatic cell score. Thus, milk and fat yield may have more QTL with large effects than protein yield and somatic cell score, which can be derived from the trends of NSNP in the realistic simulations. This is also supported by the accuracies of GEBVs where milk and fat yield had a higher accuracy than protein yield and somatic cell score. Furthermore, fat yield may have more QTL with large effect than milk yield, because both the increase of NSNP from 1,000 to 4,000 training bulls and accuracy of GEBVs was higher for fat yield.
The number of SNPs in the model estimated by BayesCπ may primarily result from the QTL with the largest effects, assuming that QTL with small effects were not detectable. The rather low accuracies of GEBVs and especially the low increase in accuracy from 4,000 to 6,500 bulls may also point to this conclusion, because many more training individuals seem to be necessary to estimate small QTL effects. Another reason may be that LD between SNPs and QTL was still too low, but this will change as SNP density increases; QTL with large effects will be estimated with fewer SNPs and additional QTL with smaller effect will be detected.
As mentioned earlier, a possible overestimation of NQTL results from the fact that several SNPs are in LD with a QTL, where each of these SNPs explains a part of the QTL effect. These SNPs are likely to surround the QTL on the chromosome, and thus NQTL can be estimated more precisely by calculating the variance of GEBVs explained by the effects of all SNPs in a specified chromosomal region. This can be done by defining a window containing a certain number of consecutive SNPs that are used to calculate this variance. By sliding the window over the chromosome and observing peaks that are higher than for single SNPs, NQTL may be inferred better. This can be done with all methods that estimate SNP effects.
Comparison of the accuracy of GEBVs
North American Holstein bulls were partitioned into training and validation data sets such that bulls of both data sets were as unrelated as possible. As a result, the contribution of additive-genetic relationships to the accuracy of GEBVs was negligible for fat yield, protein yield and somatic cell score as demonstrated by the low accuracy of P-BLUP. However, that accuracy was unexpectedly high for milk yield, which might be an artifact of previous selection for milk yield because genotypes in the validation data set were only available from selected parents. Accuracies of GEBVs were similar for the different methods, and no one outperformed all the others across all traits or training data sizes. Nevertheless, BayesA performed remarkably well for this SNP density despite the statistical drawback of BayesA as described by [11]. However, as demonstrated in [11], it is important that the degrees of freedom used for the scaled inverse chi-square prior of the locus-specific variances express little prior belief. BayesA always fits all SNPs, hence the shrinkage of SNP effects results completely from the locus-specific variances, and, in contrast to the other methods, SNP effects are not fully shrunk to zero. Thus, even SNPs that truly have zero effects are expected to have small estimated effects adding noise to the GEBVs. This applies also to G-BLUP, which is equivalent to ridge-regression fitting all SNPs with equal variance. This did not seem to affect the accuracy of GEBVs here, but in the simulations of [1] BayesB performed better than BayesA and ridge-regression. The explanation may be that the traits analyzed here are determined by many more QTL than in those simulations. Thus, BayesA may be inferior to BayesCπ and BayesDπ for traits that are determined by only a few QTL and when many more SNPs effects are modeled as SNP density increases. Applying BayesA to the data sets of the realistic simulations with only 10 QTL confirmed its inferiority to BayesCπ and BayesDπ.
Treating π as known with a high value as in BayesB may be a poor choice. This agrees with Daetwyler et al. [23] who reported that G-BLUP outperformed BayesC with a fixed π when the number of simulated QTL was large. This can be explained partly by the fact that [23] considered the GEBV accuracy of the offspring of training individuals, meaning that genetic-relationships were important; these were captured better by the SNPs, when more SNPs were fitted as in G-BLUP [7]. Note further that BayesC with π = 0 is similar to G-BLUP. Consider ridge-regression as the equivalent model of G-BLUP to see this similarity. Both methods are equivalent either 1) if the single effect variance of BayesC is treated as known, 2) if ν
a
is very large and
equals to the single effect variance of ridge regression, or 3) if the single effect variance of ridge regression is treated as unknown with own scaled inverse chi-square prior. Thus the lower accuracy for BayesC in that study results most likely from treating π as known. Another reason may be that the scale parameter of the inverse chi-square prior for the common effect variance in [23] did not depend on the additive-genetic variance nor on the fixed π value as proposed by [1].
The finding that BayesCπ and BayesDπ give similar accuracies but different π values reveals that the two methods have different mechanisms for shrinking SNP effects. BayesDπ primarily used the locus-specific variances, whereas BayesCπ was only able to vary the shrinkage at different SNPs by using δ
k
; if a SNP is not fitted to the model the effect is shrunk completely to zero, otherwise they are all shrunk using the same ratio of residual to common effect variance. In principle, BayesDπ is expected to be more flexible in shrinking SNP effects because it could use both locus-specific variances and δ
k
for this purpose. The poor mixing of π in BayesDπ indicates that locus-specific variances dominated over δ
k
, which may explain why π is not an indicator for NQTL.
Effect of training data size on Bayesian model averaging
Another insight into the mechanisms of Bayesian model averaging comes from the large increase in accuracy of GEBVs with training data size obtained by BayesB for milk yield. The parameter π was treated as known with value 0.99 resulting in about 400 SNPs fitted in each iteration of the MCMC algorithm for both 1,000 and 4,000 training bulls (Table 6). This indicates that π is a strong prior for δ
k
= 0. Therefore, setting π = 0.99 is analogous to searching for models that fit about 400 SNPs in each iteration of the algorithm and to average them. These models change from one iteration to another as some SNPs are removed from the model, while others are included. This interchange of SNPs, however, is expected to be more frequent with a small training data size, because the power to detect significant SNPs is low. On the other hand, if the training data size is large, fewer SNPs are interchanged less often so that models differ less from one iteration to the other. This becomes most apparent in the increasing number of SNPs having moderate to high model frequency as training data size increased from 1,000 to 4,000 bulls as shown in Figure 2. The implication is that the effects of those SNPs were less shrunk with larger training data size, whereas effects of all other SNPs were shrunk more.
Comparison of GEBV accuracy with other studies
Accuracies of GEBVs reported by [24] and [2] for the North American and Australian Holstein populations, respectively, are not comparable to the accuracies found here. Accuracies for the milk production traits were higher in those studies, because validation bulls were closely related to those comprising the training data as demonstrated by [9]. In that study, accuracy of GEBVs due to LD was estimated from 1,048 and 2,096 German Holstein bulls using BayesB with π = 0.99. Most of those bulls were born between 1998 and 2004, and 60% were offspring of North American Holstein bulls revealing the high genetic relationships between the German and the North American Holstein population. The strategy used to estimate the accuracy due to LD was a regression approach based on pairs of training and validation data sets with different additive-genetic relationships between the bulls of both data sets. That strategy is very time-consuming when several methods must be compared, and therefore a different approach was chosen here. GEBV accuracies obtained by BayesB compared to those in [9] were similar for milk yield, comparable for fat yield when the training data size was greater than 1,000, but lower for protein yield and somatic cell score. The increase of accuracy with training data size tended to be higher in [9]. Moreover, in contrast to the present study, G-BLUP was inferior in [9]. The difficulties in comparing the accuracies found here to those in [9], apart from the standard errors, is that there might be genotype-environment interactions, because the environment in which the daughters of the bulls born before 1975 have been tested might be different from the environment of the last decade relevant to the daughters of the training bulls. In addition, selection and genetic drift may have changed the LD structure in the population so that the accuracies of this study may not represent the GEBV accuracies due to LD in the current population.
Computing time
Computing time, which may become more important as SNP density increases, is an advantage of BayesCπ, because its Gibbs algorithm is faster than the Metropolis-Hastings algorithm of the other methods. The reason is that the MH step for sampling the locus-specific variances in this implementation of BayesA, BayesB and BayesDπ is repeated in each iteration to improve mixing; the Gibbs step for fitting a SNP in BayesCπ is only performed once. Furthermore, computing time depends largely on the number of SNPs fitted in each iteration, because the following two computation steps are the most demanding ones in the algorithm: The phenotypes have to be unadjusted for the genotypic effects of a SNP if that SNP was fitted in the previous iteration; similarly, if a new SNP effect was sampled in the current iteration, the phenotypes have to be adjusted for the genotypic effects of that SNP. BayesCπ was more sensitive to both the genetic architecture of a trait and training data size than BayesDπ, and thus computing time was shorter for BayesCπ. In this implementation, BayesA always had the longest computing time because all SNPs were fitted. For example, using 1,000 training bulls for milk yield and a 2.4 GHz AMD 280 Opteron processor, computing time for 100,000 iterations was 10.3, 14.1, 18 and 21.3 hr for BayesCπ, BayesB, BayesDπ and BayesA, respectively.