In this section, we describe Bayesian models for multitrait GBV prediction and computational procedures for Bayesian estimation of the model parameters, including construction of posterior distributions of the parameters using MCMC iteration and variational approximation. Here, we consider the statistical models for BSR and BSSVS, which are shown to be equivalent to BayesD and similar to BayesDπ, respectively. In these Bayesian models concerned, BSR is regarded as a special version of BSSVS by setting π =0, where π is a prior probability that a SNP does not contribute to traits; thus, we present the multitrait GBV prediction models in a unified fashion in terms of BSSVS.
Models for Bayesian stochastic search variable selection in multitrait genomic selection
We propose the following Bayesian multilocus linear model for the phenotypes of
T traits of the
ith individual, which are denoted as a
Tx1 vector
y
_{
i
} (
i = 1,2,…,
n), as a BSSVS model for multitrait GBV prediction:
where b is a f × 1 vector of nongenetic effects including interceptions of the model with X
_{
i
} being a T × f design matrix linking b to the ith individual; u
_{
il
} is a variable indicating the genotype of the ith individual at the lth SNP taking a value of −1, 0 or 1 corresponding to the genotypes, ‘0_0’, ‘0_1’, or ‘1_1’, respectively; g
_{
l
} is a T × 1 vector of the effects of the lth SNP on the phenotypes of T traits; γ
_{
l
} is a variable indicating the inclusion of the lth SNP in the model with 1 or 0 depending on whether or not the lth SNP is included in the model; and e
_{
i
} is a T × 1 vector of residual errors following a Tvariate normal distribution N(0, Σ
_{
e
}) with 0 being a T × 1 vector of zeros and Σ
_{
e
} being a TxT covariance matrix of e
_{
i
}.
Within the Bayesian framework, prior distributions are assigned to the parameters of the model (1). We assume that the priors of the elements of
b are the improper uniform distribution over the possible values. The prior probabilities that
γ
_{
l
} =1 and
γ
_{
l
} =0 are presented as 1
π and
π, respectively, considering the prior probability that a SNP does not contribute to the trait and is excluded from the model is
π. The prior distribution of
g
_{
l
} given
γ
_{
l
} has the form
where
Σ
_{
gl
} is a
T × T matrix that is the variance and covariance matrix of
g
_{
l
} given
γ
_{
l
} =1 and
δ (
0) is the delta function that concentrates a total mass at zero for all
T elements of
g
_{
l
}. We induce a hierarchical structure for
Σ
_{
gl
} by assigning an inverse Wishart distribution with degree of freedom
ν and scale parameter
S, denoted by IW
_{
T
}(
ν,
S), to the prior distribution of
Σ
_{
gl
} :
Although the SNP effect g
_{
l
} is zero and irrelevant to Σ
_{
gl
} when γ
_{
l
} =0 as shown in (2), we assume that Σ
_{
gl
} is a priori distributed as given in (3) regardless of γ
_{
l
}. We treat ν as a given fixed value but infer S from the data within the Bayesian framework. For simplicity, we assume here that S is a diagonal matrix with the kth diagonal element being s
_{
k
} , that is S = diag(s
_{
k
}) (k = 1,2,…,T), and we adopt the improper uniform distribution over positive values for a prior distribution of s
_{
k
}. The residual variance and covariance matrix Σ
_{
e
} is assumed to have a uniform distribution over the positive definite matrices as a prior distribution.
Denoting these parameters of the Bayesian model collectively by
θ, the prior distribution of
θ by
p(
θ) and observed phenotypes
y
_{
i
} and SNP genotypes
u
_{
il
} (
i = 1,2,…,
n;
l = 1,2,…,
N) by
Y and
U, respectively, we can write the posterior distribution of
θ,
g(
θ
ν,
Y,
U), as
from (1), (2) and (3), where y
_{
i
}* is a residual given by
Given that S is fixed, posterior distribution (4) is equivalent to that of BayesA extended to multitrait case when π = 0 leading to γ
_{
l
} =1 for all SNPs. When π > 0, the selection of SNPs to be included in the model is performed as in BayesB. However, it should be noted that posterior distribution (4) is not equivalent to that of the multitrait version of BayesB, which supposes that Σ
_{
gl
} is a zero matrix as well as g
_{
l
} when γ
_{
l
} =0, while, in (4), the prior distribution of Σ
_{
gl
} is assumed to be IW_{
T
}(ν, S) regardless of γ
_{
l
}. The form of posterior distribution (4) allows Gibbs sampling in the MCMC estimation for all parameters including g
_{
l
} and Σ
_{
gl
}. The full conditional posterior distributions of parameters used in MCBayes are described in Additional file
1.
Variational approximation procedure for multitrait Bayesian model
We adopted variational approximation as an alternative to MCMC iteration for constructing marginal posterior distributions of parameters based on joint posterior distribution (4). In the variational approximation procedure, the joint posterior distribution is approximated by the product of functions for subsets of parameters with lower dimension. Briefly, we assume that
g(
θ
ν,
Y,
U) is approximated by a function of
θ,
q(
θ), which is factorized as
q(
θ) =
q
_{1}(
θ
_{1})
q
_{2}(
θ
_{2})…
q
_{
K
}(
θ
_{
K
}), where
θ
_{1},
θ
_{2},… ,
θ
_{
K
} are mutually exclusive subsets of
θ such that ∪
_{
k = 1}
^{
K
}
θ
_{
k
} =
θ and
q
_{
k
} (
θ
_{
k
}) (
k = 1,2,…,
K) may generally depend on
ν
Y and
U although this dependence is omitted here for simplicity. This approximating marginal posterior distribution,
q
_{
k
}(
θ
_{
k
}), is referred to as the variational posterior of
θ
_{
k
}[
19]. The form of
q
_{
k
}(
θ
_{
k
}) is determined such that the Kullback–Leibler divergence between g(
θ
ν
Y,
U) and
q(
θ
q
_{1}(
θ
_{1})
q
_{2}(
θ
_{2})…
q
_{
K
}(
θ
_{
K
}),
D(
q
g), is minimized
[
20], where
D(
q
g) is defined as
It can be shown that
q
_{
k
}(
θ
_{
k
}) is expressed as
where C is a normalized constant and E_{

}
_{
θ
}
_{
k
}[.] indicates the expectations of parameters other than θ
_{
k
} that are calculated with respect to every other parameter’s variational posteriors except q
_{
k
}(θ
_{
k
})
[21].
In the varBayes method that applies the variational approximation procedure to the multitrait Bayesian model considered here, g(
θ
ν,
Y,
U) is assumed to be approximated by a factorized function
q(
θ) that is written as
where we denote all of the variational posteriors for the different parameters in the righthand side by q(.) for simplicity with the understanding that q(b), q (Σ
_{
e
}) and so on take different forms depending on the parameters. The forms of these variational posteriors are derived from (5) in a manner similar to that used for derivation of the full conditional posteriors for the parameters in Gibbs sampling and the computational details are given in Additional file
2. In the following, we will give the variational posteriors for parameters, γ
_{
l
}, g
_{
l
}, Σ
_{
gl
} (l = 1,2,…,N), b, Σ
_{
e
} and S.
The variational posterior for
b,
q(
b),is a multivariate normal density with mean
and variancecovariance matrix
where
E(.) indicates the expectation calculated with respect to the variational posteriors of relevant parameters, while
q (
Σ
_{
e
}) is IW
_{
T
} (
nT1,
S
_{
e
}) with
The variational posterior of
Σ
_{
gl
} is represented by IW
_{
T
} (
ν
_{
gl
},
S
_{
gl
}), where
The expectation of
Σ
_{
gl
}
^{1} with respect to this posterior distribution is
For
γ
_{
l
} and
g
_{
l
}, we consider joint distribution for variational posterior
q(
γ
_{
l
},
g
_{
l
}) that is expressed as
where
is a density function of a multivariate normal distribution
with mean
and variancecovariance matrix
The marginal posterior distribution of
γ
_{
l
} is a binomial distribution with probabilities of
γ
_{
l
} =1 and 0 given by
The conditional distribution of
g
_{
l
} given
γ
_{
l
} is given by
From (4), the variational posterior of
S is a
Tvariate Wishart distribution with a scale matrix
Σ
_{
S
} and degree of freedom
ν
_{
s
}, W
_{
T
}(
ν
_{
s
},
Σ
_{
S
}), where
and
ν
_{
s
} =
Nν +
T + 1, and the expectation of
S is expressed as
As outlined above, a wellknown probability distribution, such as normal, inverse Wishart and so on, is assigned to the variational posterior of each parameter, which is characterized by the expectations of the functions of other parameters, taken with respect to their variational posteriors. The relationships between these expectations are given by (6), (8), (9), (13), (14), (15) and (16), from which the expectations can be calculated with numerical iterations to obtain the variational posteriors.
Moreover, the prior probability for a SNP to have zero effect,
π, can be treated as a variable parameter to be inferred from the data as in BayesC
π and BayesD
π when 0 <
π < 1. Here, we assume a uniform prior on 0 <
π < 1 to obtain a Beta distribution, B(
a,
b), as a variational posterior from (4), where
and
, with the expectation
Accordingly, the variational posteriors of the other parameters are modified with π substituted by E(π) when π is involved in the Bayesian inference.
Simulation experiments
We simulated datasets to evaluate the accuracy of the predicted GBVs using the proposed Bayesian methods, MCBayes and varBayes, for multiple traits. In generating the datasets, three traits, denoted as A, B and C, were considered. The simulation of population and genome was carried out following
[3] where a single trait was treated, while multiple traits were generated through a modified approach. The simulated population, genotypes and phenotypes are described in the following.
Populations with an effective population size of 100 were maintained by random mating for 1000 generations to attain mutation drift balance and LD between SNPs and QTLs. In generation 1001 and 1002, the population size was increased to 1000. The population in the 1001st generation was treated as a training population, where the phenotypes of three traits and SNP genotypes of the individuals were simulated and analyzed to estimate the SNP effects in the model. The phenotype of each trait for each individual in the 1001st generation was given as the sum of QTL effects over the polymorphic QTLs and environmental effects, which were sampled as described later. For simplicity, no other fixed effects were assumed. The population in the 1002nd generation was used as a test population, where the individuals were only genotyped for SNP markers without phenotypic records and GBVs of three traits were predicted for each individual using a model with SNP effects estimated based on the training population in the 1001st generation. The true breeding value (TBV) of the individual in the 1002nd generation was also simulated as the sum of QTL effects corresponding to the QTL genotype for each trait and used for evaluating the accuracy of predicted GBV, but was regarded as unknown and unavailable in the estimation of SNP effects in the models. Accuracy was measured based on the correlation between the TBV and predicted GBV, r
_{TBV,pGBV}, for each trait and regression of TBV on predicted GBV, b
_{TBV,pGBV}, was also obtained for assessing the bias of the prediction.
The genome was assumed to consist of 10 chromosomes each 100cM in length. Two scenarios were considered for the number of available SNP markers and the datasets under these two scenarios were denoted as Data I and Data II. In Data I, 101 marker loci were located every 1cM on each chromosome for a total of 1010 markers on a genome. In Data II, 1010 equidistant marker loci were located on each chromosome for a total of 10100 markers. We assumed that 100 equidistant QTLs were located on each chromosome such that a QTL was in the middle between two marker loci in both Data I and Data II. Therefore, there were a total of 1000 QTLs located on a whole genome. The mutation rates assumed per locus per meiosis were 2.5 × 10^{3} and 5.0 × 10^{5} for the marker locus and QTL, respectively. At least one mutation occurred in the most of the marker loci with a high mutation rate during the simulated generations. In the marker loci experiencing more than one mutation, the mutation remaining at the highest minor allele frequency (MAF) was regarded as visible, whereas the others were ignored, which resulted in the marker loci having two alleles similar to SNP markers. Although the mutation rate for QTL was assumed 2.5 × 10^{5} in the simulation for a single trait conducted in
[3], we here doubled it for generating TBVs of multiple traits for the reason as described below.
The polymorphic QTLs at which mutation occurred were used to simulate the three traits, A, B and C, the heritabilities of which, denoted by h
_{A}
^{2}, h
_{B}
^{2} and h
_{C}
^{2}, respectively, were assumed to be h
_{A}
^{2} = 0.8, h
_{B}
^{2} = 0.1 and h
_{C}
^{2} = 0.1. We divided all polymorphic QTLs into four groups, Group1, Group2, Group3 and Group4, according to the causal relationship with traits, where Group1 was a group of QTLs affecting both traits A and B, Group2 was a group of QTLs affecting only trait A, Group3 was a group of QTLs affecting only trait B and Group4 was a group of QTLs affecting only trait C. Therefore, it was assumed that QTLs of Group1 had pleiotropic effects on traits A and B while QTLs of other groups affected only a single trait. Each polymorphic QTL was randomly assigned to one of these groups, that is, to Group1, Group2, Group3 and Group4 with respective probabilities 0.4, 0.1, 0.1 and 0.4. Therefore, 80% of the QTL loci affecting trait A were shared by trait B on average and whereas all the QTL loci affecting trait C influence neither traits A nor B. In this setting of multitrait case, the mutation rate of QTL was increased to 5.0 × 10^{5} from 2.5 × 10^{5}, the mutation rate adopted in
[3] for generating a single trait, to retain the number of QTL per trait.
The pleiotropic effects of each QTL in Group1 were assumed to be correlated between traits A and B with correlation coefficient of 0.9. Consequently, genetic correlations between traits A and B, between A and C and between B and C, denoted as ρ
_{GAB}, ρ
_{GAC} and ρ
_{GBC}, respectively, were ρ
_{GAB} = 0.72 and ρ
_{GAC} = ρ
_{GBC} = 0 on average although the values of correlation coefficients were somewhat fluctuated in each data generation. This setting of traits was adopted to investigate how the prediction accuracy was increased for a lowheritability trait (trait B) by simultaneous analysis for multiple traits including a correlated highheritability trait (trait A).
The effects of QTL alleles were sampled from gamma distributions independently for each QTL. Pleiotropic effects of QTL alleles in Group1 were determined for traits A and B by generating two correlated gamma random variables, x and y, with the correlation coefficient of 0.9, and assigning them with positive or negative values with equal probabilities. The effects of QTL alleles in the other groups, which affected each single trait only, was determined by sampling a gamma random variable z and by similarly assigning it with the positive or negative sign. We generated these three random variables x, y and z such that their marginal distributions were the same gamma distribution with scale parameter 0.4 and shape parameter 1.66, Gamma(0.4,1.66). For obtaining correlated variables x and y, we generated three independent gamma variables, x
_{1}, x
_{2} and x
_{3}, which were sampled from Gamma(0.36,1.66), Gamma(0.04,1.66) and Gamma(0.04,1.66), respectively, and determined the values of x and y as x = x
_{1} + x
_{2} and y = x
_{1} + x
_{3}. It can be shown that x and y had a correlation coefficient of 0.9 and the same marginal distribution Gamma(0.4,1.66)
[23].
The environmental correlation coefficients between three traits were denoted by
ρ
_{EAB},
ρ
_{EAC} and
ρ
_{EBC}, respectively, and assumed as
ρ
_{EAB} = 0.1,
ρ
_{EAC} = 0.2 and
ρ
_{EBC} = 0.3. The environmental effects were sampled from a trivariate normal distribution with a mean vector
0 and a variancecovariance matrix
R
_{E}, where
with σ
_{EA}
^{2}, σ
_{EB}
^{2} and σ
_{EC}
^{2} indicating environmental variances of three traits. Environmental variance of trait A was given by σ
_{EA}
^{2} = (1/h
_{A}
^{2}1)σ
_{GA}
^{2} with its heritability h
_{A}
^{2} and genetic variance σ
_{GA}
^{2}, which was variance of TBV of trait A, and those of traits B and C were obtained similarly. The environmental effects were added to TBVs which were given by the sum of QTL effects to determine phenotypic values of three traits for individuals in the 1001st generation.
In Data I, 100 replicated datasets were simulated while Data II consisted of 20 replicated datasets due to the larger number of SNPs. Each of replicated datasets included records of phenotypes of three traits and genotypes of SNPs for the training population (1001st generation) and only SNP genotypes for the test population (1002nd generation). To simulate the situation of missing phenotypes, we generated additional datasets by deleting the phenotypic records of some traits for some individuals in the 100 replicated training datasets of Data I. These 100 replicated datasets were referred to as Data III, where the phenotypic records of traits A, B and C were respectively deleted for individuals of i = 801–1000, individuals of i = 1500 and individuals of i = 201700 in 1000 individuals (i = 11000) of the 1001st generation of Data I. Therefore, in Data III, the prediction model for GBVs was constructed with a training dataset consisting of the phenotypes of 800, 500 and 500 individuals for traits A, B and C, respectively, where only 100 individuals (i = 701800) had phenotypic records of all three traits, and 1000 individuals in the 1002nd generation were predicted for GBV based on the genotypes of 1010 markers. The setting of nonzero environmental correlations, i.e., ρ
_{EAB} = 0.1, ρ
_{EAC} = 0.2 and ρ
_{EBC} = 0.3, was adopted here to assess the benefit from the estimation process of missing phenotypes implemented in multitrait analyses for the prediction accuracy, where the information of environmental covariance between observed and missing phenotypes was utilized as in (18).
Each replicated dataset in Data I, Data II and Data III was analyzed using the proposed methods for multiple traits, MCBayes and varBayes, to construct the GBV prediction model in the1001st generation and investigate r
_{TBV,pGBV} and b
_{TBV, pGBV} for each trait in the 1002nd generation. For a comparison with conventional singletrait GBV prediction, the same datasets were also analyzed with the singletrait setting of MCBayes where three traits were separately treated without considering the correlation structure between traits.
We conducted crossvalidation as well to evaluate the prediction performance within a population in the 1001st generation without the 1002nd generation since the techniques of crossvalidation have commonly been used for evaluation of the accuracy in the studies of genomic selection with the actual datasets of animals
[24–26] and plants
[27–29], where the prediction accuracy of the model for the unobserved future samples was of concern. We applied 10fold crossvalidation. In brief, we randomly split a population of 1000 individuals in the 1001st generation into ten subpopulations with each size 100. By using a single subpopulation used as a test set and the remaining nine subpopulations as a training set to construct prediction model, the prediction accuracy and bias were assessed for the test set. This process was repeated ten times until all single subpopulations were used as test sets exactly once. We averaged the prediction accuracy and bias over ten repetitions to evaluate the prediction performance in each dataset. Because of much computational burden with MCMC analysis for repeated model constructions in crossvalidation, evaluation with 10fold crossvalidation was carried out for Data I only, where mean of the prediction accuracies and biases in 100 datasets were obtained as r
_{TBV,pGBV} and b
_{TBV, pGBV} for each trait and for each prediction method. In the process of crossvalidation, we also investigated the correlation between predicted GBV and the phenotypic value, r
_{y,pGBV}, in place of TBV, and regression of predicted GBV on phenotypic value, b
_{y, pGBV}, for each trait.
Two settings for the prior probability that a SNP has zero effect, π, were adopted in model construction, i.e., π was fixed at 0 or π was variable over the range 0 < π <1 and inferred from the data. The values of hyperparameter ν, degree of freedom of prior inverse Wishart distribution of Σ
_{
gl
}, were set to 5.0 and 3.2 corresponding to π =0 and 0 < π <1 after preliminary analyses to evaluate the effects of the values of ν on prediction accuracy over 3 < ν <6 although the accuracy was little affected by the values of ν.
In the MCMC iteration of MCBayes, we repeated 11000 cycles including a burnin period of the first 1000 cycles. The values of parameters were sampled every 10 cycles to obtain the posterior means that were used to determine a prediction model for each generated dataset. In the method of varBayes, we adopted the criterion
< 10^{8} for convergence in numerical iteration for computing the expectations of parameters, where
and
are the current and previous value of the expectations of parameters. When this criterion was satisfied, the computational procedure with variational approximation was regarded as converged.