Method
The gamavar.90 program estimates the (co)variance of all possible gametic values that can be generated from an individual genome and meiosis process using data on phased genotype, allelic substitution effect, and recombination rate between variants. Since only the heterozygous loci of an individual will contribute to \( {\upsigma}_{\mathrm{gamete}}^2 \), the variance of two biallelic loci, j and k, of an individual i, with the true allele substitution effect αj and αk, can be calculated from the variance of a binomial distribution as \( {\upsigma}_{\left[\mathrm{j}+\mathrm{k}\right]}^2={\upsigma}_{\mathrm{j}}^2+{\upsigma}_{\mathrm{k}}^2+2{\upsigma}_{\mathrm{j}\mathrm{k}} \), where \( {\upsigma}_{\mathrm{j}}^2= npq{\alpha}_j^2,{\upsigma}_{\mathrm{k}}^2= npq{\alpha}_k^2 \), σij = n(pjk − pjpk)αjαk, and p = q = 0.5 and n = 1. Thus, the total variance is computed across all N heterozygous loci for trait x as \( {\upsigma}_{\mathrm{Xgamete}}^2=\left[{\hat{\alpha}}_{x_1}\kern0.5em \dots \kern0.5em {\hat{\alpha}}_{x_N}\right]P\left[{\hat{\alpha}}_{x_1}\kern0.5em \dots \kern0.5em {\hat{\alpha}}_{x_n}\right]^{\prime } \) and the covariance between the traits x and y can be computed using the same matrix P (as in Santos et al. [2]), and the allele substitution effect of the two traits as in (Bonk et al. [4]), as \( {\sigma}_{XYgamete}=\left[{\hat{\alpha}}_{x_1}\kern0.5em \dots \kern0.5em {\hat{\alpha}}_{x_N}\right]P\left[\begin{array}{ccc}{\hat{\alpha}}_{y1}& \dots & {\hat{\alpha}}_{y_N}\end{array}\right]^{\prime } \), where \( \hat{\alpha} \) is the allele substitution effect estimated with genomic model. The (co)variance matrix of the Mendelian transmission probabilities, P, with only the heterozygous loci can be represented as \( P=\left[\begin{array}{ccc}0.25& \dots & {al}_{1N}\left(-\frac{cM_{1N}}{200}+0.25\right)\\ {}\vdots & \ddots & \vdots \\ {}{al}_{N1}\left(-\frac{cM_{N1}}{200}+0.25\right)& \dots & 0.25\end{array}\right] \), where aljk is a phase indicator for loci j and k, with value 1 when both loci have the reference allele on the same chromosome and − 1 otherwise; cMjk is the genetic distance between the 2 loci (in centimorgans). Loci with genetic distances greater than 50 cM on the same chromosome, are assumed to be independent. If the recombination rates between the SNP markers are directly used instead of cM, the off-diagonal elements of the P matrix will be \( {P}_{jk}={al}_{jk}\left(-\frac{rate_{jk}}{2}+0.25\right) \) when the recombination rate is < 0.5; and Pjk = 0 when the rate is ≥0.5.
The gamevar.f90 software also calculates the chromosome-level statistic HOM = \( \sum \limits_i^{NHom}{\alpha}_i^2 \) (sum of squared effects of the homozygous loci from an individual) and coefficient of relative variation (CRV), \( {CRV}_i=\frac{\sigma_{gamete}}{\sqrt{0.5{\sum}_i^{NHom}{\alpha}_i^2+{\upsigma}_{\mathrm{gamete}}^2}} \), as described by Santos et al. [2]. The statistics \( {\upsigma}_{\mathrm{gamete}}^2 \) and CRV include all chromosomes used in the calculation of genomic breeding values. Gamevar.f90 calculates these statistics for each of the chromosomes separately. Math for the sex chromosomes could differ by sex of parent and progeny but we treated all chromosomes as autosomes. The total statistics can be obtained as a simple total across the chromosomes. Details on these variability statistics and algorithms have been described in Santos et al. [2].
The gamevar.f90 program directly uses allele effects of the markers estimated from existing genomic evaluations. Since the allele effects have been estimated, gamevar.f90 can also calculate the genomic breeding values (it computes by chromosomes) according to Meuwissen et al. [7] as M[α1…αN]′, where M is a matrix of genotypes coded in − 1,0 and 1 for aa, Aa and AA, with rows corresponding to individuals and column to markers.
Input and output files
A parameter file is required to run gamevar.f90. The parameter file provides some user-specified options, including file names. The program automatically performs an initial check of the parameters from the input file, such as the options defined by users, initial data descriptions, warnings, stoppings, cases of incorrect inputs, and output messages. Parameters are annotated in more details in the user’s manual (Additional File 1; https://github.com/djordand2008/gamevar.f90). Gamevar.f90 also requires some pre-processed files as input, such as allelic substitution effects and phased genotypes, as well as the chromosome information with recombination rate/genetic distance between markers. The program can optionally produce up to five easily-handled output files in text format for the (co)variance of gametic diversity, EBV, CRV and HOM by individuals. To reduce memory required by the program, output files are written during the analyses so that memory can be reused. In additional to the manual, ready-to-run example files are also provided in the package.
Efficiency
The software is written in Fortran with the intrinsic library (Additional File 2). Executable files are currently available for the Linux platform (Additional File 3). It is free software with open-access code that is portable to other operating systems for compiling. The standard compilers for Fortran 90 and 95, such as gfortran, are recommended for use. In an example run, the computing time for analyzing eight traits (lifetime net merit, productive live, somatic cell score, daughter pregnancy rate, heifer conception rate, cow conception rate, livability, and early calving) with 4340 Markers on chromosome 1 and 100 bulls was around 4 to 5 min or less than 3 s per individual on an Intel Xeon X7560 server, running at 2.27GHz with 660GB RAM. A maximum of 0.15GB of RAM was used for the example run.