GVCBLUP: a computer package for genomic prediction and variance component estimation of additive and dominance effects
 Chunkao Wang^{1},
 Dzianis Prakapenka^{2},
 Shengwen Wang^{1},
 Sujata Pulugurta^{1},
 Hakizumwami Birali Runesha^{2} and
 Yang Da^{1}Email author
DOI: 10.1186/1471210515270
© Wang et al.; licensee BioMed Central Ltd. 2014
Received: 12 February 2014
Accepted: 30 July 2014
Published: 9 August 2014
Abstract
Background
Dominance effect may play an important role in genetic variation of complex traits. Full featured and easytouse computing tools for genomic prediction and variance component estimation of additive and dominance effects using genomewide single nucleotide polymorphism (SNP) markers are necessary to understand dominance contribution to a complex trait and to utilize dominance for selecting individuals with favorable genetic potential.
Results
The GVCBLUP package is a shared memory parallel computing tool for genomic prediction and variance component estimation of additive and dominance effects using genomewide SNP markers. This package currently has three main programs (GREML_CE, GREML_QM, and GCORRMX) and a graphical user interface (GUI) that integrates the three main programs with an existing program for the graphical viewing of SNP additive and dominance effects (GVCeasy). The GREML_CE and GREML_QM programs offer complementary computing advantages with identical results for genomic prediction of breeding values, dominance deviations and genotypic values, and for genomic estimation of additive and dominance variances and heritabilities using a combination of expectationmaximization (EM) algorithm and average information restricted maximum likelihood (AIREML) algorithm. GREML_CE is designed for large numbers of SNP markers and GREML_QM for large numbers of individuals. Test results showed that GREML_CE could analyze 50,000 individuals with 400 K SNP markers and GREML_QM could analyze 100,000 individuals with 50K SNP markers. GCORRMX calculates genomic additive and dominance relationship matrices using SNP markers. GVCeasy is the GUI for GVCBLUP integrated with an existing software tool for the graphical viewing of SNP effects and a function for editing the parameter files for the three main programs.
Conclusion
The GVCBLUP package is a powerful and versatile computing tool for assessing the type and magnitude of genetic effects affecting a phenotype by estimating wholegenome additive and dominance heritabilities, for genomic prediction of breeding values, dominance deviations and genotypic values, for calculating genomic relationships, and for research and education in genomic prediction and estimation.
Keywords
GVCBLUP Genomic selection Variance component Heritability BLUPBackground
Genomic prediction using genomewide single nucleotide polymorphism (SNP) has become a powerful approach to capture genetic effects dispersed over the genome for predicting an individual’s genetic potential of a phenotype [1–3]. Genomic estimation of variance components using genomewide SNP markers is a powerful tool for estimating the genetic contribution of the wholegenome to a phenotype and for addressing the missing heritability problem where a large number of causal variants explained only a small fraction of the phenotypic variation. Dominance effects of quantitative traits are measured as the deviation of the mean value of the heterozygote genotype of individuals from the averages of the two alternative homozygous genotypes [4, 5]. The inclusion of dominance in the prediction model may improve the accuracy of genomic prediction when dominance effects are present [6–9]. However, currently available software packages for genomic prediction and variance component estimation either are designed for additive effects only (GCTA [10]), or require users to prepare a dominancespecific file to estimate dominance effects (BLR or BGLR [11], GenSel [12], DMU [13], BLUPF90 [14]). Userfriendliness of the computing tool affects the efficiency of data analysis for genomic prediction and estimation. In order to fill these gaps, we implement two computationally complementary computing strategies with identical results and various definitions of genomic relationships in the GVCBLUP package that has a widerange of flexibility and functionality for broad applicability of genomic prediction and estimation of additive and dominance effects.
Implementation
GVCBLUP currently has three main programs and a graphical user interface (GUI) named GVCeasy that integrates the three main programs with an existing program for graphical viewing of SNP effects. The three main programs are GREML_CE, GREML_QM, and GCORRMX, which are developed using shared memory parallel computing technology. GVCeasy supplies users a userfriendly platform to run GVCBLUP.
Two complementary computing strategies
where Z_{1} = ZT_{α} and Z_{2} = ZT_{δ}. Computing difficulty is the V^{−1} and P = V^{−1} − V^{−1}X(X’V^{−1}X)^{−}X’V^{−1} for the CE set of Equations 1–2 and is the inverse of the coefficient matrix of the mixed model equations after absorbing fixed nongenetic effects (to be denoted by C^{−1}) for the QM set of Equations 3–4. The CE set has the best potential for using large numbers of SNP markers because the size of the V^{−1} and P matrices is determined by the number of individuals (assuming one observation per individual) and does not change for different numbers of SNPs. Similarly, the QM set has the best potential for using large numbers of individuals because the size of the C^{−1} matrix is determined by the number of SNP markers and does not change for different numbers of individuals.
EMREML and AIREML
Two algorithms for restricted maximum likelihood (REML) estimation of variance components are implemented in both GREML_CE and GREML_QM: EM type algorithm (EMREML) and AIREML algorithm [5]. AIREML generally is much faster than EMREML but is not as robust as EMREML and may be sensitive to initial values of variance components in the iterations. We require at least two iterations of EMREML and the user may specify a larger number of EMREML iterations to produce better initial values of variance components than the user provided initial values before switching to AIREML. When AIREML yields a negative estimate for any of the variance component estimates, the program automatically returns to EMREML, which yields nonnegative estimates of variance components. This strategy is designed to guarantee GREML_CE and GREML_QM estimates of variance components to be positive.
Shared memory parallel computing
GVCBLUP is programmed in C++ language using Eigen [16] and Intel Math Kernel libraries (MKL) [17]. Eigen is a C++ template library for linear algebra, supports large dense and sparse matrices and supplies easytouse coding expression for linear algebra. Intel MKL provides BLAS and LAPACK linear algebra routines and is optimized for Intel processors with multiple cores by using shared memory parallel computing technology, which is used for dense matrix inversion including V^{−1} and C^{−1} as well as dense matrix multiplications involving those two matrices in GVCBLUP.
Calculation and graphical viewing of SNP effects and heritabilities
where ${\mathrm{\sigma}}_{\mathrm{y}}^{2}={\mathrm{\sigma}}_{\mathrm{\alpha}}^{2}+{\mathrm{\sigma}}_{\mathrm{\delta}}^{2}+{\mathrm{\sigma}}_{\mathrm{e}}^{2}$ = phenotypic variance, ${\mathrm{h}}_{\mathrm{\alpha}}^{2}$ = total additive heritability of all SNP markers, and ${\mathrm{h}}_{\mathrm{\delta}}^{2}$ = total dominance heritability of all SNP markers. The output file for the SNP effects and heritabilities of Equations 59 is designed such that the SNP effects and heritability estimates can be directly used as the input file for graphing and graphical viewing by SNPEVG2 [18].
Simulated test data
Two simulated datasets are supplied in GVCBLUP package for testing purpose. One data set (dataset_1) has 1000 genotyped individuals with 3000 SNP markers and the other (dataset_2) has 3000 genotyped individuals with 1000 SNP markers. The parameter files to run GVCBLUP programs for the simulated datasets are also included in the package. These simulated data are designed for GVCBLUP exercises and for showing the complementary advantages of the CE and QM sets of formulations. Users interested in GVCBLUP exercises using large datasets could use a publically available swine dataset with over 45,000 SNP markers on 3534 individuals [19] that was used for comparing GREML estimates by GVCBLUP with the corresponding REML estimates using pedigree relations [5].
Results and discussion
GREML_CE and GREML_QM programs
Computing time (seconds) using GREML_CE and GREML_QM for simulated datasets ^{ 1 }
q = 1000, m = 3000 (Dataset_1)  q = 3000, m = 1000 (Dataset_2)  

GREML_CE  GREML_QM  GREML_CE  GREML_QM  
Time for SNP input, A_{g} and D_{g}  1  1  1  1 
Time per iteration  ~0.2  6  3  ~0.6 
Number of iteration  10  10  7  7 
Total time  5  69  32  6 
Capacity and speed of GVCBLUP for genomic estimation of additive, dominance and residual variances (tolerance = 10 ^{ −8 } ) and ItascaSB supercomputer
GREML_CE  GREML_CE  GREML_QM  GREML_QM^{1}  

Number of individuals (q)  20,000  50,000  200,000  100,000 
Number of SNP markers (m)  1 million  400,000  10,000  50,000 
Time for SNP input, A_{g} and D_{g}  3.7 hrs  6.0 hrs  14.9 min  0.33 hrs 
Time per iteration  3.1 min  0.77 hrs  1.5 min  2.25 hrs 
Total time  4.8 hrs  23.2 hrs  2 hrs  ~45.83 hrs 
Number of iteration  12  13  20  20 
Comparison of iteration numbers of EMREML and AIREML (tolerance = 10 ^{ −8 } ) using simulated data with different heritability levels
Replication  h_{ α }^{2} = 0.0, h_{ δ }^{2} = 0.0  h_{ α }^{2} = 0.3, h_{ δ }^{2} = 0.3  

EMREML  AIREML  EMREML  AIREML  
1  173  −^{1}  322  9 
2  231    386  12 
3  348    348  9 
4  359    354  8 
5  481  18  458  10 
6  138    295  10 
7  871    416  8 
8  134    353  9 
9  291  16  336  12 
10  1000  1000^{1}  431  11 
In addition to the tests in Table 1 using the simulation datasets we provide with the GVCBLUP package, GREML_CE and GREML_QM programs were extensively evaluated using simulation data under various assumptions, and the GREML estimates were compared to the REML estimates of additive heritabilities of five traits using pedigree relationships in a publically available swine dataset of 3534 pigs with the 60 K SNP data [5]. GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.000050.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels [5]. The inclusion of dominance in the prediction model resulted in improved accuracy of genomic prediction [8], and the genomic models with additive and dominance effects were more accurate for the estimation of variance components than their pedigreebased counterparts [7]. In a study of trout propensity to migrate, genomicpredicted additive effects completely separated migratory and nonmigratory fish in the wild population with 95.5% additive heritability and 4.5% dominance heritability, whereas genomicpredicted dominance effects achieved such complete separation in the damblocked population with 0% additive heritability and 39.3% dominance heritability [22], showing the importance to account for the exact effect type in the prediction model.
GCORRMX program
The GCORRMX program is designed to calculate measures of genomic similarities among individuals. This program currently calculates the A_{ g } and D_{ g } matrices for six definitions [23]. An example of the GCORRMX output files is given in Additional file 1: Supplementary output files.
GVCeasy: Graphical user interface (GUI) for GVCBLUP
Conclusions
The GVCBLUP package is a powerful and user friendly computing tool for assessing the type and magnitude of genetic effects affecting a phenotype by estimating wholegenome additive and dominance heritabilities of a phenotype using genomewide SNP markers, is a full featured computing tool for genomic prediction of breeding values, dominance deviations and genotypic values for both training and validation data sets, and provides an important computing utility for research and education in the area of genomic prediction and estimation.
Availability and requirements
Project name: GVCBLUP
Project home page: http://animalgene.umn.edu/
Operating system(s): Windows, Linux and Mac OS X
Programming language: C++, Java
License: None
Abbreviations
 SNP:

Single nucleotide polymorphism
 BLUP:

Best unbiased linear prediction
 GBLUP:

Genomic BLUP
 REML:

Restricted maximum likelihood estimation
 GREML:

Genomic REML
 EM:

Expectationmaximization
 AIREML:

Average information REML
 GUI:

Graphical user interface
 MME:

Mixed model equations.
Declarations
Acknowledgements
This research was supported by USDA National Institute of Food and Agriculture Grant no. 20116701530333 and by project MN16043 of the Agricultural Experiment Station at the University of Minnesota. Supercomputer computing time was provided by the Minnesota Supercomputer Institute at the University of Minnesota and by the Research Computing Center at The University of Chicago.
Authors’ Affiliations
References
 Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001, 157 (4): 18191829.PubMed CentralPubMedGoogle Scholar
 VanRaden P: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91 (11): 44144423. 10.3168/jds.20070980.View ArticlePubMedGoogle Scholar
 Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW: Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010, 42 (7): 565569. 10.1038/ng.608.View ArticlePubMed CentralPubMedGoogle Scholar
 Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, Harlow, Essex, UK: Longmans Green, 4Google Scholar
 Da Y, Wang C, Wang S, Hu G: Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One. 2014, 9 (1): e8766610.1371/journal.pone.0087666.View ArticlePubMed CentralPubMedGoogle Scholar
 Hu G, Wang C, Da Y: Genomic heritability estimation for the early life‒history transition related to propensity to migrate in wild rainbow and steelhead trout populations. Ecol Evol. 2014, doi:101002/ece31038Google Scholar
 Vitezica ZG, Varona L, Legarra A: On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics. 2013, 195 (4): 12231230. 10.1534/genetics.113.155176.View ArticlePubMed CentralPubMedGoogle Scholar
 Nishio M, Satoh M: Including dominance effects in the genomic BLUP method for genomic evaluation. PLoS One. 2014, 9 (1): e8579210.1371/journal.pone.0085792.View ArticlePubMed CentralPubMedGoogle Scholar
 Sun C, VanRaden P, O’Connell J, Weigel K, Gianola D: Mating programs including genomic relationships and dominance effects. J Dairy Sci. 2013, 96 (12): 80148023. 10.3168/jds.20136969.View ArticlePubMedGoogle Scholar
 Yang J, Lee SH, Goddard ME, Visscher PM: GCTA: a tool for genomewide complex trait analysis. Am J Hum Genet. 2011, 88 (1): 7682. 10.1016/j.ajhg.2010.11.011.View ArticlePubMed CentralPubMedGoogle Scholar
 Pérez P, de Los CG, Crossa J, Gianola D: Genomicenabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome. 2010, 3 (2): 106116. 10.3835/plantgenome2010.04.0005.View ArticlePubMed CentralPubMedGoogle Scholar
 Fernando R, Garrick D: GenSelUser Manual for a Portfolio of Genomic Selection Related Analyses. 2008, Ames: Animal Breeding and Genetics, Iowa State University, [http://taurus.ansci.iastate.edu/]Google Scholar
 Su G, Christensen OF, Ostersen T, Henryon M, Lund MS: Estimating additive and nonadditive genetic variances and predicting genetic merits using genomewide dense single nucleotide polymorphism markers. PLoS One. 2012, 7 (9): e4529310.1371/journal.pone.0045293.View ArticlePubMed CentralPubMedGoogle Scholar
 Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ: Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010, 93 (2): 743752. 10.3168/jds.20092730.View ArticlePubMedGoogle Scholar
 Da Y, Wang S: Joint genomic prediction and estimation of variance components of additive and dominance effects using SNP markers. Abstract P1004. Plant and Animal Genome XXI, January 12–16, 2013. San Diego. [https://pag.confex.com/pag/xxi/webprogram/Paper7396.html]
 Eigen V3. [http://eigen.tuxfamily.org]
 Intel Math Kernel Library Reference Manual. Doc. No. 630813061US, MKL 11.0, update 5. [http://downloadsoftware.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/mklman.pdf]
 Wang S, Dvorkin D, Da Y: SNPEVG: a graphical tool for GWAS graphing with mouse clicks. BMC Bioinformatics. 2012, 13 (1): 31910.1186/1471210513319.View ArticlePubMed CentralPubMedGoogle Scholar
 Cleveland MA, Hickey JM, Forni S: A common dataset for genomic analysis of livestock populations. G3: Genes GenomesGenetics. 2012, 2 (4): 429435.View ArticlePubMed CentralGoogle Scholar
 Ma L, Runesha HB, Dvorkin D, Garbe J, Da Y: Parallel and serial computing tools for testing singlelocus and epistatic SNP effects of quantitative traits in genomewide association studies. BMC Bioinformatics. 2008, 9 (1): 31510.1186/147121059315.View ArticlePubMed CentralPubMedGoogle Scholar
 Ma L, Wiggans G, Wang S, Sonstegard T, Yang J, Crooker B, Cole J, Van Tassell C, Lawlor T, Da Y: Effect of sample stratification on dairy GWAS results. BMC Genomics. 2012, 13 (1): 53610.1186/1471216413536.View ArticlePubMed CentralPubMedGoogle Scholar
 Hu G, Wang C, Da Y: Genomic heritability estimation for the early life‒history transition related to propensity to migrate in wild rainbow and steelhead trout populations. Ecology Evol. 2014, 4 (8): 13811388. 10.1002/ece3.1038.View ArticleGoogle Scholar
 Wang C, Prakapenka D, Wang S, Runesha HB, Da Y: GVCBLUP: a computer package for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. Version 3.3. Department of Animal Science, University of Minnesota. 2013Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.