MARV: a tool for genome-wide multi-phenotype analysis of rare variants
© The Author(s). 2017
Received: 21 June 2016
Accepted: 6 February 2017
Published: 16 February 2017
Genome-wide association studies have enabled identification of thousands of loci for hundreds of traits. Yet, for most human traits a substantial part of the estimated heritability is unexplained. This and recent advances in technology to produce high-dimensional data cost-effectively have led to method development beyond standard common variant analysis, including single-phenotype rare variant and multi-phenotype common variant analysis, with the latter increasing power for locus discovery and providing suggestions of pleiotropic effects. However, there are currently no optimal methods and tools for the combined analysis of rare variants and multiple phenotypes.
We propose a user-friendly software tool MARV for Multi-phenotype Analysis of Rare Variants. The tool is based on a method that collapses rare variants within a genomic region and models the proportion of minor alleles in the rare variants on a linear combination of multiple phenotypes. MARV provides analyses of all phenotype combinations within one run and calculates the Bayesian Information Criterion to facilitate model selection. The running time increases with the size of the genetic data while the number of phenotypes to analyse has little effect both on running time and required memory. We illustrate the use of MARV with analysis of triglycerides (TG), fasting insulin (FI) and waist-to-hip ratio (WHR) in 4,721 individuals from the Northern Finland Birth Cohort 1966. The analysis suggests novel multi-phenotype effects for these metabolic traits at APOA5 and ZNF259, and at ZNF259 provides stronger support for association (P TG+FI = 1.8 × 10−9) than observed in single phenotype rare variant analyses (P TG = 6.5 × 10−8 and P FI = 0.27).
MARV is a computationally efficient, flexible and user-friendly software tool allowing rapid identification of rare variant effects on multiple phenotypes, thus paving the way for novel discoveries and insights into biology of complex traits.
KeywordsRare variant analysis Multi-phenotype analysis High-dimensional data
In the past decade, genetic locus discovery for human traits and diseases has been advanced via genome-wide association studies (GWAS). Recent improvements in technology to produce genotype data in a very cost- and time-effective manner and powerful easy-to-use software tools have played a major role in these advances, facilitating fast analysis of constantly increasing amounts of data. Clearly, the next advances in the field of genomics will be based on large-scale sequencing and other high-dimensional omics data. A key challenge for successful utilisation of these data lies, once again, in the availability of powerful methods and user-friendly software tools, thus enabling researchers to make rapid discoveries .
Large-scale sequencing efforts, such as the 1000 Genomes Project  or more recently the UK10K Project  and the Haplotype Reference Consortium , have enabled better characterization of variation in the human genome, especially in the low-frequency and rare variant range. Here, we denote all variants with minor allele frequency, MAF < 5%, by RVs. Imputation based on variant density detected by these projects yields high-quality genotype data even down to 0.01% allele frequency . Large scale sequencing data generation encourages method and software development for elucidating RV effects, since traditional single-variant methods are underpowered to detect RV associations. Several methods and related software tools have been proposed, including burden tests using collapsing techniques, variance-component tests and combinations of the two .
There has also been increasing interest in addressing analysis of high-dimensional phenotypic and omic data, such as metabolomics, in relation to human genome variation. Multi-phenotype analysis (MPA), i.e. joint analysis of multiple phenotypes, is an example of recent developments in the field. Several methods and related software for single-variant MPA, including Bayesian and frequentist approaches, have recently been published . The MPA approach is motivated by several factors: 1) it boosts power for locus discovery [8–11]; 2) it provides more precise parameter estimates ; and 3) it has biological advantages including the possibility to identify multi-phenotype effects, including pleiotropy , when one locus affects multiple phenotypes. The power improvement by the MPA approach is especially relevant from a computational point of view, because to enable the discovery of further loci for complex traits, the analyses will need to be based on hundreds of thousands of individuals, such as those available from the UK Biobank and other new large-scale efforts based on sequencing. Storage and computational load for such amounts of data will pose a challenge, and alternative strategies for boosting power for locus discovery other than that of increasing sample size, clearly bring an enormous advantage.
We propose a novel tool MARV for RV MPA, which enables joint analysis of both large-scale high-dimensional genomic and phenotypic data. It extends the burden test for RVs to high-dimensional phenotypic data by applying the MPA approach. Recently, methods designed for MPA of RVs have been proposed [14–16], but these have several limitations regarding scalability and ability to combine continuous and discrete phenotypes, and more importantly, the associated software: they either lack an easy user-interface or are computationally inefficient – key features to facilitate fast discoveries. Our software tool MARV enables analysis of both continuous and binary phenotypes, as well as genotyped, imputed or sequenced data. MARV is computationally efficient for large-scale data. From a user point of view, it enables standard formats of data as used in other GWAS software, and the analyses are run using a command line interface, also familiar from widely used GWAS software such as Plink  and SNPTEST , thus enabling researchers quickly and effortlessly to transit from the standard single variant, single phenotype GWAS to region-based analysis of multiple phenotypes.
The method on which MARV is based is briefly introduced in Methods, and is extensively described, including power simulations, elsewhere  (Methods). MARV is written in C++ and has a command line user-interface. A single run of MARV consists of just one step and the required input files, commands and the resulting output files are described below.
Data input and commands
The user then needs to specify the phenotypes to be analysed (--pheno_name), corresponsing to a column name in the sample file, and the method to use for the analysis, i.e. whether to analyse the genotype dosages derived by the software from the imputation probabilities (--method expected) or whether to use the thresholded genotypes based on a pre-defined cut-off (--method threshold, with a cut-off default 0.9 which can be changed with the --call_thres option) (Fig. 1). Additionally, the user may specify several other options, such as individuals or SNPs to extract or exclude from the analysis. It is important to specify the threshold used for the minor allele frequency (--rare_thresh, by default 0.05, which means that variants with minor alleles of frequency < 5% only will be included in the analysis). All the available options of the latest version of MARV can be found from the online manual of MARV.
MARV works across the genome by going through the specified gene regions one by one. Based on the gene boundaries and desired rare variant cut-off, it calculates, for each individual, the proportion of minor alleles at rare variants within the region . After this calculation is performed for all individuals, a linear regression is fitted using the proportion as the outcome and the listed phenotypes as its predictors. The likelihood contribution of each individual is further weighted by the number of successfully genotyped/imputed RVs in the region of interest. For each genomic region, weighted linear regression is performed for all different phenotype combinations, i.e. if a user specifies phenotypes pheno_a and pheno_b, three different models for the proportion are fitted with the following predictor combinations: 1) pheno_a + pheno_b, 2) pheno_a, 3) pheno_b. MARV calculates the Bayesian Information Criterion (BIC) for each model to help the user in identifying the best fitting phenotype combination.
MARV produces three files by default:.error file,.log file and.result file (Fig. 1). The error file will be empty if the run was completed successfully; otherwise details about problems during the run are reported (Fig. 2). The log file will give specific details of the analysis, including the number of samples in the sample file and genotype file, and the number of phenotypes used for the analysis. It will also include the variants included for the analysis of each genomic region, along with their MAFs. The results file will include one row for results per each genomic region. If the user specifies printing of all the possible model combinations (--print_all) there will be as many rows per gene as there were different model combinations fitted. This file will inform the log likelihood and BIC of the model as well as the P-value for each model. We note that the P-value is uncorrected for any multiple testing. If the user is interested in the effect estimates and their standard errors for each of the model members, i.e. phenotypes included in the fitted model, a separate.betas file can be requested (--betas) (Figs. 1 and 2). A complete list of the columns in the output file with their meanings is provided in the online tutorial of MARV.
To illustrate the use of MARV across the genome, we have applied it to data from the Northern Finland Birth Cohort 1966 (NFBC1966), which covers over 96% of all births in the two northernmost provinces of Finland in 1966 (N = 12,068 live-born children) . We included data from 4,721 cohort members who had participated in the 31 year clinical examination and had genetic data as well as data on triglycerides (TG), fasting insulin (FI) and waist-to-hip ratio (WHR). The Ethics Committees of the University of Oulu and Northern Ostrobothnia Hospital District have approved the study. Individuals used for the analyses have provided written, informed consent.
Motivation for the selection of the traits comes from a common variant single-trait GWAS, which has shown an enrichment of FI associations among SNPs preselected on Metabochip for TG and waist phenotypes . For the selected traits, we applied the following criteria: 1) FG: exclude non-fasting individuals and/or those having type 1 or 2 diabetes mellitus or on diabetes treatment or having fasting blood glucose ≥ 7 mmol/l and/or being pregnant, 2) TG: exclude non-fasting and/or individuals known to be on lipid lowering medication. We modelled each trait on sex, body mass index and the first three principal components derived from the genetic data to control for potential population structure. An inverse normal transformation was further applied to the residuals of WHR and TG to reduce skewness.
DNA was extracted from blood samples drawn after overnight fasting at the 31 year clinical examination. Genotyping was performed with the Illumina HumanCNV370DUO Analysis BeadChip platform at the Broad Institute, USA, with Beadstudio algorithm being used for genotype calling. Detailed genotyping and sample quality control (QC) of the first set of data have been reported before . Additional samples were genotyped afterwards, resulting in 5,402 subjects and 324,896 SNPs available for analysis. The 1,000 Genomes Project “all ancestries” reference panel (March 2012) was used for imputation, resulting in ~38 M SNPs for analysis.
We analysed the transformed residuals in MARV with the method “threshold” (option -m threshold), i.e. genotypes with probability of 0.95 or higher were considered called, whilst all others were considered missing. The gene list from the University of California Santa Cruz (UCSC, NCBI genome sequence build 37, hg19)  was used to define gene regions, and a level of significance of 1.67 × 10−6 was adopted based on a Bonferroni correction for 30,000 genes. We analysed all variants irrespective of their annotation across autosomal chromosomes using the following cut-offs: MAF < 5% and imputation quality > 0.4.
Results and discussion
Results for loci reaching genome-wide significance in the multi-phenotype rare variant analysis of NFBC1966 (N = 4,721). Regression coefficients with their standard errors (SE) are reported, followed by the P-value and the Bayesian Information Criterion (BIC) for the analysed model. TG, triglycerides; ln(FI), natural logarithm transformed fasting insulin; WHR, waist-to-hip ratio
(Chr 11: 116,660,086-116,663,136)
(Chr 11: 116,649,276-116,658,739)
TG + ln(FI) + WHR, full model a
3.32 × 10−8; −19877.3
6.3 × 10−8; −25069.6
TG + ln(FI)
2.00 × 10−8; −19883.5
1.8 × 10−9; −25077.3
TG + WHR
3.34 × 10−7; −19877.9
4.1 × 10−7; −25066.5
ln(FI) + WHR
9.15 × 10−8; −19885.1
6.5 × 10−8; −25074.7
Common variants at these two identified genes have previously been associated with TG, total cholesterol, high-density lipoprotein, low-density lipoprotein, apolipoprotein A1 and B, coronary heart disease, coronary artery disease, plasma viscosity, Lp-PLA2 activity, prostate cancer, and circulating vitamin E levels [25–33]. A recent large-scale GWAS also reported RV associations at ZNF259 with triglyceride levels . Our analysis pointed to multi-phenotype effects with TG and FI. A recent study in Japanese individuals showed evidence for associations between variation in ZNF259 and type 2 diabetes , making this locus of interest for further investigation in the pathogenesis of the disease. Interestingly though, in our MPA the effects of TG and FI on the rare allele load at ZNF259 were in opposite directions, contrary to our expectations, since elevated TG levels usually correlate with elevated rather than decreased FI levels.
Running time and memory
We measured running time and memory usage of MARV by performing additional analyses on the NFBC1966 data with different number of individuals, phenotypes and on different sized chromosomes. For these analyses, we used 2,405 and 4,809 (i.e. ~double the first) individuals with complete data on eight continuous phenotypes. We analysed a combination of two, four and eight continuous phenotypes and used 1000 Genomes imputed chromosomes 1 and 22 data for the association analyses. All analyses were run and their performance data were collected using Imperial College HPC Cluster. Compute nodes were equipped with Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz machine.
Computational time and peak memory usage of MARV by varying sample size, chromosomal size and number of phenotypes
Number of phenotypes (number of fitted models)
Chr 1 (249 Mbp)
Chr 22 (35 Mbp)
N = 2,405
4:06:54 (215 MB)
00:38:04 (260 MB)
3:51:23 (215 MB)
00:38:37 (260 MB)
5:07:11 (215 MB)
00:55:58 (260 MB)
N = 4,809
14:47:34 (780 MB)
02:26:00 (500 MB)
13:40:11 (780 MB)
02:26:10 (600 MB)
17:26:08 (780 MB)
03:03:00 (600 MB)
The memory usage of MARV is more related to the size of the genetic data and number of individuals to analyse rather than the number of phenotypes to analyse. In our example, the peak memory usage was almost constant for all chromosome 1 and 22 analyses when the sample size remained the same, independent of the number of phenotypes in the model (Table 2). Considering the size differences of these two chromosomes (Table 2), we note that the increase in memory usage is not linear, however.
Our novel tool MARV allows for RV analysis of multiple phenotypes in a computationally efficient and user-friendly manner. The data input formats and the command line interface familiar from widely-used GWAS software will offer researchers a quick setup for the analyses. Moreover, the feature of analysing all phenotype combinations within one run and the calculation of BIC to help in model selection will pave the way for rapid discoveries and novel insights into biology of complex traits.
The type I error rate and power of the method have been tested under various scenarios with simulated phenotype and genotype data, and the results from these analyses are described in detail elsewhere .
Availability and requirements
Project name: MARV
Project home page: https://github.com/ImperialStatGen/MARV
Operating system(s): UNIX
Programming language: C++
Other requirements: Standard Linux/UNIX build tools to compile the program.
License: BSD 3-Clause License
Any restrictions to use by non-academics: None
Bayesian information criterion
Genome-wide association studies
Minor allele frequency
Multi-phenotype analysis of rare variants
Northern Finland Birth Cohort
Northern Finland Birth Cohort (NFBC1966) would like to thank the late professor Paula Rantakallio (launch of NFBC1966), the participants in the 31 year study and the NFBC project center.
This work used the computing resources of the UK MEDical BIOinformatics partnership - aggregation, integration, visualisation and analysis of large, complex data (UK MED-BIO) which is supported by the Medical Research Council [grant number MR/L01632X/1]; the Imperial College High Performance Computing Service, URL: http://www.imperial.ac.uk/admin-services/ict/self-service/research-support/hpc/.
MK is funded by the European Commission under the Marie Curie Intra-European Fellowship (project MARVEL (WPGA-P48951)). APM is a Wellcome Trust Senior Fellow in Basic Biomedical Science (WT098017). IP was in part funded by the Elsie Widdowson Fellowship. NFBC1966 received financial support from University of Oulu Grant no. 65354, Oulu University Hospital Grant no. 2/97, 8/97, Ministry of Health and Social Affairs Grant no. 23/251/97, 160/97, 190/97, National Institute for Health and Welfare, Helsinki Grant no. 54121, Regional Institute of Occupational Health, Oulu, Finland Grant no. 50621, 54231.
Availability of data and materials
The MARV software tool is freely available at URL: https://github.com/ImperialStatGen/MARV.
The Northern Finland Birth Cohort data which were used for the application of the developed tool are available upon collaboration and formal data request only, please see http://www.oulu.fi/nfbc/node/18136. The results from the case study are available as Additional files 1, 2, 3 and 4.
MK, RM, APM and JH wrote the MARV code and developed the software. MK, RM, APM and IP developed the methodology. MK performed the analyses and drafted the manuscript. JH tested the running time and memory usage. M.R.J. contributed to NFBC1966 data acquisition and study design. All authors read, edited and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The Ethics Committees of the University of Oulu and Northern Ostrobothnia Hospital District have approved the study. Individuals used for the analyses have provided written, informed consent.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Schatz MC. Biological data sciences in genome research. Genome Res. 2015;25:1417–22.View ArticlePubMedPubMed CentralGoogle Scholar
- McVean GA, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs RA, Dinh H, Kovar C, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(V):56–65.View ArticlePubMedGoogle Scholar
- Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JRB, Xu C, Futema M, Lawson D, Iotchkova V, Schiffels S, Hendricks AE, Danecek P, Li R, Floyd J, Wain LV, Barroso I, Humphries SE, Hurles ME, Zeggini E, Barrett JC, Plagnol V, Brent Richards J, Greenwood CMT, Timpson NJ, Durbin R, Soranzo N, Bala S, Clapham P, et al. The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82–90.View ArticlePubMedGoogle Scholar
- The Haplotype Reference Consortium [http://www.haplotype-reference-consortium.org/home]. Accessed 8 Feb 2017.
- Huang J, Howie B, McCarthy S, Memari Y, Walter K, Min JL, Danecek P, Malerba G, Trabetti E, Zheng H-F, UK10K Consortium, Gambaro G, Richards JB, Durbin R, Timpson NJ, Marchini J, Soranzo N. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun. 2015;6:8111.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: Study designs and statistical tests. Am J Hum Genet. 2014;95:5–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Galesloot TE, Van Steen K, Kiemeney LA, Janss LL, Vermeulen SH. A comparison of multivariate genome-wide association methods. PLoS One. 2014;9:1–8.View ArticleGoogle Scholar
- Amos CI, Laing A. A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol. 1993;10:671–6.View ArticlePubMedGoogle Scholar
- Allison DB, Thiel B, St Jean P, Elston RC, Infante MC, Schork NJ. Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet. 1998;63:1190–201.View ArticlePubMedPubMed CentralGoogle Scholar
- Banerjee S, Yandell BS, Yi NJ. Bayesian quantitative trait loci mapping for multiple traits. Genetics. 2008;179(August):2275–89.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim S, Xing EP. Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 2009;5:e1000587.View ArticlePubMedPubMed CentralGoogle Scholar
- Jiang C, Zeng ZB. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995;140:1111–27.PubMedPubMed CentralGoogle Scholar
- Shriner D. Moving toward system genetics through multiple trait analysis in genome-wide association studies. Front Genet. 2012;3(January):1.PubMedPubMed CentralGoogle Scholar
- Zhao J, Thalamuthu A. Gene-based multiple trait analysis for exome sequencing data. BMC Proc. 2011;5 Suppl 9:S75.View ArticlePubMedPubMed CentralGoogle Scholar
- Marttinen P, Gillberg J, Havulinna A, Corander J, Kaski S. Genome-wide association studies with high-dimensional phenotypes. Stat Appl Genet Mol Biol. 2013;12:413–31.PubMedGoogle Scholar
- Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, Xiong M, Wu CO, Fan R. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol. 2015;39:259–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(September):559–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–13.View ArticlePubMedGoogle Scholar
- Kaakinen M, Mägi R, Fischer K, Heikkinen J, Järvelin M-R, Morris AP, Prokopenko I. A rare variant test for high-dimensional data. Eur J Hum Genet. 2017. Under revision.
- Mägi R, Asimit JL, Day-Williams AG, Zeggini E, Morris AP. Genome-wide association analysis of imputed rare variants: application to seven common complex diseases. Genet Epidemiol. 2012;36:785–96.PubMedPubMed CentralGoogle Scholar
- Rantakallio P. Groups at risk in low birth weight infants and perinatal mortality. Acta Paediatr Scand. 1969;193 Suppl 193:1+.Google Scholar
- Scott R, Lagou V, Welch RP, Wheeler E, Montasser ME, Luan J, Mägi R, Strawbridge RJ, Rehnberg E, Gustafsson S, Kanoni S, Rasmussen-Torvik LJ, Yengo L, Lecoeur C, Shungin D, Sanna S, Sidore C, Johnson PCD, Jukema JW, Johnson T, Mahajan A, Verweij N, Thorleifsson G, Hottenga J-J, Shah S, Smith AV, Sennblad B, Gieger C, Salo P, Perola M, et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet. 2012;44:991–1005.View ArticlePubMedPubMed CentralGoogle Scholar
- Sabatti C, Service SK, Hartikainen A-L, Pouta A, Ripatti S, Brodsky J, Jones CG, Zaitlen NA, Varilo T, Kaakinen M, Sovio U, Ruokonen A, Laitinen J, Jakkula E, Coin L, Hoggart C, Collins A, Turunen H, Gabriel S, Elliot P, McCarthy MI, Daly MJ, Järvelin M-R, Freimer NB, Peltonen L. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet. 2009;41:35–46.View ArticlePubMedGoogle Scholar
- Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hickey G, Hinrichs AS, Hubley R, Karolchik D, Learned K, Lee BT, Li CH, Miga KH, Nguyen N, Paten B, Raney BJ, Smit AF, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC genome browser database: 2015 update. Nucleic Acids Res. 2015;43(November 2014):D670–81.View ArticlePubMedGoogle Scholar
- Ganna A, Salihovic S, Sundström J, Broeckling CD, Hedman ÅK, Magnusson PKE, Pedersen NL, Larsson A, Siegbahn A, Zilmer M, Prenni J, Ärnlöv J, Lind L, Fall T, Ingelsson E. Large-scale metabolomic profiling identifies novel biomarkers for incident coronary heart disease. PLoS Genet. 2014;10:e1004801.View ArticlePubMedPubMed CentralGoogle Scholar
- Feitosa MF, Wojczynski MK, Straka R, Kammerer CM, Lee JH, Kraja AT, Christensen K, Newman AB, Province MA, Borecki IB. Genetic analysis of long-lived families reveals novel variants influencing high density-lipoprotein cholesterol. Front Genet. 2014;5(June):159.PubMedPubMed CentralGoogle Scholar
- Major JM, Yu K, Wheeler W, Zhang H, Cornelis MC, Wright ME, Yeager M, Snyder K, Weinstein SJ, Mondul A, Eliassen H, Purdue M, Hazra A, McCarty CA, Hendrickson S, Virtamo J, Hunter D, Chanock S, Kraft P, Albanes D. Genome-wide association study identifies common variants associated with circulating vitamin E levels. Hum Mol Genet. 2011;20:3876–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Major JM, Yu K, Weinstein SJ, Berndt SI, Hyland PL, Yeager M, Chanock S, Albanes D. Genetic variants reflecting higher vitamin e status in men are associated with reduced risk of prostate cancer. J Nutr. 2014;144:729–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Cha S, Yu H, Park AY, Song KH. Effects of apolipoprotein A5 haplotypes on the ratio of triglyceride to high-density lipoprotein cholesterol and the risk for metabolic syndrome in Koreans. Lipids Health Dis. 2014;13:45.View ArticlePubMedPubMed CentralGoogle Scholar
- Gaunt TR, Zabaneh D, Shah S, Guyatt A, Ladroue C, Kumari M, Drenos F, Shah T, Talmud PJ, Casas JP, Lowe G, Rumley A, Lawlor DA, Kivimaki M, Whittaker J, Hingorani AD, Humphries SE, Day IN. Gene-centric association signals for haemostasis and thrombosis traits identified with the HumanCVD BeadChip. Thromb Haemost. 2013;110:995–1003.View ArticlePubMedPubMed CentralGoogle Scholar
- Grallert H, Dupuis J, Bis JC, Dehghan A, Barbalic M, Baumert J, Lu C, Smith NL, Uitterlinden AG, Roberts R, Khuseyinova N, Schnabel RB, Rice KM, Rivadeneira F, Hoogeveen RC, Fontes JD, Meisinger C, Keaney JF, Lemaitre R, Aulchenko YS, Vasan RS, Ellis S, Hazen SL, Van Duijn CM, Nelson JJ, März W, Schunkert H, McPherson RM, Stirnadel-Farrant H, Psaty BM, et al. Eight genetic loci associated with variation in lipoprotein-associated phospholipase A2 mass and activity and coronary heart disease: Meta-analysis of genome-wide association studies from five community-based studies. Eur Heart J. 2012;33(September 2010):238–51.View ArticlePubMedGoogle Scholar
- Waterworth DM, Ricketts SL, Song K, Chen L, Zhao JH, Ripatti S, Aulcheko Y, Zhang W, Yuan X, Lim N. Genetic variants influencing circulating lipid levels and risk of coronary artery disease. Atheroscler Thromb Vasc Biol. 2014;30:2264–76.View ArticleGoogle Scholar
- Suchindran S, Rivedal D, Guyton JR, Milledge T, Gao X, Benjamin A, Rowell J, Ginsburg GS, McCarthy JJ. Genome-wide association study of Lp-PLA(2) activity and mass in the Framingham Heart Study. PLoS Genet. 2010;6:e1000928.View ArticlePubMedPubMed CentralGoogle Scholar
- Surakka I, Horikoshi M, Mägi R, Sarin A-P, Mahajan A, Lagou V, Marullo L, Ferreira T, Miraglio B, Timonen S, Kettunen J, Pirinen M, Karjalainen J, Thorleifsson G, Hägg S, Hottenga J-J, Isaacs A, Ladenvall C, Beekman M, Esko T, Ried JS, Nelson CP, Willenborg C, Gustafsson S, Westra H-J, Blades M, de Craen AJM, de Geus EJ, Deelen J, Grallert H, et al. The impact of low-frequency and rare variants on lipid levels. Nat Genet. 2015;47:589–97.View ArticlePubMedPubMed CentralGoogle Scholar
- Tokoro F, Matsuoka R, Abe S, Arai M, Noda T, Watanabe S, Horibe H, Fujimaki T, Oguri M, Kato K, Minatoguchi S, Yamada Y. Association of a genetic variant of the ZPR1 zinc finger gene with type 2 diabetes mellitus. Biomed Rep. 2015;3:88–92.PubMedGoogle Scholar
- O’Reilly PF, Hoggart CJ, Pomyen Y, Calboli FCF, Elliott P, Jarvelin M-R, Coin LJM. MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS. PLoS One. 2012;7:e34861.View ArticlePubMedPubMed CentralGoogle Scholar
- Mägi R, Suleimanov YV, Clarke GM, Kaakinen M, Fischer K, Prokopenko I, Morris AP. SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes. BMC Bioinformatics. 2017. Accepted.