A novel approach for multi-SNP GWAS and its application in Alzheimer’s disease
© Bodily et al. 2016
Published: 25 July 2016
Genome-wide association studies (GWAS) have effectively identified genetic factors for many diseases. Many diseases, including Alzheimer’s disease (AD), have epistatic causes, requiring more sophisticated analyses to identify groups of variants which together affect phenotype.
Based on the GWAS statistical model, we developed a multi-SNP GWAS analysis to identify pairs of variants whose common occurrence signaled the Alzheimer’s disease phenotype.
Despite not having sufficient data to demonstrate significance, our preliminary experimentation identified a high correlation between GRIA3 and HLA-DRB5 (an AD gene). GRIA3 has not been previously reported in association with AD, but is known to play a role in learning and memory.
Until recently linkage studies were the best approach to identify genes responsible for genetic diseases. However, linkage studies are most successful in monogenic disorders with highly penetrant variants, and were most often only possible in families. Unfortunately, the majority of genetic diseases are complex and therefore their genetic architectures could not be studied with linkage studies. Following completion of the human genome and the development of accurate SNP arrays, the Human HapMap project  cataloged the majority of common variants in the human genome. This catalog of SNPs facilitated the creation of genome-wide association studies (GWAS) . In a GWAS, the co-occurence of a given SNP and a phenotype are assessed. SNPs present (or absent) significantly more often in individuals with a particular phenotype or more extreme phenotype, are reported as disease markers (i.e., genomic variation correlated with a trait, but not necessarily causative). GWAS can be used to study both quantitative and binary phenotypes, and were the first effective approach for studying the genetics of complex traits.
In 2005, Haines et al.  conducted the first GWAS, examining statistical significance between single SNPs and age-related macular degeneration. In the decade since, the success of this technique has been used to identify genetic factors associated with dozens of traits (e.g., coronary heart disease, type-1 diabetes, type-2 diabetes, rheumatoid arthritis, Crohn’s disease, bipolar disorder, hypertension, Alzheimer’s disease, and others ). The GWAS catalog co-curated by the National Human Genome Research Institute and the European Bioinformatics Institute contains reported associations for thousands of GWAS .
Unfortunately, GWAS have several limitations. First, most GWAS markers are thought to be non-functional, so while the marker may provide insights into regions of the genome important to a particular phenotype, unless the marker is functional it typically does not provide information about the specific biological mechanisms driving disease. Second, the majority of GWAS markers have only a very modest effect on risk for disease. Third, despite thousands of GWAS performed using progressively bigger datasets, collectively GWAS SNPs only explain a portion—often a small portion—of the total estimated genetic variance for a given trait .
A number of explanations exist for the unexplained genetic variance. One possibility is that gene-gene (i.e., epistasis) interactions are a feature of the genetic architecture of these traits. GWAS assume an additive model (i.e., SNPs confer disease risk independent of other SNPs) and therefore cannot be used to detect epistatic interactions. Numerous approaches have been attempted to identify epistasis including multifactor dimensionality reduction, regression (i.e., GWAS with interaction terms), and others , each with different pros and cons. In this manuscript we present a novel multi-SNP GWAS approach for identifying epistatic interactions and demonstrate its utility in Alzheimer’s disease (AD).
Genes most highly associated with Alzheimer’s disease
This research used 802 whole genomes (279 control, 191 case, 332 unknown, and 444 males and 358 females) from the Alzheimer’s Disease Neuroimaging Intiative (ADNI). The genomes were processed by ADNI using the Burrows-Wheeler aligner (BWA)  and the best practices of the Genome Analysis Toolkit (GATK) . Genomes were obtained from the ADNI database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see www.adni-info.org.
Single-SNP GWAS analysis
In GWAS, LD is used to select which SNPs from the genome to analyze. For example, if two SNPs provide the exact same information (i.e., they always or almost always are inherited together), then only one of the SNPs is analyzed. This reduces the total number of tests (i.e., preserves statistical power) by eliminating redundant tests.
In our research, rather than using p-values from a regression to assess the relationship between a SNP (or multiple SNPs) and AD case/control status (i.e., which SNPs are correlated with AD), we calculated r 2 between each SNP in the dataset and AD case/control status. In this approach, the two events we are measuring are the co-occurence of a SNP and AD case/control status (i.e., do case status and a particular SNP co-occur with high confidence). We accomplished this by writing our own algorithm that computes LD between each SNP and AD case/control status.
An outline of our single-SNP GWAS algorithm is given in Algorithm 1. Note that there are multiple genotype values of A i because there are two haplotypes. The computation of Pearson’s r is as described above given the computed probabilities. This approach on a set of individuals, S, and a set of SNPs, L, runs in O(∥S∥×∥L∥) time and with O(1) space.
Multi-SNP GWAS analysis
To calculate the correlation of all SNPs with every other SNP would have exceeded our computational resources. We therefore calculated correlations between all SNPs and a subset of the SNPs that were located in genes with strongest and most consistent associations with AD (Table 1). Even this matrix was too large to fit into memory, so we computed subsections of the matrix in parallel on different machines. We found 2101 pairs of SNPs with an r 2 correlation greater than 0.04. These we plotted using R(Studio) to look for genes that had a (relatively) high correlation with other SNPs, as the relationship between two genes could provide important insights into disease processes.
An outline of our multi-SNP parallelized GWAS algorithm is given in Algorithm 2. The required START and END allow the task to be partitioned and run in parallel. There are multiple genotype values of A i and B j because there are two haplotypes. Our solution on on a set of individuals, S, and a set of SNPs, L, runs in O(∥S∥×∥L∥2/k) time and with O(∥L∥) space for each of k parallel runs (for our run k=201 with each run allocated 32GB of RAM and a wall time of 6 h).
Single-SNP GWAS analysis
SNPs with r 2≥ 0.025
Multi-SNP GWAS analysis
Among the 10 SNP pairs with strongest correlation, half of the pairings include a SNP from HLA-DRB5, two from INPP5D, two from FERMT2, and one from PTK2B. All but one of the SNPs from the top 10 pairs mentioned comes from a unique SNP with no prior reported AD involvement. Among the top 10, ELF4 had two SNPs with significant pairings. The first (rs6637686) was correlated with rs78623109 in FERMT2. The second (rs3788847) was correlated with rs377610860 in PTK2B.
Although our sample size is too small to identify statistically significant correlations, this work demonstrates the utility of taking a linkage disequilibrium approach in our single- and multi-SNP GWAS. First, we identified APOE as an important AD gene in the single-SNP GWAS using our new approach. This is hardly novel as two different APOE SNPs, rs7412 and rs429358, are the most established genetic protective/risk factors for AD . However, it is significant because the APOE effect is strong enough that we would expect to detect the signal in even a small dataset. In contrast, all of the other known variations that affect risk for AD are either very rare (PLD3 , APP , TREM2 , etc.), or have very small effect sizes . Consequently, because of our sample size it is not surprising that we did not detect any other Results of the AD markers in the single-SNP GWAS.
Next, we applied our multi-SNP GWAS to identify pairs of SNPs jointly correlated with AD. In principle, epistatic interactions can exist between two genes that individually are not correlated with disease. In this study we tested the pairing of each genic SNP in known AD genes with every other SNP in the dataset. In this analysis there were five strong bands corresponding with AD genes with a large number of identified SNPxSNP pairings.
Among the 10 strongest correlations, ELF4 had two correlated SNPs (one with FERMT2 and one with HLA-DRB5). ELF4 (E74-like factor 4) is a transcription factor involved in immunity and cell cycle control [18–20]. The function of FERMT2 is unknown. However, HLA-DRB5 has a direct role in immunity . From a biological standpoint, an interaction between HLA-DRB5 and ELF4 makes sense, and the immune system has a known role in AD .
GRIA3 (glutamate receptor, ionotropc, ampa 3) is another gene with SNPs of potential interest. A SNP in GRIA3 (rs7061304) was the second strongest correlation in our analyses (paired with rs67588672 in HLA-DRB5), and GRIA3 is among the non AD genes with the highest number of correlations. GRIA3 is a glutamate receptor. Glutamate receptors are the primary neurotransmitter receptors in human brains and GRIA3 specifically has a role in learning and memory. Furthermore, GRIA3 has been implicated in numerous disorders in the brain including bipolar disorder, mental retardation, and encephalopathy with epileptic seizures (Rasmussen encephalitis) [23, 24]. Finally, mutations in GRIA3 have been associated with cognitive impairment . Although GRIA3 has not been associated with AD, glutamate receptors have been studied (including members in the same family of genes as GRIA3) for their effect on Alzheimer’s disease based on the hypothesis that malfunction of glutamate receptors leads to AD-specific cell loss . When considering the function of GRIA3, especially in relationship with a known AD SNP (HLA-DRB5), GRIA3 is an attractive candidate gene for further studies.
In summary, we developed a novel multi-SNP GWAS method and demonstrated its utility in an AD dataset. Using this approach we identified potential epistatic interactions that affect risk for AD. GRIA3, in particular, is especially intriguing and warrants followup studies in larger datasets. Due to the difficulty identifying epistatic interactions, relatively few interactions are known. Future work will focus on developing an appropriate dataset with experimentally validated epistatic interactions for testing new models, integrating known biology of genes in identified interactions to identify mechanisms of synergistic functionality, and modification of the approach to more appropriately ascbribe statistical significance (i.e., p-values) to identified interactions.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (adni.loni.usc.edu) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. See http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf for a complete listing of ADNI investigators. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Publication of this article was funded by the Department of Biology and the College of Life Sciences at Brigham Young University.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 7, 2016: Selected articles from the 12th Annual Biotechnology and Bioinformatics Symposium: bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-7.
Availability of data and materials
Data is available from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). More information at http://adni.loni.usc.edu/data-samples/access-data/.
PMB, MSF, JTP formulated the study. MTWE and PGR aided with statistical analyses and biological insights. PMB, MSF, JTP, and PGR drafted the manuscript. MJC was involved in collaborative revision and direction of research. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Data and analyses in this manuscript were approved by the Brigham Young University Institutional Review Board.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, Ch’ang L-Y, Huang W, Liu B, Shen Y, et al.The international hapmap project. Nature. 2003; 426(6968):789–96.View ArticleGoogle Scholar
- McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008; 9(5):356–69.View ArticlePubMedGoogle Scholar
- Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, Spencer KL, Kwan SY, Noureddine M, Gilbert JR, et al.Complement factor h variant increases the risk of age-related macular degeneration. Science. 2005; 308(5720):419–21.View ArticlePubMedGoogle Scholar
- Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, et al.Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007; 447(7145):661–78.View ArticleGoogle Scholar
- Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, et al.The nhgri gwas catalog, a curated resource of snp-trait associations. Nucleic Acids Res. 2014; 42(D1):D1001–6.View ArticlePubMedGoogle Scholar
- Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB. Rare variants create synthetic genome-wide associations. PLoS Biol. 2010; 8(1):34.View ArticleGoogle Scholar
- Ebbert MT, Ridge PG, Kauwe JS. Bridging the gap between statistical and biological epistasis in alzheimer’s disease. BioMed Res Int. 2015; 201:121–9.Google Scholar
- Ridge PG, Ebbert MTW, Kauwe JSK. Genetics of alzheimer’s disease. BioMed Res Int. 2013; 2013:13. Article ID 254954. doi:10.1155/2013/254954.View ArticleGoogle Scholar
- Ebbert MT, Ridge PG, Wilson AR, Sharp AR, Bailey M, Norton MC, Tschanz JT, Munger RG, Corcoran CD, Kauwe JS. Population-based analysis of alzheimer’s disease risk alleles implicates genetic interactions. Biol Psychiatry. 2014; 75(9):732–7.View ArticlePubMedGoogle Scholar
- Ebbert MT, Boehme KL, Wadsworth ME, Staley LA, Mukherjee S, Crane PK, Ridge PG, Kauwe JS. Interaction between variants in clu and ms4a4e modulates alzheimer’s disease risk. Alzheimers Dement J Alzheimers Assoc. 2015; 12(2):121–9.View ArticleGoogle Scholar
- Cruchaga C, Karch CM, Jin SC, Benitez BA, Cai Y, Guerreiro R, Harari O, Norton J, Budde J, Bertelsen S, et al.Rare coding variants in the phospholipase d3 gene confer risk for alzheimer/’s disease. Nature. 2014; 505(7484):550–4.View ArticlePubMedGoogle Scholar
- Guerreiro R, Wojtas A, Bras J, Carrasquillo M, Rogaeva E, Majounie E, Cruchaga C, Sassi C, Kauwe JS, Younkin S, et al.Trem2 variants in alzheimer’s disease. N Engl J Med. 2013; 368(2):117–27.View ArticlePubMedGoogle Scholar
- Jonsson T, Atwal JK, Steinberg S, Snaedal J, Jonsson PV, Bjornsson S, Stefansson H, Sulem P, Gudbjartsson D, Maloney J, et al.A mutation in app protects against alzheimer/’s disease and age-related cognitive decline. Nature. 2012; 488(7409):96–9.View ArticlePubMedGoogle Scholar
- Ridge PG, Mukherjee S, Crane PK, Kauwe JS, et al.Alzheimer’s disease: analyzing the missing heritability. PloS ONE. 2013; 8(11):e79771.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Durbin R. Fast and accurate long-read alignment with burrows–wheeler transform. Bioinformatics. 2010; 26(5):589–95.View ArticlePubMedPubMed CentralGoogle Scholar
- DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M, et al.A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat Genet. 2011; 43(5):491–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Lambert J-C, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, Jun G, DeStefano AL, Bis JC, Beecham GW, et al.Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for alzheimer’s disease. Nat Genet. 2013; 45(12):1452–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Miyazaki Y, Sun X, Uchida H, Zhang J, Nimer S. Mef, a novel transcription factor with an elf-1 like dna binding domain but distinct transcriptional activating properties. Oncogene. 1996; 13(8):1721–9.PubMedGoogle Scholar
- Aryee D, Petermann R, Kos K, Henn T, Haas O, Kovar H. Cloning of a novel human elf-1-related ets transcription factor, elfr, its characterization and chromosomal assignment relative to elf-1. Gene. 1998; 210(1):71–8.View ArticlePubMedGoogle Scholar
- Yamada T, Park CS, Mamonkin M, Lacorazza HD. Transcription factor elf4 controls the proliferation and homing of cd8+ t cells via the krüppel-like factors klf4 and klf2. Nat Immunol. 2009; 10(6):618–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Tieber V, Abruzzini L, Didier D, Schwartz B, Rotwein P. Complete characterization and sequence of an hla class ii dr beta chain cdna from the dr5 haplotype. J Biol Chem. 1986; 261(6):2738–42.PubMedGoogle Scholar
- McGeer P, Akiyama H, Itagaki S, McGeer E. Immune system response in alzheimer’s disease. Can J Neurol Sci Le journal canadien des sciences neurologiques. 1989; 16(4 Suppl):516–27.Google Scholar
- Gécz J, Barnett S, Liu J, Hollway G, Donnelly A, Eyre H, Eshkevari HS, Baltazar R, Grunn A, Nagaraja R, et al.Characterization of the human glutamate receptor subunit 3 gene (gria3), a candidate for bipolar disorder and nonspecific x-linked mental retardation. Genomics. 1999; 62(3):356–68.View ArticlePubMedGoogle Scholar
- Rogers SW, Andrews PI, Gahring LC, Whisenand T, Cauley K, Crain B, Hughes TE, Heinemann SF, McNamara JO. Autoantibodies to glutamate receptor glur3 in rasmussen’s encephalitis. Science. 1994; 265(5172):648–51.View ArticlePubMedGoogle Scholar
- Wu Y, Arai AC, Rumbaugh G, Srivastava AK, Turner G, Hayashi T, Suzuki E, Jiang Y, Zhang L, Rodriguez J, et al.Mutations in ionotropic ampa receptor 3 alter channel properties and are associated with moderate cognitive impairment in humans. Proc Natl Acad Sci. 2007; 104(46):18163–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Pellegrini-Giampietro D, Bennett M, Zukin R. Ampa/kainate receptor gene expression in normal and alzheimer’s disease hippocampus. Neuroscience. 1994; 61(1):41–9.View ArticlePubMedGoogle Scholar