- Methodology article
- Open Access

# GLOSSI: a method to assess the association of genetic loci-sets with complex diseases

- High-Seng Chai
^{1}, - Hugues Sicotte
^{1}, - Kent R Bailey
^{1}, - Stephen T Turner
^{2}, - Yan W Asmann
^{1}and - Jean-Pierre A Kocher
^{1}Email author

**10**:102

https://doi.org/10.1186/1471-2105-10-102

© Chai et al; licensee BioMed Central Ltd. 2009

**Received: **07 July 2008

**Accepted: **03 April 2009

**Published: **03 April 2009

## Abstract

### Background

The developments of high-throughput genotyping technologies, which enable the simultaneous genotyping of hundreds of thousands of single nucleotide polymorphisms (SNP) have the potential to increase the benefits of genetic epidemiology studies. Although the enhanced resolution of these platforms increases the chance of interrogating functional SNPs that are themselves causative or in linkage disequilibrium with causal SNPs, commonly used single SNP-association approaches suffer from serious multiple hypothesis testing problems and provide limited insights into combinations of loci that may contribute to complex diseases. Drawing inspiration from Gene Set Enrichment Analysis developed for gene expression data, we have developed a method, named GLOSSI (G ene-lo ci S et Analysi s), that integrates prior biological knowledge into the statistical analysis of genotyping data to test the association of a group of SNPs (loci-set) with complex disease phenotypes. The most significant loci-sets can be used to formulate hypotheses from a functional viewpoint that can be validated experimentally.

### Results

In a simulation study, GLOSSI showed sufficient power to detect loci-sets with less than 10% of SNPs having moderate-to-large effect sizes and intermediate minor allele frequency values. When applied to a biological dataset where no single SNP-association was found in a previous study, GLOSSI was able to identify several loci-sets that are significantly related to blood pressure response to an antihypertensive drug.

### Conclusion

GLOSSI is valuable for association of SNPs at multiple genetic loci with complex disease phenotypes. In contrast to methods based on the Kolmogorov-Smirnov statistic, the approach is parametric and only utilizes information from within the interrogated loci-set. It properly accounts for dependency among SNPs and allows the testing of loci-sets of any size.

## Keywords

- Minor Allele Frequency
- Atrial Natriuretic Peptide
- Single Nucleotide Polymorphism
- Hypothetical Gene
- Causal Locus

## Background

The genetic component of complex disorders such as hypertension, Parkinson's disease, cancer, and diabetes is believed to result from the compound effect of multiple DNA variations in different chromosomal regions. In this context, the paradigm of searching across the genome for univariate single nucleotide polymorphism (SNP) associations may not be the most appropriate or realistic strategy. A preferred approach would consider the effects of multiple SNPs jointly. Unstructured enumeration of all possible combinations of SNPs for association is computationally demanding, if not infeasible. Variable selection needs to be performed before testing such multi-locus effects due to the discrepancy between numbers of SNPs and sample size in a typical genome-wide association study. In the current work, we focused the proposed association analyses of SNPs belonging to genes that are biologically related. The criteria for grouping SNPs can be based on biological theory, expert opinion, or localization in genes that control the same functional process or are co-regulated. Such groups of SNPs will be referred to as loci-sets. We have developed a method called GLOSSI (G ene-lo ci S et Analysi s) to score loci-sets as a function of the significance level of the individual SNPs comprising each loci-set. In what follows, we will use the terms locus and SNP interchangeably.

The idea of directly scoring a predefined set of genetic features is not new. It has sparked considerable interest in the context of gene expression data analysis since the publication of the pioneering paper by Mootha *et al*. [1, 2]. These authors designated and implemented the Gene Set Enrichment Analysis (GSEA) approach to identify functionally related genes that display overall coordinated expression changes with respect to biological states or disease phenotypes. The annotated biological function is expected to be more relevant if the set is 'enriched' with genes showing good-to-moderate association signals as compared to the remaining genes.

Recently, Wang *et al*. [3] built on work of Subramanian *et al*. [2] and extended it to genotyping data. Since many SNPs can be assigned to the same gene, the authors used the best signal (biggest *χ*^{2}-value) from each gene in their calculation. Similar to GSEA, enrichment of association signals was measured by using a modified Kolmogorov-Smirnov (KS) statistic and statistical significance determined through permutation testing. One drawback of the KS statistic is that it depends, in part, on the signals outside of the tested loci-set. Put another way, it assumes the 'real' causal SNPs are fully contained in a single relevant loci-set, if such a set exists. In practice, causal SNPs can probably span across multiple loci-sets, without accounting for the imperfect SNP classification that might arise, for instance, from the empirical definition of the boundary of a loci-set. Under these conditions, application of the KS statistic will result in the attenuation of the overall significance of the relevant loci-set. Another limitation pointed out by the authors is the need to carry out the computationally demanding permutation of sample labels, instead of the faster gene label permutation to properly assess the statistical significance of the KS statistic. When many loci-sets have to be tested, the computational challenge is increased since a larger number of permutations have to be performed so as to detect significant association with correction for multiple hypothesis testing.

The method we describe below addresses these two issues. GLOSSI scores loci-sets by an alternative strategy that only focuses on information from within a loci-set and allows the determination of significance level with relative computational ease.

## Results and discussion

### Fisher's combined probability test

*I*independent subjects on

*J*loci, where the number of loci genotyped is typically much larger than the sample size,

*i.e*.

*J*>>

*I*. Let

*y*

_{ i }represents the phenotypic measurement for the

*i*-th subject,

*i*= 1,...,

*I*, with the phenotype being understood in a broad sense as for instance a binary or multiclass label, a continuous quantity, a censored variable, or even count data. For illustration purposes, we assume here that the phenotype is binary (coded as 0/1). Using a standard encoding of the genotype as a count of the major allele, we denote the genotyping data as:

*i*= 1,...,

*I*and

*j*= 1,...,

*J*. Also, let

*g*

_{ jk }be an indicator variable indexing the

*k*-th loci-set, i.e. for

*j*= 1,...,

*J*and

*k*= 1,...,

*K*(assuming there are

*K*loci-sets of interest)

A measure of statistical significance is first calculated between each of the *J* loci with a chosen binary phenotype (eg case versus control). Either allele or genotype frequencies can be used as the basis for testing a locus in terms of its ability to distinguish the two phenotypic classes under study. Various statistical approaches are appropriate for deriving the p-value, from contingency-table-based methods: Fisher's exact test, Pearson's *χ*^{2} test, or Cochran-Armitage trend test; to regression-based techniques: logistic analysis, probit analysis, or complementary-log-log analysis. These approaches could have widely different methodological assumptions, specifically on the way in which the phenotype depends on the loci (eg additive, recessive, dominant, or unconstrained). Because it is a common belief that the additive assumption is generally adequate for complex disorders, we opt for the Cochran-Armitage trend test in view of its statistical power. The trend test statistic is formulated in the Method Section. Henceforth, we denote p-value for the *j-* th locus by *p*_{
j
}.

The null hypothesis of no association between *y*_{
i
}and *s*_{
ij
}implies that *p*_{
j
}is distributed as a standard uniform random variable, taking values in the interval (0,1]. Furthermore, *t*_{
j
}= -2log *p*_{
j
}has a chi-square distribution with two degrees of freedom.

*p*

_{ j }of the

*k*-th loci-set follows a chi-square distribution with degrees of freedom equal 2 times the total number of SNPs in the set. The simplicity of this approach is very appealing. In addition, it was shown by Little and Folks [5, 6] that Fisher's combined probability test is asymptotically (as the number of tests is increased) Bahadur optimal. However, the assumption of independence is not tenable in a high-throughput GWAS even if only tag SNPs are used in the study. Basing inference on the independence assumption could therefore greatly inflate the apparent statistical significance of a loci-set, leading to more false positive findings. For this reason, we suggest correcting for correlation using Brown's approximation [7]:

with
in which Ω is the covariance matrix of *t*. Brown showed in his paper that the approximation works well in general except when *t*_{
j
}'s are highly negatively correlated. Since correlations between *t*_{
j
}s from two-sided tests can only be positive, the approximation should be adequate for most genetic association studies. Note that Ω is unknown and needs to be estimated. We chose to perform the estimation through shuffling the phenotype labels 100 times, though smaller number of permutations are often sufficient to attain a stable estimate for Ω. Details of the permutation scheme are deferred to the Method Section.

### Simulation study

In order to objectively assess the potential of the proposed methodology, we conducted a simulation study. A web-based tool, namely HapSample [8], was used to generate case-control samples with genetically realistic genotypes. We restricted the simulation to a subset of SNPs interrogated by the Sentrix^{®} HumanHap300 BeadChip [9], which consists mostly of tag SNPs. More explicitly, a total of 59,140 SNPs no more than 20 millions bases away from the end of each autosome were retained in the study. Of these, we filtered out 770 loci based on the following criteria: 737 are not in the HapMap phase I/II data [10]; 20 have a minor allele frequency (MAF) of zero according to the HapMap project (Utah population); and 13 lie near the edge of some chromosomes which possess linkage patterns (inferred from HapMap data) that are not compatible with the simulator recombination algorithm.

^{2}> 0.8 via Hapmap) in Scenarios 19–21, as containing a high risk allele. Note that either none or only one SNP per autosome was assumed causative in compliance with the constraint of HapSample. Moreover, we assumed additivity on the phenotype-genotype relationships and considered the causal SNPs as independent from each other. All SNPs located within 15 kilobases upstream or downstream (average size of a real gene) of the 44 selected loci constitute a loci-set. This resulted in 231 members in the MAF = 0.05 loci-set while the size of the other set was 254.

Parameter specification in the simulated examples

Scenario | Number of causal SNPs | RR (OR)* | MAF | Whether causal SNPs were 'genotyped' |
---|---|---|---|---|

1–3 | 1, 5 or 20 | 1.07 (1.10) | 0.25 | Yes |

4–6 | 1, 5 or 20 | 1.34 (1.52) | 0.05 | Yes |

7–9 | 1, 5 or 20 | 1.34 (1.50) | 0.25 | Yes |

10–12 | 1, 5, or 20 | 1.61 (2.00) | 0.05 | Yes |

13–15 | 1, 5, or 20 | 1.61 (1.94) | 0.25 | Yes |

16–18 | 1 or 5 | 2.00 (2.67) | 0.25 | Yes |

19–21 | 1, 5, or 17 | 1.61 (~1.94) | ~0.25 | No |

*α*= 0.05, 95% CI 0.036 – 0.064;

*α*= 0.001, 95% CI 0.004 – 0.016;

*α*= 0.001 95% CI 0 – 0.003). This leads us to conclude that GLOSSI offer adequate results in the true null scenario.

Estimated type I error rates for GLOSSI in the simulated examples

Total sample size | Nominal rate, | Proportion of p-value < |
---|---|---|

400 | 0.05 | 0.057 |

0.01 | 0.011 | |

0.001 | 0 | |

1200 | 0.05 | 0.045 |

0.01 | 0.012 | |

0.001 | 0.002 | |

2000 | 0.05 | 0.049 |

0.01 | 0.010 | |

0.001 | 0.003 |

It is of interest to compare the performance of GLOSSI against the modified KS approach proposed by Wang *et al*. [3]. To this end, eight additional non-overlapping regions (hypothetical genes) were randomly picked from each chromosome. We altered the extension of the newly selected regions to either double, equal or halve the size of the hypothetical genes in the loci-set. These were used to create a reference distribution in the modified KS test. Without loss of generality, we focused the comparison on the null and two alternative hypotheses with the use of our data set of 200 cases and 200 controls. Scenarios 17 and 18 were chosen here because their powers were near 80% in the case of GLOSSI.

Estimated type I error rates for the modified KS statistic

Type of permutation | Relative size of genes: out/in loci-set | Proportion of p-value < 0.05 | 95% CI |
---|---|---|---|

Phenotype | 50% | 0.052 | (0.038,0.066) |

100% | 0.053 | (0.039,0.067) | |

200% | 0.051 | (0.037,0.065) | |

Gene | 50% | 0.27 | (0.24,0.29) |

100% | 0.046 | (0.033,0.059) | |

200% | 0.004 | (0,0.008) |

Power of the modified KS statistic when two genes in the reference set consist of a causal SNP

Number of genes outside of loci-set | Relative size of genes: out/in loci-set | Proportion of p-value < 0.05 | |
---|---|---|---|

Scenario 17 | Scenario 18 | ||

50% | 0.464 | 0.842 | |

176 | 100% | 0.581 | 0.854 |

200% | 0.684 | 0.860 | |

50% | 0.371 | 0.804 | |

88 | 100% | 0.495 | 0.829 |

200% | 0.629 | 0.854 | |

50% | 0.145 | 0.683 | |

44 | 100% | 0.187 | 0.732 |

200% | 0.161 | 0.773 |

### Antihypertensive response example

^{®}Human Mapping 100 K Set [12] were available for 194 African Americans (97 good and 97 poor responders, 50 women and 47 men in each of these response groups) and 195 non-Hispanic Caucasians (98 good – 42 females and 56 males; and 97 poor – 42 females and 55 males responders). Genotype calls were made through the use of Dynamic Modeling algorithm [13]. The following SNPs were excluded from the analysis: on the X-chromosome, monomorphic, MAF < 2%, "call"-rate < 80% or deviated from Hardy-Weinberg equilibrium at p < .001. Thus our illustration is based on 102,334 and 95,221 post-filtering SNPs in the black and white samples, respectively. See Turner

*et al*. [14] for a detailed description of the study design and procedures taken to preprocess the Affymetrix data. We derived loci-sets from 1412 generic and human-specific functional sets of the publicly accessible Molecular Signature Database (MSigDB version 2.1 [15]). Only SNPs located within 5000 base pairs upstream or downstream of a gene (defined using Affymetrix build na24 annotation files) were considered relevant to that gene. Table 5 reports the most significant loci-sets from applying GLOSSI on all non-empty loci-sets (1405 for Whites and 1404 for Blacks). q-values [16, 17] were calculated to guard against the cost of multiple hypothesis testing. This provides an expected proportion of false positives among loci-sets with unadjusted p-values at least as extreme as the current set of interest.

Loci-sets with unadjusted p-value no greater than 0.1% in the antihypertensive response example

Loci-set | MsigDB ID | No. SNP | No. relevant gene | p-value | q-value |
---|---|---|---|---|---|

| |||||

TPO signaling pathway | c2:338 | 48 | 10 | 0.0001 | 0.035 |

Erk1/Erk2 Mapk signaling pathway | c2:178 | 74 | 16 | 0.0001 | 0.035 |

Sprouty regulation of tyrosine kinase signals | c2:316 | 36 | 10 | 0.0001 | 0.035 |

Multiple antiapoptotic pathways from IGF-1R signaling lead to bad phosphorylation | c2:214 | 24 | 8 | 0.0002 | 0.035 |

PTEN pathway | c2:557 | 30 | 8 | 0.0002 | 0.035 |

Transcription factor CREB and its extracellular signals | c2:152 | 83 | 16 | 0.0002 | 0.035 |

Growth hormone signaling pathway | c2:198 | 50 | 11 | 0.0002 | 0.035 |

PTEN dependent cell cycle arrest and apoptosis | c2:292 | 24 | 8 | 0.0003 | 0.035 |

Upregulated in acute rejection transplanted kidney biopsies | c2:834 | 132 | 25 | 0.0003 | 0.035 |

IL 3 signaling pathway | c2:223 | 18 | 6 | 0.0003 | 0.035 |

Trka receptor signaling pathway | c2:339 | 41 | 5 | 0.0003 | 0.035 |

IL-2 receptor beta chain in T cell activation | c2:222 | 40 | 11 | 0.0004 | 0.035 |

B cell antigen receptor | c2:569 | 49 | 18 | 0.0004 | 0.035 |

IL 4 receptor signaling in B lymphocytes | c2:563 | 39 | 12 | 0.0004 | 0.035 |

Calcium signaling by HBx of Hepatitis B virus | c2:569 | 16 | 4 | 0.0005 | 0.035 |

Glycogen processing | c2:602 | 39 | 8 | 0.0005 | 0.035 |

IGF-1 signaling pathway | c2:213 | 31 | 9 | 0.0005 | 0.035 |

Down regulated following Apc loss | c2:1048 | 156 | 32 | 0.0005 | 0.035 |

Liver selective | c2:979 | 300 | 104 | 0.0005 | 0.035 |

TrkA receptor | c2:559 | 19 | 6 | 0.0006 | 0.035 |

Inhibition of cellular proliferation by gleevec | c2:199 | 37 | 10 | 0.0006 | 0.035 |

IL 6 signaling pathway | c2:226 | 25 | 8 | 0.0006 | 0.035 |

Insulin signaling pathway | c2:229 | 26 | 8 | 0.0007 | 0.039 |

Upregulated in fibroblasts following infection with human cytomegalovirus | c2:1269 | 131 | 24 | 0.0008 | 0.040 |

Down regulated by both curcumin and sulindac in SW260 colon carcinoma cells | c2:1412 | 50 | 10 | 0.0010 | 0.047 |

Upregulated by TPA in resistant HL-525 cells | c2:1679 | 90 | 19 | 0.0010 | 0.048 |

| |||||

Upregulated by UV-B light in epidermal keratinocytes | c2:1717 | 55 | 12 | 0.0004 | 0.56 |

Upregulated in well functioning transplanted kidney biopsies | c2:836 | 1347 | 285 | 0.0009 | 0.63 |

GLOSSI reported 26 loci-set with a q-value lower than 5% in Whites but no loci-set passed this cutoff in Blacks (Table 5). The size of the 26 loci-sets ranges from 16 to 300 SNPs. This result is quite encouraging since single SNP methods previously applied to the same datasets could not detect any SNP that was statistically significantly associated with DBP response to hydrochlorothiazide (unpublished results). Among the top ranking loci-sets of the populations, two were derived from the same gene expression experiment of kidney transplant biopsies [18]. These loci-sets are 'upregulated in acute rejection transplanted kidney biopsies' (MsigDB ID = c2:834, p-value = 0.0003) for non-Hispanic Caucasians and 'upregulated in well functioning transplanted kidney biopsies' (MsigDB ID = c2:836, p-value = 0.0009) for African Americans. Although biological interpretation of the results is not straightforward, one can hypothesize that genes in those two loci-sets are related to kidney pathophysiology or normal physiology and, therefore, may be relevant to sodium excretion, blood pressure regulation, and DBP response to diuretic therapy. One could also speculate that the different physiological mechanisms indexed by these two loci-sets are consistent with known differences in diuretic response between Black and White individuals with hypertension.

Other loci-sets are less informative and harder to interpret. Inspection of their names suggests that several of the significant loci-sets in Whites could conceivably be involved in regulation of antihypertensive drug response. These include 'growth hormone signaling pathway' (MsigDB ID = c2:198), 'calcium signaling by HBx of Hepatitis B virus' (MsigDB ID = c2:569), and 'insulin signaling pathway' (MsigDB ID = c2:229). The relationship of some other loci-sets with DBP response to hydrochlorothiazide requires a more speculative interpretation. For example, a few of them appear to be related to cell growth regulation but with no obvious relationship to blood pressure. However, a possible connection could exist through mitogenic hormones that are often vasoconstrictive and antinatriuretic and, therefore, would elevate blood pressure (eg, angiotensin II). Conversely, vasodilating and natriuretic hormones that lower blood pressure are often anti-mitogenic (eg, atrial natriuretic peptide).

## Conclusion

The GLOSSI methodology for scoring loci-sets (a priori defined groups of SNPs) overcomes limitations of commonly-used single SNP approaches. The origin of a loci-set facilitates the interpretation of statistical outputs, providing a biological understanding of the mechanisms that underlie diseases or other phenotypes of interest. In contrast to the approach of Wang *et al*. [3], the proposed procedure is parametric: it assumes that p-values from individual SNPs follow a standard uniform distribution under the null hypothesis of no association and infers statistical relevance of each loci-set against a *χ*^{2} distribution. Consequently it has the advantage of computational speed, demands measurements only of SNPs within the query loci-set, and imposes no constraint on the size of the set. Although we only focus on binary phenotypes in this communication, the technique is general and equally applicable to other kinds of outcomes or any types of genome-scale data. In particular, the locus-specific p-values could be generated by statistical methods equipped with the ability to control for the presence of covariates (eg age, gender, etc). Appropriate adjustment for additional covariates would allow more accurate estimation of the true genotype-phenotype effect. The performance of the proposed method was evaluated by using computer simulated data as well as data from an antihypertensive pharmacogenomic study. In the simulation study, GLOSSI yielded the anticipated type-I error rate when no SNP in the loci-set was related to the binary outcome. Also, it demonstrated sufficiently high power for detecting loci-sets in which a fair number of SNPs (< 10%) had moderate to large effect sizes and intermediate MAF values. In the real data example, the proposed method appears to have been able to identify novel loci-sets not previously known or suspected to be involved in blood pressure regulation or antihypertensive drug response.

The lack of firm biological interpretation in the antihypertensive response example underlines one of the limitations of our method. Although GLOSSI is capable of detecting relevant loci-sets as demonstrated in the simulation experiment, its usefulness depends directly on the definition and availability of loci-sets when applying it to biological data. The currently available functionally annotated loci-sets are biased toward groups of genes involved in cancers since most of them were derived from such disease studies but very few of them focus on blood pressure or kidney-related investigations. Undoubtedly, more annotated and curated loci-sets will be available over time, which in turn will increase the applicability of GLOSSI for a given disease phenotype. The definition of a loci-set itself can also be challenged. The current assignment of SNPs to a gene, according to fixed physical distance boundaries from that gene, might not be optimal, not even in principle, let alone given the uncertainty in determining the appropriate fixed distance.

It must be stressed that GLOSSI only accounts for the additive, independent effect of individual SNPs and, therefore, ignores possible biological interactions that might exist. The joint effect of SNPs within a loci-set can be captured using multivariate methods [19–21]. However, a fair comparison of multivariate models derived from various loci-sets is hard to achieve since it demands sample label permutation testing. More specifically, the statistical model needs to be rebuilt for every loci-set in each permutation, which quickly becomes impractical as the numbers of loci-sets and permutations increase. Other complications that might arise during the application of multivariate analysis include overfitting and model instability. To balance the need for joint effects modeling with computational time effectiveness, one can envision developing a hierarchical approach that first uses GLOSSI for rapid identification of significant loci-sets followed by more extensive multivariate modeling. This approach is currently being investigated in our group.

## Methods

### Cochran-Armitage trend test

with *δ*(.) signifies an indicator function taking value one if its argument is correct and zero otherwise. The null hypothesis of the test is no linear trend in the proportion of group memberships at each of the three SNP genotypes, i.e. the proportion of 0/1 class distinction is the same for all levels. The statistic follows a standard normal distribution under this null hypothesis. Hence the evidence of association (p-value) between *y*_{
i
}and *s*_{
ij
}can be inferred by comparing *T*_{
j
}^{2} to a *χ*^{2} distribution with 1 degree of freedom, i.e. this will be a two-sided test.

### Estimating covariance matrix by permutation

The existence of local LD implies that *t*_{
j
}, *j* = 1,..., *J*, are not independent. Their covariance matrix, Ω, under the null hypothesis can be estimated through the use of permutation as follows. Assume that the phenotype measurements are independently and identically distributed. The subscript of *y*_{1}, *y*_{2},..., *y*_{
I
}are first shuffled reiteratively. One could generate either all *I*! permissible permutations or just a random sample of them. Then recalculate *t*_{
j
}for all loci over every permuted datasets. Each of the resulting set of *t*-values represents a joint observation from the sampling distribution of *t* = (*t*_{
j
},..., *t*_{
j
})^{T} that is consistent with the null hypothesis. Given enough permutations, the empirical covariance of the *t*-values from the above should approximate Ω. Note that because not all SNPs are assigned to loci-sets, it is more computationally efficient to perform the calculations only on the relevant loci.

## Declarations

### Acknowledgements

This work was supported partly by HL 74735, HL 53330 and Mayo Foundation. We are grateful to the authors of HapSample, in particular Fred Wright, for their assistance in the simulation study.

## Authors’ Affiliations

## References

- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1
*α*-responsive genes involved in oxidative phosphorylation are coordinately down-regulated in human diabetes.*Nat Genet*2003, 34: 267–273. 10.1038/ng1180View ArticlePubMedGoogle Scholar - Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
*Proc Natl Acad Sci USA*2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar - Wang K, Li M, Bucan M: Pathway-based approaches for analysis of genome-wide association studies.
*Am J Hum Genet*2007, 81: 1278–1283. 10.1086/522374PubMed CentralView ArticlePubMedGoogle Scholar - Fisher RA:
*Statistical methods for research workers*. London: Oliver and Boyd; 1932.Google Scholar - Little RC, Folks JL: Asymptotic optimality of Fisher's method of combining independent tests.
*J Am Stat Assoc*1971, 66: 802–806. 10.2307/2284230View ArticleGoogle Scholar - Little RC, Folks JL: Asymptotic optimality of Fisher's method of combining independent tests II.
*J Am Stat Assoc*1973, 68: 193–194. 10.2307/2284167View ArticleGoogle Scholar - Brown MB: A method for combining non-independent, one-sided tests of significance.
*Biometrics*1975, 31: 987–992. 10.2307/2529826View ArticleGoogle Scholar - Wright FA, Huang H, Guan X, Gamiel K, Jeffries C, Barry WT, Pardo-Manuel F, Sullivan PF, Wihelmsen KC, Zou F: Simulating association studies: a data-based resampling method for candidate regions or whole genome scans.
*Bioinformatics*2007, 23: 2581–2588. 10.1093/bioinformatics/btm386View ArticlePubMedGoogle Scholar - Illumina Inc[http://www.illumina.com]
- International HapMap Project[http://www.hapmap.org]
- Chapman AB, Schwartz GL, Boerwinkle E, Turner ST: Predictors of antihypertensive response to a standard dose of hydrochlorothiazide for essential hypertension.
*Kidney Int*2002, 61: 1047–1055. 10.1046/j.1523-1755.2002.00200.xView ArticlePubMedGoogle Scholar - Affymetrix Inc[http://www.affymetrix.com]
- Cutler DJ, Zwick ME, Carrasquillo MM, Yohn CT, Tobin KP, Kashuk C, Mathews DJ, Shah NA, Eichler EE, Warrington JA, Chakravarti A: High-throughput variation detection and genotyping using microarrays.
*Genome Res*2001, 11: 1913–1925.PubMed CentralPubMedGoogle Scholar - Turner ST, Bailey KR, Fridley BL, Chapman AB, Schwartz GL, Chai HS, Sicotte H, Kocher JPA, Rodin AS, Boerwinkle E: Large-scale genomic association analysis suggests a pharmacogenomic locus on chromosome 12 influencing antihypertensive response to thiazide diuretic.
*Hypertension*2008, 52: 359–365. 10.1161/HYPERTENSIONAHA.107.104273PubMed CentralView ArticlePubMedGoogle Scholar - Molecular Signature Database – Broad Institute, Cambridge MA[http://www.broad.mit.edu/gsea/msigdb]
- Storey JD: direct approach to false discovery rates.
*J Roy Stat Soc B*2002, 64: 479–498. 10.1111/1467-9868.00346View ArticleGoogle Scholar - Storey JD: The positive false discovery rate: a Bayesian interpretation and the q-value.
*Ann Stat*2003, 31: 2013–2035. 10.1214/aos/1074290335View ArticleGoogle Scholar - Flechner SM, Kurian SM, Head SR, Sharp SM, Whisenant TC, Zhang J, Chismar JD, Horvath S, Mondala T, Gilmartin T, Cook DJ, Kay SA, Walker JR, Salomon DR: Kidney transplant rejection and tissue injury by gene profiling of biopsies and peripheral blood lymphocytes.
*Am J Transplant*2004, 4: 1475–1489. 10.1111/j.1600-6143.2004.00526.xPubMed CentralView ArticlePubMedGoogle Scholar - Dinu V, Zhoa H, Miller PL: Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis.
*J Biomed Inform*2007, 40: 750–760. 10.1016/j.jbi.2007.06.002View ArticlePubMedGoogle Scholar - Lesnick TG, Papapetropoulos S, Mash DC, Ffrench-Mullen J, Shehadeh L, de Andrade M, Henley JR, Rocca WA, Ahlskog JE, Maragonore DM: A genomic pathway approach to a complex disease: axon guidance and Parkinson disease.
*PLoS Genet*2007, 3: e98. 10.1371/journal.pgen.0030098PubMed CentralView ArticlePubMedGoogle Scholar - Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer.
*Am J Hum Genet*2001, 69: 138–147. 10.1086/321276PubMed CentralView ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.