Skip to main content
  • Poster presentation
  • Open access
  • Published:

Evaluation of two-step iterative resampling procedure for internal validation of genome-wide association studies


Genome-wide association (GWA) studies have successfully identified many common genetic variants associated with complex diseases over the past decade. The standard method for validating the top single nucleotide polymorphisms (SNPs) identified in GWA studies is to identify and replicate the findings in external cohorts. However, in cases of rare diseases like retinoblastoma and Ewing sarcoma that have prevalence rates of about 0.000001, it can be difficult to find an external validation cohort within a reasonable time frame. Even when disease outcomes are not rare, it can be hard to find a suitable external validation cohort.

Materials and methods

In these situations, resampling approaches or two-stage validation approaches such as the one proposed by Yang et al. [1] have been used to identify SNPs associated with the outcome of interest. The two-stage validation approach proposed in [1] is based on dividing the original cohort equally into discovery and replication cohorts. A SNP is considered discovered and replicated if the p-values corresponding to the tests are significant at levels α1 and α2 in the two stages. This process is repeated 100 times and a SNP is considered to be “validated” if the SNP is discovered and replicated at least 10 times in 100 repetitions. However, there was no justification provided for the 50:50 split and the statistical properties of the approach were also not evaluated. In this paper, we undertook extensive simulation studies to assess the effect of various split ratios, various type I error cut-offs for stage I, and different cut-offs used in 100 repetitions on the performance of the approach regarding the overall type I error and statistical power. Two independent simulation studies, one for the binary phenotype and the other for continuous phenotype, were conducted.


Our simulation results indicate that, for GWA studies, the two-stage approach with a 70:30 split and a cut-off of 20 in 100 replications is reasonable as it is able to maintain an overall type I error at α1α2 and provides near “optimal” power for internally validated SNPs. We then applied this approach to the two GWA study examples and validated the SNPs identified in the original GWA studies. The first GWA study was conducted to identify SNPs associated with obesity (evaluated as a binary phenotype) in survivors of childhood cancer treated with cranial radiation, while the aim of the second study was to identify SNPs associated with heart failure (evaluated as a continuous phenotype) in cancer survivors who received cardiotoxic therapy.


  1. Yang JJ, Cheng C, Devidas M, Cao X, Campana D, Yang W, Fan Y, Neale G, Cox N, Scheet P, Borowitz MJ, Winick NJ, Martin PL, Bowman WP, Camitta B, Reaman GH, Carrroll WL, Willman CL, Hunger SP, Evans WE, Pui CH, Loh M, Relling MV: Genome-wide association study identifies germline polymorphisms associated with relapse of childhood acute lymphoblastic leukemia. Blood. 2012, 120 (20): 4197-4204. 10.1182/blood-2012-07-440107.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kumar Srivastava.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, G., Liu, W., Cheng, C. et al. Evaluation of two-step iterative resampling procedure for internal validation of genome-wide association studies. BMC Bioinformatics 15 (Suppl 10), P34 (2014).

Download citation

  • Published:

  • DOI: