Background
Next-generation sequencing technology is changing the genomic research due to its huge sequencing capacity which is used to identify rare susceptibility variants that affect complex diseases. Because of its high cost, a two-stage extreme phenotype sequencing (TS-EPS) design is an alternative approach [1] in which the genetic variants are discovered by whole-genome or whole-exome sequencing individuals with extreme phenotypes in stage I and the association variants are detected by sequencing a large number of individuals on the discovered variants in stage II. TS-EPS can efficiently discover more than half of the causal variants using about 0.2% of all individuals [2] and can therefore have higher power than random sampling given the sample size and effect sizes of the causal variants [2, 3]. Using simulated data for unrelated individuals, we further explore the efficiency of TS-ESP in term of different sample sizes and varying effect sizes.