In this paper we discussed several selection rules for two-stage designs, where after an interim analysis only promising hypotheses are considered in the second stage.

For the choice of the selection rule, different criteria may apply. With the FNS design, the total number of observations is known in advance, which facilitates the planning of resources. However, this design does not adapt to the number of hypotheses that show an effect in the interim analysis. The latter can be achieved with the FDRS design, where, on the other hand, the total number of observations is random and the planning of resources becomes more difficult. As an extension one can consider an FDRS design where the overall number of observations (across all hypotheses and both stages) is fixed and the observations allocated to the second stage are equally distributed among the selected hypotheses. This comes at the cost of a decreasing per hypothesis power if for a larger number of hypotheses the alternative holds.

For the FNS design the testing procedures provided a sound control in the considered scenarios where more than 5 hypotheses are selected for the second stage for independent as well as for correlated data. Also for the modified FDRS procedure FDR control is given in all scenarios for *m*
_{
s
} > 5. Comparing the integrated approaches for both selection rules with the corresponding pilot approaches showed an advantage of the integrated approach in many scenarios. This holds particularly for the FNS design but in many scenarios also for the FDRS design. The advantage of the integrated design increases with the proportion of observations allocated to the first stage. This is in line with earlier findings
[7, 8], where scenarios with small first-stage sample sizes were considered and only small differences between the integrated and the pilot design have been observed. In particular, if the effect sizes in microarray studies are low (as, e.g., shown in examples in
[20]) and the number of observations in the first stage is sufficiently large compared to the number of observations for the second stage, the integrated design is superior.

On the other hand, using only the second-stage data for testing has the advantage of increased flexibility and simplicity. For example, the pilot FNS procedure controls the FDR even if the hypotheses for the second stage are selected in an arbitrary way. Furthermore, standard (non-sequential) tests can be applied and FDR control can be shown analytically under suitable assumptions.

In the simulations the BH-procedure was applied to the sequential *p*-values to control the FDR. As described above, this method is conservative if *π*
_{0} < 1 as it controls the FDR actually at level *π*
_{0}
*α*. Following the suggestion of one of the referees, we additionally considered so called adaptive FDR controlling procedures that are based on an estimate of *π*
_{0} (see Additional file
2). Under independence these adaptive tests are less conservative then the BH-tests, but did not exceed the nominal level in the considered simulation scenarios. However, as shown earlier (e.g.,
[21]) under strong correlation adaptive procedures may inflate the FDR.

It is well known that two-stage designs may lead to a considerable improvement in efficiency compared to single-stage designs
[1–8] and this applies also to the procedures investigated in this paper (see Additional file
3 for a simulation study comparing the two-stage tests to corresponding single-stage designs). Furthermore, the methods can be extended to designs where an explicit early rejection boundary is applied in the interim analysis as in many group-sequential applications. In this case the calculation of the sequential *p*-values is slightly modified (the integral boundaries depend on the early rejection boundaries). However, unless the fraction of hypotheses for which the alternative holds is large, it is expected that the addition of an early rejection boundary at the interim analysis has only a marginal impact on the efficiency of the procedure. Furthermore, for hypotheses that are rejected in the interim analysis based on few observations, a confirmation with a larger sample size might be important anyway.