Average ROC curves for 13 Latin Square experiments. The performance of ranking true and false positives for pairs of N = 1 experiments are depicted. The first experiment from 13 2× Latin Square experiments was selected for analysis. For each of the 13 comparisons, an ROC curve was generated. Shown is the average of all 13 ROC curves. Figure 1A shows the full-scale performance for all false positives. Figure 1B is a zoomed in view of 1A with the x and y-axes zoomed to show detail of restrictive cutoffs with few false positives. Figure 1C is a box plot of the number of TP detected at an arbitrary cut off level of 4 FP (vertical dashed line in 1B).