From: Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments
Filter | Description | Considered thresholds |
---|---|---|
Mean-based | These filters are based on the gene-wise overall mean counts from both conditions. Genes with a mean expression less than some threshold given by the specified percentile percentage of mean counts are removed by the filter and not considered for the test decision (e.g., [2]). | Percentile % \(=\{\)1, 2, 3, 4, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 55, 60, 65, 70,75, 80, 85, 90\(\}\) |
Max-based | Genes with maximum counts (over both conditions ) less than a threshold given by the specified percentile percentage of maximum counts are removed from the analysis and not considered for the test decision (e.g., [2]). | Percentile % \(=\{\)1, 2, 3, 4, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 55, 60, 65, 70, 75, 80, 85, 90\(\}\) |
CPM | Robinson and Oshlack (2010) [13] propose to base filtering on counts per million (CPM). Genes with CPM values less than threshold c in more than \(\min (n_1,n_2)\) samples are removed. | \(c=\{1,2,5,25,50,100\}\) |
Jaccard | Max-based filter [2] where the filter threshold \(v^*\) is determined with the Jaccard similarity index. To compute the index for a pair of replicates, the gene counts are first dichotomised for a cut-off v: a gene count is either larger than v or not. Then the number of counts larger than threshold v in both replicates divided by the number of gene counts larger in any of the two replicates is calculated resulting in values between 0 (dissimilar) and 1 (similar). The global Jaccard index is the average of the index across all pairs in each condition. The calculations are repeated for several threshold values v and the threshold \(v^*\) with the greatest similarity is found by fitting a loess curve through the set of candidate thresholds. \(v^*\) is then used as a threshold in a max-based filter. | Â |
Zero-based | This filter counts the sum of zero counts per gene and removes genes with more than u zeros from the analysis. Note that the basic filter is the zero-based filter with threshold \(u=n\). | \(u=\{16,\dots , 1\}\) |