Skip to main content

Table 2 Types of filtering methods

From: Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments

Filter

Description

Considered thresholds

Mean-based

These filters are based on the gene-wise overall mean counts from both conditions. Genes with a mean expression less than some threshold given by the specified percentile percentage of mean counts are removed by the filter and not considered for the test decision (e.g., [2]).

Percentile % \(=\{\)1, 2, 3, 4, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 55, 60, 65, 70,75, 80, 85, 90\(\}\)

Max-based

Genes with maximum counts (over both conditions ) less than a threshold given by the specified percentile percentage of maximum counts are removed from the analysis and not considered for the test decision (e.g., [2]).

Percentile % \(=\{\)1, 2, 3, 4, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 55, 60, 65, 70, 75, 80, 85, 90\(\}\)

CPM

Robinson and Oshlack (2010) [13] propose to base filtering on counts per million (CPM). Genes with CPM values less than threshold c in more than \(\min (n_1,n_2)\) samples are removed.

\(c=\{1,2,5,25,50,100\}\)

Jaccard

Max-based filter [2] where the filter threshold \(v^*\) is determined with the Jaccard similarity index. To compute the index for a pair of replicates, the gene counts are first dichotomised for a cut-off v: a gene count is either larger than v or not. Then the number of counts larger than threshold v in both replicates divided by the number of gene counts larger in any of the two replicates is calculated resulting in values between 0 (dissimilar) and 1 (similar). The global Jaccard index is the average of the index across all pairs in each condition. The calculations are repeated for several threshold values v and the threshold \(v^*\) with the greatest similarity is found by fitting a loess curve through the set of candidate thresholds. \(v^*\) is then used as a threshold in a max-based filter.

 

Zero-based

This filter counts the sum of zero counts per gene and removes genes with more than u zeros from the analysis. Note that the basic filter is the zero-based filter with threshold \(u=n\).

\(u=\{16,\dots , 1\}\)