Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments

Zehetmayer, Sonja; Posch, Martin; Graf, Alexandra

doi:10.1186/s12859-022-04928-z

BMC Bioinformatics

Table 2 Types of filtering methods

From: Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments

Filter	Description	Considered thresholds
Mean-based	These filters are based on the gene-wise overall mean counts from both conditions. Genes with a mean expression less than some threshold given by the specified percentile percentage of mean counts are removed by the filter and not considered for the test decision (e.g., [2]).	Percentile % \(=\{\)1, 2, 3, 4, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 55, 60, 65, 70,75, 80, 85, 90\(\}\)
Max-based	Genes with maximum counts (over both conditions ) less than a threshold given by the specified percentile percentage of maximum counts are removed from the analysis and not considered for the test decision (e.g., [2]).	Percentile % \(=\{\)1, 2, 3, 4, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 55, 60, 65, 70, 75, 80, 85, 90\(\}\)
CPM	Robinson and Oshlack (2010) [13] propose to base filtering on counts per million (CPM). Genes with CPM values less than threshold c in more than \(\min (n_1,n_2)\) samples are removed.	\(c=\{1,2,5,25,50,100\}\)
Jaccard	Max-based filter [2] where the filter threshold \(v^\) is determined with the Jaccard similarity index. To compute the index for a pair of replicates, the gene counts are first dichotomised for a cut-off v: a gene count is either larger than v or not. Then the number of counts larger than threshold v in both* replicates divided by the number of gene counts larger in any of the two replicates is calculated resulting in values between 0 (dissimilar) and 1 (similar). The global Jaccard index is the average of the index across all pairs in each condition. The calculations are repeated for several threshold values v and the threshold \(v^\) with the greatest similarity is found by fitting a loess curve through the set of candidate thresholds. \(v^\) is then used as a threshold in a max-based filter.
Zero-based	This filter counts the sum of zero counts per gene and removes genes with more than u zeros from the analysis. Note that the basic filter is the zero-based filter with threshold \(u=n\).	\(u=\{16,\dots , 1\}\)

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com