False discovery rates in spectral identification

BMC Bioinformatics

Table 3 Comparison between searches using reversed or shuffled decoy databases

Search#	Spectra	Database	PMTol	Decoy	EmpiricalFDR¹ fixed			FactFDR² fixed
					$N_{t a r g e t}^{3}$	FactFDR(%)	p-value(%) ⁴	N _target	EmpiricalFDR(%)
I-1	ISB-02	ISB	2.5 Da	Rev	2329/1009	5.8/4.4	10.9/30.9	2279/1024	3.9/5.7
I-2	ISB-02	ISB	2.5 Da	Shfl	2339/1023	6.0/4.6	7.2/38.6	2279/1025	3.6/5.1
I-3	ISB-02	ISB+Yeast	2.5 Da	Rev	1578/602	4.7/5.1	40.8/50.3	1583/596	5.1/3.9
I-4	ISB-02	ISB+Yeast	2.5 Da	Shfl	1597/577	5.0/4.2	50.2/34.5	1589/588	4.8/5.6
I-5	ISB-02+AB-TC	ISB+Yeast	2.5 Da	Rev	1490/569	5.0/5.8	50.2/31.2	1480/553	4.5/4.3
I-6	ISB-02+AB-TC	ISB+Yeast	2.5 Da	Shfl	1488/530	5.0/4.0	50.2/28.7	1478/550	4.9/5.5
I-7	ISB-02+AB-TC	ISB+AT	2.5 Da	Rev	1320/441	4.6/4.1	36.6/38.0	1342/464	5.8/7.3
I-8	ISB-02+AB-TC	ISB+AT	2.5 Da	Shfl	1287/441	3.4/4.1	7.5/38.0	1342/464	6.8/7.3
Y-1	Y-Small+AB-TC	Yeast+AT	30 ppm	Rev	2574/1988	1.0/1.3	50.1/22.7	2588/1759	1.0/0.5
Y-2	Y-Small+AB-TC	Yeast+AT	30 ppm	Shfl	2554/1940	0.9/1.2	38.7/27.3	2620/1758	1.1/0.6

All searches followed the standard TDA procedure except the step 2 for shuffled database searches. The results in columns labeled "FDR fixed" are obtained at empirical FDR threshold of 5% (the searches I-1 to I-8) or 1%(the searches Y-1 to Y-2). The results in columns labeled "FactFDR fixed" are obtained at factual FDR threshold of 5% (the searches I-1 to I-8) or 1%(the searches Y-1 to Y-2). The underlined characters represent either dummy spectra, dummy databases, or dummy tolerance. The first numbers in N_target/FactFDR/FDR/p-value fields are from MS-GFDB, and the second from X!Tandem. Note that we do not aim to compare database search engines (i.e., MS-GFDB vs. X!Tandem). We only evaluate how FDR estimation via TDA is reliable and how the number of positive PSMs (or peptides) changes for different search strategies with different parameters or protocols.
In contrast with popular belief, we did not observe a conservative estimation of FDR with shuffled decoy when compared to the reverse decoy database.
¹the empirical FDR; ²the factual FDR; ³ the number of positive target PSMs; ⁴ Fisher p-value (see Table 2) - Fisher p-values less than 5% were emphasized with bold fonts.

ISSN: 1471-2105