Skip to main content

Table 2 Test data set analysis.

From: RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens

  

2+ SRRs

3+ SRRs

  

Loose Repeat Threshold

Loose Repeat Threshold

Total Proteins

SRR Proteins

Total

True positives

False positives

Total

True positives

False positives

5000

250

342

250 (100%)

92

248

248 (99.2%)

0

5000

1250

1306

1250 (100%)

56

1237

1237 (99.0%)

0

10000

500

674

500 (100%)

174

492

492 (98.4%)

0

10000

2500

2633

2500 (100%)

133

2466

2466 (98.6%)

0

  

Normal Repeat Threshold

Normal Repeat Threshold

Total Proteins

SRR Proteins

Total

True positives

False positives

Total

True positives

False positives

5000

250

256

250 (100%)

6

248

248 (99.2%)

0

5000

1250

1253

1248 (99.8%)

5

1237

1237 (99.0%)

0

10000

500

506

499 (99.8 %)

7

492

492 (98.4%)

0

10000

2500

2504

2496 (99.8%)

8

2466

2466 (98.6%)

0

  

Strict Repeat Threshold

Strict Repeat Threshold

Total Proteins

SRR Proteins

Total

True positives

False positives

Total

True positives

False positives

5000

250

245

245 (98.0%)

0

244

244 (97.6%)

0

5000

1250

1220

1220 (97.6%)

0

1219

1219 (97.5%)

0

10000

500

485

485 (97.0%)

0

484

484 (96.8%)

0

10000

2500

2424

2424 (97.0%)

0

2420

2420 (96.8%)

0

  1. Proteomes containing 5000 or 10000 proteins (5% or 25% of which contained repeat regions) were created and analysed using RepSeq.