Skip to main content

Table 2 Total number of plant proteins taken from Uniprot and number of proteins remaining after redundancy filtering at 3 different percentages.

From: A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Negative dataset (Plant host)

Total number of proteins (reviewed)

90% redundancy

50% redundancy

30% redundancy

Triticum aestivum

357

315

292

291

Oryza sativa

87

86

86

86

Solanum tuberosum

390

314

308

308

Arabidopsis thaliana

1000

968

857

852

Cucurbita maxima

26

25

25

25

Citrus sinensis

93

91

91

91

Vitis vinefera

161

154

152

152

Hordeum vulgare

348

323

307

307

Pisum sativum

371

347

335

334

Glycine max

373

346

324

324

Total

3206

2969

2777

2770

Total after blastclust on cumulative data

-

2631

2284

2277