Skip to main content

Table 3 Data efficiency of the STSS algorithm.

From: Self-training in significance space of support vectors for imbalanced biomedical event data

Class label

Original training dataset

STSS training dataset (Ds)

 

Number of instances

Imbalance ratio

Number of instances

Imbalance ratio

AtLoc

Pos: 48

Neg: 3661527

1:76282

Pos: 48

Neg: 128761

1:2682

Cause

Pos: 1117

Neg: 366045

1:3277

Pos: 1117

Neg: 27505

1:24

Cause-Theme

Pos: 6

Neg: 3661521

1:610261

Pos: 6

Neg: 6000

1:1000

Site

Pos: 425

Neg: 36661150

1:8614

Pos: 425

Neg: 36627

1:86

Theme

Pos: 9246

Neg: 3652329

1:395

Pos: 9246

Neg: 30915

1:3

ToLoc

Pos: 50

Neg: 3661525

1:73231

Pos: 50

Neg: 167120

1:3342

  1. An imbalance ratio close to 1.0 is preferred.