Skip to main content

Table 2 Characteristics of the genome-wide data sets containing true and decoy acceptor and donor splice sites for our five model organisms.

From: Accurate splice site prediction using support vector machines

 

Worm

Fly

Cress

Fish

Human

 

Acceptor

Donor

Acceptor

Donor

Acceptor

Donor

Acceptor

Donor

Acceptor

Donor

Training total

1,105,886

1,744,733

1,289,427

2,484,854

1,340,260

2,033,863

3,541,087

6,017,854

6,635,123

9,262,241

Fraction positives

3.6%

2.3%

1.4%

0.7%

3.6%

2.3%

2.4%

1.5%

1.5%

1.1%

Evaluation total

371,897

588,088

425,287

820,172

448,924

680,998

3,892,454

10,820,985

10,820,985

15,201,348

Fraction positives

3.6%

2.3%

1.4%

0.7%

3.6%

2.3%

0.7%

0.4%

0.3%

0.2%

Testing total

364,967

578621

441,686

851,539

445,585

673,732

3,998,521

11,011,875

11,011,875

15,369,748

Fraction positives

3.6%

2.3%

1.4%

0.7%

3.5%

2.3%

0.7%

0.4%

0.3%

0.2%

  1. The sequence length in all sets is 141 nt, for acceptor splice sequences the consensus dimer AG is at position 61, for donor GT/GC at position 81. The negative examples in training sets of fish and human were sub-sampled by a factor of three and five, respectively.