Skip to main content

Table 2 Characteristics of the genome-wide data sets containing true and decoy acceptor and donor splice sites for our five model organisms.

From: Accurate splice site prediction using support vector machines

  Worm Fly Cress Fish Human
  Acceptor Donor Acceptor Donor Acceptor Donor Acceptor Donor Acceptor Donor
Training total 1,105,886 1,744,733 1,289,427 2,484,854 1,340,260 2,033,863 3,541,087 6,017,854 6,635,123 9,262,241
Fraction positives 3.6% 2.3% 1.4% 0.7% 3.6% 2.3% 2.4% 1.5% 1.5% 1.1%
Evaluation total 371,897 588,088 425,287 820,172 448,924 680,998 3,892,454 10,820,985 10,820,985 15,201,348
Fraction positives 3.6% 2.3% 1.4% 0.7% 3.6% 2.3% 0.7% 0.4% 0.3% 0.2%
Testing total 364,967 578621 441,686 851,539 445,585 673,732 3,998,521 11,011,875 11,011,875 15,369,748
Fraction positives 3.6% 2.3% 1.4% 0.7% 3.5% 2.3% 0.7% 0.4% 0.3% 0.2%
  1. The sequence length in all sets is 141 nt, for acceptor splice sequences the consensus dimer AG is at position 61, for donor GT/GC at position 81. The negative examples in training sets of fish and human were sub-sampled by a factor of three and five, respectively.