Skip to main content

Table 4 Composition of the 8 datasets

From: Spliceator: multi-species splice site prediction using convolutional neural networks

Dataset

Quality of sequences

No. of positive sequences

No. of negative sequences

Type of negative sequences

Ratio

Donor

Acceptor

AS_0

Unconfirmed and confirmed

12,000

12,000

12,000

FP only

1:1

AS_1

12,000

4000 exons, 4000 introns and 4000 FP

AS_2

24,000

FP only

1:2

AS_10

120,000

FP only

1:10

GS_0

Confirmed

10,973

11,179

11,000

FP only

1:1

GS_1

11,000

3650 exons, 3650 introns and 3700 FP

GS_2

22,000

FP only

1:2

GS_10

110,000

FP only

1:10

  1. Composition of the 8 datasets used to study the impact of (i) the type of negative examples (only FP sequences vs. heterogeneous data with exons, introns and FP sequences), (ii) the ratio of positive to negative examples (1:1, 1:2 and 1:10), (iii) data quality (‘Confirmed’ and ‘Unconfirmed’ sequences in the AS datasets vs. only Confirmed sequences in the GS datasets
  2. FP, False Positive; GS, Gold Standard; AS, All Sequences