Skip to main content

Table 1 Main characteristics of the dataset used for the accuracy assessment of PIntron

From: PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

Region

Genomic

length (nt)

Number

of genes

Number of

transcripts

Overall transcript

length (nt)

ENm004

1,700,000

18

6,964

4,497,709

ENm006

1,338,447

35

18,230

11,377,148

ENr111

500,000

2

171

113,356

ENr114

500,000

1

35

120,734

ENr132

500,000

4

855

551,266

ENr222

500,000

2

461

277,554

ENr223

500,000

5

50,607

32,732,634

ENr231

500,000

11

5,637

3,534,406

ENr232

500,000

9

4,779

2,505,934

ENr323

500,000

5

1,670

997,647

ENr324

500,000

1

487

343,220

ENr333

500,000

12

7,179

4,381,534

ENr334

500,000

7

989

611,795

Total

8,538,447

112

98,064

62,044,937