Skip to main content

Table 1 Main characteristics of the dataset used for the accuracy assessment of PIntron

From: PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

Region Genomic
length (nt)
Number
of genes
Number of
transcripts
Overall transcript
length (nt)
ENm004 1,700,000 18 6,964 4,497,709
ENm006 1,338,447 35 18,230 11,377,148
ENr111 500,000 2 171 113,356
ENr114 500,000 1 35 120,734
ENr132 500,000 4 855 551,266
ENr222 500,000 2 461 277,554
ENr223 500,000 5 50,607 32,732,634
ENr231 500,000 11 5,637 3,534,406
ENr232 500,000 9 4,779 2,505,934
ENr323 500,000 5 1,670 997,647
ENr324 500,000 1 487 343,220
ENr333 500,000 12 7,179 4,381,534
ENr334 500,000 7 989 611,795
Total 8,538,447 112 98,064 62,044,937