Skip to main content

Table 1 Random undersampling was used for training; thus, the number of negative instances was equal to the number of positive instances

From: Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences

Dataset   Training data Testing data
   Positives/Negatives Positives Negatives
Chr. 1 TIS 17,638 2156 8,074,590
  STOP 17,404 2154 23,573,031
Chr. 3 TIS 18,631 1163 7,291,951
  STOP 18,444 1114 21,522,500
Chr. 13 TIS 19,454 340 3,664,164
  STOP 19,225 333 10,878,302
Chr. 19 TIS 18,383 1411 1,698,891
  STOP 18,136 1422 4,665,804
Chr. 21 TIS 19,561 233 1,303,634
  STOP 19,558 237 3,726,959