Skip to main content

Table 1 Length distributions of disordered regions. VSL2 training dataset contained 1,327 sequences, while the blind-test dataset had 1,304 recent PDB chains that were unrelated to any training sequences. Both datasets were non-redundant with pairwise identity ≤25%.

From: Length-dependent prediction of protein intrinsic disorder

  VSL2 training dataset Blind-test dataset
length range     
  # regions # residues # regions # residues
1–3 483 1,044 791 1,440
4–15 758 5,650 1,012 7,343
16–30 148 3,118 151 3,173
31–100 154 8,039 50 2,236
>100 63 17,060 4 545
Total 1,606 34,911 2,008 14,737