Skip to main content

Table 1 Length distributions of disordered regions. VSL2 training dataset contained 1,327 sequences, while the blind-test dataset had 1,304 recent PDB chains that were unrelated to any training sequences. Both datasets were non-redundant with pairwise identity ≤25%.

From: Length-dependent prediction of protein intrinsic disorder

 

VSL2 training dataset

Blind-test dataset

length range

    
 

# regions

# residues

# regions

# residues

1–3

483

1,044

791

1,440

4–15

758

5,650

1,012

7,343

16–30

148

3,118

151

3,173

31–100

154

8,039

50

2,236

>100

63

17,060

4

545

Total

1,606

34,911

2,008

14,737