Skip to main content

Table 1 Quantification of the structural word extraction from the non-redundant data set.

From: Mining protein loops using a structural alphabet and statistical exceptionality

Words

Number of words

3310

166

2214

930

(%)

(11.7%)

(5.0%)

(66.9%)

(28.1%)

Number of fragments

249953

11435

129781

108737

(%)

(60.2%)

(4.6%)

(51.9%)

(43.5%)

Nb fragments/word

75.5

68.9

58.6

116.9*

All-loop coverage rate

72.7%

5.1%

46.5%

40.2%

Short-loop coverage rate

70.3%

4.4%

38.9%

39.3%

Long-loop coverage rate

74.9%

5.7%

53.9%

41.1%

Loops containing at least one word

84.8%

9.8%

60.3%

58.2%

Short loops containing at least one word

79.7%

6.1%

48.1%

49.4%

Long loops containing at least one word

97.8%

19.1%

90.9%

80.4%

  1. 1: words seen more than 30 times. 2: under-represented words, 3: non-significant words, 4: over-represented words, '*': significantly higher occurrence according to a Kruskal-Wallis test. Coverage rates are given on a per structural letter basis. Numbers within brackets denote the percentage of words/fragments with respect to the 28274 words/415071 fragments of the whole data set (column 1) and with respect to the 3310 words/249953 fragments in W set≥30 (columns 2 to 4).